Итак, я пытаюсь извлечь данные из сообщений reddit, используя praw, и преобразовать их в файл JSON Lines .
Мне нужно что-то вроде этого:
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?"], "response": ["Debug Stick?"], "id": "gabsj3"}
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?", "Debug Stick?"], "response": ["My guess is the dot is flat out gone\n\nThere's no way for it to exist so why would they leave it in"], "id": "gabsj3"}
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?", "Debug Stick?", "My guess is the dot is flat out gone\n\nThere's no way for it to exist so why would they leave it in"], "response": ["No, it's still in the game. Use the debug stick to set all sides to `none`"], "id": "gabsj3"}
Таким образом, контекст содержит ["POST TITLE", "КОММЕНТАРИЙ ПЕРВОГО УРОВНЯ", "КОММЕНТАРИЙ ВТОРОГО УРОВНЯ", "ET C ..."], а ответ содержит комментарий последнего уровня. В этом сообщении на reddit должно быть:
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?", "Debug Stick?", "My guess is the dot is flat out gone\n\nThere's no way for it to exist so why would they leave it in", "No, it's still in the game. Use the debug stick to set all sides to `none`"], "response": ["Huh, alright"], "id": "gabsj3"}
Но вывод моего кода выглядит примерно так:
{"context": ["Cross your redstone wires - Snapshot 20w18a is out", "But how will people get a blood spot effect now if the redstone default is a cross again?"], "response": ["Debug Stick?", "I think we can still use resource packs to change it back into a dot, I don't know so don't quote me on that", "I honestly think the cross redstone looks a bit more like a splatter."], "id": "gabsj3"}
Вот мой код:
import praw
import jsonlines
reddit = praw.Reddit(client_id='-', client_secret='-', user_agent='user_agent')
max = 1000
sequence =1
for post in reddit.subreddit('minecraft').new(limit=max):
data = []
title = []
comment = []
response = []
post_id = post.id
titl = post.title
# print("https://www.reddit.com/"+post.permalink)
print("Fetched "+str(sequence) + " posts .. ")
title.append(titl)
try:
submission = reddit.submission(id=post_id)
submission.comments.replace_more(limit=None)
sequence = sequence + 1
for top_level_comment in submission.comments:
cmnt_body = top_level_comment.body
comment.append(cmnt_body)
for second_level_comment in top_level_comment.replies:
response.append(second_level_comment.body)
context = [title[0],comment[0]]
data.append({"context":context,"response":response,"id":post_id})
response = []
# print(data[0])
with jsonlines.open('2020-04-30_12.jsonl', mode='a') as writer:
writer.write(data.pop())
comment.pop()
title.pop()
except Exception :
pass