JSONDecodeError при потоковой передаче твитов - PullRequest
0 голосов
/ 17 декабря 2018

Я использую Twython, чтобы получать живые твиты, а затем добавляю их в файл json.Поскольку твиты последовательно выгружаются в файл json, я пытаюсь разделить каждый объект и отформатировать файл, чтобы избежать ошибки Multiple JSON root elements .Для этого я использую следующий код:

class MyStreamer(TwythonStreamer):
def on_success(self,data):

      with open('fetched_tweets.json','a') as tf:
            json.dump(data, tf)
            tf.write("\n") 
            time.sleep(10)  

            content = open('fetched_tweets.json', "r").read() 
            n = [json.loads(str(item)) for item in content.strip().split('\n')]
            with open ('test.json', 'w') as m:
                json.dump(n,m)
            return True

def on_error(self, status):
    print (status)  

Моя проблема: Код дает мне следующую ошибку: JSONDecodeError: Ожидаемое значение: строка 1, столбец 1 (char 0)

Что я не понимаю: Если я запускаю следующий код отдельно, он работает нормально (переформатирует данные json, сброшенные в файл fetched_tweets.json), вдопустимый файл json 'test.json'), но выдает ошибку при добавлении его в основной скрипт:

content = open('fetched_tweets.json', "r").read() 
            n = [json.loads(str(item)) for item in content.strip().split('\n')]
            with open ('test.json', 'w') as m:
                json.dump(n,m)

Что мне нужно: Мне нужно запустить все втот же сценарий без каких-либо ошибок.

Примечание: я использую блокнот Jupyter.

EDIT : данные в файле fetched_tweets.json выглядят как APIОтвет JSON по этой ссылке: https://gist.github.com/hrp/900964. Я использую приведенный ниже код для записи каждого твита в одну строку:

tf.write("\n")

Затем с помощью:

content = open('fetched_tweets.json', "r").read() 
        n = [json.loads(str(item)) for item in content.strip().split('\n')]
        with open ('test.json', 'w') as m:
            json.dump(n,m)

Для переформатированияфайл в действительный файл JSON.

Пример данных из файла 'fetched_tweets.json' перед применением content.sкод отключения ():

{"created_at": "Mon Dec 17 22:38:45 +0000 2018", "id": 1074796067898748929, "id_str": "1074796067898748929", "text": "RT @robreiner: It\u2019s clear. The President is a criminal. He has committed felonies. In the United States of America no one is above the law.\u2026", "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "truncated": false, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 817409784777506816, "id_str": "817409784777506816", "name": "UNrealDonaldTrump", "screen_name": "UNreal_Donald_T", "location": "Europe", "url": null, "description": "\ud83c\uddec\ud83c\udde7Old Fernebergian. Follows NASCAR & NFL. Don't make me laugh; it hurts my back\ud83d\ude00. Ex-adman MIPA(retd). Thank gawd for KODI & a VPN.", "translator_type": "none", "protected": false, "verified": false, "followers_count": 367, "friends_count": 103, "listed_count": 12, "favourites_count": 65139, "statuses_count": 51622, "created_at": "Fri Jan 06 16:37:34 +0000 2017", "utc_offset": null, "time_zone": null, "geo_enabled": false, "lang": "en", "contributors_enabled": false, "is_translator": false, "profile_background_color": "000000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_link_color": "000000", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "000000", "profile_text_color": "000000", "profile_use_background_image": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1011365968805875712/Sdu90pe9_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1011365968805875712/Sdu90pe9_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/817409784777506816/1494141668", "default_profile": false, "default_profile_image": false, "following": null, "follow_request_sent": null, "notifications": null}, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweeted_status": {"created_at": "Mon Dec 17 15:21:01 +0000 2018", "id": 1074685908496961537, "id_str": "1074685908496961537", "text": "It\u2019s clear. The President is a criminal. He has committed felonies. In the United States of America no one is above\u2026 https://t.co/", "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "truncated": true, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 738080573365702657, "id_str": "738080573365702657", "name": "Rob Reiner", "screen_name": "robreiner", "location": "California, USA", "url": null, "description": "Filmmaker, actor, producer, husband, and father.", "translator_type": "none", "protected": false, "verified": true, "followers_count": 539787, "friends_count": 277, "listed_count": 2709, "favourites_count": 67957, "statuses_count": 2313, "created_at": "Wed Jun 01 18:51:36 +0000 2016", "utc_offset": null, "time_zone": null, "geo_enabled": false, "lang": "en", "contributors_enabled": false, "is_translator": false, "profile_background_color": "F5F8FA", "profile_background_image_url": "", "profile_background_image_url_https": "", "profile_background_tile": false, "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "profile_image_url": "http://pbs.twimg.com/profile_images/740361916883730432/B44FKZvz_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/740361916883730432/B44FKZvz_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/738080573365702657/1517362906", "default_profile": true, "default_profile_image": false, "following": null, "follow_request_sent": null, "notifications": null}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "extended_tweet": {"full_text": "It\u2019s clear. The President is a criminal. He has committed felonies. In the United States of America no one is above the law. There is nothing in the Constitution that says a President can\u2019t be indicted. Donald Trump must be indicted.", "display_text_range": [0, 233], "entities": {"hashtags": [], "urls": [], "user_mentions": [], "symbols": []}}, "quote_count": 312, "reply_count": 776, "retweet_count": 7790, "favorite_count": 29009, "entities": {"hashtags": [], "urls": [{"url": "https://t.co", "expanded_url": "https://twitter.com/i/web/status/1074685908496961537", "display_url": "twitter.com/i/web/status/1\u2026", "indices": [117, 140]}], "user_mentions": [], "symbols": []}, "favorited": false, "retweeted": false, "filter_level": "low", "lang": "en"}, "is_quote_status": false, "quote_count": 0, "reply_count": 0, "retweet_count": 0, "favorite_count": 0, "entities": {"hashtags": [], "urls": [], "user_mentions": [{"screen_name": "robreiner", "name": "Rob Reiner", "id": 738080573365702657, "id_str": "738080573365702657", "indices": [3, 13]}], "symbols": []}, "favorited": false, "retweeted": false, "filter_level": "low", "lang": "en", "timestamp_ms": "1545086325989"}
{"created_at": "Mon Dec 17 22:38:46 +0000 2018", "id": 1074796068015992832, "id_str": "1074796068015992832", "text": "RT @EdKrassen: BREAKING:  Donald Trump has won the \"Golden Idiot\" award from the Heute-Show, a late-night satirical German television progr\u2026", "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "truncated": false, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 23458866, "id_str": "23458866", "name": "Prudence Cain", "screen_name": "iggasuz", "location": "Colorado", "url": null, "description": "Passionate about reading.  Folks should try it.  If salty language offends you, you might not want to follow.  Retired medical professional.  Vet.", "translator_type": "none", "protected": false, "verified": false, "followers_count": 2030, "friends_count": 4328, "listed_count": 9, "favourites_count": 77824, "statuses_count": 38918, "created_at": "Mon Mar 09 16:56:40 +0000 2009", "utc_offset": null, "time_zone": null, "geo_enabled": true, "lang": "en", "contributors_enabled": false, "is_translator": false, "profile_background_color": "642D8B", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme10/bg.gif", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme10/bg.gif", "profile_background_tile": true, "profile_link_color": "1B95E0", "profile_sidebar_border_color": "DA65AD", "profile_sidebar_fill_color": "7AC3EE", "profile_text_color": "3D1957", "profile_use_background_image": true, "profile_image_url": "http://pbs.twimg.com/profile_images/863044678832111616/E8oRd-1l_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/863044678832111616/E8oRd-1l_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/23458866/1541786436", "default_profile": false, "default_profile_image": false, "following": null, "follow_request_sent": null, "notifications": null}, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweeted_status": {"created_at": "Mon Dec 17 22:20:00 +0000 2018", "id": 1074791346521681920, "id_str": "1074791346521681920", "text": "BREAKING:  Donald Trump has won the \"Golden Idiot\" award from the Heute-Show, a late-night satirical German televis\u2026 https://t.co", "source": "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck</a>", "truncated": true, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 132339474, "id_str": "132339474", "name": "Ed Krassenstein", "screen_name": "EdKrassen", "location": "Fort Myers, FL", "url": "http://edkrassenstein.com", "description": "Co-founder of @HillReporter, Author \"How the People Trumped Ronald Plump\", ed@hillreporter.com, edkrassen@protonmail.com - Twin of @Krassenstein", "translator_type": "none", "protected": false, "verified": false, "followers_count": 851760, "friends_count": 659436, "listed_count": 6654, "favourites_count": 34890, "statuses_count": 40251, "created_at": "Tue Apr 13 00:00:13 +0000 2010", "utc_offset": null, "time_zone": null, "geo_enabled": false, "lang": "en", "contributors_enabled": false, "is_translator": false, "profile_background_color": "000000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_link_color": "229900", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "000000", "profile_text_color": "000000", "profile_use_background_image": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1032612349633486848/t35esAW6_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1032612349633486848/t35esAW6_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/132339474/1537441493", "default_profile": false, "default_profile_image": false, "following": null, "follow_request_sent": null, "notifications": null}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "extended_tweet": {"full_text": "BREAKING:  Donald Trump has won the \"Golden Idiot\" award from the Heute-Show, a late-night satirical German television program, for the 4th year in a row.\n\nCongratulations Mr. Trump.  You are the mockery of the world, again, and again, and again, and again!", "display_text_range": [0, 257], "entities": {"hashtags": [], "urls": [], "user_mentions": [], "symbols": []}}, "quote_count": 55, "reply_count": 77, "retweet_count": 455, "favorite_count": 1724, "entities": {"hashtags": [], "urls": [{"url": "https://t.co", "expanded_url": "https://twitter.com/i/web/status/1074791346521681920", "display_url": "twitter.com/i/web/status/1\u2026", "indices": [117, 140]}], "user_mentions": [], "symbols": []}, "favorited": false, "retweeted": false, "filter_level": "low", "lang": "en"}, "is_quote_status": false, "quote_count": 0, "reply_count": 0, "retweet_count": 0, "favorite_count": 0, "entities": {"hashtags": [], "urls": [], "user_mentions": [{"screen_name": "EdKrassen", "name": "Ed Krassenstein", "id": 132339474, "id_str": "132339474", "indices": [3, 13]}], "symbols": []}, "favorited": false, "retweeted": false, "filter_level": "low", "lang": "en", "timestamp_ms": "1545086326017"}

1 Ответ

0 голосов
/ 18 декабря 2018

Мне удалось изменить скрипт следующим образом:

class MyStreamer(TwythonStreamer):
def on_success(self, data):
    if 'text' in data:

        print(data['text'])
        def on_error(self, status_code, data):
    print(status_code)
    class MyStreamer(TwythonStreamer):
def on_success(self, data):        
     with open('fetched_tweets.json','a') as tf:
        json.dump(data, tf)
        tf.write("\n") 


    contents = open('fetched_tweets.json', "r").read() 
    data = [json.loads(str(item)) for item in contents.strip().split('\n')]
    with open ('test.json', 'w') as m:
        json.dump(data,m) 

        return True
def on_error(self, status):
    print (status)
stream = MyStreamer(APP_KEY, APP_SECRET,
                OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
tweet = stream.statuses.filter(track='Keyword')
...