Я хотел бы знать, возможно ли проанализировать заголовки новостей (с использованием NTLK / sentiment vader) в режиме реального времени / потоковых новостях.
Ниже кода, подающего мою систему новостей (заголовки)
import praw
import time
reddit = praw.Reddit(client_id='xxxx',
client_secret='MLK5gKaEM2FxxxxxxxxI', user_agent='testing_api')
# must be edited to properly authenticate
subreddit = reddit.subreddit('worldnews')
seen_submissions = set()
while True:
for submission in subreddit.new(limit=10):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
time.sleep(60) # sleep for a minute (60 seconds)
Использование SentimentIntensityAnalyzer Я построил:
from IPython import display
import math
from pprint import pprint
import pandas as pd
import numpy as np
import nltk
nltk.download('vader_lexicon')
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='darkgrid', context='talk', palette='Dark2')
import praw
reddit = praw.Reddit(client_id='xxxx',
client_secret='MLK5gKaEM2FxxxxxxxxI', user_agent='testing_api')
subreddit = reddit.subreddit('worldnews')
headlines = set()
while True:
for submission in subreddit.new(limit=10):
if submission.title not in headlines:
headlines.add(submission.title)
time.sleep(60) # sleep for a minute (60 seconds)
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
sia = SIA()
results = []
for line in headlines:
pol_score = sia.polarity_scores(line)
pol_score['headline'] = line
results.append(pol_score)
pprint(results[], width=100)
Я не вижу ничего отображаемого в консоли ... Я ожидаю увидеть что-то вроде (в реальном времени)
{'compound': -0.5267,
'headline': 'Report: Nearly Half of Americans Breathing Unhealthy Air',
'neg': 0.327,
'neu': 0.673,
'pos': 0.0},
{'compound': -0.0754,
'headline': 'The Implications of Trump Derangement Syndrome | Even now, vehement Trump '
'supporters seem to believe that most criticism of the president is explained by '
'widespread TDS.',
'neg': 0.11,
'neu': 0.791,
'pos': 0.1}]