Анализ настроений финансовых отчетов - Python - PullRequest
0 голосов
/ 01 октября 2019

Я пытался проанализировать настроения финансовой отчетности. Я использую модуль nltk.vader_lexicon после добавления финансовых слов в лексикон. Я использую слова Лафрана-Макдональда , чтобы увеличить финансовый словарный запас с здесь .

Код для добавления слов выглядит следующим образом:

import csv
import pandas as pd

# stock market lexicon
stock_lex = pd.read_csv('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data/stock_lex.csv')
stock_lex['sentiment'] = (stock_lex['Aff_Score'] + stock_lex['Neg_Score'])/2
stock_lex = dict(zip(stock_lex.Item, stock_lex.sentiment))
stock_lex = {k:v for k,v in stock_lex.items() if len(k.split(' '))==1}
stock_lex_scaled = {}
for k, v in stock_lex.items():
    if v > 0:
        stock_lex_scaled[k] = v / max(stock_lex.values()) * 4
    else:
        stock_lex_scaled[k] = v / min(stock_lex.values()) * -4

# Loughran and McDonald
positive = []
with open('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data//lm_positive.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        positive.append(row[0].strip())

negative = []
with open('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data//lm_negative.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        entry = row[0].strip().split(" ")
        if len(entry) > 1:
            negative.extend(entry)
        else:
            negative.append(entry[0])

final_lex = {}
final_lex.update({word:2.0 for word in positive})
final_lex.update({word:-2.0 for word in negative})
final_lex.update(stock_lex_scaled)
final_lex.update(sia.lexicon)
sia.lexicon = final_lex

Хотя общие результаты значительно улучшились, но модель, похоже, не понимает цифры. Например:

sia.polarity_scores('Royal Dutch Shell plc announced earnings results for the second quarter ended June 30, 2019. \ For the second quarter, the company announced total revenue was USD 91,838 million compared to USD 99,268 million a year \ ago. Net income was USD 2,998 million compared to USD 6,024 million a year ago. Basic earnings per share was USD 0.37 \ compared to USD 0.72 a year ago. For the half year, total revenue was USD 177,499 million compared to USD 190,382 million\ a year ago. Net income was USD 8,999 million compared to USD 11,923 million a year ago. Basic earnings per share was \ USD 1.11 compared to USD 1.44 a year ago. Diluted earnings per share was USD 1.1 compared to USD 1.42 a year ago.')

-0,81

, что абсолютно правильно, но даже когда я меняю цифры:

sia.polarity_scores('Royal Dutch Shell plc announced earnings results for the second quarter ended June 30, 2019. \ For the second quarter, the company announced total revenue was USD 91,838 million compared to USD 69,268 million a year \ ago. Net income was USD 2,998 million compared to USD 1,024 million a year ago. Basic earnings per share was USD 0.37 \ compared to USD 0.17 a year ago. For the half year, total revenue was USD 177,499 million compared to USD 150,382 million\ a year ago. Net income was USD 8,999 million compared to USD 6,923 million a year ago. Basic earnings per share was \ USD 1.11 compared to USD 1.04 a year ago. Diluted earnings per share was USD 1.1 compared to USD 1.02 a year ago.')

-0.81

предоставленная оценка настроения все еще отрицательна.

Есть ли способ, которым я могу помочь модели понять эти числа сконтекст к написанному тексту?

...