财务报告分析情感-Python

问题描述 投票:0回答:1

我一直在尝试分析财务报表的情绪。在将财务词汇添加到词典后,我正在使用nltk.vader_lexicon模块。我正在使用此[[Loughran-McDonald词来增加here的财务词汇。

添加单词的代码如下:

import csv import pandas as pd # stock market lexicon stock_lex = pd.read_csv('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data/stock_lex.csv') stock_lex['sentiment'] = (stock_lex['Aff_Score'] + stock_lex['Neg_Score'])/2 stock_lex = dict(zip(stock_lex.Item, stock_lex.sentiment)) stock_lex = {k:v for k,v in stock_lex.items() if len(k.split(' '))==1} stock_lex_scaled = {} for k, v in stock_lex.items(): if v > 0: stock_lex_scaled[k] = v / max(stock_lex.values()) * 4 else: stock_lex_scaled[k] = v / min(stock_lex.values()) * -4 # Loughran and McDonald positive = [] with open('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data//lm_positive.csv', 'r') as f: reader = csv.reader(f) for row in reader: positive.append(row[0].strip()) negative = [] with open('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data//lm_negative.csv', 'r') as f: reader = csv.reader(f) for row in reader: entry = row[0].strip().split(" ") if len(entry) > 1: negative.extend(entry) else: negative.append(entry[0]) final_lex = {} final_lex.update({word:2.0 for word in positive}) final_lex.update({word:-2.0 for word in negative}) final_lex.update(stock_lex_scaled) final_lex.update(sia.lexicon) sia.lexicon = final_lex

尽管总体结果有所改善,但是该模型似乎无法理解这些数字。对于前:

sia.polarity_scores('Royal Dutch Shell plc announced earnings results for the second quarter ended June 30, 2019. \ For the second quarter, the company announced total revenue was USD 91,838 million compared to USD 99,268 million a year \ ago. Net income was USD 2,998 million compared to USD 6,024 million a year ago. Basic earnings per share was USD 0.37 \ compared to USD 0.72 a year ago. For the half year, total revenue was USD 177,499 million compared to USD 190,382 million\ a year ago. Net income was USD 8,999 million compared to USD 11,923 million a year ago. Basic earnings per share was \ USD 1.11 compared to USD 1.44 a year ago. Diluted earnings per share was USD 1.1 compared to USD 1.42 a year ago.')

-0.81

这是绝对正确的,但是即使我更改了数字:

sia.polarity_scores('Royal Dutch Shell plc announced earnings results for the second quarter ended June 30, 2019. \ For the second quarter, the company announced total revenue was USD 91,838 million compared to USD 69,268 million a year \ ago. Net income was USD 2,998 million compared to USD 1,024 million a year ago. Basic earnings per share was USD 0.37 \ compared to USD 0.17 a year ago. For the half year, total revenue was USD 177,499 million compared to USD 150,382 million\ a year ago. Net income was USD 8,999 million compared to USD 6,923 million a year ago. Basic earnings per share was \ USD 1.11 compared to USD 1.04 a year ago. Diluted earnings per share was USD 1.1 compared to USD 1.02 a year ago.')

-0.81

提供的情感分数仍然为负。

有没有一种方法可以帮助模型根据所写文本的上下文来理解这些数字?

python nlp nltk sentiment-analysis
1个回答
0
投票
据我了解,您只是根据作为文本句子要素的单个标记来调整情感估计,但这绝对不是正确的情感分析方法。为了训练允许对文本进行分类的模型,标准方法将在神经网络中使用长短期记忆单元。您可以使用这些Loughran-McDonald词来将标记映射到该文件中列出的类别。如果您所有的文字都符合此原理图比较,则可以提取数字,计算变化(有义的或负的),然后使用该数字训练模型以更好地理解与数字的关系。这可能意味着您将更改比例映射到可以输入LSTM模型的单独评估类别。
© www.soinside.com 2019 - 2024. All rights reserved.