使用NTLk的实时Feed中的情感分析

Question

我想知道它是否可行，以实时/流媒体新闻的形式分析头条新闻（使用NTLK / sentiment vader）。

在提供新闻系统（标题）的代码下面

import praw
import time


reddit = praw.Reddit(client_id='xxxx',
client_secret='MLK5gKaEM2FxxxxxxxxI', user_agent='testing_api')



  # must be edited to properly authenticate
subreddit = reddit.subreddit('worldnews')
seen_submissions = set()

while True:
    for submission in subreddit.new(limit=10):
        if submission.fullname not in seen_submissions:
            seen_submissions.add(submission.fullname)
            print('{} {}\n'.format(submission.title, submission.url))
    time.sleep(60)  # sleep for a minute (60 seconds)

使用SentimentIntensityAnalyzer我建立了：

from IPython import display
import math
from pprint import pprint
import pandas as pd
import numpy as np
import nltk
nltk.download('vader_lexicon')
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='darkgrid', context='talk', palette='Dark2')

import praw 

reddit = praw.Reddit(client_id='xxxx',
client_secret='MLK5gKaEM2FxxxxxxxxI', user_agent='testing_api')
subreddit = reddit.subreddit('worldnews')


headlines = set()

while True:
  for submission in subreddit.new(limit=10):
   if submission.title not in headlines:
       headlines.add(submission.title)
time.sleep(60)  # sleep for a minute (60 seconds)







from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

sia = SIA()
results = []


for line in headlines:
    pol_score = sia.polarity_scores(line)
    pol_score['headline'] = line
    results.append(pol_score)

pprint(results[], width=100)

我看不到控制台中显示的任何内容……希望（实时）看到类似的内容

 {'compound': -0.5267,
  'headline': 'Report: Nearly Half of Americans Breathing Unhealthy Air',
  'neg': 0.327,
  'neu': 0.673,
  'pos': 0.0},
 {'compound': -0.0754,
  'headline': 'The Implications of Trump Derangement Syndrome | Even now, vehement Trump '
              'supporters seem to believe that most criticism of the president is explained by '
              'widespread TDS.',
  'neg': 0.11,
  'neu': 0.791,
  'pos': 0.1}]

Answer 1

看来您没有提供完整的示例。您仍然需要调用polarity_scores()并将其添加到您的数据结构中。

例如，如果您要使用字典：

reddit = praw.Reddit( ... )
sub = reddit.subreddit('worldnews')
analyzer = SentimentIntensityAnalyzer()

results = {}
posts = sub.new(limit=10)
for post in posts:
    title = post.title
    if title in results:
        # skip title if previously encountered
        continue

    score = analyzer.polarity_scores(title)
    results[title] = score
    results[title]['headline'] = title

您还可以通过按日期进行搜索或仅跟踪最后看到的帖子的时间戳，并使其他帖子的回路短路，并像开始时一样使用原始的set()，从而使查询和循环更加高效。] >

results = set()

...

    if post.created > last_date:
        break

    last_date = post.created
    score = analyzer.polarity_scores(post.title)
    score['headline'] = post.title
    results.add(score)

您可能会发现本教程有助于获得有关构建这样的系统的更多详细信息：https://www.codeproject.com/Articles/5269358/Introducing-NLTK-for-Natural-Language-Processing-w

使用NTLk的实时Feed中的情感分析

问题描述投票：0回答：1

1个回答

最新问题

使用NTLk的实时Feed中的情感分析

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1