我正在使用tweepy流API来获取包含特定主题标签的推文。我面临的问题是我无法从Streaming API中提取推文的全文。只有140个字符可用,之后会被截断。
这是代码:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
def analyze_status(text):
if 'RT' in text[0:3]:
return True
else:
return False
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
if not analyze_status(status.text) :
with open('fetched_tweets.txt','a') as tf:
tf.write(status.text.encode('utf-8') + '\n\n')
print(status.text)
def on_error(self, status):
print("Error Code : " + status)
def test_rate_limit(api, wait=True, buffer=.1):
"""
Tests whether the rate limit of the last request has been reached.
:param api: The `tweepy` api instance.
:param wait: A flag indicating whether to wait for the rate limit reset
if the rate limit has been reached.
:param buffer: A buffer time in seconds that is added on to the waiting
time as an extra safety margin.
:return: True if it is ok to proceed with the next request. False otherwise.
"""
#Get the number of remaining requests
remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
#Check if we have reached the limit
if remaining == 0:
limit = int(api.last_response.getheader('x-rate-limit-limit'))
reset = int(api.last_response.getheader('x-rate-limit-reset'))
#Parse the UTC time
reset = datetime.fromtimestamp(reset)
#Let the user know we have reached the rate limit
print "0 of {} requests remaining until {}.".format(limit, reset)
if wait:
#Determine the delay and sleep
delay = (reset - datetime.now()).total_seconds() + buffer
print "Sleeping for {}s...".format(delay)
sleep(delay)
#We have waited for the rate limit reset. OK to proceed.
return True
else:
#We have reached the rate limit. The user needs to handle the rate limit manually.
return False
#We have not reached the rate limit
return True
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener ,
tweet_mode='extended')
myStream.filter(track=['#bitcoin'],async=True)
有没有人有办法解决吗 ?
tweet_mode=extended
在此代码中不起作用,因为Streaming API不支持该参数。如果Tweet包含更长的文本,它将在JSON响应中包含一个名为extended_tweet
的附加对象,该对象将包含一个名为full_text
的字段。
在这种情况下,你会想要像print(status.extended_tweet.full_text)
这样的东西来提取更长的文本。
您必须启用扩展推文模式,如下所示:
s = tweepy.Stream(auth, l, tweet_mode='extended')
然后你可以打印扩展的推文,但记得由于Twitter API你必须确保存在扩展的推文,否则它会抛出错误
l = listener()
class listener(StreamListener):
def on_status(self, status):
try:
print(status.extended_tweet['full_text'])
except Exception as e:
raise
else:
print(status.text)
return True
def on_error(self, status_code):
if status_code == 420:
return False
为我工作。
在@ AndyPiper的answer的基础上,您可以通过try / except检查推文是否在那里:
def get_tweet_text(tweet):
try:
return tweet.extended_tweet['full_text']
except AttributeError as e:
return tweet.text
或检查内部json:
def get_tweet_text(tweet):
if 'extended_tweet' in tweet._json:
return tweet.extended_tweet['full_text']
else:
return tweet.text
请注意,extended_tweet是一个字典对象,因此“tweet.extended_tweet.full_text”实际上不起作用并将引发错误。
除了之前的答案:在我的情况下它只作为status.extended_tweet['full_text']
,因为status.extended_tweet
只是一本字典。