如何使用 python 利用 next_token 检索 twitter v2 上某个时间段的 100 条推文

Question

我尝试使用基本访问 API 检索总共 5000 条推文。我收到的推文数量少于 100 条，要求在我的代码中使用重复检查。我想使用 next_token 参数，但我不知道如何在这段代码中实现它，以便 API 不会每次都查看同一组推文，浪费我的请求。

其次，我想使用关键字（我已经完成）和用户自定义的位置（例如美国和英国）来提取推文。如何将个人资料位置过滤器添加到此搜索？（澄清一下，我不需要推文地理位置，而是用户简介中的位置。

最后，我的数据中出现了被截断的推文，其中推文会被缩短，并且后面会出现这个符号“…”。

如果有任何帮助，我将不胜感激。谢谢你。

import tweepy
import pandas as pd
import config1

client = tweepy.Client(bearer_token=config1.bearer_token, wait_on_rate_limit=True)

# Define the query parameters
keywords = 'COVID or women'
start_date = '2023-07-22T02:00:00Z'
end_date = '2023-07-22T18:00:00Z'

# Create a list to store the extracted data
tweets_data = []

    # Perform the search query for keyword
query = f'{keywords} lang:en -is:retweet -is:quote -has:media'

response = client.search_recent_tweets(query=query, max_results=100, start_time=start_date, end_time=end_date, tweet_fields=['id', 'text', 'created_at', 'public_metrics'], expansions=['geo.place_id'])
    
    # Extract the desired information from each tweet
for tweet in response.data:
    tweet_data = {
        'Tweet ID': tweet['id'],
        'Text': tweet['text'].encode('utf-8', 'ignore').decode('utf-8'),
        'Public metrics': {
            'retweet_count': tweet['public_metrics']['retweet_count'],
            'reply_count': tweet['public_metrics']['reply_count'],
            'like_count': tweet['public_metrics']['like_count']
         },
         'Created At': tweet['created_at'],
         'Place': tweet['geo']
         
    }
            
    # Add the tweet data to the list
    tweets_data.append(tweet_data)

# Create a DataFrame from the extracted tweet data
df = pd.DataFrame(tweets_data)

# Load the existing CSV file
existing_df = pd.read_csv('tweets_ex.csv', encoding='utf-8')

# Concatenate the existing DataFrame and the new DataFrame
updated_df = pd.concat([existing_df, df], ignore_index=True)

# Drop duplicate tweets based on the 'Tweet ID'
updated_df.drop_duplicates(subset='Text', inplace=True)

# Save the updated DataFrame to the CSV file
updated_df.to_csv('tweets_ex.csv', encoding='utf-8', index=False)

# Print the updated number of rows
print(f"The updated number of rows in the CSV file is: {len(updated_df)}")

当我尝试 next_token 时，它说超出了速率限制，并且睡眠时间超过 800 秒，我最终不得不中断。最终没有返回任何推文摘录，但它用完了我的推文摘录。我只要求 35 条推文，但我的开发门户上却显示 2100 条推文已被删除！

对于截断的推文，我尝试了推文模式 = 扩展，但我认为它与 v2 不兼容

# Perform the search query for keyword and pagination
next_token = None
while True:
    query = f'{keywords} lang:en -is:retweet -is:quote -has:media'
    response = client.search_recent_tweets(
        query=query,
        max_results=35,
        start_time=start_date,
        end_time=end_date,
        tweet_fields=['id', 'text', 'created_at', 'public_metrics'],
        expansions=['geo.place_id'],
        next_token=next_token
    )
    
    for tweet in response.data:
        tweet_data = {
            'Tweet ID': tweet['id'],
            'Text': tweet['text'].encode('utf-8', 'ignore').decode('utf-8'),
            'Public metrics': {
                'retweet_count': tweet['public_metrics']['retweet_count'],
                'reply_count': tweet['public_metrics']['reply_count'],
                'like_count': tweet['public_metrics']['like_count']
            },
            'Created At': tweet['created_at'],
            'Place': tweet['geo']
        }
        tweets_data.append(tweet_data)

    if 'next_token' in response.meta:
        next_token = response.meta['next_token']
    else:
        break

Answer 1

您是否能够使用用户自定义位置提取推文并解决截断问题？

如何使用 python 利用 next_token 检索 twitter v2 上某个时间段的 100 条推文

问题描述投票：0回答：1

1个回答

最新问题

如何使用 python 利用 next_token 检索 twitter v2 上某个时间段的 100 条推文

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1