使用praw来刮取子reddits的列表。"TypeError: 'Subreddit' object is not iterable."

问题描述 投票:0回答:1

我正在使用Python 3的praw从一个子reddits列表中刮取帖子和评论。这段代码之前适用于1个subreddit,也适用于[i]个subreddits列表中的[j]个搜索词列表。我去掉了搜索词列表,只想让它遍历子reddits列表,但我一直得到 "TypeError: 'Subreddit' object is not iterable. 我不明白发生了什么?

subs= ["sub1","sub2", "sub3", "sub4"]

commentsDict = {"comment_user": [], "comment_text":[], "comment_score":[], "comment_date":[] }
postsDict = {"post_title" : [], "post_score" : [], "post_comments_num":[], "post_date":[], \
                "post_user":[], "post_text":[], "post_id":[]}

for i in range(len(subs)):
    for submission in reddit.subreddit(subs[i]):
        submission.comment_sort = 'new'
        comments = list(submission.comments)
        for comments in submission.comments:
            postsDict["post_title"].append(submission.title)#title of post with comment
            postsDict["post_score"].append(submission.score)#upvotes-downvotes
            postsDict["post_text"].append(submission.selftext)#get body of post
            postsDict["post_id"].append(submission.id)#unique id address for post
            postsDict["post_user"].append(submission.author)  #user name of poster
            postsDict["post_comments_num"].append(submission.num_comments) #number of comments on post
            date = submission.created_utc                                  #create variable for date
            timestamp = datetime.datetime.fromtimestamp(date)              #create variable to translate unix date 
            postsDict["post_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #extract date and add to dict
            for top_level_comment in submission.comments:                   #create loop for extracting comments
                if isinstance(top_level_comment, MoreComments):
                    continue
            submission.comments.replace_more(limit=None)                   #tell Praw to click more comments and get those too
            commentsDict["comment_user"].append(comments.author)              #get comment username
            commentsDict["comment_score"].append(comments.score)            #comment upvotes-downvotes
            date = comments.created                                         #same date as above but for comments
            timestamp = datetime.datetime.fromtimestamp(date)
            commentsDict["comment_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #add translated unix date to dict
            commentsDict["comment_text"].append(comments.body)      #get comment text 

先谢谢你的帮助。

python-3.x praw
1个回答
0
投票

你需要使用 subreddit.stream.submissions() 作为你的for循环的生成器,例如

sub = reddit.subreddit(subreddit_name)
for submissions in sub.stream.submission():
    # Do stuff with submissions

0
投票

首先(与你的问题无关),这个循环通过索引迭代到列表中的 subs 然后用这个索引来获取一个项目。

for i in range(len(subs)):
    for submission in reddit.subreddit(subs[i]):

改为直接在子reddits上迭代。

for subreddit in subs:
    for submission in reddit.subreddit(subreddit):

现在要修正你的PRAW错误: 你不能只在一个子reddit上迭代(for submission in reddit.subreddit(subreddit)). 你必须指定你要迭代的列表(如新的、热门的或顶部)。你可以在这里看到可用的列表 的PRAW文件 Subreddit. 这些列表与你在网络上查看子reddit时看到的各种标签相对应。

Reddit tabs: hot, new, rising, controversial, top, gilded

例如,使用 hot 列表:

for subreddit in subs:
    for submission in reddit.subreddit(subreddit).hot():

如果你想指定返回的帖子数量,你可以使用 limit 参数。

for subreddit in subs:
    for submission in reddit.subreddit(subreddit).hot(limit=5):

上面的代码会给你每个subreddit最多5个提交的内容。

你的代码的其余部分做了一些非正统的事情。我在 您之前的职位是这样的

comments = list(submission.comments)
for comments in submission.comments:

你设置 comments 等于某个东西,然后永远不使用它,因为它在下一行被重新定义。我将删除 comments = 行,因为它什么也没做。

另外,对于帖子中的每一条评论,你都会遍历帖子中的所有评论,却什么都不做。

for top_level_comment in submission.comments:                   #create loop for extracting comments
    if isinstance(top_level_comment, MoreComments):
        continue

我不知道你想让这段代码做什么 但现在它除了浪费时间之外什么也没做 所以我也要把它删掉。

© www.soinside.com 2019 - 2024. All rights reserved.