使用praw来刮取子reddits的列表。"TypeError: 'Subreddit' object is not iterable."

Question

我正在使用Python 3的praw从一个子reddits列表中刮取帖子和评论。这段代码之前适用于1个subreddit，也适用于[i]个subreddits列表中的[j]个搜索词列表。我去掉了搜索词列表，只想让它遍历子reddits列表，但我一直得到 "TypeError: 'Subreddit' object is not iterable. 我不明白发生了什么？

subs= ["sub1","sub2", "sub3", "sub4"]

commentsDict = {"comment_user": [], "comment_text":[], "comment_score":[], "comment_date":[] }
postsDict = {"post_title" : [], "post_score" : [], "post_comments_num":[], "post_date":[], \
                "post_user":[], "post_text":[], "post_id":[]}

for i in range(len(subs)):
    for submission in reddit.subreddit(subs[i]):
        submission.comment_sort = 'new'
        comments = list(submission.comments)
        for comments in submission.comments:
            postsDict["post_title"].append(submission.title)#title of post with comment
            postsDict["post_score"].append(submission.score)#upvotes-downvotes
            postsDict["post_text"].append(submission.selftext)#get body of post
            postsDict["post_id"].append(submission.id)#unique id address for post
            postsDict["post_user"].append(submission.author)  #user name of poster
            postsDict["post_comments_num"].append(submission.num_comments) #number of comments on post
            date = submission.created_utc                                  #create variable for date
            timestamp = datetime.datetime.fromtimestamp(date)              #create variable to translate unix date 
            postsDict["post_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #extract date and add to dict
            for top_level_comment in submission.comments:                   #create loop for extracting comments
                if isinstance(top_level_comment, MoreComments):
                    continue
            submission.comments.replace_more(limit=None)                   #tell Praw to click more comments and get those too
            commentsDict["comment_user"].append(comments.author)              #get comment username
            commentsDict["comment_score"].append(comments.score)            #comment upvotes-downvotes
            date = comments.created                                         #same date as above but for comments
            timestamp = datetime.datetime.fromtimestamp(date)
            commentsDict["comment_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #add translated unix date to dict
            commentsDict["comment_text"].append(comments.body)      #get comment text

先谢谢你的帮助。

Answer 1

你需要使用 subreddit.stream.submissions() 作为你的for循环的生成器，例如

sub = reddit.subreddit(subreddit_name)
for submissions in sub.stream.submission():
    # Do stuff with submissions

Answer 2

首先(与你的问题无关)，这个循环通过索引迭代到列表中的 subs 然后用这个索引来获取一个项目。

for i in range(len(subs)):
    for submission in reddit.subreddit(subs[i]):

改为直接在子reddits上迭代。

for subreddit in subs:
    for submission in reddit.subreddit(subreddit):

现在要修正你的PRAW错误：你不能只在一个子reddit上迭代(for submission in reddit.subreddit(subreddit)). 你必须指定你要迭代的列表（如新的、热门的或顶部）。你可以在这里看到可用的列表的PRAW文件 Subreddit. 这些列表与你在网络上查看子reddit时看到的各种标签相对应。

例如，使用 hot 列表:

for subreddit in subs:
    for submission in reddit.subreddit(subreddit).hot():

如果你想指定返回的帖子数量，你可以使用 limit 参数。

for subreddit in subs:
    for submission in reddit.subreddit(subreddit).hot(limit=5):

上面的代码会给你每个subreddit最多5个提交的内容。

你的代码的其余部分做了一些非正统的事情。我在您之前的职位是这样的

comments = list(submission.comments)
for comments in submission.comments:

你设置 comments 等于某个东西，然后永远不使用它，因为它在下一行被重新定义。我将删除 comments = 行，因为它什么也没做。

另外，对于帖子中的每一条评论，你都会遍历帖子中的所有评论，却什么都不做。

for top_level_comment in submission.comments:                   #create loop for extracting comments
    if isinstance(top_level_comment, MoreComments):
        continue

我不知道你想让这段代码做什么但现在它除了浪费时间之外什么也没做所以我也要把它删掉。

使用praw来刮取子reddits的列表。"TypeError: 'Subreddit' object is not iterable."

问题描述投票：0回答：1

1个回答

最新问题

使用praw来刮取子reddits的列表。"TypeError: 'Subreddit' object is not iterable."

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1