我正在使用Python 3的praw从一个子reddits列表中刮取帖子和评论。这段代码之前适用于1个subreddit,也适用于[i]个subreddits列表中的[j]个搜索词列表。我去掉了搜索词列表,只想让它遍历子reddits列表,但我一直得到 "TypeError: 'Subreddit' object is not iterable. 我不明白发生了什么?
subs= ["sub1","sub2", "sub3", "sub4"]
commentsDict = {"comment_user": [], "comment_text":[], "comment_score":[], "comment_date":[] }
postsDict = {"post_title" : [], "post_score" : [], "post_comments_num":[], "post_date":[], \
"post_user":[], "post_text":[], "post_id":[]}
for i in range(len(subs)):
for submission in reddit.subreddit(subs[i]):
submission.comment_sort = 'new'
comments = list(submission.comments)
for comments in submission.comments:
postsDict["post_title"].append(submission.title)#title of post with comment
postsDict["post_score"].append(submission.score)#upvotes-downvotes
postsDict["post_text"].append(submission.selftext)#get body of post
postsDict["post_id"].append(submission.id)#unique id address for post
postsDict["post_user"].append(submission.author) #user name of poster
postsDict["post_comments_num"].append(submission.num_comments) #number of comments on post
date = submission.created_utc #create variable for date
timestamp = datetime.datetime.fromtimestamp(date) #create variable to translate unix date
postsDict["post_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #extract date and add to dict
for top_level_comment in submission.comments: #create loop for extracting comments
if isinstance(top_level_comment, MoreComments):
continue
submission.comments.replace_more(limit=None) #tell Praw to click more comments and get those too
commentsDict["comment_user"].append(comments.author) #get comment username
commentsDict["comment_score"].append(comments.score) #comment upvotes-downvotes
date = comments.created #same date as above but for comments
timestamp = datetime.datetime.fromtimestamp(date)
commentsDict["comment_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #add translated unix date to dict
commentsDict["comment_text"].append(comments.body) #get comment text
先谢谢你的帮助。
你需要使用 subreddit.stream.submissions()
作为你的for循环的生成器,例如
sub = reddit.subreddit(subreddit_name)
for submissions in sub.stream.submission():
# Do stuff with submissions
首先(与你的问题无关),这个循环通过索引迭代到列表中的 subs
然后用这个索引来获取一个项目。
for i in range(len(subs)):
for submission in reddit.subreddit(subs[i]):
改为直接在子reddits上迭代。
for subreddit in subs:
for submission in reddit.subreddit(subreddit):
现在要修正你的PRAW错误: 你不能只在一个子reddit上迭代(for submission in reddit.subreddit(subreddit)
). 你必须指定你要迭代的列表(如新的、热门的或顶部)。你可以在这里看到可用的列表 的PRAW文件 Subreddit
. 这些列表与你在网络上查看子reddit时看到的各种标签相对应。
例如,使用 hot
列表:
for subreddit in subs:
for submission in reddit.subreddit(subreddit).hot():
如果你想指定返回的帖子数量,你可以使用 limit
参数。
for subreddit in subs:
for submission in reddit.subreddit(subreddit).hot(limit=5):
上面的代码会给你每个subreddit最多5个提交的内容。
你的代码的其余部分做了一些非正统的事情。我在 您之前的职位是这样的
comments = list(submission.comments)
for comments in submission.comments:
你设置 comments
等于某个东西,然后永远不使用它,因为它在下一行被重新定义。我将删除 comments =
行,因为它什么也没做。
另外,对于帖子中的每一条评论,你都会遍历帖子中的所有评论,却什么都不做。
for top_level_comment in submission.comments: #create loop for extracting comments
if isinstance(top_level_comment, MoreComments):
continue
我不知道你想让这段代码做什么 但现在它除了浪费时间之外什么也没做 所以我也要把它删掉。