如何使用Reddit API循环创建DataFrame并管理列表

问题描述 投票:0回答:1

我对Reddit API(PRAW / PSAW),Python以及一般的编程都是非常陌生的。我想做的是在6个月内从某些子redredit那里获得顶级提交,然后将列表转换为DataFrame并转换为CSV文件。

我想要:

  1. 获取列表的长度
  2. 按日期排序(时代)
  3. 从中制作数据框

到目前为止我尝试过的:

list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
    if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
        print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
        len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
        sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable. 
                                       # I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.

简而言之,

我想以此来制作数据框也可以解决前两个问题,尽管我认为能够使用代码做到这一点在评估列表时会有所帮助!

python reddit praw
1个回答
0
投票

回答问题的3个部分:

  1. [要获取列表的长度,您需要将要求值的列表传递给len()方法,因此,如果要说找到list_submission的长度,则改为执行len(list_submission)。目前,您基本上是在尝试获取虚无的长度,因此这就是为什么看到零的原因。
  2. 如果提交符合要求,您可以将其附加到带有list_submission.append(submission)的提交列表中。然后,在for循环完成之后,您可以使用sorted()对整个列表进行排序。您需要传递整个列表以及要排序的键,因此看起来像sorted(list_submission, key=lambda submission: submission.created_utc)。出现错误的原因是因为传递了错误的参数。
  3. 您将列表转换为DataFrame的方法应该起作用。您可以使用columns = ['created_utc', 'title', 'score', 'id']设置列名。

最终代码将类似于以下内容:

list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
    if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
        print(submission.created_utc, submission.title, submission.score, submission.id)
        list_submission.append(submission)
        print(len(list_submission))

sorted(list_submission, key=lambda submission: submission.created_utc)  
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])
© www.soinside.com 2019 - 2024. All rights reserved.