我对Reddit API(PRAW / PSAW),Python以及一般的编程都是非常陌生的。我想做的是在6个月内从某些子redredit那里获得顶级提交,然后将列表转换为DataFrame并转换为CSV文件。
我想要:
到目前为止我尝试过的:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable.
# I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.
简而言之,
我想以此来制作数据框也可以解决前两个问题,尽管我认为能够使用代码做到这一点在评估列表时会有所帮助!
回答问题的3个部分:
len()
方法,因此,如果要说找到list_submission
的长度,则改为执行len(list_submission)
。目前,您基本上是在尝试获取虚无的长度,因此这就是为什么看到零的原因。list_submission.append(submission)
的提交列表中。然后,在for循环完成之后,您可以使用sorted()
对整个列表进行排序。您需要传递整个列表以及要排序的键,因此看起来像sorted(list_submission, key=lambda submission: submission.created_utc)
。出现错误的原因是因为传递了错误的参数。columns = ['created_utc', 'title', 'score', 'id']
设置列名。最终代码将类似于以下内容:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id)
list_submission.append(submission)
print(len(list_submission))
sorted(list_submission, key=lambda submission: submission.created_utc)
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])