我正在尝试从Reddit的API中抓取食谱。但是,我不断收到错误消息。如果您可以帮助我解决此问题,那么将不胜感激。
这是我使用的代码:
#! python3
import praw
import pandas as pd
import datetime as dt
reddit=praw.Reddit(client_id='RpdZdsNcyIE9vg', \
client_secret='aVlCaLr5XMfP4BP-1a8-4B2uOo8', \
user_agent= 'Food Parser', \
username= 'AndrewPlummer2020', \
password= 'John3:18')
subreddit=reddit.subreddit('recipes')
top_subreddit=subreddit.top(limit=800)
for submission in subreddit.top(limit=1):
print(submission.title, submission.id)
topics_dict = {"title":[], \
"score":[], \
"id": [], "url": [], \
"comms_num": [], \
"created": [], \
"body": []}
for submission in top_subreddit:
topics_dict['title'].append(submission.title)
topics_dict['score'].append(submission.score)
topics_dict['comms_num'].append(submission.num_comments)
topics_dict['created'].append(submission.created)
topics_dict['body'].append(submission.selftext)
topics_data=pd.DataFrame(topics_dict)
topics_data.to_csv("Dish Recpies.csv", set='\t')
这是我得到的错误。
Traceback (most recent call last):
File "C:/Users/plumm/AppData/Local/Programs/Python/Python37/Reddit_scraper.py", line 27, in <module>
topics_data=pd.DataFrame(topics_dict)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 411, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
index = extract_index(arrays)
File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 368, in extract_index
raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length
任何帮助将不胜感激。预先谢谢你。
Panda抱怨您的数组长度不同。
正如@ Aran-Fey所提到的,这是因为您的数组topics_dict[id]
和topics_dict[url]
为空,因为在下面的代码中未向其添加任何内容。
for submission in top_subreddit:
topics_dict['title'].append(submission.title)
topics_dict['score'].append(submission.score)
topics_dict['comms_num'].append(submission.num_comments)
topics_dict['created'].append(submission.created)
topics_dict['body'].append(submission.selftext)
要解决此问题,请添加以下几行:
for submission in top_subreddit:
topics_dict['title'].append(submission.title)
topics_dict['score'].append(submission.score)
topics_dict['comms_num'].append(submission.num_comments)
topics_dict['created'].append(submission.created)
topics_dict['body'].append(submission.selftext)
# Add url and id
topics_dict['id'].append(submission.id)
topics_dict['url'].append(submission.url)
[repl.it打印出CSV文件时输出
title ... body
0 Garlic Butter Steak and Potatoes Skillet ...
1 This shoyu ramen broth is our family's favorit... ...
2 Wasn't sure how to properly thank a stranger f... ...
3 I'm working on moving all my mothers hand writ... ...
4 I made my first ever loaf of bread ...
.. ... ... ...
795 Linguine with Golden Beet and Beef Ragù ...
796 Vegan Stone Fruit Tart ...
797 Mojito Chicken Tacos ...
798 Mongolian Chicken ...
799 Stuffed Handmade flat bread ( Paratha )with Eg... ...
[800 rows x 7 columns]
最后一个节点:解决问题后,请不要忘记更改密码和client_secret:)