我正在尝试使用Reddit的API抓取数据。但是,执行此操作时出现Value错误。为什么会发生这种情况?

问题描述 投票:1回答:1

我正在尝试从Reddit的API中抓取食谱。但是,我不断收到错误消息。如果您可以帮助我解决此问题,那么将不胜感激。

这是我使用的代码:

#! python3
import praw
import pandas as pd
import datetime as dt
reddit=praw.Reddit(client_id='RpdZdsNcyIE9vg', \
                   client_secret='aVlCaLr5XMfP4BP-1a8-4B2uOo8', \
                   user_agent= 'Food Parser', \
                   username= 'AndrewPlummer2020', \
                   password= 'John3:18')
subreddit=reddit.subreddit('recipes')
top_subreddit=subreddit.top(limit=800)
for submission in subreddit.top(limit=1):
    print(submission.title, submission.id)
topics_dict = {"title":[], \
               "score":[], \
               "id": [], "url": [], \
               "comms_num": [], \
               "created": [], \
               "body": []}
for submission in top_subreddit:
    topics_dict['title'].append(submission.title)
    topics_dict['score'].append(submission.score)
    topics_dict['comms_num'].append(submission.num_comments)
    topics_dict['created'].append(submission.created)
    topics_dict['body'].append(submission.selftext)

topics_data=pd.DataFrame(topics_dict)
topics_data.to_csv("Dish Recpies.csv", set='\t')

这是我得到的错误。

Traceback (most recent call last):
  File "C:/Users/plumm/AppData/Local/Programs/Python/Python37/Reddit_scraper.py", line 27, in <module>
    topics_data=pd.DataFrame(topics_dict)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 411, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
    index = extract_index(arrays)
  File "C:\Users\plumm\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py", line 368, in extract_index
    raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length

任何帮助将不胜感激。预先谢谢你。

python pandas api reddit
1个回答
0
投票

Panda抱怨您的数组长度不同。

正如@ Aran-Fey所提到的,这是因为您的数组topics_dict[id]topics_dict[url]为空,因为在下面的代码中未向其添加任何内容。

for submission in top_subreddit:
    topics_dict['title'].append(submission.title)
    topics_dict['score'].append(submission.score)
    topics_dict['comms_num'].append(submission.num_comments)
    topics_dict['created'].append(submission.created)
    topics_dict['body'].append(submission.selftext)

要解决此问题,请添加以下几行:

for submission in top_subreddit:
    topics_dict['title'].append(submission.title)
    topics_dict['score'].append(submission.score)
    topics_dict['comms_num'].append(submission.num_comments)
    topics_dict['created'].append(submission.created)
    topics_dict['body'].append(submission.selftext)

    # Add url and id
    topics_dict['id'].append(submission.id)
    topics_dict['url'].append(submission.url)

[repl.it打印出CSV文件时输出

                                                 title  ...  body
0             Garlic Butter Steak and Potatoes Skillet  ...
1    This shoyu ramen broth is our family's favorit...  ...
2    Wasn't sure how to properly thank a stranger f...  ...
3    I'm working on moving all my mothers hand writ...  ...
4                   I made my first ever loaf of bread  ...
..                                                 ...  ...   ...
795            Linguine with Golden Beet and Beef Ragù  ...
796                             Vegan Stone Fruit Tart  ...
797                               Mojito Chicken Tacos  ...
798                                  Mongolian Chicken  ...
799  Stuffed Handmade flat bread ( Paratha )with Eg...  ...

[800 rows x 7 columns]

最后一个节点:解决问题后,请不要忘记更改密码和client_secret:)

© www.soinside.com 2019 - 2024. All rights reserved.