如何在熊猫数据框中迭代嵌套的for循环？

Question

[我试图遍历Hacker News数据集，并尝试创建在HN论坛上找到的3个类别（即帖子类型），即ask_posts，show_posts和other_posts。

简而言之，我试图找出每个类别每个帖子的平均评论数（如下所述。）>

import pandas as pd
import datetime as dt

df = pd.read_csv('HN_posts_year_to_Sep_26_2016.csv')

ask_posts = []
show_posts = []
other_post = []
total_ask_comments = 0
total_show_comments = 0

for i, row in df.iterrows():
    title = row.title
    comments = row['num_comments']
    if title.lower().startswith('ask hn'):
        ask_posts.append(title)
        for post in ask_posts:
            total_ask_comments += comments
    elif title.lower().startswith('show hn'):
        show_posts.append(title)
        for post in show_posts:
             total_show_comments += comments
    else:
        other_post.append(title)

avg_ask_comments = total_ask_comments/len(ask_posts)
avg_show_comments = total_show_comments/len(show_posts)


print(total_ask_comments)
print(total_show_comments)

print(avg_ask_comments)
print(avg_show_comments)
结果分别是;

395976587

250362315

和

43328.21829521829

24646.81187241583

这些值似乎很高，我不确定是否可以，因为这与嵌套循环的结构方式有关。这种方法正确吗？使用for循环执行此操作非常重要。

感谢您对我的代码的所有帮助/验证。

我正在尝试遍历Hacker News数据集，并尝试创建在HN论坛上找到的3个类别（即帖子类型），即ask_posts，show_posts和other_posts。简而言之，我是...

Answer 1

遍历熊猫数据框以按需获取信息将非常缓慢。使用过滤来获取所需信息的速度要快得多。

如何在熊猫数据框中迭代嵌套的for循环？

问题描述投票：0回答：1

1个回答

最新问题

如何在熊猫数据框中迭代嵌套的for循环？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1