有人给我一个问题要解决,我似乎做得很好。
问题是:打印评分在4以上且总评分超过1000的图书的平均评论数,同时显示评分在2-3范围内且总评分在500以下的图书
这是我的代码:
import pandas as pd
df = pd.read_csv('data/GoodReads_100k_books.csv')
above = df[(df['rating'] > 4) & (df['totalratings'] > 1000)].mean()
below = df[(2 <= df['rating'] <= 3) & (df['totalratings'] < 500)].mean()
print(above)
print(below)
对我来说似乎不错,但它返回此错误:
TypeError Traceback (most recent call last)
Cell In[15], line 5
1 import pandas as pd
3 df = pd.read_csv('data/GoodReads_100k_books.csv')
----> 5 above = df[(df['rating'] > 4) & (df['totalratings'] > 1000)].mean()
7 below = df[(2 <= df['rating'] <= 3) & (df['totalratings'] < 500)].mean()
9 print(above)
TypeError: can only concatenate str (not "int") to str
我已询问 TabnineAI 代码有什么问题以及为什么会返回此错误。
它说代码没有问题,连接与我的代码无关。
我该如何解决这个问题?
错误消息表明mean()函数正在尝试将字符串列表转换为数字。发生这种情况是因为 Mean() 函数被应用于整个 df,包括包含字符串值的“标题”列。
要解决此问题,您应该仅将mean()函数应用于数字列。这是一个快捷方式:
import pandas as pd
df = pd.read_csv('data/GoodReads_100k_books.csv')
numeric_columns = ['rating', 'totalratings']
above = df[(df['rating'] > 4) & (df['totalratings'] > 1000)][numeric_columns].mean()
below = df[(df['rating'] >= 2) & (df['rating'] <= 3) & (df['totalratings'] < 500)][numeric_columns].mean()
print(above)
print(below)