我有一个这样的csv文件。
year,value
1897.386301369863,0.6
1897.3890410958904,1.1
1897.3917808219178,0.0
1897.3945205479451,8.3
1897.3972602739725,3.3
1897.4,6.7
1897.4027397260274,0.6
1897.4054794520548,2.2
1897.4082191780822,0.6
1897.4109589041095,9.4
1897.4136986301369,9.4
1897.4164383561645,31.1
这是我写的代码
import pandas as pd
df1 = pd.read_csv("[Path to file is here]", header=0, sep=",")
df1["year"] = df1["year"].astype(int)
n1 = df1.groupby("year")["value"].mean()
但我一直收到这个错误信息:
pandas.core.base.DataError: 没有数字类型可以聚合
这个代码我检查了很多次,以前也能用,但是不知道哪里出了问题。
你可以做
df1["year"] = df1["year"].astype(int)
df1["value"] = pd.to_numeric(df1["value"])
n1 = df1.groupby("year")["value"].mean()
如果更换丢失的 value
数据与 0
是正常的,以下内容可以解决你的问题
import pandas as pd
import numpy as np
df1 = pd.read_csv("./a.csv", header=0, sep=",")
df1["value"] = df1["value"].replace(r'^\s*$', np.nan, regex=True)
df1["value"] = df1["value"].astype(float)
df1["year"] = df1["year"].astype(int)
df1["value"] = df1["value"].fillna(0)
n1 = df1.groupby("year")["value"].mean()
print(n1)
如果你想省略缺失的数据,请使用下面的方法。
import pandas as pd
import numpy as np
df1 = pd.read_csv("./a.csv", header=0, sep=",")
df1["value"] = df1["value"].replace(r'^\s*$', np.nan, regex=True)
df1 = df1[~df1["value"].isnull()]
df1["value"] = df1["value"].astype(float)
df1["year"] = df1["year"].astype(int)
df1["value"] = df1["value"].fillna(0)
n1 = df1.groupby("year")["value"].mean()
print(n1)