使用 groupby as_index=False, count, to_frame 给出 'Dataframe' 对象没有属性 to_frame

Question

我正在尝试使用 groupby()、count() 和 to_frame() 从现有数据帧创建一个新数据帧。在将 'as_index=False' 添加到 groupby 后，我收到 AttributeError: 'DataFrame' object has no attribute 'to_frame' 。

这是代码：


    newdat = indat.query('-1017 <= WDIR16 <= -1000')
    newdat.reset_index(drop=True, inplace=True)
    newdat.sort_values(by=['YEAR', 'MO', 'GP', 'HR'], inplace=True)

    # Find Count
    w1 = newdat.groupby(['YEAR','MO', 'GP','HR'], as_index=False)["WDIR16"].count().to_frame(name='wndclimodirectionobsqty').reset_index()

    # Find Means
    w1['wndclimomeanspeedrate'] = newdat.groupby(['YEAR','MO', 'GP', 'HR'], as_index=False).aggregate({'WSPD':'mean'}, as_index=False).values

错误发生在“to_frame”行。我在 groupby 中使用“as_index=False”的原因是因为有时现有的数据框可能为空，其中有列。参考：在空数据框中保留 groupby 之后的列如果我省略“as_index=False”，则带有“to_frame”的行就可以工作。但是，如果 groupby 上的数据帧为空，则空列不会移动到新数据帧。有什么想法吗？

这是 newdat 数据框的几行：

NETWORKTYPE,PLATFORMID,REPORTTYPECODE,OBSERVATIONTIME,YEAR,MO,DAY,HR,MINUTEDV,PLATFORMHEIGHT,TEMPC,DEWPC,WDIR,WSPD,GUST,SLP,STNPRES,ALSTG,CIG,SKY,CAVOK,VSBY,PRCP1,PRCPTIM1,PRCP2,PRCPTIM2,PRCP3,PRCPTIM3,PRCP4,PRCPTIM4,HUMREL,VAPOR,ABSHUM,SPHUM,TVIRTK,DENSITY,DENALT,PRSALT,SKY100,TEMP_GE32,TEMP_LE0,TEMP_LEM17,TSTM,FOG,FOG3MILE,BLOWSNOW,BLOWSAND,FREZRAIN,HAIL,SNOW,FROZPRCP,SNOWICE,RAIN,ALLPRECP,SMOKHAZE,SANDSNOW,OBSTVISN,U,V,WDIRCOS,WDIRSIN,WDIR16,CALM,LIGHT,WSPDGT12_8,WSPDGT12_3,WSPDGT9_7,WSPDGT17_5,WSPDGT25_2,VSBY_800,VSBY_1600,VSBY_3200,VSBY_4800,GP
ICAO ,KOFF,SAO  ,1948-01-12 06:00:00,1948,1,12,6,0,320.0,2.4,0.2,290.0,4.6,,,,,22000.0,8.0,N,11200,,,,,,,,,,6.196962,4.87,,,,,0,100.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4.32259,-1.57329,0.3420201029058752,-0.9396926354975091,-1014,0,0,0,0,0,0,0,1,1,1,1,3
ICAO ,KOFF,SAO  ,1948-01-12 07:00:00,1948,1,12,6,0,320.0,2.4,-2.6,290.0,5.1,,,,,22000.0,7.0,N,8000,,,,,,,,,,5.045877,3.97,,,,,0,87.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,4.79243,-1.7443,0.3420201029058752,-0.9396926354975091,-1014,0,0,0,0,0,0,0,1,1,1,1,3
ICAO ,KOFF,SAO  ,1948-01-12 08:00:00,1948,1,12,9,0,320.0,0.8,-1.5,290.0,4.6,,,,,22000.0,7.0,N,11200,,,,,,,,,,5.473223,4.33,,,,,0,87.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4.32259,-1.57329,0.3420201029058752,-0.9396926354975091,-1014,0,0,0,0,0,0,0,1,1,1,1,3

Answer 1

IIUC 你可以做：

indat = pd.read_csv("your_data.csv")

newdat = indat.query("-1017 <= WDIR16 <= -1000")
newdat.reset_index(drop=True, inplace=True)
newdat.sort_values(by=["YEAR", "MO", "GP", "HR"], inplace=True)

# Find Count
w1 = (
    newdat.groupby(["YEAR", "MO", "GP", "HR"], as_index=False)["WDIR16"]
    .count()
    .rename(columns={"WDIR16": "wndclimodirectionobsqty"})
)

# Find Means
w1["wndclimomeanspeedrate"] = (
    newdat.groupby(["YEAR", "MO", "GP", "HR"])["WSPD"].agg("mean").values
)

print(w1)

打印：

   YEAR  MO  GP  HR  wndclimodirectionobsqty  wndclimomeanspeedrate
0  1948   1   3   6                        2                   4.85
1  1948   1   3   9                        1                   4.60

使用 groupby as_index=False, count, to_frame 给出 'Dataframe' 对象没有属性 to_frame

问题描述投票：0回答：1

1个回答

最新问题

使用 groupby as_index=False, count, to_frame 给出 'Dataframe' 对象没有属性 to_frame

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1