Pandas 聚合连接其他列

问题描述 投票:0回答:1

这是我正在使用的数据集如下:

data = [['2608 W SYLVESTER ST', 'PASCO', 'WA', 4304],
        ['61 W MESQUITE BLVD', 'MESQUITE', 'NV', 115000],
        ['287 NW 3RD AVE', 'ESTACADA', 'OR', 1000],
        ['287 NW 3RD AVE', 'ESTACADA', 'OR', 2000],
        ['287 NW 3RD AVE', 'ESTACADA', 'OR', 7000]])

数据框的显示:

          site_address         site_city         site_state   price
0  2608 W SYLVESTER ST             PASCO                 WA    4304
1   61 W MESQUITE BLVD          MESQUITE                 NV  115000
2       287 NW 3RD AVE          ESTACADA                 OR    1000
3       287 NW 3RD AVE          ESTACADA                 OR    2000
4       287 NW 3RD AVE          ESTACADA                 OR    7000

需要输出如下JSON结构:

[
   "sites": [
      {
        "location": "2608 W SYLVESTER ST, PASCO WA",
        "value": 4304
      },
      {
        "location": "61 W MESQUITE BLVD, MESQUITE NV",
        "value": 115000
      },
      {
        "location": "287 NW 3RD AVE, ESTACADA OR",
        "value": 10000
      }
]

尝试使用 pandas、groupby 和 agg 功能:

df_grp = df.groupby('site_address', as_index=False).agg(**{
    'location': ** NEED HELP HERE **,
    'value': ('price', 'sum')
}).get(['location', 'value']).reset_index(drop=True)

result = json.loads(df_grp.to_json(orient='records'))

print(result)
python pandas aggregate
1个回答
0
投票

尝试:

df = df.groupby(["site_address", "site_city", "site_state"]).agg("sum").reset_index()
print(
    {
        "sites": df.apply(
            lambda x: {
                "location": f"{x['site_address']} {x['site_city']} {x['site_state']}",
                "value": x["price"],
            },
            axis=1,
        ).to_list()
    }
)

打印:

{
    "sites": [
        {"location": "2608 W SYLVESTER ST PASCO WA", "value": 4304},
        {"location": "287 NW 3RD AVE ESTACADA OR", "value": 10000},
        {"location": "61 W MESQUITE BLVD MESQUITE NV", "value": 115000},
    ]
}
© www.soinside.com 2019 - 2024. All rights reserved.