极坐标相当于 pandas 表达式 df.groupby['col1','col2']['col3'].sum().unstack()

问题描述 投票:0回答:1
pandasdf=pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
pandasdf.groupby(["fruits","cars"])['B'].sum().unstack()

如何在极坐标中创建等效的真值表?

将下表转化为真值表

df=pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
df.groupby(["fruits","cars"]).agg(pl.col('B').sum()) #->truthtable

代码的效率很重要,因为数据集太大(与 apriori 算法一起使用)

Polars 中的 unstack 功能不同,pd.crosstab 的 Polars 替代品也可以工作。

python dataframe analytics data-analysis python-polars
1个回答
4
投票

看起来你想做一个

pivot

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)

df.pivot(values="B", index="cars", columns="fruits", aggregate_function="sum")
shape: (2, 3)
┌────────┬────────┬───────┐
│ cars   ┆ banana ┆ apple │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ i64    ┆ i64   │
╞════════╪════════╪═══════╡
│ beetle ┆ 6      ┆ 5     │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ audi   ┆ 4      ┆ null  │
└────────┴────────┴───────┘

© www.soinside.com 2019 - 2024. All rights reserved.