分解 Polars DataFrame 列而不重复其他列值

Question

作为一个最小的例子，假设我们有下一个极坐标。DataFrame：

df = pl.DataFrame({"sub_id": [1,2,3], "engagement": ["one:one,two:two", "one:two,two:one", "one:one"], "total_duration": [123, 456, 789]})

子_id	订婚	总持续时间
1	一：一，二：二	123
2	一：二，二：一	456
3	一个：一个	789

然后，我们爆“订婚”栏目

df = df.with_columns(pl.col("engagement").str.split(",")).explode("engagement")

并收到：

子_id	订婚	总持续时间
1	一个：一个	123
1	二：二	123
2	一个：两个	456
2	二：一	456
3	一个：一个	789

为了可视化，我使用 Plotly，代码如下：

import plotly.express as px
fig = px.bar(df, x="sub_id", y="total_duration", color="engagement")
fig.show()

现在基本上意味着订阅者 1 和订阅者 2 的total_duration（总观看时间）加倍。我怎样才能保留每个子的总持续时间，但保留如图图例所示的参与组？

Answer 1

在极坐标中处理此问题的一个选项是将

total_duration

列除以具有给定

sub_id

的行数。

(
    df
    .with_columns(
        pl.col("engagement").str.split(",")
    )
    .explode("engagement")
    .with_columns(
        pl.col("total_duration") / pl.len().over("sub_id")
    )
)

shape: (5, 3)
┌────────┬────────────┬────────────────┐
│ sub_id ┆ engagement ┆ total_duration │
│ ---    ┆ ---        ┆ ---            │
│ i64    ┆ str        ┆ f64            │
╞════════╪════════════╪════════════════╡
│ 1      ┆ one:one    ┆ 61.5           │
│ 1      ┆ two:two    ┆ 61.5           │
│ 2      ┆ one:two    ┆ 228.0          │
│ 2      ┆ two:one    ┆ 228.0          │
│ 3      ┆ one:one    ┆ 789.0          │
└────────┴────────────┴────────────────┘

分解 Polars DataFrame 列而不重复其他列值

问题描述投票：0回答：1

1个回答

最新问题

分解 Polars DataFrame 列而不重复其他列值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1