如何在 Polars 中进行转置

问题描述 投票:0回答:1

Transpose Example

如何在 Polars 中完成此操作?我正在尝试显示一些销售信息,我在 Polars 中查询和处理这些信息,但不确定如何进行最终转换。

Polars Docs 似乎支持它,但如何实际编写代码令人困惑

python python-polars
1个回答
0
投票

transpose
有点棘手,因为您必须手动将列重命名为第一行。

初始化df

df=pl.DataFrame(dict(
    Year=[2018,2019,2020],
    Product1=[100,120,140],
    Product2=[130,150,170],
    Product3=[1,2,3]
))

第一次尝试(不是解决方案)

df.transpose(include_header=True)
shape: (4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column   ┆ column_0 ┆ column_1 ┆ column_2 │
│ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str      ┆ i64      ┆ i64      ┆ i64      │
╞══════════╪══════════╪══════════╪══════════╡
│ Year     ┆ 2018     ┆ 2019     ┆ 2020     │
│ Product1 ┆ 100      ┆ 120      ┆ 140      │
│ Product2 ┆ 130      ┆ 150      ┆ 170      │
│ Product3 ┆ 1        ┆ 2        ┆ 3        │
└──────────┴──────────┴──────────┴──────────┘

但您希望第一行成为列标题,而不是 df 本身的一部分。这里有两种方法可以做到这一点。

中级 df 方法(更容易看到发生了什么)

df_t = df.transpose(include_header=True)
df_t.columns = [str(x) for x in df_t.slice(0,1).rows()[0]]
# doing slice(0,1) isn't strictly necessary but if you have a big df it'll 
# keep it from transforming the whole df to a list of tuples when you
# only need the first

df_t = df_t.rename({df_t.columns[0]:""})
# this rename is also not necessary but renames the first column to a blank string
# to match the example

df_t = df_t.slice(1) # this slice is necessary to get rid of the column headings
df_t


shape: (3, 4)
┌──────────┬──────┬──────┬──────┐
│          ┆ 2018 ┆ 2019 ┆ 2020 │
│ ---      ┆ ---  ┆ ---  ┆ ---  │
│ str      ┆ i64  ┆ i64  ┆ i64  │
╞══════════╪══════╪══════╪══════╡
│ Product1 ┆ 100  ┆ 120  ┆ 140  │
│ Product2 ┆ 130  ┆ 150  ┆ 170  │
│ Product3 ┆ 1    ┆ 2    ┆ 3    │
└──────────┴──────┴──────┴──────┘

“单”行方法(看起来有点黑客)

(
    df
    .transpose(include_header=True)
    .pipe(lambda df_t: (
    # using pipe here let's us work from the output of the transpose without setting global
    # intermediate variables. We just make a lambda and then chain everything from that 
    # df_t but it's only scoped in the lambda so it doesn't clutter up global vars.
        df_t
        .rename(
            { old_col:
                str(new_col) if new_col != df_t.slice(0,1).select(pl.first()).item() else "" 
                for old_col,new_col in zip(df_t.columns, df_t.slice(0,1).rows()[0])}
        )
        # In this method, we want to chain everything so we rename the columns with this
        # dictionary comprehension instead of setting df_t.columns
        # like before but it does the same thing
        .slice(1)
    ))
)

shape: (3, 4)
┌──────────┬──────┬──────┬──────┐
│          ┆ 2018 ┆ 2019 ┆ 2020 │
│ ---      ┆ ---  ┆ ---  ┆ ---  │
│ str      ┆ i64  ┆ i64  ┆ i64  │
╞══════════╪══════╪══════╪══════╡
│ Product1 ┆ 100  ┆ 120  ┆ 140  │
│ Product2 ┆ 130  ┆ 150  ┆ 170  │
│ Product3 ┆ 1    ┆ 2    ┆ 3    │
└──────────┴──────┴──────┴──────┘
© www.soinside.com 2019 - 2024. All rights reserved.