如何在同一列上将数据框从宽到长融化两次？

Question

我有一个数据框：

data = [
    [1, "2022-04-29", 123, "circle", 1, 3, 6, 7.3],
    [1, "2022-02-10", 456, "square", 4, np.nan, 3, 9],
]

df = pd.DataFrame(
    data,
    columns=[
        "ID",
        "date",
        "code",
        "shape",
        "circle_X_rating",
        "circle_Y_rating",
        "square_X_rating",
        "square_Y_rating",
    ],
)
df


ID date      code shape circle_X_rating circle_Y_rating square_X_rating square_Y_rating
1 2022-04-29  123 circle       1               3.0             6              7.3
1 2022-02-10  456 square       4               NaN             3              9.0

我想融化这个数据框，以便有一个形状列和 2 列用于评级

X_rating

和

Y_rating

，我不知道该怎么做。目前我正在融化它，这就是我得到的：

test = (
    pd.melt(
        df,
        id_vars=[
            "ID",
            "date",
            "bar_code",
            "shape",
        ],
        value_vars=[
            "circle_X_rating",
            "circle_Y_rating",
            "square_X_rating",
            "square_Y_rating",
        ],
        var_name="shape_for_rating",
        value_name="shape_rating",
    )
    .assign(
        shape_for_rating=lambda df: df["shape"].apply(lambda a_str: a_str.split("_")[0])
    )
    .query("shape == shape")
    .drop(columns=["shape_for_rating"])
)
test

    ID  date        code    shape   shape_rating
0   1   2022-04-29  123     circle      1.0
1   1   2022-02-10  456     square      4.0
2   1   2022-04-29  123     circle      3.0
3   1   2022-02-10  456     square      NaN
4   1   2022-04-29  123     circle      6.0
5   1   2022-02-10  456     square      3.0
6   1   2022-04-29  123     circle      7.3
7   1   2022-02-10  456     square      9.0

但我真正想要的是：

    ID  date        code    shape   X_rating   Y_rating
0   1   2022-04-29  123     circle      1.0       3
1   1   2022-04-29  123     square      6.0      7.3
2   1   2022-02-10  456     circle      4        NaN
3   1   2022-02-10  456     square      3         9
...

有谁知道最好的方法吗？我一直在旋转我的轮子。

Answer 1

试试

wide_to_long

df.columns = df.columns.str.split('_',n=1).map(lambda x : '_'.join(x[::-1]))

df = pd.wide_to_long(df, 
                    stubnames = ['X_rating','Y_rating'], 
                    i = ['ID', 'date', 'code', 'shape'], 
                    j = 'shape1',
                    suffix = r'\w+').reset_index()
df
Out[84]: 
   ID        date  code   shape   shape1  X_rating  Y_rating
0   1  2022-04-29   123  circle  _circle         1       3.0
1   1  2022-04-29   123  circle  _square         6       7.3
2   1  2022-02-10   456  square  _circle         4       NaN
3   1  2022-02-10   456  square  _square         3       9.0

Answer 2

使用

janitor

的

pivot_longer

很容易实现：

# pip install janitor
import janitor

out = df.pivot_longer(index=['ID', 'date', 'code', 'shape'],
                      names_to=('shape', '.value'),
                      names_pattern=(r'([^_]*)_(.*)'),
                      sort_by_appearance=True
                      )

输出：

   ID        date  code   shape  X_rating  Y_rating
0   1  2022-04-29   123  circle         1       3.0
1   1  2022-04-29   123  square         6       7.3
2   1  2022-02-10   456  circle         4       NaN
3   1  2022-02-10   456  square         3       9.0

如何在同一列上将数据框从宽到长融化两次？

问题描述投票：0回答：2

2个回答

最新问题

如何在同一列上将数据框从宽到长融化两次？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2