将一列的唯一值扩展为多列，适用于 DataFrame 中的 X 列

Question

我需要将 DataFrame 转换为以下形状：

import pandas as pd
import numpy as np

df = pd.DataFrame({
   'foo': ['one', 'one', 'one', 'two', 'two', 'two', 'three', 'three', 'three'],
   'tak': ['dgad', 'dgad', 'dgad', 'ogfagas', 'ogfagas', 'ogfagas', 'adgadg', 'adgadg', 'adgadg'],
   'bar': ['B', 'B', 'A', 'C', 'A', 'C', 'C', 'C', 'C'],
   'nix': ['Z', 'Z', 'Z', 'G', 'G', 'G', 'Z', 'G', 'G']
})

...进入 DataFrame，其中

foo

和

tak

是索引（对于

tak

的每个唯一值，

foo

的唯一值永远不会超过一个）。对于

bar

和

nix

（实际上我需要使用 10 个不同的列来执行此操作），我需要以某种方式将这些列中的每一列旋转为多个列，其中

bar_1

将是

bar

中的第一个唯一值对于每个索引，

bar_2

将是每个

bar

组的

foo

中的第二个唯一值，等等。如果对于给定的

bar

组，

nix

或

foo

中只有一个或没有唯一值，则应插入

np.nan

。像这样：

pd.DataFrame({
   'foo': ['one', 'three', 'two']
   'tak': ['dgad', 'ogfagas', 'adgadg'],
   'bar_one': ['B', 'C', 'C'],
   'bar_two': ['A', 'A', np.nan],
   'nix_one': ['Z' , 'G', 'Z'],
   'nix_two': [np.nan, np.nan, 'G']
})

我目前正在做的是将

.pivot_table

与此聚合函数一起使用：

pivot_df = df.pivot_table(
   index=['foo', 'tak'],
   values=['bar', 'nix'],
   aggfunc = lambda x: list(set(x))
)

然后，我将每个 foo-tak 组的唯一值列表的这些列扩展为多个列，并将它们连接到列表理解中：

pd.concat(
   [
       pivot_df[column].apply(pd.Series)
       for column in ['bar', 'nix']
   ],
   axis=1
)

是否有更简单/更直接/更Pythonic的方式来进行这种转换？

Answer 1

cols = ['bar', 'nix']

out = pd.concat([df.drop(cols, axis=1)] + [pd.DataFrame(df[col].tolist()).add_prefix(col) for col in cols], axis=1)

出

     foo      tak bar0  bar1 nix0  nix1
0    one     dgad    B     A    Z  None
1  three  ogfagas    C     A    G  None
2    two   adgadg    C  None    Z     G

将一列的唯一值扩展为多列，适用于 DataFrame 中的 X 列

问题描述投票：0回答：1

1个回答

最新问题

将一列的唯一值扩展为多列，适用于 DataFrame 中的 X 列

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1