根据条件使用其他列中的值在数据框中创建新列

Question

假设我有一个如下所示的数据框（df）（点表示更多列）：

Type  Price1  Price2  Price3  Price4  Price5  ... ... 
A       nan     1       nan     nan     2           
A       nan     3       nan     nan     2
B       nan     nan     4       5       nan
B       nan     nan     6       7       nan
C       nan     2       nan     nan     1  
C       nan     4       nan     nan     3
D       1       8       nan     nan     nan
D       9       6       nan     nan     nan

我需要将此数据框转换为如下所示：

Type   newcol1    newcol2  ...  ...
     A      2           1
     A      2           3
     B      4           5
     B      6           7
     C      1           2   
     C      3           4   
     D      1           8       
     D      9           6

这些是选择哪些列获取哪些值的标准：

if type == 'A'
'Price5' -> 'newcol1' && 'Price2' -> 'newcol2'

if type == 'B'
'Price3' -> 'newcol1' && 'Price4' -> 'newcol2'

if type == 'C'
'Price5' -> 'newcol1' && 'Price2' -> 'newcol2'

if type == 'D'
'Price1' -> 'newcol1' && 'Price2' -> 'newcol2'

我的逻辑是使用 mask 函数来实现这一点，并将值分配给 2 列，然后重命名它们，这可行，但我单独处理每个案例，有点混乱：

df = df.mask(df['Type'].eq('A'), df.assign(**{'Price3': df['Price5'].values, 'Price4': df['Price2'].values}))

df.rename(columns = {'Price3':'newcol1'}, inplace = True) 
df.rename(columns = {'Price4':'newcol2'}, inplace = True)

我怎样才能以更有效的方式实现这一目标？

Answer 1

您可以将自定义

groupby.apply

与

rename

和

reindex

:

# dictionary to define the mappings per group
dic = {'A': {'Price5': 'newcol1', 'Price2': 'newcol2'},
       'B': {'Price3': 'newcol1', 'Price4': 'newcol2'},
       'C': {'Price5': 'newcol1', 'Price2': 'newcol2'},
       'D': {'Price1' :'newcol1', 'Price2': 'newcol2'},
       }

def f(g):
    # for each group, rename and select the columns
    d = dic.get(g.name, {})
    return (g.rename(columns=d)
             .reindex(columns=d.values())
           )

# apply the function per group
out = df.groupby('Type').apply(f).reset_index(0)

输出：

  Type  newcol1  newcol2
0    A      2.0      1.0
1    A      2.0      3.0
2    B      4.0      5.0
3    B      6.0      7.0
4    C      1.0      2.0
5    C      3.0      4.0
6    D      1.0      8.0
7    D      9.0      6.0

Answer 2

您可以使用 lambda 函数：

df['newCol1'] = df.apply(lambda x: x.Price5 if x.Type == 'A' else (x.Price3 if x.Type == 'B' else (x.Price5 if x.Type == 'C' else (x.Price1 if x.Type=='D' else 'NaN'))), axis=1)
df['newCol2'] = df.apply(lambda x: x.Price2 if x.Type == 'A' else (x.Price4 if x.Type == 'B' else (x.Price2 if x.Type == 'C' else (x.Price2 if x.Type=='D' else 'NA'))), axis=1)

根据条件使用其他列中的值在数据框中创建新列

问题描述投票：0回答：2

2个回答

最新问题

根据条件使用其他列中的值在数据框中创建新列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2