将函数应用于列标题包含特定字符串的数据框中的列

问题描述 投票:1回答:1

我有一个名为passenger_details的数据框,如下所示

Passenger     Age  Gender   Commute_to_work    Commute_mode    Commute_time ...
Passenger1    32   Male      I drive to work      car              1 hour
Passenger2    26   Female    I take the metro     train            NaN    ...
Passenger3    33   Female      NaN                 NaN             30 mins      ...
Passenger4    29   Female    I take the metro     train            NaN     ...
...

我想应用一个if函数,它将缺失值(NaN值)转换为0并将值显示为1,以及其中包含字符串'Commute'的列标题。

这基本上就是我想要实现的目标

Passenger     Age  Gender   Commute_to_work    Commute_mode    Commute_time ...
Passenger1    32   Male         1                 1              1
Passenger2    26   Female       1                 1              0    ...
Passenger3    33   Female       0                 0              1      ...
Passenger4    29   Female       1                 1              0     ...
...

但是,我正在努力学习如何表达我的代码。这就是我所做的

passenger_details = passenger_details.filter(regex = 'Location_', axis = 1).apply(lambda value: str(value).replace('value', '1', 'NaN','0'))

但是我得到了一个类型错误

'replace() takes at most 3 arguments (4 given)'

任何帮助,将不胜感激

python pandas filter apply data-cleaning
1个回答
1
投票

Index.contains选择Seelct列并测试DataFrame.notna没有丢失值,并且最后转换为True/False1/0地图的整数:

c = df.columns.str.contains('Commute')
df.loc[:, c] = df.loc[:, c].notna().astype(int)
print (df)
    Passenger  Age  Gender  Commute_to_work  Commute_mode  Commute_time
0  Passenger1   32    Male                1             1             1
1  Passenger2   26  Female                1             1             0
2  Passenger3   33  Female                0             0             1
3  Passenger4   29  Female                1             1             0
© www.soinside.com 2019 - 2024. All rights reserved.