我有一个名为passenger_details的数据框,如下所示
Passenger Age Gender Commute_to_work Commute_mode Commute_time ...
Passenger1 32 Male I drive to work car 1 hour
Passenger2 26 Female I take the metro train NaN ...
Passenger3 33 Female NaN NaN 30 mins ...
Passenger4 29 Female I take the metro train NaN ...
...
我想应用一个if函数,它将缺失值(NaN值)转换为0并将值显示为1,以及其中包含字符串'Commute'的列标题。
这基本上就是我想要实现的目标
Passenger Age Gender Commute_to_work Commute_mode Commute_time ...
Passenger1 32 Male 1 1 1
Passenger2 26 Female 1 1 0 ...
Passenger3 33 Female 0 0 1 ...
Passenger4 29 Female 1 1 0 ...
...
但是,我正在努力学习如何表达我的代码。这就是我所做的
passenger_details = passenger_details.filter(regex = 'Location_', axis = 1).apply(lambda value: str(value).replace('value', '1', 'NaN','0'))
但是我得到了一个类型错误
'replace() takes at most 3 arguments (4 given)'
任何帮助,将不胜感激
由Index.contains
选择Seelct列并测试DataFrame.notna
没有丢失值,并且最后转换为True/False
到1/0
地图的整数:
c = df.columns.str.contains('Commute')
df.loc[:, c] = df.loc[:, c].notna().astype(int)
print (df)
Passenger Age Gender Commute_to_work Commute_mode Commute_time
0 Passenger1 32 Male 1 1 1
1 Passenger2 26 Female 1 1 0
2 Passenger3 33 Female 0 0 1
3 Passenger4 29 Female 1 1 0