我有如下数据仓库。
ID COUNTRY GENDER AGE V1 V2 V3 V4 V5
1 1 1 53 APPLE apple bosck APPLE123 xApple111t
2 2 2 51 BEKO beko SIMSUNG SamsungO123 ttBeko111t
3 3 1 24 SAMSUNG bosch SEMSUNG BOSC1123 uuSAMSUNG111t
如果列表中有相同值或包含特定值,我想替换为np.nan。我在下面尝试过,但是发生了错误。
remove_list = ['APPLE', 'BEKO']
remove_contain_list = ['SUNG', 'bosc']
df.iloc[:,4:].str.replace(remove_list, np.nan, case=False) # exact match & case sensitive
df.iloc[:,4:].str.contains(remove_contain_list, np.nan, case=False) # contain & case sensitive
我该如何解决这些问题?
用途:
remove_list = ['APPLE', 'BEKO']
remove_contain_list = ['SUNG', 'bosc']
s = df.iloc[:,4:].stack()
m1 = s.str.lower().isin([x.lower() for x in remove_list])
m2 = s.str.contains('|'.join(remove_contain_list), case=False)
s = s.mask(m1 | m2)
df.iloc[:,4:] = s.unstack()
print (df)
ID COUNTRY GENDER AGE V1 V2 V3 V4 V5
0 1 1 1 53 NaN NaN NaN APPLE123 xApple111t
1 2 2 2 51 NaN NaN NaN NaN ttBeko111t
2 3 3 1 24 NaN NaN NaN NaN NaN