Python 新手,尝试创建一个布尔掩码来对数据集进行子集化。我将不胜感激任何关于如何使这个面具发挥作用以及我做错了什么的指导。谢谢!
我的 df 看起来像这样:
import pandas as pd
df= pd.DataFrame({'A': ['A-dog', "B-dog","C-cat" , "E-snake", "F-hamser"],
'B': ['F-dog', "B-parrot","C-snake" , "E-cat", "F-bird"],
'C': [1, 2, 3, 4, 5],
'D': [22,23,24,25,26],
'E': ['A-snake', "B-dog","C-snake" , "E-snake", "F-snake"],
'Flag': [0,0,0,0,0]})
df
我想评估 A、B 和 E 列,并将以“dog”和“cat”结尾的单元格替换为“”,并将我进行替换的行上的“标志”列更改为 1。
我想创建一个布尔掩码,以便我可以替换字符串并将“Flag”更改为 1,但我的掩码不起作用。
这是我尝试过的:
cols=['A','B','E']
mask=df[cols].apply(lambda x: 'dog' or 'cat' in x[-3:])
# x[-3:] to select the last three characters of the string.
# If the mask were working, I would change the flag variable in this way
df.loc(mask.any(axis=1),'Flag')=1
我得到的 df 看起来像这样:
res= pd.DataFrame({'A': ['A-', "B-","C-" , "E-snake", "F-hamser"],
'B': ['F-', "B-parrot","C-snake" , "E-", "F-bird"],
'C': [1, 2, 3, 4, 5],
'D': [22,23,24,25,26],
'E': ['A-snake', "B-","C-snake" , "E-snake", "F-snake"],
'Flag': [1,1,1,1,0]})
res
您可以使用
.str.endswith
创建掩码(此函数也接受值元组):
cols = ["A", "B", "E"]
mask = df[cols].apply(lambda x: x.str.endswith(("dog", "cat")))
df["Flag"] = mask.any(axis=1).astype(int)
df[mask] = df[mask].apply(lambda x: x.str[:-3] if x.notna().any() else x)
print(df)
打印:
A B C D E Flag
0 A- F- 1 22 A-snake 1
1 B- B-parrot 2 23 B- 1
2 C- C-snake 3 24 C-snake 1
3 E-snake E- 4 25 E-snake 1
4 F-hamser F-bird 5 26 F-snake 0