我有下表形式的数据
Name Mas Sce
M ( (87) 83
(91)
(97) )
T (77) 76
R (60) 32
G (95) 20
M ( (50) 89
(50)
(99) )
我的一些数据贯穿多个列,例如 M case。数据包含在括号内
我尝试过删除重复项。它在单行时起作用。但是,现在我有几行作为一组
import pandas as pd
# creating a DataFrame
dict = {'Name' : ['M',None,None,'T', 'R', 'G','M','',''],
'Mas' : ['( (87)', '(91)', '(97) )','(77)','(60)' ,'(95)','( (50)','(50)','(99) )'],
'Sce' : ['83', '', '', '76', '32', '20','89','','']}
df = pd.DataFrame(dict)
df['Name'] = df['Name'].ffill()
print(df)
df.drop_duplicates(subset='Name',keep='first',inplace=True)
# displaying the DataFrame
print(df)
我想删除重复出现的数据。在这种情况下,第二个 M
Name Mas Sce
M ( (87) 83
(91)
(97) )
T (77) 76
R (60) 32
G (95) 20
尝试:
# make the `Name` column consistend -> change "", None to NaNs
df["Name"] = np.where(df["Name"].isin(["", None]), np.nan, df["Name"])
# create a mask what to keep and what to discard
mask = ~(
pd.Series(
np.where(df["Name"].notna(), df["Name"].duplicated(keep="first"), np.nan),
index=df.index,
)
.ffill()
.astype(bool)
)
# print final df
print(df[mask])
打印:
Name Mas Sce
0 M ( (87) 83
1 NaN (91)
2 NaN (97) )
3 T (77) 76
4 R (60) 32
5 G (95) 20