对数据进行分组并删除重复的

问题描述 投票:0回答:1

我有下表形式的数据

Name     Mas       Sce
   M      ( (87)    83
            (91)    
          (97) )    
   T        (77)    76
   R        (60)    32
   G        (95)    20
   M     ( (50)     89
            (50)    
          (99) )    

我的一些数据贯穿多个列,例如 M case。数据包含在括号内

我尝试过删除重复项。它在单行时起作用。但是,现在我有几行作为一组

import pandas as pd
 
d = {'Name': ['M', None, None, 'T', 'R', 'G', 'M', '', ''],
     'Mas': ['( (87)', '(91)', '(97) )', '(77)', '(60)', '(95)', '( (50)', '(50)', '(99) )'],
     'Sce': ['83', '', '', '76', '32', '20', '89', '', '']}
df = pd.DataFrame(d)
df['Name'] = df['Name'].ffill()
print(df)

df.drop_duplicates(subset='Name', keep='first', inplace=True)
print(df)

我想删除重复出现的数据。在这种情况下,第二个 M

Name     Mas       Sce
   M      ( (87)    83
            (91)    
          (97) )    
   T        (77)    76
   R        (60)    32
   G        (95)    20
python pandas dataframe
1个回答
0
投票

尝试:

# make the `Name` column consistend -> change  "", None to NaNs
df["Name"] = np.where(df["Name"].isin(["", None]), np.nan, df["Name"])

# create a mask what to keep and what to discard
mask = ~(
    pd.Series(
        np.where(df["Name"].notna(), df["Name"].duplicated(keep="first"), np.nan),
        index=df.index,
    )
    .ffill()
    .astype(bool)
)

# print final df
print(df[mask])

打印:

  Name     Mas Sce
0    M  ( (87)  83
1  NaN    (91)    
2  NaN  (97) )    
3    T    (77)  76
4    R    (60)  32
5    G    (95)  20
© www.soinside.com 2019 - 2024. All rights reserved.