Python panda 分组数据并删除重复

问题描述 投票:0回答:1

我有下表形式的数据



Name     Mas       Sce

   M      ( (87)    83

            (91)    

          (97) )    

   T        (77)    76

   R        (60)    32

   G        (95)    20

   M     ( (50)     89

            (50)    

          (99) )    

我的一些数据贯穿多个列,例如 M case。数据包含在括号内

我尝试过删除重复项。它在单行时起作用。但是,现在我有几行作为一组




import pandas as pd

 

# creating a DataFrame

dict = {'Name' : ['M',None,None,'T', 'R', 'G','M','',''],

        'Mas' : ['( (87)', '(91)', '(97) )','(77)','(60)' ,'(95)','( (50)','(50)','(99) )'],

        'Sce' : ['83', '', '', '76', '32', '20','89','','']}

df = pd.DataFrame(dict)

df['Name'] = df['Name'].ffill()

print(df)

df.drop_duplicates(subset='Name',keep='first',inplace=True)

# displaying the DataFrame

print(df)

我想删除重复出现的数据。在这种情况下,第二个 M



Name     Mas       Sce

   M      ( (87)    83

            (91)    

          (97) )    

   T        (77)    76

   R        (60)    32

   G        (95)    20



python pandas dataframe
1个回答
0
投票

尝试:

# make the `Name` column consistend -> change  "", None to NaNs
df["Name"] = np.where(df["Name"].isin(["", None]), np.nan, df["Name"])

# create a mask what to keep and what to discard
mask = ~(
    pd.Series(
        np.where(df["Name"].notna(), df["Name"].duplicated(keep="first"), np.nan),
        index=df.index,
    )
    .ffill()
    .astype(bool)
)

# print final df
print(df[mask])

打印:

  Name     Mas Sce
0    M  ( (87)  83
1  NaN    (91)    
2  NaN  (97) )    
3    T    (77)  76
4    R    (60)  32
5    G    (95)  20
© www.soinside.com 2019 - 2024. All rights reserved.