删除重复的列

问题描述 投票:0回答:2

如果我使用下面的代码,它将保留具有NaN的列(请参见附图)。我有其他类似的列。是否可以保留第二个而不是第一个?

data_final2 = data_final.loc[:, ~data_final.columns.duplicated()]

enter image description here

python pandas
2个回答
2
投票

Approach 1: drop columns that contain NaN

如果您只需要修复此特定情况,并且您知道所需的列没有NaNs:

data_final2 = data_final.dropna(axis=1)

Approach 2: overwrite column labels with unique names, then pick desired cols

data_final.columns = ['Site_nan', 'Site', 'Dimensions_nan', 'Dimensions']
data_final2 = data_final[['Site', 'Dimensions']].copy()

0
投票

groupby列和选择first值,这将忽略Nulls。

df.groupby(df.columns, 1).first()

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'0': [1,2,3], '1': [np.NaN]*3, '2': [np.NaN]*3, '3': ['1x1', '2x2', '3x3']})
df.columns= ['Size', 'Size', 'Dims', 'Dims']

#   Size  Size Dims Dims
#0     1   NaN  NaN  1x1
#1     2   NaN  NaN  2x2
#2     3   NaN  NaN  3x3


df.groupby(df.columns, 1).first()

#  Dims Size
#0  1x1    1
#1  2x2    2
#2  3x3    3
© www.soinside.com 2019 - 2024. All rights reserved.