计算熊猫中无关紧要的行数

问题描述 投票:2回答:2

我正在尝试计算数据集中有多少无关紧要的行。无关紧要的行是少于50%的列被填充。

count_insignificant_rows=0
for i in range(len(df)):
    columns_empty=0
    for column in df.columns:
        if df[column][i] is np.nan:
            columns_empty=columns_empty+1
            print(columns_empty)
    if columns_empty>=len(df.columns)/2:
        count_insignificant_rows=count_insignificant_rows+1

但是,它一直给我一个关键错误:331

该怎么办?

python pandas for-loop
2个回答
1
投票

一种更简单的方法是对所有具有空值的行进行计数:

# First, create a sample df
df = pd.DataFrame().from_records(
    [{'id':1,'A':1,'B':1,'C':1,'D':1},
     {'id':2,'A':None,'B':2,'C':2,'D':2},
     {'id':3,'A':None,'B':None, 'C':3,'D':3},
     {'id':4,'A':None,'B':None, 'C':None,'D':4},
     {'id':5,'A':None,'B':None, 'C':None,'D':None}
     ], index = 'id')

# ----
# Next, drop rows with null values
# (If your null values are strings, zeros, or infs you can replace them with null values using `.replace()`

# thresh --> drop if this many empty
thresh = len(df.columns)//2
sig_rows = len(df.dropna(axis=0, thresh=2))
print(f'There are {len(df)-sig_rows} insignificant rows.')

1
投票

每行中第一个非缺失值的计数。

df["insignificant"] = df.apply(lambda x: x.count(), axis=1)
df["insignificant"] = df["insignificant"] / df.shape[1]

然后计算多少行无关紧要。

df[df["insignificant"] < 0.5].shape[0]
© www.soinside.com 2019 - 2024. All rights reserved.