我正在尝试计算数据集中有多少无关紧要的行。无关紧要的行是少于50%的列被填充。
count_insignificant_rows=0
for i in range(len(df)):
columns_empty=0
for column in df.columns:
if df[column][i] is np.nan:
columns_empty=columns_empty+1
print(columns_empty)
if columns_empty>=len(df.columns)/2:
count_insignificant_rows=count_insignificant_rows+1
但是,它一直给我一个关键错误:331
该怎么办?
一种更简单的方法是对所有具有空值的行进行计数:
# First, create a sample df
df = pd.DataFrame().from_records(
[{'id':1,'A':1,'B':1,'C':1,'D':1},
{'id':2,'A':None,'B':2,'C':2,'D':2},
{'id':3,'A':None,'B':None, 'C':3,'D':3},
{'id':4,'A':None,'B':None, 'C':None,'D':4},
{'id':5,'A':None,'B':None, 'C':None,'D':None}
], index = 'id')
# ----
# Next, drop rows with null values
# (If your null values are strings, zeros, or infs you can replace them with null values using `.replace()`
# thresh --> drop if this many empty
thresh = len(df.columns)//2
sig_rows = len(df.dropna(axis=0, thresh=2))
print(f'There are {len(df)-sig_rows} insignificant rows.')
每行中第一个非缺失值的计数。
df["insignificant"] = df.apply(lambda x: x.count(), axis=1)
df["insignificant"] = df["insignificant"] / df.shape[1]
然后计算多少行无关紧要。
df[df["insignificant"] < 0.5].shape[0]