每行不满足条件的值的计数

问题描述 投票:0回答:3

我想比较第

1/01
列中的值, 使用
1/02
列中的条件将
1/03
1/04
1/05
1/06
criteria
与目标列中的值进行比较。我想获取列中每个 ID 不符合条件的所有值的计数
Sum

# importing pandas as pd 
import pandas as pd 

# Create sample dataframe
 raw_data = {'ID': ['A1', 'B1', 'C1', 'D1'], 
'Domain': ['Finance', 'IT', 'IT', 'Finance'], 
'Target': [1, 2, 3, 1], 
'Criteria':['<=', '<=', '>=', '>='],
"1/01":[0.9, 1.1, 2.1, 1],
"1/02":[0.4, 0.3, 0.5, 0.9], 
"1/03":[1, 1, 4, 1.1], 
"1/04":[0.7, 0.7, 0.1, 1],
"1/05":[0.7, 0.7, 0.1, 1], 
"1/06":[0.9, 1.1, 2.1, 1],}



 df = pd.DataFrame(raw_data, columns = ['ID', 'Domain', 'Target','Criteria', '1/01', 
'1/02','1/03', '1/04','1/05', '1/06','Sum'])

预期输出示例:

   ID   Domain  Target Criteria  1/01  1/02  1/03  1/04  1/05  1/06  Sum
0  A1  Finance       1       <=   0.9   0.4   1.0   0.7   0.7   0.9  0.0
1  B1       IT       2       <=   1.1   0.3   1.0   0.7   0.7   1.1  0.0
2  C1       IT       3       >=   2.1   0.5   4.0   0.1   0.1   2.1  5.0
3  D1  Finance       1       >=   1.0   0.9   1.1   1.0   1.0   1.0  1.0
python pandas dataframe
3个回答
1
投票

只需使用

np.where
有选择地计算违反标准的行。这是针对以 <= and >= 作为唯一可能标准的问题进行优化的。

# `.to_numpy()` will work for pandas versions >= 0.24. 
# For older versions, use .values.
dates = df.iloc[:,4:].to_numpy()
target = df[['Target']].to_numpy()

df['Sum'] = np.where(
  (df['Criteria'] == '<=')[:,None], dates > target, dates < target).sum(axis=1)
df

   ID   Domain  Target Criteria  1/01  1/02  1/03  1/04  1/05  1/06  Sum
0  A1  Finance       1       <=   0.9   0.4   1.0   0.7   0.7   0.9    0
1  B1       IT       2       <=   1.1   0.3   1.0   0.7   0.7   1.1    0
2  C1       IT       3       >=   2.1   0.5   4.0   0.1   0.1   2.1    5
3  D1  Finance       1       >=   1.0   0.9   1.1   1.0   1.0   1.0    1

1
投票

想法是使用 operators 来通过过滤行进行比较,通过

DataFrame.mask
和最后一个
sum
获取不匹配的值并分配给新列 - 此操作由字典中的所有运算符循环使用:

import operator


ops = { '>=': operator.lt,
       '<=': operator.gt}

for k, v in ops.items():
    mask = df['Criteria'].eq(k).values
    df1 = df.iloc[mask, 4:]
    df.loc[mask, 'new'] = (v)(df1,df.loc[mask, 'Target'].values[:, None]).sum(axis=1)
print (df)
   ID   Domain  Target Criteria  1/01  1/02  1/03  1/04  1/05  1/06  Sum  new
0  A1  Finance       1       <=   0.9   0.4   1.0   0.7   0.7   0.9  0.0  0.0
1  B1       IT       2       <=   1.1   0.3   1.0   0.7   0.7   1.1  0.0  0.0
2  C1       IT       3       >=   2.1   0.5   4.0   0.1   0.1   2.1  5.0  5.0
3  D1  Finance       1       >=   1.0   0.9   1.1   1.0   1.0   1.0  1.0  1.0

0
投票
#%%
col1=df.loc[:,['Criteria','Target']].astype(str).sum(1)[:,None]
Sum=df.loc[:,df.columns.str.contains('/').tolist()].astype(str).add(col1).applymap(lambda x: not eval(x)).sum(1)
df.assign(Sum=Sum)


 ID   Domain  Target Criteria  1/01  1/02  1/03  1/04  1/05  1/06  Sum
0  A1  Finance       1       <=   0.9   0.4   1.0   0.7   0.7   0.9    0
1  B1       IT       2       <=   1.1   0.3   1.0   0.7   0.7   1.1    0
2  C1       IT       3       >=   2.1   0.5   4.0   0.1   0.1   2.1    5
3  D1  Finance       1       >=   1.0   0.9   1.1   1.0   1.0   1.0    1
© www.soinside.com 2019 - 2024. All rights reserved.