我有一个数据框,想检查每一行我的条件是否正确。如果 multiple 为真,我想用 np.select 返回所有这些选择。我该怎么做?
df = pd.DataFrame({'cond1':[True, True, False, True],
'cond2':[False, False, True, True],
'cond3':[True, False, False, True],
'value': [1, 3, 3, 6]})
conditions = [df['cond1'] & (df['value']>4),
df['cond2'],
df['cond2'] & (df['value']>2),
df['cond3'] & df['cond2']]
choices = [ '1', '2', '3', '4']
df["class"] = np.select(conditions, choices, default=np.nan)
我明白了
cond1 cond2 cond3 value class
0 True False True 1 nan
1 True False False 3 nan
2 False True False 3 2
3 True True True 6 1
但想得到这个
cond1 cond2 cond3 value class
0 True False True 1 nan
1 True False False 3 nan
2 False True False 3 2 and 3
3 True True True 6 1 and 2 and 3 and 4
在此示例中,我创建了一个 check_conditions 函数,该函数单独检查每一行的每个条件,并将满足条件的标签附加到列表中。然后,它用“and”连接这些标签,以表示“class”列中的组合条件。如果不满足任何条件,则分配 NaN。
这有帮助吗?
import pandas as pd
import numpy as np
# Creating the DataFrame
df = pd.DataFrame({'cond1': [True, True, False, True],
'cond2': [False, False, True, True],
'cond3': [True, False, False, True],
'value': [1, 3, 3, 6]})
# Define a function to check conditions for each row
def check_conditions(row):
# your conditions
conditions = [
(row['cond1'] and (row['value'] > 4)),
row['cond2'],
(row['cond2'] and (row['value'] > 2)),
(row['cond3'] and row['cond2'])
]
# your condition labels
choices = ['1', '2', '3', '4']
# Check which conditions are met for the current row and store the corresponding choices
conditions_met = [str(choice) for condition, choice in zip(conditions, choices) if condition]
# Join the labels of met conditions using 'and', or assign NaN if no conditions are met
return ' and '.join(conditions_met) if conditions_met else np.nan
# Apply the check_conditions function to each row of the DataFrame
df['class'] = df.apply(check_conditions, axis=1)