我有两个数据框,如下所示。
df1:
data1 = {
'Acc': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4],
'indi_val': ['Val1', 'val2', 'Val_E', 'Val1_E', 'Val1', 'Val3', 'val2', 'val2_E', 'val22_E', 'val2_A', 'val2_V', 'Val_E', 'Val_A', 'Val', 'Val2', 'val7'],
'Amt': [10, 20, 5, 5, 22, 38, 15, 25, 22, 23, 24, 56, 67, 45, 87, 88]
}
df1 = pd.DataFrame(data1)
df2:
data2 = {
'Acc': [1, 1, 2, 2, 3, 4],
'Indi': ['With E', 'Without E', 'With E', 'Without E', 'Normal', 'Normal']
}
df2 = pd.DataFrame(data2)
基于这两个数据框,我需要创建最终输出,如下所示:
AccNo Indi Amt
1 With E 7
1 Without E 90
2 With E 47
2 Without E 62
3 Normal 225
4 Normal 88
逻辑:
with E
:其中 df1['indi_val]
的最后 2 个字符等于“_E”,得到 sum(Amt)
。Without E
:如果 df1['indi_val']
的最后 2 个字符不等于“_E”,则得到 sum(Amt)
。Normal
:在df1['indi_val']
上不加任何滤镜,得到sum(Amt)
。我尝试写如下内容:
def get_indi(row):
listval = []
if row['Indi'] == "With E":
#print('A')
df1.apply(lambda df1row: listval.append(df1row['amt'] if df1row['Acc']==row['Acc'] and df1row['indi_val'][-2:]=="_E" else 0))
if row['Indi'] == "Without E":
df1.apply(lambda df1row: listval.append(df1row['amt'] if df1row['Acc']==row['Acc'] and df1row['indi_val'][-2:]!="_E" else 0))
if row['Indi'] == "Normal":
df1.apply(lambda df1row: listval.append(df1row['amt']))
return sum(listval)
# Apply the function to create the 'Indi' column in df1
df2['Amt'] = df2.apply(get_indi)
使用上面的代码我收到以下错误:
get_loc
raise KeyError(key)
KeyError: 'Indi'
首先,apply 可能不是最好的方法here,但是对于您的用例,您需要为所有 apply 调用明确提及
axis=1
,如下所示:
df2.apply(get_indi, axis=1)
更好的解决方案是使用 pandas group by 和聚合,然后进行合并操作:
df_temp = []
# normal
qwe = df1.groupby('Acc')['Amt'].sum().reset_index()
qwe['Indi'] = 'Normal'
df_temp.append(qwe)
# With E
qwe = df1[df1['indi_val'].str.endswith('_E')].groupby('Acc')['Amt'].sum().reset_index()
qwe['Indi'] = 'With E'
df_temp.append(qwe)
# Without E
qwe = df1[~df1['indi_val'].str.endswith('_E')].groupby('Acc')['Amt'].sum().reset_index()
qwe['Indi'] = 'Without E'
df_temp.append(qwe)
df_temp = pd.concat(df_temp)
# merge
df2.merge(df_temp, on=['Acc', 'Indi'])
输出:
ACC | 印度 | 阿姆特 | |
---|---|---|---|
0 | 1 | 与E | 10 |
1 | 1 | 没有E | 90 |
2 | 2 | 与E | 47 |
3 | 2 | 没有E | 62 |
4 | 3 | 正常 | 255 |
5 | 4 | 正常 | 88 |
代码
import numpy as np
cond = df1['indi_val'].str.contains('_E')
grp = np.where(cond, 'With E', 'Without E')
tmp = (df1.groupby(['Acc', grp])['Amt'].sum()
.reset_index().rename({'level_1': 'Indi'}, axis=1)
)
tmp:
Acc Indi Amt
0 1 With E 10
1 1 Without E 90
2 2 With E 47
3 2 Without E 62
4 3 With E 56
5 3 Without E 199
6 4 Without E 88
out = df2.merge(
pd.concat([tmp, tmp.groupby('Acc')['Amt'].sum().reset_index().assign(Indi='Normal')]),
how='left'
)
出
Acc Indi Amt
0 1 With E 10
1 1 Without E 90
2 2 With E 47
3 2 Without E 62
4 3 Normal 255
5 4 Normal 88
这是一种方法:
import numpy as np # add import
out = (
df2
.merge(df1, on='Acc', how='left')
.pipe(lambda x: x[
(
(
np.where(x['indi_val'].str.endswith('_E'), 'With E', 'Without E') ==
x['Indi']
) |
x['Indi'].eq('Normal')
)
])
.groupby(['Acc', 'Indi'], as_index=False)['Amt'].sum()
)
输出
Acc Indi Amt
0 1 With E 10
1 1 Without E 90
2 2 With E 47
3 2 Without E 62
4 3 Normal 255
5 4 Normal 88
解释
df2
与 df1
左合并 (df.merge
)。df.pipe
处理合并结果(现在:x
)并创建一个条件来应用 布尔索引。
x['indi_val']
是否以 '_E' 结尾 (Series.str.endswith
),并使用 np.where
获取每行的合适描述,即 'With E' 或 'Without E',然后检查是否它等于同一行中的 x['Indi']
。|
),允许 x['Indi']
中的行等于“正常”(Series.eq
)。x
中进行选择,删除所有“False”行,并应用 df.groupby
来获取“Amt”列的 groupby.sum
。中间体
合并之后,
boolean indexing
之前:
Acc Indi indi_val Amt
0 1 With E Val1 10 # < will be `False` (excluded), not 'With E'
1 1 With E val2 20
2 1 With E Val_E 5 # < will be `True` (included), not 'With E'
3 1 With E Val1_E 5
4 1 With E Val1 22
5 1 With E Val3 38
6 1 Without E Val1 10
7 1 Without E val2 20
8 1 Without E Val_E 5
9 1 Without E Val1_E 5
10 1 Without E Val1 22
11 1 Without E Val3 38
12 2 With E val2 15
13 2 With E val2_E 25
14 2 With E val22_E 22
15 2 With E val2_A 23
16 2 With E val2_V 24
17 2 Without E val2 15
18 2 Without E val2_E 25
19 2 Without E val22_E 22
20 2 Without E val2_A 23
21 2 Without E val2_V 24
22 3 Normal Val_E 56 # < will be `True` (included), 'Normal' always OK
23 3 Normal Val_A 67
24 3 Normal Val 45
25 3 Normal Val2 87
26 4 Normal val7 88
布尔索引之后,
groupby
之前:
Acc Indi indi_val Amt
2 1 With E Val_E 5
3 1 With E Val1_E 5
6 1 Without E Val1 10
7 1 Without E val2 20
10 1 Without E Val1 22
11 1 Without E Val3 38
13 2 With E val2_E 25
14 2 With E val22_E 22
17 2 Without E val2 15
20 2 Without E val2_A 23
21 2 Without E val2_V 24
22 3 Normal Val_E 56
23 3 Normal Val_A 67
24 3 Normal Val 45
25 3 Normal Val2 87
26 4 Normal val7 88