总结:我希望能够重新创建我的函数,而不必手动输入每个单独的 iloc 并使用 if/elif 来实现可扩展性,以应对组太大而无法处理的情况
我有一个示例表 df_stack_exchange
data_stack_exchange = {'store': ['A','B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D'],
'worker': [1,1,2,1,2,3,1,2,3,4],
'boxes': [105, 90, 100, 80, 10, 200, 70, 210, 50, 0],
'optimal_boxes': [0,0,0,0,0,0,0,0,0,0]}
df_stack_exchange = pandas.DataFrame(data_stack_exchange)
商店 | 工人 | 盒子 | optimal_boxes | |
---|---|---|---|---|
0 | A | 1 | 105 | 0 |
1 | B | 1 | 90 | 0 |
2 | B | 2 | 100 | 0 |
3 | C | 1 | 80 | 0 |
4 | C | 2 | 10 | 0 |
5 | C | 3 | 200 | 0 |
6 | D | 1 | 70 | 0 |
7 | D | 2 | 210 | 0 |
8 | D | 3 | 50 | 0 |
9 | D | 4 | 0 | 0 |
工人优先级按数字顺序排列,我想为他们分配最多 100 个盒子,直到没有更多的盒子可以分配为止。唯一的条件是,如果只有一名工作人员可用(商店 A),那么该工作人员将获得所有盒子,即使它大于 100。请参阅下面的预期数据帧
商店 | 工人 | 盒子 | optimal_boxes | |
---|---|---|---|---|
0 | A | 1 | 105 | 105 |
1 | B | 1 | 90 | 100 |
2 | B | 2 | 100 | 90 |
3 | C | 1 | 80 | 100 |
4 | C | 2 | 10 | 100 |
5 | C | 3 | 200 | 90 |
6 | D | 1 | 70 | 100 |
7 | D | 2 | 210 | 100 |
8 | D | 3 | 50 | 100 |
9 | D | 4 | 0 | 30 |
我创建了以下函数,它产生了我的预期结果,但它是不可持续的,因为我必须手动输入每个 iloc。我希望能够使用循环重新创建此函数,或者使其能够扩展而无需继续添加 elif。当组大小变为 10+ 而不是当前最大大小 4(商店 D)时,这不是一个可扩展的解决方案
def box_optimizer(x):
if x['optimal_boxes'].count() == 1:
x['optimal_boxes'].iloc[0] = x['boxes'].sum()
return x
elif x['optimal_boxes'].count() == 2:
x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
return x
elif x['optimal_boxes'].count() == 3:
x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[2] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
return x
elif x['optimal_boxes'].count() == 4:
x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[2] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[3] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
return x
df_stack_exchange_function = pandas.DataFrame(df_stack_exchange.groupby('store', as_index=False, group_keys=False).apply(box_optimizer))
# the expected dataframe output
df_stack_exchange_function
您可以通过使用循环迭代每个组内的工作人员来实现。这是您的函数的修改版本
import pandas as pd
import numpy as np
data_stack_exchange = {'store': ['A','B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D'],
'worker': [1,1,2,1,2,3,1,2,3,4],
'boxes': [105, 90, 100, 80, 10, 200, 70, 210, 50, 0],
'optimal_boxes': [0,0,0,0,0,0,0,0,0,0]}
df_stack_exchange = pd.DataFrame(data_stack_exchange)
def box_optimizer(x):
total_boxes = x['boxes'].sum()
num_workers = x['worker'].nunique()
if num_workers == 1:
x['optimal_boxes'] = total_boxes
else:
remaining_boxes = total_boxes
for _, row in x.iterrows():
available_boxes = min(100, remaining_boxes)
x.loc[row.name, 'optimal_boxes'] += available_boxes
remaining_boxes -= available_boxes
if remaining_boxes <= 0:
break
return x
df_stack_exchange_function = df_stack_exchange.groupby('store', as_index=False, group_keys=False).apply(box_optimizer)
print(df_stack_exchange_function)
store worker boxes optimal_boxes
0 A 1 105 105
1 B 1 90 100
2 B 2 100 90
3 C 1 80 100
4 C 2 10 100
5 C 3 200 90
6 D 1 70 100
7 D 2 210 100
8 D 3 50 100
9 D 4 0 30