按日级别比较数据帧的列值与另一个数据帧

问题描述 投票:0回答:1

我有以下两个数据框

   Box      box_cap     size      Preference
    1          16       1200           1
    2          16       1550           2
    3          15       1300           3

另外一个是

   Day  Capacity
    1        23
    2        24

我需要以下数据帧作为输出

   Day  Box     box_cap
    1    1          16
    1    2           7
    2    2           9
    2    3          15

基本上是根据每日容量来按日分配球数。优先编号用于按顺序放置盒子。由于框 1 具有第一优先权,因此框 1 将首先执行。我正在尝试按照以下方式进行操作

for i in df.index:
    tun = df['Tundish'][i]
    heat = df['Heat'][i]
    width = df['Width'][i]
    pref = df['Preference'][i]
    tuntobegiven = []
    for j in dfday.index:
        day = dfday['Day'][j]
        cap = dfday['Capacity'][j]
        caps = cap - heat
        if caps >= 0:
            tuntobegiven.append((tun, day))

无法破解。数据是虚拟的。可以有多个盒子。

python pandas dataframe
1个回答
0
投票

IMO,最简单的方法是使用迭代代码,可以选择使用 numba 来增强性能:

from numba import jit

def allocate(df, dfday):
    @jit(nopython=True) # optional
    def compute(boxes, capacities):
        box_idx = 0
        cap_idx = 0
        out = []
        while (cap_idx < len(capacities)) and (box_idx < len(boxes)):
            take = min(boxes[box_idx], capacities[cap_idx])
            boxes[box_idx] -= take
            capacities[cap_idx] -= take
            out.append((cap_idx, box_idx, take))
            if not capacities[cap_idx]:
                cap_idx += 1
            if not boxes[box_idx]:
                box_idx += 1
        return out
            
    # ensure inputs are sorted by Preference/Day        
    df = df.sort_values(by='Preference', ignore_index=True)
    dfday = dfday.sort_values(by='Day', ignore_index=True)
    
    # run allocation
    out = pd.DataFrame(compute(df['box_cap'].to_numpy(copy=True),
                               dfday['Capacity'].to_numpy(copy=True)),
                       columns=['Day', 'Box', 'box_cap'])
    # convert indices to actual values
    out['Day'] = dfday['Day'].to_numpy()[out['Day']]
    out['Box'] = df['Box'].to_numpy()[out['Box']]

    return out

out = allocate(df, dfday)

输出:

   Day  Box  box_cap
0    1    1       16
1    1    2        7
2    2    2        9
3    2    3       15
© www.soinside.com 2019 - 2024. All rights reserved.