python pandas如何通过其他数据框扩展数据框

问题描述 投票:0回答:3

我有两个表df1和df2。

df1是销售清单。

df2是组合的产品列表。

我想基于df1和df2扩展到df3。

df3是单个产品的销售清单。

df1(可以想象成销售清单)

enter image description here

df2(可以想象成一个组合产品列表)

enter image description here

df3(可以想象成单个产品的销售清单)

enter image description here

code:

data1 = [["Banana", "1"],
        ["Apple", "2"],
        ["Milk", "3"],
        ["Banana_milk", "1"],
        ["Apple_milk", "1"],
        ["Watermelon_milk", "2"]]
df1 = pd.DataFrame(data=data1,columns=['Part_No','Quantity'])
print(df1)

data2 = [["Banana_milk", "Banana", "1"],
        ["Banana_milk",  "Milk", "1"],
        ["Apple_milk", "Apple", "1"],
        ["Apple_milk", "Milk", "1"],
        ["Watermelon_milk", "Watermelon", "2"],
        ["Watermelon_milk", "Milk", "1"]]
df2 = pd.DataFrame(data=data2,columns=['Combination_Part_No', 'Part_No', 'Quantity'])
print(df2)
python pandas dataframe
3个回答
0
投票
import pandas as pd

data1 = [["Banana", "1"],
        ["Apple", "2"],
        ["Milk", "3"],
        ["Banana_milk", "1"],
        ["Apple_milk", "2"]]
df1 = pd.DataFrame(data=data1,columns=['Part_No','Quantity'])
display(df1)

data2 = [["Banana_milk", "Banana", "1"],
        ["Banana_milk",  "Milk", "1"],
        ["Apple_milk", "Apple", "1"],
        ["Apple_milk", "Milk", "1"]]
df2 = pd.DataFrame(data=data2,columns=['Combination_Part_No', 'Part_No', 'Quantity'])
display(df2)

这是您的数据,我将Apple_milk更改为2,易于调试。

df1.Quantity = df1.Quantity.astype(int)
df2.Quantity = df2.Quantity.astype(int)

for line in df1.to_dict("records"):
    df2.loc[df2["Combination_Part_No"] == line["Part_No"], "Quantity"] *= line["Quantity"]

for line in df2.to_dict("records"):
    df1.loc[df1["Part_No"] == line["Part_No"], "Quantity"] *= line["Quantity"]

df3 = df1[df1.Part_No.isin(df2.Part_No)]

然后,df3变为

    Part_No Quantity
0   Banana  1
1   Apple   4
2   Milk    6

0
投票

有人有更简洁的方法吗?

df1['Quantity'] = df1['Quantity'].astype(int)
df2['Quantity'] = df2['Quantity'].astype(int)

df3 = pd.DataFrame(columns=['Part_No','Quantity'])

for index, row in df1.iterrows():

    if len(df2.loc[df2["Combination_Part_No"] == row["Part_No"]]) == 0:
        df3 = df3.append(row,ignore_index=True)
    else:
        df4 = df2.loc[df2["Combination_Part_No"] == row["Part_No"]][["Part_No","Quantity"]]
        df4["Quantity"] = df4["Quantity"] * row["Quantity"]
        df3 = df3.append(df4)
df3 = df3.groupby("Part_No",as_index=False).sum()
display(df3)

enter image description here


0
投票

首先使用DataFrame.merge和左连接,然后用DataFrame.merge值替换Part_No中缺少的df2值,并用df1替换多个Quantity列,最后聚合Series.mul

Series.mul

sum
© www.soinside.com 2019 - 2024. All rights reserved.