我有两个表df1和df2。
df1是销售清单。
df2是组合的产品列表。
我想基于df1和df2扩展到df3。
df3是单个产品的销售清单。
df1(可以想象成销售清单)
df2(可以想象成一个组合产品列表)
df3(可以想象成单个产品的销售清单)
code:
data1 = [["Banana", "1"],
["Apple", "2"],
["Milk", "3"],
["Banana_milk", "1"],
["Apple_milk", "1"],
["Watermelon_milk", "2"]]
df1 = pd.DataFrame(data=data1,columns=['Part_No','Quantity'])
print(df1)
data2 = [["Banana_milk", "Banana", "1"],
["Banana_milk", "Milk", "1"],
["Apple_milk", "Apple", "1"],
["Apple_milk", "Milk", "1"],
["Watermelon_milk", "Watermelon", "2"],
["Watermelon_milk", "Milk", "1"]]
df2 = pd.DataFrame(data=data2,columns=['Combination_Part_No', 'Part_No', 'Quantity'])
print(df2)
import pandas as pd
data1 = [["Banana", "1"],
["Apple", "2"],
["Milk", "3"],
["Banana_milk", "1"],
["Apple_milk", "2"]]
df1 = pd.DataFrame(data=data1,columns=['Part_No','Quantity'])
display(df1)
data2 = [["Banana_milk", "Banana", "1"],
["Banana_milk", "Milk", "1"],
["Apple_milk", "Apple", "1"],
["Apple_milk", "Milk", "1"]]
df2 = pd.DataFrame(data=data2,columns=['Combination_Part_No', 'Part_No', 'Quantity'])
display(df2)
这是您的数据,我将Apple_milk
更改为2
,易于调试。
df1.Quantity = df1.Quantity.astype(int)
df2.Quantity = df2.Quantity.astype(int)
for line in df1.to_dict("records"):
df2.loc[df2["Combination_Part_No"] == line["Part_No"], "Quantity"] *= line["Quantity"]
for line in df2.to_dict("records"):
df1.loc[df1["Part_No"] == line["Part_No"], "Quantity"] *= line["Quantity"]
df3 = df1[df1.Part_No.isin(df2.Part_No)]
然后,df3
变为
Part_No Quantity
0 Banana 1
1 Apple 4
2 Milk 6
有人有更简洁的方法吗?
df1['Quantity'] = df1['Quantity'].astype(int)
df2['Quantity'] = df2['Quantity'].astype(int)
df3 = pd.DataFrame(columns=['Part_No','Quantity'])
for index, row in df1.iterrows():
if len(df2.loc[df2["Combination_Part_No"] == row["Part_No"]]) == 0:
df3 = df3.append(row,ignore_index=True)
else:
df4 = df2.loc[df2["Combination_Part_No"] == row["Part_No"]][["Part_No","Quantity"]]
df4["Quantity"] = df4["Quantity"] * row["Quantity"]
df3 = df3.append(df4)
df3 = df3.groupby("Part_No",as_index=False).sum()
display(df3)
首先使用DataFrame.merge
和左连接,然后用DataFrame.merge
值替换Part_No
中缺少的df2
值,并用df1
替换多个Quantity
列,最后聚合Series.mul
:
Series.mul
sum