许多参加行为

问题描述 投票:0回答:1

不太确定如何为这个问题加上标题,但这是实际情况。我有一个具有order_id和基本信息的数据框(dfOrders),如下所示:

|order_id|full_name|order_date|billing|shipping|
------------------------------------------------
|1234567 |John Doe |1/1/2019  |Address|Address1|
|1234567 |John Doe |1/1/2019  |Address|Address2|

然后有第二个数据帧(dfStandardized),其中包含标准化的地址信息:

|order_id|latitude |longitude |shippingZip|...
-------------------------------------------
|1234567 |97.12345 |101.1245  |12345      |...
|1234567 |98.98765 |102.9876  |12389      |...

并且本质上,问题是一位客户下了一个订单,但又将其运送到两个单独的地址。因此,只有一个order_id,但只有两行,每行都有一个送货地址。我想要的是这样的数据框:

|order_id|full_name|order_date|billing|shipping|latitude |longitude |shippingZip|...
---------------------------------------------------------------------------------
|1234567 |John Doe |1/1/2019  |Address|Address1|97.12345 |101.1245  |12345      |...
|1234567 |John Doe |1/1/2019  |Address|Address2|98.98765 |102.9876  |12389      |...

仅将额外的运送信息添加到具有相应正确地址的行中,但我得到的是:

|order_id|full_name|order_date|billing|shipping|latitude |longitude |shippingZip|...
---------------------------------------------------------------------------------
|1234567 |John Doe |1/1/2019  |Address|Address1|97.12345 |101.1245  |12345      |...
|1234567 |John Doe |1/1/2019  |Address|Address2|98.98765 |102.9876  |12389      |...
|1234567 |John Doe |1/1/2019  |Address|Address1|98.98765 |102.9876  |12389      |...
|1234567 |John Doe |1/1/2019  |Address|Address2|97.12345 |101.1245  |12345      |...

两个地址都将获得两个地址的补充属性。这显然是因为由于只有一个order_id,所以它是多对多联接。有什么办法可以得到我想要的结果吗?这是我正在使用的代码:

import pandas as pd

df = dfOrders.merge(dfStandardized, on = 'order_id', how = 'inner')
python sql pandas
1个回答
0
投票

通常我们会cumcount

© www.soinside.com 2019 - 2024. All rights reserved.