合并多列上的重复行?

问题描述 投票:0回答:0

样本输入:

DID        FIRST NAME  MIDDLE NAME  LAST NAME  STREET  CITY  STATE  ZIP    DOB        PIN                TIN
CRA0012347129   John JR       Nan      Doe      Nan     Nan   Nan 1124.0 02-07-2020 12345678998765432 268869162.0
CRA0012347152   John          Nan      Doe      Nan     Nan   Nan  201.0 12-12-2020 12261986          Nan
CRA0012347286   John          Nan      Doe      Nan     Nan   Nan   Nan Nan 12261986      268869162.0

说明: 1 您必须使用 PIN 和 TIN 列作为主键来合并重复记录(行)。下面给出的例子。 2 您可以使用 DOB 列作为辅助键。考虑将此列与其他列连接起来以创建辅助键。 3 确保基于以上两点合并所有“相似”记录(行)。

预期产出:

DID                                          FIRST NAME      MIDDLE NAME LAST NAME STREET CITY  STATE  ZIP         
CRA0012347129, CRA0012347152, CRA0012347286  John JR, John       NaN        Doe     NaN    NaN   NaN   1124, 201   

 DOB                          PIN                        TIN
02/07/2020, 12/12/2020  12345678998765432,'012261986  '268869162
python pandas machine-learning data-science data-manipulation
© www.soinside.com 2019 - 2024. All rights reserved.