合并和重塑 3 个大型数据帧、处理重复项时遇到麻烦

问题描述 投票:0回答:0

我正在寻找合并和重塑 3 个表中的数据。我有 3 个表,大约有 250,000 行和 30 列。需要重塑以适应机器学习模型。

这是我在 stackoverlow 上发布的原始文章,其中详细介绍了要求。

这里是一个 github 存储库,其中包含 3 个表和代码,但尝试合并失败:

我尝试使用以下代码合并表格



# option #1 from stack overflow 

tables = [Table_1, Table_2, Table_3]

out = (pd
   .concat([t.set_index(['unique_ID','patient_ID', 'Week']) for t in tables], axis=1)
   .stack().unstack(level='patient_ID').add_prefix('Patient ')
   .pipe(lambda d: d.set_axis('Week'+d.index.get_level_values('Week').astype(str)
                              +' '+d.index.get_level_values(1))
        )
   .rename_axis(index='Clinical Data', columns=None).reset_index()
)

输出:

ValueError: Index contains duplicate entries, cannot reshape
# option 2 from stackoverflow

from functools import reduce

tables = [Table_1, Table_2, Table_3]

out = (reduce(lambda a, b: a.merge(b, on=['unique_ID','patient_ID','Week']), tables)
 .melt(['unique_ID','patient_ID','Week'])
 .assign(**{'Clinical Data': lambda d: 'Week'+d.pop('Week').astype(str)
                                       +' '+d.pop('variable')})
 .pivot(index='Clinical Data', columns='patient_ID', values='value')
 .rename_axis(columns=None).reset_index()
)```

Output = Incomplete, only collects a small % of data and reshapes

Clinical Data   1
0   Week0 VISITID_x 15031
1   Week0 VISITID_y 15031
2   Week0 admin_location    1.0
3   Week0 alc_qty   NaN
4   Week0 alc_result    0.0
5   Week0 alc_test  1.0
6   Week0 dose_received 8.0
7   Week0 medication    2.0
8   Week0 no_reason NaN
9   Week0 other_reason  NaN
10  Week0 sr_alcohol    0.0
11  Week0 sr_amphetamine    0.0
12  Week0 sr_benzodiazepine 0.0
13  Week0 sr_cannabis   0.0
14  Week0 sr_cocaine    0.0
15  Week0 sr_methadone  0.0
16  Week0 sr_methanphetamine    0.0
17  Week0 sr_opiates    1.0
18  Week0 sr_other  0.0
19  Week0 sr_oxycodone  0.0
20  Week0 sr_propoxyphene   0.0
21  Week0 supervised    0.0
22  Week0 test_amphetamine  0.0
23  Week0 test_benzodiazepine   0.0
24  Week0 test_cannabis 0.0
25  Week0 test_cocaine  0.0
26  Week0 test_methadone    0.0
27  Week0 test_methamphetamine  0.0
28  Week0 test_opiate300    1.0
29  Week0 test_oxycodone    0.0
30  Week0 test_performed    1.0
31  Week0 test_propoxyphene 0.0
python pandas machine-learning bigdata reshape
© www.soinside.com 2019 - 2024. All rights reserved.