我的麻烦始于JSON文件,其中包含某些“设备”信息以及用于不同设备的某些参数。
我能够将每个设备json捕获为每个设备的单行DataFrame。他们将有40-60列,包括普通列。
示例数据如下:
可复制的代码:
df1 = pd.DataFrame({'id': {0: 1122},
'c1': {0: 'uid'},
'c2': {0: 'iopw'},
'c3': {0: 'uywy'},
'c4': {0: '7uyw'},
'c5': {0: 'iwoq'},
'c6': {0: 'owoe'}}
)
df2 = pd.DataFrame({'id': {0: 9910},
'c1': {0: 'mnjjj'},
'c3': {0: 'mhji'},
'c6': {0: 'mb '},
'c8': {0: 'bly'},
'c14': {0: 'bnhg'},
'c15': {0: 'kkkl'},
'c20': {0: 'llug'},
'c25': {0: '87jo'}})
df3 = pd.DataFrame({'id': {0: 2020},
'c4': {0: 'kvkh'},
'c5': {0: 'kjhjkh'},
'c10': {0: 'cvcvc'},
'c15': {0: 'ququ'}})
我已经尝试过合并,但是下面我尝试过的代码中的问题是它正在创建重复的列。
dfs = [df1, df2, df3]
from functools import reduce
df_final = reduce(lambda left,right: pd.merge(left,right,on='id',how="outer"), dfs)
如何避免重复,或者是否有其他更干净的方式来合并或合并表,以便避免重复的列?
预期输出如下所示。它应具有3行,并具有正确的列数
{'id': {0: 1122, 1: 9910, 2: 2020},
'c1': {0: 'uid', 1: 'mnjj', 2: nan},
'c2': {0: 'iopw', 1: nan, 2: nan},
'c3': {0: 'uywy', 1: nan, 2: nan},
'c4': {0: '7uyw', 1: nan, 2: 'kvkh'},
'c5': {0: 'iwoq', 1: nan, 2: 'kjhjkh'},
'c6': {0: 'owoe', 1: 'mb', 2: nan},
'c7': {0: nan, 1: nan, 2: nan},
'c8': {0: nan, 1: 'bly', 2: nan},
'c9': {0: nan, 1: nan, 2: nan},
'c10': {0: nan, 1: nan, 2: 'cvcvc'},
'c11': {0: nan, 1: nan, 2: nan},
'c12': {0: nan, 1: nan, 2: nan},
'c13': {0: nan, 1: nan, 2: nan},
'c14': {0: nan, 1: 'bnhg', 2: nan},
'c15': {0: nan, 1: 'kkkl', 2: 'ququ'},
'c16': {0: nan, 1: nan, 2: nan},
'c17': {0: nan, 1: nan, 2: nan},
'c18': {0: nan, 1: nan, 2: nan},
'c19': {0: nan, 1: nan, 2: nan},
'c20': {0: nan, 1: 'llug', 2: nan},
'c21': {0: nan, 1: nan, 2: nan},
'c22': {0: nan, 1: nan, 2: nan},
'c23': {0: nan, 1: nan, 2: nan},
'c24': {0: nan, 1: nan, 2: nan},
'c25': {0: nan, 1: '87jo', 2: nan}}
将concat
和concat
一起使用id
创建的索引: