我正在尝试合并到Pandas中的excel文件。
import pandas as pd
import numpy as np
upload_raw = pd.read_excel(r'C:\Users\Desktop\Upload Raw Data.xlsx',
sheet_name = 'Upload',
header = 0,
index_col = 0,
)
mapping = pd.read_excel(r'C:\Users\Desktop\Mapping.xlsx',
sheet_name = 'Mapping',
header = 0,
index_col = 0,
)
display(upload_raw)
display(mapping)
upload_lookup=upload_raw.merge(mapping,on ='BRANCH',how = 'outer' )
display(upload_lookup)
我继续得到KeyError: 'BRANCH'
。我检查了列的值都是文本。 Mapping文件有3列,而上传大约有4列。
上传原始数据
BRANCH DEPT CREAT_TS RAF_IND
AA &CR 2018-06-22-06.48.49.601000
03 CUE 2018-06-22-11.43.29.859000
90 T0L 2018-06-22-11.54.52.633000
映射数据:
BRANCH UNIT MASTER
03 MAS CoE
04 NAS ET
05 ET ET
在错误消息中,这些非常突出。
# validate the merge keys dtypes. We may need to coerce
# work-around for merge_asof(right_index=True)
# duplicate columns & possible reduce dimensionality
我该如何避免这个问题。
我甚至尝试过left_on = 'True', right_on = 'True'
left_key = 'lkey', right_key = 'rkey'
。我收到错误'找不到rkey
此致,任。
主要区别似乎是我没有将'BRANCH'设置为索引。
此外,映射'BRANCH'作为int64导入,因为该示例仅包含数字,而upload_raw'BRANCH'作为对象导入。
upload_raw = pd.read_excel('data/2018-09-03_data_mapping.xlsx',
sheet_name = 'Upload',
header = 0)
mapping = pd.read_excel(r'data/2018-09-03_data_mapping.xlsx',
sheet_name = 'Mapping',
header = 0)
print(upload_raw)
output:
BRANCH DEPT CREAT_TS RAF_IND
0 AA &CR 2018-06-22-06.48.49.601000 NaN
1 3 CUE 2018-06-22-11.43.29.859000 NaN
2 90 T0L 2018-06-22-11.54.52.633000 NaN
mapping['BRANCH'] = mapping['BRANCH'].astype('object')
print(mapping)
output:
BRANCH UNIT MASTER
0 3 MAS CoE
1 4 NAS ET
2 5 ET ET
upload_lookup=pd.merge(left=upload_raw, right=mapping, on='BRANCH')
print(upload_lookup)
output:
BRANCH DEPT CREAT_TS RAF_IND UNIT MASTER
0 3 CUE 2018-06-22-11.43.29.859000 NaN MAS CoE