我有2个数据帧,如下所示:
df1 =
City Date Data1
LA 2020-01-01 20
LA 2020-01-02 30
NY 2020-01-01 50
df2 =
City Date Data2
LA 2020-01-01 2.5
LA 2020-01-02 1
LA 2020-01-03 7
NY 2020-01-01 6.5
我想根据'城市'和'日期'合并或合并这两个文件,这样结果将是:
City Date Data1 Data2
LA 2020-01-01 20 2.5
LA 2020-01-02 30 1
NY 2020-01-01 50 6.5
我尝试过的事情:
pd.concat([df1.set_index(['Country','Date'],[df1.set_index(['Country','Date'])], axis = 1)
我收到错误消息:ValueError:无法处理非唯一的多索引!
由于我将日期作为索引,因此我无法合并。
[Idea是由GroupBy.cumcount
创建的新列的重复数据删除对:
GroupBy.cumcount
如果需要删除助手级别print (df2)
City Date Data2
0 LA 2020-01-01 2.5
1 LA 2020-01-02 1.0 <- duplicates
2 LA 2020-01-02 7.0 <- duplicates
3 NY 2020-01-01 6.5
df1 = (df1.assign(g = df1.groupby(['City','Date']).cumcount())
.set_index(['City','Date','g']))
df2 = (df2.assign(g = df2.groupby(['City','Date']).cumcount())
.set_index(['City','Date','g']))
df = pd.concat([df1, df2], axis = 1)
print (df)
Data1 Data2
City Date g
LA 2020-01-01 0 20.0 2.5
2020-01-02 0 30.0 1.0
1 NaN 7.0
NY 2020-01-01 0 50.0 6.5
:
g
编辑:我认为这里有必要将两列都转换为DataFrame,然后将内部联接与df = pd.concat([df1, df2], axis = 1).reset_index(level=2, drop=True)
print (df)
Data1 Data2
City Date
LA 2020-01-01 20.0 2.5
2020-01-02 30.0 1.0
2020-01-02 NaN 7.0
NY 2020-01-01 50.0 6.5
一起使用:
DataFrame.merge