具有数据框的列表的交叉联接(笛卡尔积)

问题描述 投票:2回答:1

我有一个列表和一个数据框。

import pandas as pd
work_station = ['A','B','C']
name = ['Mike','Tom','Scott','Tracy']
salary = ['60000','50000','100000','90000']
df = pd.DataFrame({'name':name,'salary':salary})

我想将work_station和df交叉连接在一起,因此输出如下所示:

station     Name    salary
  A         Mike    60000
  A         Tom     50000
  A         Scott   100000
  A         Tracy   90000
  B         Mike    60000
  B         Tom     50000
  B         Scott   100000
  B         Tracy   90000
  C         Mike    60000
  C         Tom     50000
  C         Scott   100000
  C         Tracy   90000

我尝试使用*功能

df1 = work_station * salary 

但是它不能正常工作

TypeError: can't multiply sequence by non-int of type 'list'

有任何建议吗?谢谢!

python pandas list cross-join
1个回答
2
投票

轻松自如,将concatkeys参数一起使用:

(pd.concat([df] * len(work_station), keys=work_station)
   .reset_index(level=1, drop=True)
   .rename_axis('station')
   .reset_index()
)

   station   name  salary
0        A   Mike   60000
1        A    Tom   50000
2        A  Scott  100000
3        A  Tracy   90000
4        B   Mike   60000
5        B    Tom   50000
6        B  Scott  100000
7        B  Tracy   90000
8        C   Mike   60000
9        C    Tom   50000
10       C  Scott  100000
11       C  Tracy   90000

您还可以使用笛卡尔积乘merge路线:

(pd.DataFrame(work_station, columns=['station'])
  .assign(foo=1)
  .merge(df.assign(foo=1))
  .drop('foo', 1)
)

   station   name  salary
0        A   Mike   60000
1        A    Tom   50000
2        A  Scott  100000
3        A  Tracy   90000
4        B   Mike   60000
5        B    Tom   50000
6        B  Scott  100000
7        B  Tracy   90000
8        C   Mike   60000
9        C    Tom   50000
10       C  Scott  100000
11       C  Tracy   90000
© www.soinside.com 2019 - 2024. All rights reserved.