我在下面有这个数据框:
df=pd.DataFrame({'cnpj':[410000132,410000132,4830624000197,4830624000197,4830624000197],'Nome Pessoa':['EUGENIO LUPORINI NETO','JUAN MATIAS SERAGOPIAN','EUGENIO LUPORINI NETO','SIMONE FANKHAUSER','ALEX SOUZA']})
print(df)
cnpj Nome Pessoa
0 410000132 EUGENIO LUPORINI NETO
1 410000132 JUAN MATIAS SERAGOPIAN
2 4830624000197 EUGENIO LUPORINI NETO
3 4830624000197 SIMONE FANKHAUSER
4 4830624000197 ALEX SOUZA
每个cnpj
是一家公司。每个Nome Pessoa
都是一个人。我想为每个出现的Nome Pessoa
列出与他相同的cnpj
(最好没有重复)。换句话说,我将以cnpj
作为键列出人们之间的关系,以df看起来像这样(或至少接近它)的方式:
cnpj Nome Pessoa Relations
0 410000132 EUGENIO LUPORINI NETO ['JUAN MATIAS SERAGOPIAN','SIMONE FANKHAUSER','ALEX SOUZA']
1 410000132 JUAN MATIAS SERAGOPIAN ['EUGENIO LUPORINI NETO']
2 4830624000197 EUGENIO LUPORINI NETO ['JUAN MATIAS SERAGOPIAN','SIMONE FANKHAUSER','ALEX SOUZA']
3 4830624000197 SIMONE FANKHAUSER ['EUGENIO LUPORINI NETO','ALEX SOUZA']
4 4830624000197 ALEX SOUZA ['EUGENIO LUPORINI NETO','SIMONE FANKHAUSER']
例如,df['Relations'][0] = ['JUAN MATIAS SERAGOPIAN','SIMONE FANKHAUSER','ALEX SOUZA']
之所以这样,是因为JUAN MATIAS SERAGOPIAN与EUGENIO LUPORINI NETO(410000132)出现在同一cnpj中,而SIMONE FANKHAUSER和ALEX SOUZA与EUGENIO(4830624000197)出现在另一个cnpj中]
我想这可能是groupby区域中的东西,但是不确定如何实现。
我在下面有这个数据框:df = pd.DataFrame({'cnpj':[410000132,410000132,4830624000197,4830624000197,4830624000197],'Nome Pessoa':['EUGENIO LUPORINI NETO','JUAN MATIAS SERAGOPIAN',' EUGENIO LUPORINI ...
以下作品:
您可以对其中的查询使用apply
,并将结果附加到DataFrame: