在 Python 中执行 R 的 inner_join()

问题描述 投票:0回答:2

我有一个名为 Network 的 pandas 数据库,其网络结构如下:

{'Sup': {0: 1002000157,
  1: 1002000157,
  2: 1002000157,
  3: 1002000157,
  4: 1002000157,
  5: 1002000157,
  6: 1002000157,
  7: 1002000157,
  8: 1002000157,
  9: 1002000157,
  10: 1002000157,
  11: 1002000157,
  12: 1002000157,
  13: 1002000382,
  14: 1002000382,
  15: 1002000382,
  16: 1002000382,
  17: 1002000382,
  18: 1002000382,
  19: 1002000382,
  20: 1002000382,
  21: 1002000382,
  22: 1002000382,
  23: 1002000382,
  24: 1002000382,
  25: 1002000382,
  26: 1002000382,
  27: 1002000382,
  28: 1002000382,
  29: 1002000382},
 'Cust': {0: 1002438313,
  1: 8039296054,
  2: 9003188096,
  3: 14900070991,
  4: 17005234747,
  5: 18006860724,
  6: 28000286091,
  7: 29009623382,
  8: 39000007702,
  9: 39004420023,
  10: 46000088397,
  11: 50000063751,
  12: 7000090017,
  13: 1900120936,
  14: 1900779883,
  15: 2000013994,
  16: 2001222824,
  17: 2003032125,
  18: 2900121723,
  19: 2900197555,
  20: 2902742641,
  21: 3000101113,
  22: 3000195031,
  23: 3000318054,
  24: 3900091301,
  25: 3911084436,
  26: 4900112325,
  27: 5900720933,
  28: 7000001703,
  29: 8000004881}}

我想在 python 中重现 R 的这个命令(可能没有内核中断):

NodesSharingSupplier <- inner_join(Network, Network,  by=c('Sup'='Sup'))

现在,如果我是正确的,这是一种内部连接 SQL 风格,因此担心它不能简单地通过 python 中 Sup 的内部合并来执行。

你能帮我弄清楚如何在 python 中重现它吗?

python r join inner-join
2个回答
2
投票

IIUC,你在找

merge

NodesSharingSupplier = Network.merge(Network, on='Sup', how='inner')
print(NodesSharingSupplier)

# Output
            Sup      Cust_x       Cust_y
0    1002000157  1002438313   1002438313
1    1002000157  1002438313   8039296054
2    1002000157  1002438313   9003188096
3    1002000157  1002438313  14900070991
4    1002000157  1002438313  17005234747
..          ...         ...          ...
453  1002000382  8000004881   3911084436
454  1002000382  8000004881   4900112325
455  1002000382  8000004881   5900720933
456  1002000382  8000004881   7000001703
457  1002000382  8000004881   8000004881

[458 rows x 3 columns]

您可以通过在

Cust_x == Cust_y
之后附加
.query('Cust_x != Cust_y')
来删除
.merge(...)
的大小写。

输入:

data = {'Sup': {0: 1002000157, 1: 1002000157, 2: 1002000157, 3: 1002000157, 4: 1002000157, 5: 1002000157, 6: 1002000157, 7: 1002000157, 8: 1002000157, 9: 1002000157, 10: 1002000157, 11: 1002000157, 12: 1002000157, 13: 1002000382, 14: 1002000382, 15: 1002000382, 16: 1002000382, 17: 1002000382, 18: 1002000382, 19: 1002000382, 20: 1002000382, 21: 1002000382, 22: 1002000382, 23: 1002000382, 24: 1002000382, 25: 1002000382, 26: 1002000382, 27: 1002000382, 28: 1002000382, 29: 1002000382},
        'Cust': {0: 1002438313, 1: 8039296054, 2: 9003188096, 3: 14900070991, 4: 17005234747, 5: 18006860724, 6: 28000286091, 7: 29009623382, 8: 39000007702, 9: 39004420023, 10: 46000088397, 11: 50000063751, 12: 7000090017, 13: 1900120936, 14: 1900779883, 15: 2000013994, 16: 2001222824, 17: 2003032125, 18: 2900121723, 19: 2900197555, 20: 2902742641, 21: 3000101113, 22: 3000195031, 23: 3000318054, 24: 3900091301, 25: 3911084436, 26: 4900112325, 27: 5900720933, 28: 7000001703, 29: 8000004881}}
Network = pd.DataFrame(data)

更多信息:Pandas Merging 101


0
投票

内部联接:

merge(df1, df2)
将适用于这些示例,因为 R 会自动按公共变量名称联接框架,但您很可能希望指定
merge(df1, df2, by = "CustomerId")
以确保您只匹配所需的字段。如果匹配变量在不同的数据框中具有不同的名称,您还可以使用
by.x
by.y
参数。

© www.soinside.com 2019 - 2024. All rights reserved.