使用 pandas 转换数据框进行网络分析

Question

我有一个在线游戏比赛的数据框，包括两个特定的列：比赛的 ID 和参加特定比赛的玩家的 ID。例如：

match_id	player_id
0	1
0	2
0	3
0	4
0	5
1	6
1	1
1	7
1	8
1	2

因此，

player_id

是玩家的唯一标识符。同时，

match_id

是一场比赛的 ID，它会重复固定次数（比如 5 次），因为 5 是能够参加某场比赛的玩家的最大数量。所以在每一行中，

match_id

对应

player_id

意味着某个玩家参加了特定的游戏。

从上表可以看出，两个或更多玩家可以一起玩不止一次（或者他们可以根本没有任何合作）。这就是为什么我有兴趣将这个初始数据框转换为邻接矩阵，其中行和列的交集会给出共同比赛的数量。另一种选择是创建如下数据框：

player_1	player_2	coplays_number
1	2	2
1	3	1
1	4	1
1	10	0
1	5	1
...	...	...

特此，我的任务是准备数据，以便使用

igraph

或

networkx

进一步分析合作网络。我还想得到一个加权网络，即边的权重意味着两个节点（玩家）之间的许多共同比赛。在这种情况下，Edge 意味着两个用户一起玩过，即他们参加过一次同一场比赛，或者他们作为一个团队一起参加过两场或更多场比赛（如上面初始数据示例中的玩家 ID 1 和 2）。

我的问题是：如何将我的初始数据帧转换为网络数据，

igraph

或

networkx

函数将作为参数，使用

pandas

和

numpy

？或者我不需要需要任何数据操作并且

igraph

或

networkx

函数能够与初始数据框一起工作？

提前感谢您的回答和建议！

Answer 1

如果您使用

networkx

和

permutations

中的

itertools

，我认为您不需要

pd.crosstab

：

from itertools import permutations

pairs = (df.groupby('match_id')['player_id']
           .apply(lambda x: list(permutations(x, r=2)))
           .explode())
adj = pd.crosstab(pairs.str[0], pairs.str[1],
                  rownames=['Player 1'], colnames=['Player 2'])

输出：

>>> adj
Player 2  1  2  3  4  5  6  7  8
Player 1                        
1         0  2  1  1  1  1  1  1
2         2  0  1  1  1  1  1  1
3         1  1  0  1  1  0  0  0
4         1  1  1  0  1  0  0  0
5         1  1  1  1  0  0  0  0
6         1  1  0  0  0  0  1  1
7         1  1  0  0  0  1  0  1
8         1  1  0  0  0  1  1  0

如果你想要一个平面列表（不是邻接矩阵），使用

combinations

：

from itertools import combinations

pairs = (df.groupby('match_id')['player_id']
           .apply(lambda x: frozenset(combinations(x, r=2)))
           .explode().value_counts())

coplays = pd.DataFrame({'Player 1': pairs.index.str[0],
                        'Player 2': pairs.index.str[1],
                        'coplays_number': pairs.tolist()})

输出：

>>> coplays
    Player 1  Player 2  coplays_number
0          1         2               2
1          2         4               1
2          6         2               1
3          8         2               1
4          7         2               1
5          1         7               1
6          6         7               1
7          1         8               1
8          6         8               1
9          6         1               1
10         3         5               1
11         1         3               1
12         2         5               1
13         4         5               1
14         2         3               1
15         1         4               1
16         1         5               1
17         3         4               1
18         7         8               1

Answer 2

您可以在

df

上将您的初始

match_id

与自身合并。然后按 player_1、player_2 和

size()

分组以获得加权边数据框。

df.merge(df, how='inner', on='match_id', suffixes=('1', '2'))\
.groupby(['player_id1', 'player_id2'], as_index=False).size()

你还会得到 player_id1 == player_id2 行：这将是玩家参加的比赛总数。

例子

import pandas as pd
import networkx as nx

a, b, c = 'a', 'b', 'c'

df = pd.DataFrame(
{
    'match_id':  [0, 0, 0, 1, 1, 2],
    'player_id': [a, b, c, a, b, c],
})
print(df)

   match_id player_id
0         0         a
1         0         b
2         0         c
3         1         a
4         1         b
5         2         c

edges = df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
.groupby(['player_id1', 'player_id2'], as_index=False).size()
print(edges)

  player_id1 player_id2  size
0          a          a     2
1          a          b     2
2          a          c     1
3          b          a     2
4          b          b     2
5          b          c     1
6          c          a     1
7          c          b     1
8          c          c     2

graph = nx.from_pandas_edgelist(edges, source='player_id1', target='player_id2',
edge_attr='size', create_using=nx.Graph)

pos = nx.spring_layout(graph)
nx.draw_networkx(graph, pos, with_labels=True)
nx.draw_networkx_edge_labels(graph, pos, edge_labels=nx.get_edge_attributes(graph,'size'))

给

您可以使用

create_using=nx.DiGraph

获得：

Networkx 没有绘制它，但自循环是加权的：

>>> graph['a']['a']
{'size': 2}

使用 pandas 转换数据框进行网络分析

问题描述投票：0回答：2

2个回答

例子

最新问题

使用 pandas 转换数据框进行网络分析

问题描述 投票：0回答：2

2个回答

例子

最新问题

问题描述投票：0回答：2