在Python中绘制有向图吗?

问题描述 投票:1回答:2

我正在尝试制作有向图或Sankey图(任何可行的图)以进行客户状态迁移。数据如下所示,计数表示从当前状态迁移到下一个状态的用户数量。

**current_state         next_state          count**
New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673

我已经编写了构建sankey的代码,但是该图不容易阅读。寻找可读的有向图。这是我的代码:

    df = pd.read_csv('input.csv')

    x = list(set(df.current_state.values) | set(df.next_state))
    di = dict()

    count = 0
    for i in x:
        di[i] = count
        count += 1

    #
    df['source'] = df['current_state'].apply(lambda y : di[y])
    df['target'] = df['next_state'].apply(lambda y : di[y])


    #
    fig = go.Figure(data=[go.Sankey(
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = x,
          color = "blue"
        ),
        link = dict(
          source = df.source, 
          target = df.target,
          value = df['count']
      ))])


    #
    fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
        width=1000,
        height=1000,
        margin=go.layout.Margin(
            l=50,
            r=50,
            b=100,
            t=100,
            pad=4
        ))
    fig.show()
python plotly directed-graph
2个回答
2
投票

对于有向图,graphviz将是我选择的工具,而不是Python。

以下脚本txt2dot.py将您的数据转换为graphviz的输入文件:

text = '''New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673'''

# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')

# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))

print('digraph foo {')
for n in nodes:
    print(f'    {n};')
print()
for item in edges:
    print('    ', item[0],  ' -> ', item[1],  ' [label="', item[2], '"];', sep='')
print('}')

正在运行python3 txt2dot.py > foo.dot将导致:

digraph foo {
    Applied;
    End;
    IntentDetected;
    InterestedInJob;
    JobRecommended;
    NewProfile;
    NotInterestedInJob;
    NotOpted;
    ProfileCreated;
    ProfileInitiated;

    NewProfile -> ProfileInitiated [label="37715"];
    ProfileInitiated -> End [label="36411"];
    JobRecommended -> End [label="6202"];
    NewProfile -> End [label="6171"];
    ProfileCreated -> JobRecommended [label="5799"];
    ProfileInitiated -> ProfileCreated [label="4360"];
    NewProfile -> NotOpted [label="3751"];
    NotOpted -> ProfileInitiated [label="2817"];
    JobRecommended -> InterestedInJob [label="2542"];
    IntentDetected -> ProfileCreated [label="2334"];
    ProfileCreated -> IntentDetected [label="1839"];
    InterestedInJob -> Applied [label="1671"];
    JobRecommended -> NotInterestedInJob [label="1477"];
    NotInterestedInJob -> ProfileCreated [label="1408"];
    IntentDetected -> End [label="1325"];
    NotOpted -> End [label="1009"];
    InterestedInJob -> ProfileCreated [label="975"];
    Applied -> IntentDetected [label="912"];
    NotInterestedInJob -> IntentDetected [label="720"];
    Applied -> ProfileCreated [label="701"];
    InterestedInJob -> End [label="673"];
}

正在运行dot -o foo.png -Tpng foo.dot给出:

graphviz image


0
投票

这将创建一个基本的Sankey图,假定您:

  1. 将数据保存在名为state_migration.csv的文件中
  2. 用破折号/下划线/什么都没有替换标签(状态名称)中的空格
  3. 用逗号替换列之间的空格
  4. 已经安装了numpy和matplotlib

2和3可以很容易地使用任何非史前文本编辑器,甚至是Python本身,如果其中包含大量数据。我强烈建议您避免使用未加引号的空格。

import plotly.graph_objects as go
import numpy as np
import matplotlib

if __name__ == '__main__':

  with open('state_migration.csv', 'r') as finput:
    info = [[ _ for _ in _.strip().lower().split(',') ]
                for _ in finput.readlines()[1:]]
  info_t = [*map(list,zip(*info))] # info transposed

  # this exists to map the data to plotly's node indexing format
  index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}

  fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = list(index.keys()),
      color = np.random.choice( list(matplotlib.colors.cnames.values()),
                                size=len(index.keys()), replace=False )
    ),
    link = dict(
      source = [index[_] for _ in info_t[0]],
      target = [index[_] for _ in info_t[1]],
      value = info_t[2]
  ))])

fig.update_layout(title_text="State Migration", font_size=12)
fig.show()

您可以拖动节点。如果要预定义它们的位置或检查其他参数,请参见this

我使用的数据是输入的纯净版本:

currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673

我将“ New Profile”更改为现有状态“ New”,因为该图否则很奇怪。随时根据需要进行调整。

我使用的库绝对不需要您想要的,我只是对它们更加熟悉。对于有向图,罗兰·史密斯(Roland Smith)为您服务。也可以使用Plotly完成,请参见其gallery

  • Plotly的替代品,按优先顺序:matplotlib,seaborne,ggplot,raw dot / graphviz
  • matplotlib仅在此处用于提供具有预定义的十六进制颜色的列表
  • numpy仅用于从列表中选择一个随机值而不进行替换(在这种情况下为颜色)

在Python 3.8.1上测试]

© www.soinside.com 2019 - 2024. All rights reserved.