我正在尝试制作有向图或Sankey图(任何可行的图)以进行客户状态迁移。数据如下所示,计数表示从当前状态迁移到下一个状态的用户数量。
**current_state next_state count**
New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673
我已经编写了构建sankey的代码,但是该图不容易阅读。寻找可读的有向图。这是我的代码:
df = pd.read_csv('input.csv')
x = list(set(df.current_state.values) | set(df.next_state))
di = dict()
count = 0
for i in x:
di[i] = count
count += 1
#
df['source'] = df['current_state'].apply(lambda y : di[y])
df['target'] = df['next_state'].apply(lambda y : di[y])
#
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = x,
color = "blue"
),
link = dict(
source = df.source,
target = df.target,
value = df['count']
))])
#
fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
width=1000,
height=1000,
margin=go.layout.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
))
fig.show()
对于有向图,graphviz
将是我选择的工具,而不是Python。
以下脚本txt2dot.py
将您的数据转换为graphviz的输入文件:
text = '''New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673'''
# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')
# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))
print('digraph foo {')
for n in nodes:
print(f' {n};')
print()
for item in edges:
print(' ', item[0], ' -> ', item[1], ' [label="', item[2], '"];', sep='')
print('}')
正在运行python3 txt2dot.py > foo.dot
将导致:
digraph foo {
Applied;
End;
IntentDetected;
InterestedInJob;
JobRecommended;
NewProfile;
NotInterestedInJob;
NotOpted;
ProfileCreated;
ProfileInitiated;
NewProfile -> ProfileInitiated [label="37715"];
ProfileInitiated -> End [label="36411"];
JobRecommended -> End [label="6202"];
NewProfile -> End [label="6171"];
ProfileCreated -> JobRecommended [label="5799"];
ProfileInitiated -> ProfileCreated [label="4360"];
NewProfile -> NotOpted [label="3751"];
NotOpted -> ProfileInitiated [label="2817"];
JobRecommended -> InterestedInJob [label="2542"];
IntentDetected -> ProfileCreated [label="2334"];
ProfileCreated -> IntentDetected [label="1839"];
InterestedInJob -> Applied [label="1671"];
JobRecommended -> NotInterestedInJob [label="1477"];
NotInterestedInJob -> ProfileCreated [label="1408"];
IntentDetected -> End [label="1325"];
NotOpted -> End [label="1009"];
InterestedInJob -> ProfileCreated [label="975"];
Applied -> IntentDetected [label="912"];
NotInterestedInJob -> IntentDetected [label="720"];
Applied -> ProfileCreated [label="701"];
InterestedInJob -> End [label="673"];
}
正在运行dot -o foo.png -Tpng foo.dot
给出:
这将创建一个基本的Sankey图,假定您:
2和3可以很容易地使用任何非史前文本编辑器,甚至是Python本身,如果其中包含大量数据。我强烈建议您避免使用未加引号的空格。
import plotly.graph_objects as go
import numpy as np
import matplotlib
if __name__ == '__main__':
with open('state_migration.csv', 'r') as finput:
info = [[ _ for _ in _.strip().lower().split(',') ]
for _ in finput.readlines()[1:]]
info_t = [*map(list,zip(*info))] # info transposed
# this exists to map the data to plotly's node indexing format
index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = list(index.keys()),
color = np.random.choice( list(matplotlib.colors.cnames.values()),
size=len(index.keys()), replace=False )
),
link = dict(
source = [index[_] for _ in info_t[0]],
target = [index[_] for _ in info_t[1]],
value = info_t[2]
))])
fig.update_layout(title_text="State Migration", font_size=12)
fig.show()
您可以拖动节点。如果要预定义它们的位置或检查其他参数,请参见this。
我使用的数据是输入的纯净版本:
currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673
我将“ New Profile”更改为现有状态“ New”,因为该图否则很奇怪。随时根据需要进行调整。
我使用的库绝对不需要您想要的,我只是对它们更加熟悉。对于有向图,罗兰·史密斯(Roland Smith)为您服务。也可以使用Plotly完成,请参见其gallery
在Python 3.8.1上测试]