余弦相似度图混杂在一起,名称一起运行

问题描述 投票:0回答:2

我有一小部分要绘制余弦相似度的文档。文档名称很长,我不知道如何防止它们在绘图上一起运行。文件名如下所示:

['0-W909MY17R0016',
 '10 ID04160056 TOR 3.17.17',
 'ENVG',
 'FA5270-14-R-0027',
 'GSS',
 'H9240819R0001_1Oct19',
 'LCLSC16R0005',
 'LTLMII RFPFINALRELEASED',
 'N00019-15-R-2004',
 'N0010418RK032_for_PR_N0010418NB058',
 'N00164-16-R-JQ94_RFP',
 'N0025319R0001',
 'N6134019R0007_RFP',
 'N66604-18-R-0881_Conformed_Through_Amendment_0006',
 'NGLD_M_Draft_RFP_Final (3)',
 'SOL-615-16-000001_-PLSO_SOL',
 'SPRDL115R0414_0000',
 'W15QKN-18-R-0065_-_MMO',
 'W58RGZ-17-R-0211',
 'W912P618B0009_FB_FAC_SUPPORT_SVCS-_FBO',
 'W91CRB17R0004_STORM_II',
 'Full_Project_Announcement_RIK-OTA-F16EW_03_Jan_2019',
 'MQ-25 Final RFP N00019-17-R-0087',
 'Solicitation N00421-18-R-0091 - Enhanced Visual Acuity (EVA)']

我在文档之间做了一个基本的余弦距离:

from sklearn.metrics.pairwise import cosine_distances
cos_distances = cosine_distances(dtm)
mds_map = MDS(dissimilarity='precomputed')
pos = mds_map.fit_transform(cos_distances)

和基本的matplotlib散点图:

#pos contains the x and y coordinates of each of the documents
x = pos[:,0]
y = pos[:,1]
#we will need matplotlib to generate a scatter plot
import matplotlib.pyplot as plt
for i, j, name in zip(x,y,files):
    plt.scatter(i,j)
    plt.text(i,j,name)


plt.show()

看起来像这样:

enter image description here

我在查找专门处理此问题的文档时遇到了麻烦。

python-3.x matplotlib cosine-similarity
2个回答
1
投票

您可以给出较短的名称,然后使用注释在绘图画布之外进行解释。

查看此stackoverflow帖子。Python: displaying a line of text outside a matplotlib chart


0
投票

您可以使用不同的颜色和/或标记来绘制每个点,并创建图例以放置在可以显示文件名的图的外部:

import numpy as np
import matplotlib.pyplot as plt

# Random 2D points to make scatter plot
x = [np.random.random() for i in range(len(names))]
y = [np.random.random() for i in range(len(names))]

fig = plt.figure(figsize=(20, 8))
ax = plt.subplot(111)

如果您不想手动为每个文件名分配颜色,则可以将pyplot颜色图映射到颜色列表,并在散点图中使用它:

colors = plt.cm.rainbow(np.linspace(0, 1, len(names)))

for i, j, name in zip(x, y, names):
    ax.scatter(i, j, label=name, c=colors[names.index(name)])

fig.subplots_adjust(right=0.6)  # This is needed so that the legend is not cut out of the figure
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize=12)
plt.show()

结果:enter image description here

您可以使用bbox_to_anchor参数来移动图例。

如果要分配单独的颜色或标记,我想到的唯一方法是创建字典。例如:

colors = plt.cm.rainbow(np.linspace(0, 1, len(names)))

plot_names = {'0-W909MY17R0016': [colors[0], 'o'],
              '10 ID04160056 TOR 3.17.17': [colors[1], 'x'],
              'ENVG': [colors[2], '*'],
              'FA5270-14-R-0027': [colors[3], '^']}

 for i, j, name in zip(x, y, names):
    ax.scatter(i, j, label=name, c=plot_names[name][0], marker=plot_names[name][1])

fig.subplots_adjust(right=0.6)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize=12)

plt.show()

结果:enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.