多标签文本数据可视化

Question

我有多标签文本数据。我希望在python中以一些好的图形可视化这些数据，以了解我的数据中存在多少重叠，并且还想知道重叠中是否存在任何模式，例如当class_1的40％时间到来时，class_40即将到来太。

数据采用以下形式：

paragraph_1  class_1
paragraph_11 class_2
paragraph_1  class_2
paragraph_1  class_3
paragraph_13 class_3

可视化此类数据的最佳方法是什么？在这种情况下哪个库可以帮助seaborn，matplotlib等？

Answer 1

你可以试试这个：

%matplotlib inline
import matplotlib.pylab as plt
from collections import Counter

x = ['paragraph1', 'paragraph1','paragraph1','paragraph1','paragraph2', 'paragraph2','paragraph3','paragraph1','paragraph4']
y = ['class1','class1','class1', 'class2','class3','class3', 'class1', 'class3','class4']


# count the occurrences of each point
c = Counter(zip(x,y))

# create a list of the sizes, here multiplied by 10 for scale
s = [10*c[(xx,yy)] for xx,yy in zip(x,y)]

plt.grid()
# plot it
plt.scatter(x, y, s=s)
plt.show()

出现的越高，标记越大。

不同的问题，但@James提出的相同答案可以在这里找到：How to have scatter points become larger for higher density using matplotlib?

Edit1（如果您有更大的数据集）使用热图的不同方法：

import numpy as np
from collections import Counter
import seaborn as sns
import pandas as pd

x = ['paragraph1', 'paragraph1','paragraph1','paragraph1','paragraph2', 'paragraph2','paragraph3','paragraph1','paragraph4']
y = ['class1','class1','class1', 'class2','class3','class3', 'class1', 'class3','class4']

# count the occurrences of each point
c = Counter(zip(x,y))

# fill pandas DataFrame with zeros
dff = pd.DataFrame(0,columns =np.unique(x) , index =np.unique(y))

# count occurencies and prepare data for heatmap
for k,v in c.items():
    dff[k[0]][k[1]] = v

sns.heatmap(dff,annot=True, fmt="d")

多标签文本数据可视化

问题描述投票：0回答：1

1个回答

最新问题

多标签文本数据可视化

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1