我在 python panda DataFrame 中有以下数据。我想要类似于 https://stanford.edu/~mwaskom/software/seaborn/examples/grouped_boxplot.html
中的分组箱形图对于每个 id,我希望并排绘制两个箱形图。我该如何实现这一目标。我尝试用 seaborn 包绘制它,但没有成功。
id predicted real
1 [10, 10, 10] [16, 18, 20]
2 [12, 12, 15] [15, 17, 19, 21, 23]
3 [20, 5, 4, 4] [29, 32]
4 [25, 25, 25, 24, 21] [21, 24, 25, 26, 28, 29, 30, 33]
5 [20, 20, 20, 21] [21, 22, 24, 26, 28, 30, 31, 32]
6 [8, 3, 3, 14] [25, 27]
7 [1, 4, 4, 4, 5, 6, 10] [69, 71, 72]
8 [11, 11, 11, 11] [19, 21, 22, 23, 24]
9 [7, 6, 9, 9] [19, 26, 27, 28]
10 [30, 30, 30, 30, 30] [38, 39]
注意您正在查看的示例中的表结构
import seaborn as sns
tips = sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", hue="sex", data=tips, palette="PRGn")
sns.despine(offset=10, trim=True)
tips.head()
我们的目标是将您的桌子布置成这样
from StringIO import StringIO
import pandas as pd
text = """id predicted real
1 [10, 10, 10] [16, 18, 20]
2 [12, 12, 15] [15, 17, 19, 21, 23]
3 [20, 5, 4, 4] [29, 32]
4 [25, 25, 25, 24, 21] [21, 24, 25, 26, 28, 29, 30, 33]
5 [20, 20, 20, 21] [21, 22, 24, 26, 28, 30, 31, 32]
6 [8, 3, 3, 14] [25, 27]
7 [1, 4, 4, 4, 5, 6, 10] [69, 71, 72]
8 [11, 11, 11, 11] [19, 21, 22, 23, 24]
9 [7, 6, 9, 9] [19, 26, 27, 28]
10 [30, 30, 30, 30, 30] [38, 39]"""
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', index_col=0)
df = df.stack().str.strip('[]') \
.str.split(', ').unstack()
df
df1 = df.stack().apply(pd.Series).stack().astype(int) \
.rename_axis(['id', 'reality', None]) \
.rename('value').reset_index(['id', 'reality']) \
.reset_index(drop=True)
df1.head()
sns.boxplot(x='id', y='value', hue='reality', data=df1, palette='PRGn')
sns.despine(offset=10, trim=True)
list type
的 pandas列,最简单的方法是将
pandas.DataFrame.melt
列转换为长格式,然后将 pandas.DataFrame.explode
列表转换为单独的值。sns.catplot
与 kind='box'
创建一个 figure-level
图,而这个 answer 创建一个 axes-level
图。
import pandas as pd
import seaborn as sns
# convert df to long-form and explode the lists
dfm = df.melt(id_vars='id').explode('value')
# plot figure-level boxplot
g = sns.catplot(kind='box', data=dfm, x='id', y='value', hue='variable', height=5, aspect=2)
df
data = {'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'predicted': [[10, 10, 10], [12, 12, 15], [20, 5, 4, 4], [25, 25, 25, 24, 21], [20, 20, 20, 21], [8, 3, 3, 14], [1, 4, 4, 4, 5, 6, 10], [11, 11, 11, 11], [7, 6, 9, 9], [30, 30, 30, 30, 30]],
'real': [[16, 18, 20], [15, 17, 19, 21, 23], [29, 32], [21, 24, 25, 26, 28, 29, 30, 33], [21, 22, 24, 26, 28, 30, 31, 32], [25, 27], [69, 71, 72], [19, 21, 22, 23, 24], [19, 26, 27, 28], [38, 39]]}
df = pd.DataFrame(data)
dfm
id variable value
0 1 predicted 10
0 1 predicted 10
0 1 predicted 10
1 2 predicted 12
1 2 predicted 12
1 2 predicted 15
2 3 predicted 20
2 3 predicted 5
2 3 predicted 4
2 3 predicted 4
3 4 predicted 25
3 4 predicted 25
3 4 predicted 25
3 4 predicted 24
3 4 predicted 21
4 5 predicted 20
4 5 predicted 20
4 5 predicted 20
4 5 predicted 21
5 6 predicted 8
5 6 predicted 3
5 6 predicted 3
5 6 predicted 14
6 7 predicted 1
6 7 predicted 4
6 7 predicted 4
6 7 predicted 4
6 7 predicted 5
6 7 predicted 6
6 7 predicted 10
7 8 predicted 11
7 8 predicted 11
7 8 predicted 11
7 8 predicted 11
8 9 predicted 7
8 9 predicted 6
8 9 predicted 9
8 9 predicted 9
9 10 predicted 30
9 10 predicted 30
9 10 predicted 30
9 10 predicted 30
9 10 predicted 30
10 1 real 16
10 1 real 18
10 1 real 20
11 2 real 15
11 2 real 17
11 2 real 19
11 2 real 21
11 2 real 23
12 3 real 29
12 3 real 32
13 4 real 21
13 4 real 24
13 4 real 25
13 4 real 26
13 4 real 28
13 4 real 29
13 4 real 30
13 4 real 33
14 5 real 21
14 5 real 22
14 5 real 24
14 5 real 26
14 5 real 28
14 5 real 30
14 5 real 31
14 5 real 32
15 6 real 25
15 6 real 27
16 7 real 69
16 7 real 71
16 7 real 72
17 8 real 19
17 8 real 21
17 8 real 22
17 8 real 23
17 8 real 24
18 9 real 19
18 9 real 26
18 9 real 27
18 9 real 28
19 10 real 38
19 10 real 39