我如何绘制多列数据的标准化计数图

Question

我有一些使用按年份组织的给定编程语言的个人数据。每种语言都是一种功能，给定的受访者可以使用多种语言。数据看起来像这样-

id | year | java | c++ | python
-------------------------------
0  | 2011 | 0    | 1   | 0
1  | 2011 | 1    | 1   | 0
…
15 | 2012 | 1    | 1   | 0
16 | 2012 | 1    | 0   | 1
…
300| 2015 | 0    | 0   | 1
…

现在我们在2011年可以有100行，在2012年可以有500行，在2015年可以有1000行，依此类推。我想比较一下逐年使用的语言的受欢迎程度。我无法做一个简单的计数图，因为给定语言的条形可能会在2011年变小而在2015年变得很大。但是我想让那些显示2011年有5％的条形使用python，而到2015年，看到45％使用python。

我尝试按年份分组后通过汇总数据的方式。这给了我所需的数据，但是我无法提供良好的可视化效果。

我可以将所有数据融化（是吗？），但是称为“语言”的一列，但是我将无法计算/绘制给定年份每种语言的出现百分比。

df_tech = df.groupby('year').agg(['mean'])
df_tech.columns = df_tech.columns.get_level_values(0)

year | java | c++ | python
--------------------------
2011 | .342 | .432| .133
2012 | .43  | .48 | .211
...
2015 | .534 | .373| .622
...

我一直未能在x轴上的每个图中绘制每个特征。我已经尝试过使用countplot，barplot等，但是无法显示多个功能。

理想情况下，我想得到一个图，该图在x轴上有每种语言，并且每年我希望看到一个条形图。

Answer 1

使用此数据框：

print(df)
   year   java    c++  python
0  2011  0.342  0.432   0.133
1  2012  0.430  0.480   0.211
2  2015  0.534  0.373   0.622

DataFrame.set_index + DataFrame.plot

#%matplotlib inline #only if jupyternotebook
df.set_index('year').plot(kind='bar',stacked=True)

您也可以将其绘制为临时函数：

#%matplotlib inline #only if jupyternotebook
df.set_index('year').plot(figsize=(10,10))

Answer 2

将`year`设置为`index`：

数据：

 year    java    c++    python
  2011   0.342  0.432    0.133
  2012   0.430  0.480    0.211
  2015   0.534  0.373    0.622

df.set_index('year', inplace=True)

       java    c++  python
year                      
2011  0.342  0.432   0.133
2012  0.430  0.480   0.211
2015  0.534  0.373   0.622

Seaborn：

import seaborn as sns

sns.lineplot(data=df)

`df.plot()`：

df.plot()

Barplots：

堆积的条形图是可能的，人们可以使用它们，但是它们是显示数据的一种不好的方式，因为人眼很难确定每个类别的相对比例。
绘图的重点是清楚地显示数据，使用除堆叠条形图以外的其他绘图可以做得更好。

df.plot.bar()

使用`seaborn.barplot`：

这需要将数据框重塑为tidy格式，如下所示

df.reset_index(inplace=True)
df_melt = pd.melt(df, id_vars='year', var_name='lang', value_name='usage')
sns.barplot(x='year', y='usage', data=df_melt, hue='lang')

FacetGrid：

order = df_melt.lang.unique()
g = sns.FacetGrid(df_melt, col='year', hue='lang', col_wrap=2)
g = g.map(sns.barplot, 'lang', 'usage', order=order)

我如何绘制多列数据的标准化计数图

问题描述投票：0回答：2

2个回答

将`year`设置为`index`：

数据：

Seaborn：

`df.plot()`：

Barplots：

使用`seaborn.barplot`：

FacetGrid：

最新问题

我如何绘制多列数据的标准化计数图

问题描述 投票：0回答：2

2个回答

将year设置为index：

数据：

Seaborn：

df.plot()：

Barplots：

使用seaborn.barplot：

FacetGrid：

最新问题

问题描述投票：0回答：2

将`year`设置为`index`：

`df.plot()`：

使用`seaborn.barplot`：