如何按中位数对熊猫的箱线图进行排序?

问题描述 投票:9回答:2

我想通过类别Zdf在数据框X中绘制一个Y列的箱线图。如何按中位数按降序对箱线图进行排序?

import pandas as pd
import random
n = 100
# this is probably a strange way to generate random data; please feel free to correct it
df = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)], 
                   "Y": [random.choice(["a","b","c"]) for i in range(n)],
                   "Z": [random.gauss(0,1) for i in range(n)]})
df.boxplot(column="Z", by=["X", "Y"])

请注意,this question非常相似,但它们使用不同的数据结构。我对pandas比较陌生(并且一般只在python上做了一些教程),所以我无法弄清楚如何使我的数据与那里发布的答案一起工作。这可能更像是重塑而不是绘图问题。也许有一个使用groupby的解决方案?

python pandas boxplot
2个回答
14
投票

您可以在How to sort a boxplot by the median values in pandas中使用答案,但首先需要对数据进行分组并创建新的数据框:

import pandas as pd
import random
import matplotlib.pyplot as plt

n = 100
# this is probably a strange way to generate random data; please feel free to correct it
df = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)], 
                   "Y": [random.choice(["a","b","c"]) for i in range(n)],
                   "Z": [random.gauss(0,1) for i in range(n)]})
grouped = df.groupby(["X", "Y"])

df2 = pd.DataFrame({col:vals['Z'] for col,vals in grouped})

meds = df2.median()
meds.sort(ascending=False)
df2 = df2[meds.index]
df2.boxplot()

plt.show()

plot


11
投票

类似的answer与Alvaro Fuentes的功能形式相比,更具便携性

import pandas as pd

def boxplot_sorted(df, by, column):
  df2 = pd.DataFrame({col:vals[column] for col, vals in df.groupby(by)})
  meds = df2.median().sort_values()
  df2[meds.index].boxplot(rot=90)

boxplot_sorted(df, by=["X", "Y"], column="Z")
© www.soinside.com 2019 - 2024. All rights reserved.