我读过很多类似的问题,但它们都很老了(看看这个,这正是我正在寻找的问题https://community.plotly.com/t/how-to-color-boxplot-by-continuous-色阶/9215)。
看这个例子:
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.stats import norm
a = norm(0, 2).rvs(100)
b = norm(2, 2).rvs(100)
c = norm(4, 2).rvs(100)
vals = np.concatenate([a, b, c])
cat = list(''.join([f'{i}' * 100 for i in ['a', 'b', 'c']]))
tdf = pd.DataFrame({
'Category': cat,
'Values': vals
})
px.box(tdf, x='Category', y='Values')
这是输出
我想使用色标根据每个框的平均值为每个框着色。
这可以在 Plotly 或 Dash 中做到吗?
对于需要解决此问题的其他人,这是我的解决方案。但这不会显示图例。
def colored_boxplot(df, x, y, cmap_name='viridis'):
"""Create a boxplot using plotly. Each box will be colored according to its median value
Args:
df (_type_): pandas.DataFrame
x (_type_): column name of the x axis
y (_type_): column name of the y axis
cmap_name (str, optional): matplotlib colormap name. Defaults to 'viridis'.
"""
# grouping_col = 'Category' = x
# value_col = 'Values' = y
medians = df.groupby(x).median()[y]
miny = min(medians)
maxy = max(medians)
def get_color(value, minx, maxx, cmap_name='viridis'):
norm = plt.Normalize(minx, maxx)
norm_val = (value - minx) / (maxx - minx)
cmap = plt.get_cmap(cmap_name)
value_color = cmap(norm_val)
return value_color
fig = go.Figure()
for cat in df[x].unique():
sdf = df[df[x] == cat]
fig.add_trace(go.Box(
x=sdf[x],
y=sdf[y],
marker=dict(
color=f"rgba{get_color(sdf[y].median(), miny, maxy)}"
)
))
return fig
我不太擅长统计,但我认为我写的解决方案很好。也许统计逻辑是错误的,但我编写代码的方式希望是好的。
我使用box函数的
color
参数将这个任务委托给Plotly。我创建了 3 个列表来存储平均值的副本。当我创建 Box
时,我指定它可能遵循哪个 euristic (color='Mean'
):
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.stats import norm
a = norm(0, 2).rvs(100)
mean = np.mean(a)
a_mean = np.array([mean for _ in range(100)])
b = norm(2, 2).rvs(100)
mean = np.mean(b)
b_mean: np.array = np.array([mean for _ in range(100)])
c = norm(4, 2).rvs(100)
mean = np.mean(c)
c_mean: np.array = np.array([mean for _ in range(100)])
cat = list(''.join([f'{i}' * 100 for i in ['a', 'b', 'c']]))
vals = np.concatenate([a, b, c])
mean = np.concatenate([a_mean, b_mean, c_mean])
tdf = pd.DataFrame({
'Category': cat,
'Values': vals,
'Mean': mean
})
result = px.box(tdf, x='Category', y='Values', color='Mean')
result.show()
结果是: