我目前正在从事一个数据可视化项目。我需要使用 Bokeh 创建堆叠条形图。数据来源于定期更新的 Excel 文件,包含多个输入和输出。数据的结构可能会有所不同,这意味着输入和输出的数量可能会发生变化。我的目标是确保 Bokeh 图能够自动适应这些变化,而不需要手动调整代码。堆积条形图应该能够动态调整以适应 Excel 文件中输入和输出数量的变化。它应该能够可视化固定输入的各种组合(例如输入 1 固定、输入 2 固定、输入 3 变量),同时显示所有相应的输出。理想情况下,该解决方案将自动读取 Excel 文件,检测输入和输出的结构,并相应地更新图表。
我尝试可视化一组场景(2 个输入和 3 个输出)的堆积条形图。在示例 Excel 文件中,数据存储在如下方案中:
示例方案:
输入_1 | 输入_2 | 输出_1 | 输出_2 | 输出_3 |
---|---|---|---|---|
1 | 1 | 100 | 200 | 200 |
2 | 1 | 150 | 150 | 200 |
3 | 1 | 200 | 100 | 200 |
1 | 2 | 200 | 200 | 100 |
2 | 2 | 150 | 200 | 150 |
3 | 2 | 100 | 200 | 200 |
1 | 3 | 200 | 100 | 200 |
2 | 3 | 200 | 150 | 150 |
3 | 3 | 200 | 200 | 100 |
它适用于这个静态场景:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource
data_frame = pd.read_excel("example.xlsx")
data_frame['Input_1'] = data_frame['Input_1'].astype(str)
data_frame['Input_2'] = data_frame['Input_2'].astype(str)
output_file("stacked_bar_charts.html")
unique_input_1 = data_frame['Input_1'].unique()
unique_input_2 = data_frame['Input_2'].unique()
plots_for_input_1 = []
plots_for_input_2 = []
for value in unique_input_1:
filtered_data = data_frame[data_frame["Input_1"] == value]
filtered_data = filtered_data.sort_values(by='Input_2')
source = ColumnDataSource(filtered_data)
plot = figure(title=f"Input_1 = {value} fixed",
x_range=filtered_data["Input_2"].unique(),
height=300,
width=500
)
plot.xaxis.axis_label = "Input_2"
plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
x="Input_2",
width=0.9,
color=["orange", "gray", "brown"],
source=source,
legend_label=["Output 1", "Output 2", "Output 3"]
)
plots_for_input_1.append(plot)
for value in unique_input_2:
filtered_data = data_frame[data_frame['Input_2'] == value]
filtered_data = filtered_data.sort_values(by='Input_1')
source = ColumnDataSource(filtered_data)
plot = figure(title=f"Input_2 = {value} fixed",
x_range=filtered_data["Input_1"].unique(),
height=300,
width=500
)
plot.xaxis.axis_label = "Input_1"
plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
x="Input_1",
width=0.9,
color=["orange", "gray", "brown"],
source=source,
legend_label=["Output 1", "Output 2", "Output 3"]
)
plots_for_input_2.append(plot)
grid_for_input_1 = gridplot(plots_for_input_1, ncols=1)
grid_for_input_2 = gridplot(plots_for_input_2, ncols=1)
final_layout = row(grid_for_input_1, grid_for_input_2)
show(final_layout)
以下是输出示例图片,以展示我如何可视化数据:
散景示例图 1:
散景示例图 2:
我无法找到一种适用于改变输入和输出的动态方法,例如改变两个输入,同时保持其他输入不变,并通过堆叠条形图可视化对所有输出的影响。此外,该方法应该有效地适应数据更新,而不需要针对每个新场景手动调整代码。
您可以在嵌套循环
中使用
groupby
来绘制所有可能的I/O组合:
output_file("stacked_bar_charts.html")
grids = []
for inp_col in inp_cols:
inplots = []
for name, sub_df in data_frame.groupby(inp_col):
for inp_diff in inp_cols.difference([inp_col], sort=False):
plot = figure(
title=TITLE(inp_col, name),
x_range=data_frame[inp_diff].unique(),
height=H, width=W,
)
plot.xaxis.axis_label = inp_diff
_ = plot.vbar_stack(
stackers=out_cols, x=inp_diff,
width=BAR_WIDTH, color=COLORS,
source=ColumnDataSource(sub_df.sort_values(inp_diff)),
legend_label=out_cols.tolist(),
)
inplots.append(plot)
grids.append(gridplot(inplots, ncols=NCOLS))
final_layout = row(*grids)
show(final_layout)
输出(
stacked_bar_charts.html"
):
使用的配置:
import pandas as pd
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, output_file, show
# PD-PREPROCESS
data_frame = pd.read_excel("example.xlsx")
inp_cols = data_frame.filter(like="Input").columns
out_cols = data_frame.filter(like="Output").columns
data_frame = data_frame.astype(dict.fromkeys(inp_cols, str))
# BOKEH-CONFIG
TITLE = "{} = {} fixed".format
COLORS = ["orange", "gray", "brown"] # depends on outputs
BAR_WIDTH = 0.9
H, W = 300, 500
NCOLS = 1