我正在使用 Plotly Express 绘制两个直方图。第一个只是绘制汽车广告的分布及其行驶里程数。第二个是对于与里程相同的垃圾箱,我正在绘制这些汽车的价格分布的直方图。现在,对于第二个图,我想在箱的中间添加一个数据点,该数据点表示该箱中所有价格的总和除以同一箱中的汽车数量。 和 例如:例如在上面显示的图片中,我想要 41.81B/8186 = 5963 万的数据点。 PFB下面的代码
# Creating mil_counts and price_counts DataFrames
mil_counts = df.groupby(['mileage']).size().sort_values(ascending=False).reset_index(name='count')
fig = make_subplots(rows=1, cols=2)
# Create a Plotly Express histogram trace for mileage
mileage_histogram_trace = px.histogram(mil_counts, x="mileage", y="count", title="Mileage", nbins=20)
# Add the mileage histogram trace to the first column
fig.add_trace(go.Histogram(histfunc="sum", x=mileage_histogram_trace.data[0]['x'], y=mileage_histogram_trace.data[0]['y'],
name="Mileage", nbinsx=20), row=1, col=1)
# Create a Plotly Express histogram trace for price
price_histogram_trace = px.histogram(df, x="mileage", y="price", title="Price", nbins=20)
# Add the price histogram trace to the second column
fig.add_trace(go.Histogram(histfunc="sum", x=price_histogram_trace.data[0]['x'], y=price_histogram_trace.data[0]['y'],
name="Price", nbinsx=20), row=1, col=2)
# Calculate the average price of cars in each mileage category
mileage_x = mileage_histogram_trace.data[0]['x']
avg_price = [48813640000/8186 , ]
for x_value in mileage_x:
indices = np.where(price_histogram_trace.data[0]['x'] == x_value)[0]
if len(indices) > 0:
avg_price.append(np.mean(price_histogram_trace.data[0]['y'][indices]))
else:
avg_price.append(0)
# Add the line trace (superimposed on the bar) with a secondary y-axis
fig.add_trace(go.Scatter(x=mileage_x, y=avg_price, mode='lines', name="Average Price (Line)", yaxis="y2"), row=1, col=2)
# Update the layout if needed
fig.update_layout(
title_text="Mileage and Price Histograms",
xaxis=dict(title="Mileage", domain=[0, 0.4]),
yaxis=dict(title="Sum of Counts"),
xaxis2=dict(title="Mileage", domain=[0.6, 0.9]),
yaxis2=dict(title="Average Price", side="right"),
xaxis3=dict(title="Mileage", domain=[0.95, 1.0]),
yaxis3=dict(title="Average Price (Line)", side="right"),
)
fig.show()
我只想用 20 个数据点(从散点图)画一条线,每个数据点代表历史箱的中间和该箱中价格的平均值。例如,在上面显示的图片中,我想要 41.81B/8186 = 5963 万的数据点。当前代码正在做的是添加额外的数据点,因为 Price_histogram_trace.data[0]['x'] 已经有 70k 个数据点,而 mileage_histogram_trace.data[0]['x'] 有 7k 个数据点,以匹配它正在添加平均价格对于数据框中的每个里程观察
如果您想在直方图顶部绘制分箱数据的平均线,那么您必须计算每个分箱的平均值。这是一个例子:
import plotly.graph_objects as go
import numpy as np
m = 200000
x = np.linspace(1, m, 100, dtype=int)
y = np.sin((x+m)/(m/2))*m + m
n_bins = 20
chunk_size = len(y)//n_bins
y_avg = [sum(y[i*chunk_size:(i*chunk_size)+chunk_size])/chunk_size for i in range(n_bins)]
fig = go.Figure(data=[
go.Histogram(x=x, y=y, nbinsx=n_bins, histfunc='sum', name='histogram'),
go.Scatter(x=x[::len(x)//n_bins]+x[chunk_size]/2, y=y_avg, name='average of data')
])
fig.show()
不过,您需要考虑到,与箱中的数据数量相比,每个箱中数据的平均值可能会变得相当小,具体取决于您为直方图使用的聚合函数。