在直方图之上添加另一个图

问题描述 投票:0回答:1

我正在使用 Plotly Express 绘制两个直方图。第一个只是绘制汽车广告的分布及其行驶里程数。第二个是对于与里程相同的垃圾箱,我正在绘制这些汽车的价格分布的直方图。现在,对于第二个图,我想在箱的中间添加一个数据点,该数据点表示该箱中所有价格的总和除以同一箱中的汽车数量。 Histogram 1 with the counts of Ads per Mileage binHistogram 2 with the sum of all the car prices per mileage bins 例如:例如在上面显示的图片中,我想要 41.81B/8186 = 5963 万的数据点。 PFB下面的代码

# Creating mil_counts and price_counts DataFrames
mil_counts = df.groupby(['mileage']).size().sort_values(ascending=False).reset_index(name='count')

fig = make_subplots(rows=1, cols=2)

# Create a Plotly Express histogram trace for mileage
mileage_histogram_trace = px.histogram(mil_counts, x="mileage", y="count", title="Mileage", nbins=20)

# Add the mileage histogram trace to the first column
fig.add_trace(go.Histogram(histfunc="sum", x=mileage_histogram_trace.data[0]['x'], y=mileage_histogram_trace.data[0]['y'], 
                            name="Mileage", nbinsx=20), row=1, col=1)

# Create a Plotly Express histogram trace for price
price_histogram_trace = px.histogram(df, x="mileage", y="price", title="Price", nbins=20)

# Add the price histogram trace to the second column
fig.add_trace(go.Histogram(histfunc="sum", x=price_histogram_trace.data[0]['x'], y=price_histogram_trace.data[0]['y'], 
                            name="Price", nbinsx=20), row=1, col=2)

# Calculate the average price of cars in each mileage category
mileage_x = mileage_histogram_trace.data[0]['x']
avg_price = [48813640000/8186 , ]

for x_value in mileage_x:
    indices = np.where(price_histogram_trace.data[0]['x'] == x_value)[0]
    if len(indices) > 0:
        avg_price.append(np.mean(price_histogram_trace.data[0]['y'][indices]))
    else:
        avg_price.append(0)

# Add the line trace (superimposed on the bar) with a secondary y-axis
fig.add_trace(go.Scatter(x=mileage_x, y=avg_price, mode='lines', name="Average Price (Line)", yaxis="y2"), row=1, col=2)

# Update the layout if needed
fig.update_layout(
    title_text="Mileage and Price Histograms",
    xaxis=dict(title="Mileage", domain=[0, 0.4]),
    yaxis=dict(title="Sum of Counts"),
    xaxis2=dict(title="Mileage", domain=[0.6, 0.9]),
    yaxis2=dict(title="Average Price", side="right"),
    xaxis3=dict(title="Mileage", domain=[0.95, 1.0]),
    yaxis3=dict(title="Average Price (Line)", side="right"),
)


fig.show()

我只想用 20 个数据点(从散点图)画一条线,每个数据点代表历史箱的中间和该箱中价格的平均值。例如,在上面显示的图片中,我想要 41.81B/8186 = 5963 万的数据点。当前代码正在做的是添加额外的数据点,因为 Price_histogram_trace.data[0]['x'] 已经有 70k 个数据点,而 mileage_histogram_trace.data[0]['x'] 有 7k 个数据点,以匹配它正在添加平均价格对于数据框中的每个里程观察

python-3.x plotly histogram subplot line-plot
1个回答
0
投票

如果您想在直方图顶部绘制分箱数据的平均线,那么您必须计算每个分箱的平均值。这是一个例子:

import plotly.graph_objects as go
import numpy as np

m = 200000
x = np.linspace(1, m, 100, dtype=int)
y = np.sin((x+m)/(m/2))*m + m
n_bins = 20
chunk_size = len(y)//n_bins
y_avg = [sum(y[i*chunk_size:(i*chunk_size)+chunk_size])/chunk_size for i in range(n_bins)]

fig = go.Figure(data=[
    go.Histogram(x=x, y=y, nbinsx=n_bins, histfunc='sum', name='histogram'),
    go.Scatter(x=x[::len(x)//n_bins]+x[chunk_size]/2, y=y_avg, name='average of data')
])

fig.show()

不过,您需要考虑到,与箱中的数据数量相比,每个箱中数据的平均值可能会变得相当小,具体取决于您为直方图使用的聚合函数。

© www.soinside.com 2019 - 2024. All rights reserved.