Seaborn 中的重叠直方图

问题描述 投票:0回答:1

我正在使用 Seaborn 的 histplot 函数绘制两个直方图。第一个直方图代表我的整个数据集,而第二个直方图是第一个直方图的子集。然而,第二个直方图似乎并没有像预期的那样与第一个直方图重叠。这是我正在使用的代码:

import numpy as np
from scipy.stats import norm

data = np.sin(np.arange(0, 6*np.pi, 0.1)) * 100
sns.scatterplot(x=[np.mean(data)], y=[0])
sns.lineplot(data)

population_size = 10000
sample_size = 100
total_means = []
for x in range(population_size):
    total_means.append(np.mean(np.random.choice(data, 100)))

total_means = np.array(total_means)
sns.histplot(total_means, kde=True)

# Q. Find the range for 68% of data will lie in that interval
from scipy.stats import norm
z1 = norm.ppf(.50 - .68/2)
se = np.array(data).std() / sample_size ** .5
x1 = z1 * se + np.array(data).mean()
z2 = norm.ppf(.50 + .68/2)
x2 = z2 * se + np.array(data).mean()
print(x1, x2)

plt.xticks(np.arange(total_means.min(), total_means.max(), 10))
plt.xticks(np.arange(0, 500, 100))
sns.histplot(total_means, kde=True)
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde=True, color='r')

在 Stack Overflow 上,建议避免发布完整代码。不过,我有一些数据可以用来快速解决问题,而无需生成新数据。

在我的代码中,最后两行绘制了两个直方图。然而,从结果图中可以明显看出,这些直方图并未按预期重叠。

sns.histplot(total_means, kde=True)
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde=True, color='r')

python matplotlib seaborn histogram central-limit-theorem
1个回答
0
投票

不确定我是否正确理解了你的问题。但是,您提供的代码似乎确实有重叠的图表。只是还有其他一些东西使它难以辨认。

清理后也是这样。如果这就是您的意思,请告诉我。

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import norm

data = np.sin(np.arange(0, 6*np.pi, 0.1)) * 100

population_size = 10000
sample_size = 100
total_means = []
for x in range(population_size):
    total_means.append(np.mean(np.random.choice(data, 100)))

total_means = np.array(total_means)

# Q. Find the range for 68% of data will lie in that interval
from scipy.stats import norm
z1 = norm.ppf(.50 - .68/2)
se = np.array(data).std() / sample_size ** .5
x1 = z1 * se + np.array(data).mean()
z2 = norm.ppf(.50 + .68/2)
x2 = z2 * se + np.array(data).mean()


plt.xticks(np.arange(total_means.min(), total_means.max(), 10))
sns.histplot(total_means, kde = True, color='g')
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde = True, color='r')

希望这有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.