我正在使用 Seaborn 的 histplot 函数绘制两个直方图。第一个直方图代表我的整个数据集,而第二个直方图是第一个直方图的子集。然而,第二个直方图似乎并没有像预期的那样与第一个直方图重叠。这是我正在使用的代码:
import numpy as np
from scipy.stats import norm
data = np.sin(np.arange(0, 6*np.pi, 0.1)) * 100
sns.scatterplot(x=[np.mean(data)], y=[0])
sns.lineplot(data)
population_size = 10000
sample_size = 100
total_means = []
for x in range(population_size):
total_means.append(np.mean(np.random.choice(data, 100)))
total_means = np.array(total_means)
sns.histplot(total_means, kde=True)
# Q. Find the range for 68% of data will lie in that interval
from scipy.stats import norm
z1 = norm.ppf(.50 - .68/2)
se = np.array(data).std() / sample_size ** .5
x1 = z1 * se + np.array(data).mean()
z2 = norm.ppf(.50 + .68/2)
x2 = z2 * se + np.array(data).mean()
print(x1, x2)
plt.xticks(np.arange(total_means.min(), total_means.max(), 10))
plt.xticks(np.arange(0, 500, 100))
sns.histplot(total_means, kde=True)
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde=True, color='r')
在 Stack Overflow 上,建议避免发布完整代码。不过,我有一些数据可以用来快速解决问题,而无需生成新数据。
在我的代码中,最后两行绘制了两个直方图。然而,从结果图中可以明显看出,这些直方图并未按预期重叠。
sns.histplot(total_means, kde=True)
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde=True, color='r')
不确定我是否正确理解了你的问题。但是,您提供的代码似乎确实有重叠的图表。只是还有其他一些东西使它难以辨认。
清理后也是这样。如果这就是您的意思,请告诉我。
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import norm
data = np.sin(np.arange(0, 6*np.pi, 0.1)) * 100
population_size = 10000
sample_size = 100
total_means = []
for x in range(population_size):
total_means.append(np.mean(np.random.choice(data, 100)))
total_means = np.array(total_means)
# Q. Find the range for 68% of data will lie in that interval
from scipy.stats import norm
z1 = norm.ppf(.50 - .68/2)
se = np.array(data).std() / sample_size ** .5
x1 = z1 * se + np.array(data).mean()
z2 = norm.ppf(.50 + .68/2)
x2 = z2 * se + np.array(data).mean()
plt.xticks(np.arange(total_means.min(), total_means.max(), 10))
sns.histplot(total_means, kde = True, color='g')
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde = True, color='r')
希望这有帮助。