python的plt.hist()方法有问题吗？

Question

我有 2 个名称为 merged 和 initial 的数据框。第二个是第一个的子集。我正在绘制两个数据集每一列的直方图来比较它们。我看到第二个数据框的值存在一些差异，这些差异不应该存在，因为第二个数据框是第一个数据框的子集。为了确保列的值，我打印了两个数据框的值。所以对于 fragC 列，我有以下值 [13.01 46.03 12.05 64.08 14.04] 和 [13.01 64.08] 如您所见，第二个是第一个的子集。当我绘制直方图时，我收到了这个

OPERA 是第二个数据框。这对于第二个数据框来说很奇怪，它看起来有第一个数据框中不存在的值，但事实并非如此。我正在使用以下代码进行绘图

for column in common_columns:
    # Exclude the excluded_columns from the comparison
    if column not in excluded_columns:
        print("")
        our_values = df1[column].values
        opera_values = df2[column].values
        print(column)
        print(our_values)
        print(opera_values)
        # Plot the distribution for df1 and df2
        plt.figure(figsize=(10, 6))
        plt.hist(df1[column], bins=20, alpha=0.5, label='our dataset')
        plt.hist(df2[column], bins=20, alpha=0.5, label='OPERA')
        plt.xlabel('Values')
        plt.ylabel('Frequency')
        plt.title(f'Distribution Comparison for Column: {column}')
        plt.legend()
        plt.tight_layout()
        plt.show()

数据框的列大小非常大，但下面我只提供特定的列

{0: 13.01, 1: 46.03, 2: 12.05, 3: 64.08, 4: 14.04}
{0: 13.01, 1: 64.08}

Answer 1

原因是bin spread不一样。第一个数据集有 20 个 bin，从 12.05 运行到 64.08。第二个数据集有 20 个 bin，从 13.01 运行到 64.08。

如果你想让 bins 从 0 开始，你需要指定它。

python的plt.hist()方法有问题吗？

问题描述投票：0回答：1

1个回答

最新问题

python的plt.hist()方法有问题吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1