Anaconda / Jupyter Notebook / Python 3 KeyError：“未找到列”

Question

我正在尝试使用Anaconda / Python 3用以下代码制作条形图，但我一直遇到错误。首先，我读取了一个已格式化为excel / csv文件的数据集，该数据集运行良好，因为我能够成功制作相关图以及其他一些图。我用包含相同数据的excel文件和csv文件尝试了相同的代码，但得到相同的错误。当程序尝试执行负责制作条形图的线时，出现以下错误：

KeyError                                  Traceback (most recent call last)
<ipython-input-4-b72668216a77> in <module>
     64 
     65 # Compare mean and standard deviation between attributes
---> 66 compare = dataset.groupby("target_class")[['mean_profile', 'std_profile', 'kurtosis_profile', 'skewness_profile', 'mean_dmsnr_curve', 'std_dmsnr_curve', 'kurtosis_dmsnr_curve', 'skewness_dmsnr_curve']].mean().reset_index()
     67 # compare = dataset.groupby("target_class")[['mean_profile', 'std_profile', 'kurtosis_profile', 'skewness_profile', 'mean_dmsnr_curve', 'std_dmsnr_curve', 'kurtosis_dmsnr_curve', 'skewness_dmsnr_curve']].mean().reset_index()
     68 

~\Anaconda3\lib\site-packages\pandas\core\base.py in __getitem__(self, key)
    263                 bad_keys = list(set(key).difference(self.obj.columns))
    264                 raise KeyError("Columns not found: {missing}"
--> 265                                .format(missing=str(bad_keys)[1:-1]))
    266             return self._gotitem(list(key), ndim=2)
    267 

KeyError: "Columns not found: 'mean_dmsnr_curve', 'std_profile', 'std_dmsnr_curve', 'skewness_dmsnr_curve', 'mean_profile', 'kurtosis_profile', 'skewness_profile', 'kurtosis_dmsnr_curve'"

[其他图可以正常工作，但比较条形图不可以。这是我要执行的代码：

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import itertools
warnings.filterwarnings("ignore")
#matplotlib inline

print("This is a test right before excel file is read")

# Utilize the pandas library to read the data
dataset = pd.read_csv(r'C:\userpath\Machine Learning Project\pulsar_stars_test.csv')
#dataset = pd.read_excel(r'C:\userpath\Machine Learning Project\pulsar_stars_test.xlsx')


# Print the number of rows and columns that the data has to the user
print("This is the number of rows: ", dataset.shape[0])
print("This is the number of columns: ", dataset.shape[1])

# Use pandas to print out the information about the data
print("This is the data information: ", dataset.info())

# Use pandas to display information about missing data
print("This is the missing data: ", dataset.isnull().sum())

# Make a figure appear to display a dataset summary to the user
plt.figure(figsize = (12, 8))
sns.heatmap(dataset.describe()[1:].transpose(), annot = True, linecolor = "w", linewidth = 2, cmap = sns.color_palette("Set2"))
plt.title("Data Summary")
plt.show()

# Instantiate another figure to display some correlation data to the user
correlation = dataset.corr()
plt.figure(figsize = (10, 8))
sns.heatmap(correlation, annot = True, cmap = sns.color_palette("magma"), linewidth = 2, edgecolor = "k")
plt.title("CORRELATION BETWEEN VARIABLES")
plt.show()

# Compute the proportion of each target variabibble in the dataset
plt.figure(figsize = (12, 6))
plt.subplot(121)
ax = sns.countplot(y = dataset["target_class"], palette = ["r", "g"], linewidth = 1, edgecolor = "k"*2)

for i, j in enumerate(dataset["target_class"].value_counts().values):
    ax.text(.7, i, j, weight = "bold", fontsize = 27)

plt.title("Count for target variable in dataset")

plt.subplot(122)
plt.pie(dataset["target_class"].value_counts().values, labels = ["not pulsar stars", "pulsar stars"], autopct = "%1.0f%%", wedgeprops = {"linewidth":2, "edgecolor":"white"})
#plt.pie(data["target_class"].value_counts().values, labels = ["not pulsar stars", "pulsar stars"], autopct = "%1.0f%%", wedgeprops = {"linewidth":2, "edgecolor":"white"})
my_circ = plt.Circle((0,0), .7, color = "white")
plt.gca().add_artist(my_circ)
plt.subplots_adjust(wspace = .2)
plt.title("Proportion of target variabibble in dataset")
plt.show()

# Compare mean and standard deviation between attributes
compare = dataset.groupby("target_class")[['mean_profile', 'std_profile', 'kurtosis_profile', 'skewness_profile', 'mean_dmsnr_curve', 'std_dmsnr_curve', 'kurtosis_dmsnr_curve', 'skewness_dmsnr_curve']].mean().reset_index()
# compare = dataset.groupby("target_class")[['mean_profile', 'std_profile', 'kurtosis_profile', 'skewness_profile', 'mean_dmsnr_curve', 'std_dmsnr_curve', 'kurtosis_dmsnr_curve', 'skewness_dmsnr_curve']].mean().reset_index()

compare = compare.drop("target_class", axis = 1)
compare.plot(kind = "bar", width = 0.6, figsize = (13,6), colormap = "Set2")
plt.grid(True, alpha = 0.3)
plt.title("COMPARING MEAN OF ATTRIBUTES FOR TARGET CLASSES")

# Second comparison plot
compare1.dataset.groupby("target_class")[['mean_profile', 'std_profile', 'kurtosis_profile', 'mean_dmsnr_curve', 'std_dmsnr_curve','kurtosis_dmsnr_curve','skewness_dmsnr_curve']].mean().reset_index()
compare1 = compare1.drop("target_class", axis = 1)
compare1.plot(kind = "bar", width = 0.6, figsize = (13, 6), colormap = "Set2")
plt.grid(True, alpha = 0.3)
plt.title("COMPARING STANDARD DEVIATION OF ATTRIBUTES FOR TARGET CLASSES")
plt.show()

该程序在程序中首次调用groupby函数的行（第66行）时失败。有人知道如何解决此问题吗？

Answer 1

    compare = pd.Dataframe(index=dataset.index)

您必须先创建一个数据框并将其分配给变量，然后再进行分组。

Anaconda / Jupyter Notebook / Python 3 KeyError：“未找到列”

问题描述投票：0回答：1

1个回答

最新问题

Anaconda / Jupyter Notebook / Python 3 KeyError：“未找到列”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1