Python KeyError：对于flow_from_dataframe中的x_col值，用于引用现有列

Question

我的以下代码如下：

   train_generator = aug.flow_from_dataframe(dataframe = z1,
                                              directory = None,
                                              x_col = 'id',
                                              y_col = 'label',
                                              class_mode = 'categorical',
                                              target_size = (mysize, mysize),
                                              shuffle = True,
                                              batch_size = mybatch,
                                              seed = 40)
    val_datagen = ImageDataGenerator(rescale = 1./255)
    val_generator = val_datagen.flow_from_dataframe(z2,
                                                   x_col = 'id',
                                                   y_col = 'label',
                                                   class_mode = 'categorical',
                                                   target_size = (mysize, mysize),
                                                   batch_size = mybatch,
                                                   seed = 41)
    test_datagen = ImageDataGenerator(rescale = 1./255)
    test_generator = test_datagen.flow_from_dataframe(testfiles,
                                                     x_col = 'id',
                                                     directory = None,
                                                     color_mode = 'rgb',
                                                     target_size = (mysize, mysize),
                                                     batch_size = 1,
                                                     class_mode = None,
                                                     shuffle = False,
                                                     seed = 42)

我已检查，并且我的DataFrames z1，z2和testfiles中都有一列，标记为id。这是其中每个的head()：

z1.head()

Output:
label   id
0   0.0 train/cat.5077.jpg
1   0.0 train/cat.2718.jpg
2   0.0 train/cat.10151.jpg
3   0.0 train/cat.3406.jpg
4   0.0 train/cat.4369.jpg

z2.head()

Output:
label   id
6   0.0 train/cat.8553.jpg
7   0.0 train/cat.9895.jpg
9   0.0 train/cat.6218.jpg
11  0.0 train/cat.12020.jpg
17  0.0 train/cat.10637.jpg

testfiles.head()

Output:
    id

[testfiles是一个空的DataFrame，但它确实包含名称为id的列。

所以我得到一个KeyError，这很混乱，它显示如下：

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
//anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'id'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-15-2a56a9e3c4bf> in <module>
     40                                                      class_mode = None,
     41                                                      shuffle = False,
---> 42                                                      seed = 42)

//anaconda3/lib/python3.7/site-packages/keras/preprocessing/image.py in flow_from_dataframe(self, dataframe, directory, x_col, y_col, weight_col, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, save_to_dir, save_prefix, save_format, subset, interpolation, validate_filenames, **kwargs)
    592             interpolation=interpolation,
    593             validate_filenames=validate_filenames,
--> 594             **kwargs
    595         )
    596 

//anaconda3/lib/python3.7/site-packages/keras/preprocessing/image.py in __init__(self, dataframe, directory, image_data_generator, x_col, y_col, weight_col, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, data_format, save_to_dir, save_prefix, save_format, subset, interpolation, dtype, validate_filenames)
    233             interpolation=interpolation,
    234             dtype=dtype,
--> 235             validate_filenames=validate_filenames)
    236 
    237 

//anaconda3/lib/python3.7/site-packages/keras_preprocessing/image/dataframe_iterator.py in __init__(self, dataframe, directory, image_data_generator, x_col, y_col, weight_col, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, data_format, save_to_dir, save_prefix, save_format, subset, interpolation, dtype, validate_filenames)
    144         if class_mode not in ["input", "multi_output", "raw", None]:
    145             self.classes = self.get_classes(df, y_col)
--> 146         self.filenames = df[x_col].tolist()
    147         self._sample_weight = df[weight_col].values if weight_col else None
    148 

//anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

//anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'id'

有人知道这里发生了什么吗？据我了解，KeyError是在引用字典中不包含的值（即不存在的列）时发生的。

据我了解，另一个post中引用了类似的问题，但似乎没有解决：

Answer 1

[图像数据生成器首先验证您的[[DataFrame中的文件是否全部存在，如果没有要验证的内容，则提高Key error。

如果您检查错误跟踪日志。
# get labels for each observation if class_mode not in ["input", "multi_output", "raw", None]: self.classes = self.get_classes(df, y_col) self.filenames = df[x_col].tolist() <<< ERROR HERE self._sample_weight = df[weight_col].values if weight_col else None
此
转换
DataFrame df中包含的所有值都转换为列表。因此，如果您进一步挖掘。
if self.columns.nlevels > 1: return self._getitem_multilevel(key) indexer = self.columns.get_loc(key) <<< ERROR HERE if is_integer(indexer): indexer = [indexer]
您可以看到该行，即key中的column是用于引用的索引。保证传递空数组[]都会引发错误，因为您将使用Nothing作为参考。
而且，图像数据生成器仅
返回值
，这意味着当您将其送入空数组时将无用。为此，您可以尝试使用空字符串testfiles = pd.DataFrame([''], columns = ['id'])创建一个DataFrame以避免遇到Key Error。

Python KeyError：对于flow_from_dataframe中的x_col值，用于引用现有列

问题描述投票：1回答：1

1个回答

最新问题

Python KeyError：对于flow_from_dataframe中的x_col值，用于引用现有列

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1