KeyError：1738而在python中使用for循环

Question

目标是使用停用词，词干等清理数据。我有一个语句列表，所以我使用for循环遍历每一行进行数据清理。如果我尝试在单行上执行步骤，则可以正常工作，但是当我尝试使用for循环时，它将引发KeyError。

代码：

corpus = []
for i in range(0, 2000):
  action = re.sub('[^a-zA-Z]', ' ', data1['Action'][i])
  action = action.lower()
  action = action.split()
  ps = PorterStemmer()
  action = [ps.stem(word) for word in action if not word in set(stopwords.words('english'))]
  action = ' '.join(action)
  corpus.append(action)

错误：

Traceback (most recent call last):

  File "<ipython-input-44-86c3af9b2191>", line 2, in <module>
    action = re.sub('[^a-zA-Z]', ' ', data1['Action'][i])

  File "C:\Users\bcpuser\anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Users\bcpuser\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))

  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item

KeyError: 1738

我相信这是某种语法错误？

Answer 1

您正在迭代一个固定的数字。 for i in range(0, 2000):不在乎有多少列（没有2000）；它会一直尝试直到完成为止。一旦该循环超过列数，您将获得一个KeyError

我们可以创建一个非常基本的DataFrame来近似文件：

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df)

输出为：

   0  1  2
0  1  2  3
1  4  5  6

请注意，列名只是一个递增整数。我们可以用print(list(df.columns))来查看，给出：

[0, 1, 2]

因此，在这个迷你示例中，我们将超调并尝试通过遍历5的范围（类似于您对2000的迭代）来访问列：

for x in range(5):
    print(df[x])

这会走一些路，然后抛出：

Traceback (most recent call last):

  File "D:\github\production_dashboard_v2\app\wip_planning\untitled0.py", line 6, in <module>
    print(df[x])

  File "C:\Users\jpilk\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\jpilk\Anaconda3\lib\site-packages\pandas\core\indexes\range.py", line 352, in get_loc
    raise KeyError(key)

KeyError: 3

这是明智的，因为我们的默认列名称仅升至2。现在，我怀疑您的实际问题（也许是多索引）还有更多工作要做，但是原理是相同的。您可以通过迭代iloc和shape来解决它：

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df.shape)

for x in range(df.shape[1]):
    print(df.iloc[:, x])

关于您要如何处理这些专栏，我不确定。但这解释了您的错误。

KeyError：1738而在python中使用for循环

问题描述投票：3回答：1

1个回答

最新问题

KeyError：1738而在python中使用for循环

问题描述 投票：3回答：1

1个回答

最新问题

问题描述投票：3回答：1