KeyError:1738而在python中使用for循环

问题描述 投票:3回答:1

enter image description here目标是使用停用词,词干等清理数据。我有一个语句列表,所以我使用for循环遍历每一行进行数据清理。如果我尝试在单行上执行步骤,则可以正常工作,但是当我尝试使用for循环时,它将引发KeyError。

代码:

corpus = []
for i in range(0, 2000):
  action = re.sub('[^a-zA-Z]', ' ', data1['Action'][i])
  action = action.lower()
  action = action.split()
  ps = PorterStemmer()
  action = [ps.stem(word) for word in action if not word in set(stopwords.words('english'))]
  action = ' '.join(action)
  corpus.append(action)

错误:

Traceback (most recent call last):

  File "<ipython-input-44-86c3af9b2191>", line 2, in <module>
    action = re.sub('[^a-zA-Z]', ' ', data1['Action'][i])

  File "C:\Users\bcpuser\anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Users\bcpuser\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))

  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item

KeyError: 1738

我相信这是某种语法错误?

python data-cleaning text-processing
1个回答
0
投票

您正在迭代一个固定的数字。 for i in range(0, 2000):不在乎有多少列(没有2000);它会一直尝试直到完成为止。一旦该循环超过列数,您将获得一个KeyError


我们可以创建一个非常基本的DataFrame来近似文件:

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df)

输出为:

   0  1  2
0  1  2  3
1  4  5  6

请注意,列名只是一个递增整数。我们可以用print(list(df.columns))来查看,给出:

[0, 1, 2]

因此,在这个迷你示例中,我们将超调并尝试通过遍历5的范围(类似于您对2000的迭代)来访问列:

for x in range(5):
    print(df[x])

这会走一些路,然后抛出:

Traceback (most recent call last):

  File "D:\github\production_dashboard_v2\app\wip_planning\untitled0.py", line 6, in <module>
    print(df[x])

  File "C:\Users\jpilk\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\jpilk\Anaconda3\lib\site-packages\pandas\core\indexes\range.py", line 352, in get_loc
    raise KeyError(key)

KeyError: 3

这是明智的,因为我们的默认列名称仅升至2。现在,我怀疑您的实际问题(也许是多索引)还有更多工作要做,但是原理是相同的。您可以通过迭代ilocshape来解决它:

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df.shape)

for x in range(df.shape[1]):
    print(df.iloc[:, x])

关于您要如何处理这些专栏,我不确定。但这解释了您的错误。

© www.soinside.com 2019 - 2024. All rights reserved.