找不到资源点。但是,它已下载并安装

问题描述 投票:0回答:1

我在数据框中有以下几列。

Unnamed: 0, title, publication, author, year, month, title.1, content, len_article, gensim_summary, split_words, first_100_words

我正在尝试运行这一小段代码。

import nltk
nltk.download('punkt')
# TOKENIZE
df.first_100_words = df.first_100_words.str.lower()
df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))

最后一行代码将引发错误。我收到此错误消息。

df.first_100_words = df.first_100_words.str.lower()
df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))
Traceback (most recent call last):

  File "<ipython-input-129-42381e657774>", line 2, in <module>
    df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\core\series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer

  File "<ipython-input-129-42381e657774>", line 2, in <lambda>
    df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))

  File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\tokenize\__init__.py", line 144, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)

  File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\tokenize\__init__.py", line 105, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))

  File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\data.py", line 868, in load
    opened_resource = _open(resource_url)

  File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\data.py", line 993, in _open
    return find(path_, path + ['']).open()

  File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\data.py", line 701, in find
    raise LookupError(resource_not_found)

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/en.pickle

  Searched in:
    - 'C:\\Users\\ryans/nltk_data'
    - 'C:\\Users\\ryans\\Anaconda3\\nltk_data'
    - 'C:\\Users\\ryans\\Anaconda3\\share\\nltk_data'
    - 'C:\\Users\\ryans\\Anaconda3\\lib\\nltk_data'
    - 'C:\\Users\\ryans\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

我对所有标记化技术都还很陌生。

示例代码来自此站点。

https://github.com/AustinKrause/Mod_5_Text_Summarizer/blob/master/Notebooks/Text_Cleaning_and_KMeans.ipynb

python python-3.x dataframe nltk tokenize
1个回答
0
投票
只需添加

nltk.download('punkt') SENT_DETECTOR = nltk.data.load('tokenizers/punkt/english.pickle')
© www.soinside.com 2019 - 2024. All rights reserved.