如何在Python语料库上使用“ collocation_list”函数?

问题描述 投票:1回答:1

我是Python的新手,请尝试导入我自己的语料库以在其文本中查找搭配词。我正在使用Python 3.7.5。并遵循Bird,Klein和Loper的教科书指示。

但是,当我尝试在整个语料库上使用“ collocation_list”时,环境返回“'ConcatenatedCorpusView'对象没有属性'collocation_list'”,而当我在单独的文本上使用它时,它的“'StreamBackedCorpusView'对象则没有属性“ collocation_list””。

我应该怎么做才能在语料库文本中找到并置词?

我试图调用“ import nltk.collocations”,但是,它当然不起作用...

>>> from nltk.corpus import PlaintextCorpusReader
>>> eng_corpus_root = 'D:\Corpus\EN'
>>> eng_corpus = PlaintextCorpusReader(eng_corpus_root, '.*')
>>> eng = eng_corpus.words()

>>> eng.collocation_list()
Traceback (most recent call last):
  File "<pyshell#39>", line 1, in <module>
    eng.collocation_list()
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'collocation_list'

>>> eng1 = eng_corpus.words('CNN/2019.10.18_EN_CNN 2.txt')

>>> eng1.collocation_list()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    eng1.collocation_list()
AttributeError: 'StreamBackedCorpusView' object has no attribute 'collocation_list'

如果我能得到像这样的结果(上面提到的教科书中的一个例子),那将是很棒的。

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

>>> text4.collocation_list()
['United States', 'fellow citizens', 'four years', 'years ago', 'Federal Government', 'General Government', 'American people', 'Vice President', 'God bless', 'Chief Justice', 'Old World', 'Almighty God', 'Fellow citizens', 'Chief Magistrate', 'every citizen', 'one another', 'fellow Americans', 'Indian tribes', 'public debt', 'foreign nations']

非常感谢您的帮助...

python attributes nltk corpus collocation
1个回答
0
投票

问题已解决...我需要初始化我的语料库(请参阅:http://www.nltk.org/api/nltk.html#nltk.text.Text

>>> from nltk.text import Text
>>> text458 = Text(eng_corpus.words())
>>> text458.collocation_list()
['Hong Kong', 'United States', 'Getty Images', 'European Union', 'Northern Ireland', 'Boris Johnson', 'Prime Minister', 'Islamic State', 'Extinction Rebellion', 'Cape Dorset', 'extradition bill', 'Recep Tayyip', 'HONG KONG', 'Mike Pence', 'New York', 'Tayyip Erdogan', 'Democratic Forces', 'Vice President', 'Anthony Kwan', 'Kurdish fighters']

很简单。

© www.soinside.com 2019 - 2024. All rights reserved.