在python中创建(lemma, NER类型)的元组,Nlp问题。

问题描述 投票:0回答:1

我写了下面的代码,并为它做了一个字典,但我想创建元组的(lemma,NER类型)和收集计数的元组,我不知道如何做到这一点,你可以请你帮助我吗?NER类型是指名称实体识别

text = """
Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.
"""

doc = nlp(text).ents
en = [(entity.text, entity.label_) for entity in doc]
en
#entities
#The list stored in variable entities is has type list[list[tuple[str, str]]],
#from pprint import pprint
pprint(en)

sum(filter(None, entities), [])

from collections import defaultdict
type2entities = defaultdict(list)
for entity, entity_type in sum(filter(None, entities), []):
  type2entities[entity_type].append(entity)

from pprint import pprint
pprint(type2entities)


python nlp nltk
1个回答
1
投票

我希望下面的代码片段能解决你的问题。

import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

text = ("Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.")

doc = nlp(text)

lemma_ner_list = [] 

for entity in doc.ents: 
   lemma_ner_list.append((entity.lemma_, entity.label_)) 

# print list of lemma ner tuples

print(lemma_ner_list)

# print count of tuples

print(len(lemma_ner_list))
© www.soinside.com 2019 - 2024. All rights reserved.