我目前正在创建一个要求几个单词的系统,如果在XML文件中找到该单词的同义词,则替换它。
这是代码:
def wordproc(self, word):
lmtzr = nltk.WordNetLemmatizer()
tokens = nltk.word_tokenize(word)
tokens_lemma = [lmtzr.lemmatize(tokens) for tokens in tokens]
tagged = nltk.pos_tag(tokens)
chunking = nltk.chunk.ne_chunk(tagged)
important_words = []
unimportant_tags = ['MD', 'TO', 'DT', 'JJR', 'CC', 'VBZ']
for x in chunking:
if x[1] not in unimportant_tags:
important_words.append(x[0])
print(important_words)
self.words = (important_words)
print(self.words)
self.loop = len(self.words)
self.xmlparse(self.words, self.loop)
def xmlparse(self, words, loops):
root = ElementTree.parse('data/word-test.xml').getroot()
for i in range(loops):
syn_loc = [word for word in root.findall('word') if word.findtext('mainword') == words]
for nym in syn_loc:
print(nym.attrib)
word_loop = self.loop
new_word = (nym.findtext('synonym'))
words = new_word
print(words)
vf = videoPlay()
vf.moviepy(words)
当wordproc的单词发送到xmlparse函数时,它不起作用。任何指导?或者我错过了一个关键点?任何帮助都会很棒!
编辑:这是一个简短的XML文件
<synwords>
<word>
<mainword>affection</mainword>
<wordtag>N</wordtag>
<synonym>love</synonym>
</word>
<word>
<mainword>sweetie</mainword>
<wordtag>N</wordtag>
<synonym>love</synonym>
</word>
<word>
<mainword>appreciation</mainword>
<wordtag>N</wordtag>
<synonym>love</synonym>
</word>
<word>
<mainword>beloved</mainword>
<wordtag>N</wordtag>
<synonym>love</synonym>
</word>
<word>
<mainword>emotion</mainword>
<wordtag>N</wordtag>
<synonym>love</synonym>
</word>
我期望的结果:
words = ["beloved", "sweetie","affection"]
结果,在与XML比较之后,将会是
words = ["love", "love", "love"]
而不是在xml中查找单词并解析它每次我建议你可以在python词典中映射你的单词和同义词,然后你可以很容易地查找或操纵你想要的。我使用beautifulsoup来解析下面的xml:
xml = """<synwords>
<word>
<mainword>affection</mainword>
<wordtag>N</wordtag>
<synonym>love</synonym>
</word>
.
.
.
<synwords>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(xml, "html.parser") # xml is your xml content
words = soup.find_all('word')
mapped_dict = {word.find("mainword").text: word.find("synonym").text for word in words}
print(mapped_dict)
输出:
{'sweetie': 'love', 'beloved': 'love', 'appreciation': 'love', 'affection': 'love', 'emotion': 'love'}