输出文件中的文本被覆盖

问题描述 投票:0回答:2
from urllib import request
from redditscore.tokenizer import CrazyTokenizer
tokenizer = CrazyTokenizer()
url = "http://www.site.uottawa.ca/~diana/csi5386/A1_2020/microblog2011.txt"
for line in request.urlopen(url):
    tokens = tokenizer.tokenize(line.decode('utf-8'))
    #print(tokens)
with open('your_file.txt', 'a') as f:
    print(tokens)
    for item in tokens:
        f.write("%s\n" % item)

在上面的代码中,我的输出是变量标记形式的列表形式。当我尝试将输出打印到文件时,文本将被覆盖。我只得到输出的最后一行

请帮助..

python nlp tokenize
2个回答
0
投票

每次循环在下面的部分中运行时,您都在创建tokens的新实例,所以它被覆盖了

for line in request.urlopen(url):
    tokens = tokenizer.tokenize(line.decode('utf-8'))

因此最好将标记附加在列表中

from urllib import request
from redditscore.tokenizer import CrazyTokenizer
tokenizer = CrazyTokenizer()
url = "http://www.site.uottawa.ca/~diana/csi5386/A1_2020/microblog2011.txt"

tokens = []
for line in request.urlopen(url):
    tokens.extend(tokenizer.tokenize(line.decode('utf-8')))
    #print(tokens)

with open('your_file.txt', 'a') as f:
    print(tokens)
    for item in tokens:
        f.write("%s\n" % item)

0
投票

这是因为您正在循环中覆盖tokens变量。试试这个:

tokens = []
for line in request.urlopen(url):
    tokens.append(tokenizer.tokenize(line.decode('utf-8')))

with open('your_file.txt', 'a') as f:
    print(tokens)
    for item in tokens:
        for token in item:
            f.write("%s\n" % token)
© www.soinside.com 2019 - 2024. All rights reserved.