from urllib import request
from redditscore.tokenizer import CrazyTokenizer
tokenizer = CrazyTokenizer()
url = "http://www.site.uottawa.ca/~diana/csi5386/A1_2020/microblog2011.txt"
for line in request.urlopen(url):
tokens = tokenizer.tokenize(line.decode('utf-8'))
#print(tokens)
with open('your_file.txt', 'a') as f:
print(tokens)
for item in tokens:
f.write("%s\n" % item)
在上面的代码中,我的输出是变量标记形式的列表形式。当我尝试将输出打印到文件时,文本将被覆盖。我只得到输出的最后一行
请帮助..
每次循环在下面的部分中运行时,您都在创建tokens
的新实例,所以它被覆盖了
for line in request.urlopen(url):
tokens = tokenizer.tokenize(line.decode('utf-8'))
因此最好将标记附加在列表中
from urllib import request
from redditscore.tokenizer import CrazyTokenizer
tokenizer = CrazyTokenizer()
url = "http://www.site.uottawa.ca/~diana/csi5386/A1_2020/microblog2011.txt"
tokens = []
for line in request.urlopen(url):
tokens.extend(tokenizer.tokenize(line.decode('utf-8')))
#print(tokens)
with open('your_file.txt', 'a') as f:
print(tokens)
for item in tokens:
f.write("%s\n" % item)
这是因为您正在循环中覆盖tokens变量。试试这个:
tokens = []
for line in request.urlopen(url):
tokens.append(tokenizer.tokenize(line.decode('utf-8')))
with open('your_file.txt', 'a') as f:
print(tokens)
for item in tokens:
for token in item:
f.write("%s\n" % token)