我有一个包含电影评论的文本文件,其中每个新行('\ n')代表一个新电影/新文档。但是,我很难将它们添加到如下序列中:
如果给出示例文本文件:
chemistry leads outstanding another story white people learn black people humanity
trappings green book on
already visionary director coogler outdone film fits larger marvel universe
innovative directors stretching across multiple
前两行是第一个文档,后面是接下来的两行,即第二个文档。
因此,目标是将这些句子转换为“列表中的列表”,如下所示:
[[chemistry, leads, outstanding, another, story, white, people, learn, black, people, humanity, trappings, green, book, on]
, [already, visionary, director, coogler, outdone, film, fits, larger, marvel, universe, innovative, directors, stretching, across, multiple]]
我遇到的问题是我不知道如何阅读有不同文档/行的地方,并将它们全部附加到现有列表中的新列表中。有人可以帮忙吗?提前致谢。
我会待命回答更多问题。谢谢
更多信息:我使用以下代码写入文件:
if count == 1:
with open('moviedata1.txt', 'a') as f:
for item in reviews:
f.write(item)
f.close()
else:
if page == 1:
with open('moviedata1.txt', 'a') as f:
f.write('\n')
for item in reviews:
f.write(item)
f.close()
else:
with open('moviedata1.txt', 'a') as f:
for item in reviews:
f.write(item)
f.close()
你可以使用zip
并迭代:
s = ''' chemistry leads outstanding another story white people learn black people humanity
trappings green book on
already visionary director coogler outdone film fits larger marvel universe
innovative directors stretching across multiple'''
lst = []
splitted = s.split('\n')
for x, y in zip(splitted[::2], splitted[1::2]):
lst.append(x.split() + y.split())
print(lst)
# [['chemistry', 'leads', 'outstanding', 'another', 'story', 'white', 'people', 'learn', 'black', 'people', 'humanity', 'trappings', 'green', 'book', 'on'],
# ['already', 'visionary', 'director', 'coogler', 'outdone', 'film', 'fits', 'larger', 'marvel', 'universe', 'innovative', 'directors', 'stretching', 'across', 'multiple']]