[在Python3中,从包含歌词/字幕/其他的现有.txt文件中,我想做一个简单的清单(没有任何嵌套)现有单词的集合,没有空格或其他插入符号。
基于其他StackExchange请求,我发出了这个
import csv
crimefile = open('she_loves_you.txt', 'r')
reader = csv.reader(crimefile)
allRows = list(reader) # result is a list with nested lists
ultimate = []
for i in allRows:
ultimate += i # result is a list with elements longer than one word
ultimate2 = []
for i in ultimate:
ultimate2 += i # result is a list with elements which are single letters
我希望的结果是
['She', 'loves', 'you', 'yeah', 'yeah', 'yeah', 'She', 'loves', 'you', ...]
================================================= ======================
同样有趣的是要理解为什么代码(它作为上述代码的扩展运行):
import re
print (re.findall(r"[\w']+", ultimate))
带来以下错误:
Traceback (most recent call last):
File "4.4.4.csv.into.list.py", line 72, in <module>
print (re.findall(r"[\w']+", ultimate))
File "/usr/lib/python3.7/re.py", line 223, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
错误消息已完全清除"expected string or bytes-like object"
。这意味着您的ultimate
应该转换为字符串(str)
,并且当您检查type
的ultimate
是list
对象时。
>>> type(ultimate)
<class 'list'>
# or
>>> type([])
<class 'list'>
根据您的情况;
print (re.findall(r"[\w']+", str(ultimate))) # original text
# or
print (re.findall(r"[\w']+", ' '.join(ultimate))) # joined words
尝试一下:
import csv
crimefile = open('she_loves_you.txt', 'r')
reader = csv.reader(crimefile)
allRows = list(reader) # result is a list with nested lists
ultimate = []
for i in allRows:
ultimate += i.split(" ")