我有以下字符串:
txt='agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
这是分隔符:
delimiters = " \t,;.?!-:@[](){}_*/"
作为输出,我想此值列表:
"agadsfa","2asdf","sdfsaf","asfsadf","adsf","klnalfk","jn234kmafs","adfs","nlnawr23"
我尝试使用正则表达式:
re.split(delimiters,txt)
但我发现了这个错误:
re.error: unterminated character set at position 10
这里有什么问题?
正则表达式是不正确。而从评论,您已经添加了delimiters
字符串不被感动的需求。
我们需要做的是什么,是处理输入字符串,并将其转换成可以通过split()
使用适当的正则表达式。这是如何做:
# need to enclose regex in [], we want to split on any of
# the chars; also some of the chars need to be escaped
delimiters = ' \t,;.?!-:@[](){}_*/'
regex = delimiters.replace(']', '\]').replace('-', '\-')
regex = r'[{}]+'.format(regex)
如预期的结果:
txt = 'agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
re.split(regex, txt)
=> ['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
Python 3的代码
import re
txt="agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23"
delimiters = "_|;|,|\)|\(|\[|\]"
list(filter(None, re.split(delimiters, txt)))
产量
['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
由单独的符号|并使用蟒蛇名单过滤功能,避免空字符串
你必须使用|
分裂您的分隔符:
delimiters = r' |\t|,|;|\.|\?|!|-|:|@|\[|\]|\(|\)|\{|\}|_|\*|/'
# then use this to eliminate empty strings if you have two delimiters next to each other
print([w for w in re.split(delimiters,txt) if w])
# or list(filter(lambda a: a, re.split(delimiters,txt)))
结果是:
['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
尝试这个:
import re
txt = "agadsfa_(2asdf_sdfsaf)asfs?adf[adsf_klna!lfk;jn234kmafs)adfs, nlnawr*23"
line = re.sub(
r"[ \t,;\.?!\-:@\[\](){}_*/]+",
r",",
txt
)
print(line.split(","))