从Unicode字符串中删除文件名中禁用字符的最有效方法[复制]

Question

这个问题在这里已有答案：

How to remove bad path characters in Python? 4个答案

我有一个字符串，其中包含我从Web解析的一些数据，并创建一个以此数据命名的文件。

string = urllib.urlopen("http://example.com").read()
f = open(path + "/" + string + ".txt")
f.write("abcdefg")
f.close()

问题是它可能包含以下字符之一：\ / * ? : " < > |。我正在使用Windows，禁止在文件名中使用这些字符。此外，string使用Unicode formar，这使得大多数解决方案都无用。

所以，我的问题是：剥离这些角色的最有效/ pythonic方式是什么？提前致谢！

编辑：文件名是Unicode格式而不是str！

Answer 1

最快的方法是使用unicode.translate，

见unicode.translate。

In [31]: _unistr = u'sdfjkh,/.,we/.,132?.?.23490/,/' # any random string.

In [48]: remove_punctuation_map = dict((ord(char), None) for char in '\/*?:"<>|')

In [49]: _unistr.translate(remove_punctuation_map)Out[49]: 

u'sdfjkh,.,we.,132..23490,'

删除所有标点符号。

In [46]: remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)

In [47]: _unistr.translate(remove_punctuation_map)
Out[47]: u'sdfjkhwe13223490'

Answer 2

我们不知道您的数据如何：

但你可以使用re.sub：

import re
your_string = re.sub(r'[\\/*?:"<>|]',"","your_string")

从Unicode字符串中删除文件名中禁用字符的最有效方法[复制]

问题描述投票：1回答：2

2个回答

最新问题

从Unicode字符串中删除文件名中禁用字符的最有效方法[复制]

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2