在python的编码文本中删除前缀b

Question

text = "hello world what is happening"
encodedText = text.encode('utf-16') #Encoding the input text
textReplaced = encodedText.replace('h'.encode('utf-16'), 'Q'.encode('utf-16')) #Doing the replacement of an encoded character by another encoded character

print('Input : ', text)
print('Expected Output : Qello world wQat is Qappening')
print('Actual Output : ', textReplaced.decode('utf-16'))
print('Encoded h : ', 'h'.encode('utf-16'))
print('Encoded Q : ', 'Q'.encode('utf-16'))
print('Encoded Actual Output : ', textReplaced)

输出：

Input :  hello world what is happening
Expected Output : Qello world wQat is Qappening
Actual Output :  Qello world what is happening
Encoded h :  b'\xff\xfeh\x00'
Encoded Q :  b'\xff\xfeQ\x00'
Encoded Actual Output :  b'\xff\xfeQ\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x00w\x00h\x00a\x00t\x00 \x00i\x00s\x00 \x00h\x00a\x00p\x00p\x00e\x00n\x00i\x00n\x00g\x00'

代码的问题是，由于每个编码的字符串或字符的编码字符都有前缀b'，因此仅在第一次出现在编码输入中时才进行替换。

Answer 1

问题是替换字节包括字节顺序标记（b'\xff\xfe'），该标记仅出现在字节串的开头。如果必须在bytes中而不是在str中进行替换，则需要使用与系统的字节序匹配的UTF-16编码（而不是字节，这可能需要在没有BOM的情况下对替换字节进行编码）不一样）。

假设字节的字节序是您系统的字节序，则将起作用：

>>> enc = 'utf-16-le' if sys.byteorder == 'little' else 'utf-16-be'
>>> textReplaced = encodedText.replace('h'.encode(enc), 'Q'.encode(enc))
>>> textReplaced.decode('utf-16')
'Qello world wQat is Qappening'

在python的编码文本中删除前缀b

问题描述投票：0回答：1

1个回答

最新问题

在python的编码文本中删除前缀b

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1