Python：字节没有正确转换？

Question

我对二进制文件非常陌生，而且我有点挣扎。我正在尝试将二进制文件转换为文本。到目前为止，这是我的代码：

 with open(file_path, 'rb') as f:
  data = f.read()
  temp_data = str(data)

  if temp_data[-1] == '\\':
    temp_data = temp_data[:-1]

  temp_data = bytes(temp_data, 'utf-8')
  text = temp_data.decode('utf-8')

它似乎正在......部分工作。我在长字节字符串中看到了一些我想看的东西，比如文件名和时间戳。但是，我仍然看到很多字节值。 text变量的值是：

 b'\x00\x00\x00\x00T\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x004\x01\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00X\x01\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00x\x01\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00TCODEF1001.DAR_MeasLog.2019-03-05+01:10:45.2019-03-05+01:11:21.1.100.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x95\xcc}\\\xba\xcc}\\LOG\x00\x00\x00\x00\x00\x00\x00\x00\x00OKL\x00\x04\x00\x00\x00\x01\x00\x00\x00VKL\x00\x05\x00\x00\x00\x01\x00\x00\x00YKL\x00\x06\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00h\xcc}\\\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\xa4\xcc}\\\x02\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00M\x00\x00\x00\x95\xcc}\\\xb9\xcc}\\'

我不知道如何解决这个问题，或者这意味着什么。

注意：我需要解析最后一个字符'\'的字符串，因为解码给了我一个错误“无法解码，因为最后一个字符是'\'”，或者沿着那些行。

谢谢！

编辑：我改变了代码所以现在它看起来像这样：

 with open(file_path, 'rb') as f:
  data = f.read()

  readable_str = data.decode('utf-16')
  bytes_again = readable_str.encode('utf-16')

当我打印readable_str时，我得到的非ASCII值根本不应该发生。我得到这样的文字：

TĴŘŸ䍔䑏䙅〱㄰䐮剁䵟慥䱳杯㈮㄰ⴹ㌰〭⬵㄰ㄺ㨰㔴㈮㄰ⴹ㌰〭⬵㄰ㄺ㨱ㄲㄮㄮ〰〮첕屽첺屽佌G䭏L䭖L䭙L챨屽첤屽M첕屽첹屽

解码不适用于'utf-8'或'utf-32'。有没有办法告诉基于此的解码使用？还有其他我没有尝试过的编码吗？谢谢！

Answer 1

Python3中用于读取和写入数据的方法比以前更加明确。几乎总是假设字节，在处理脚本中的数据之前进行解码，然后在写出之前编码回字节。我强烈建议你观看nedbat关于Python的unicode的talk以及如何正确处理字节输入/输出。

无论如何，你想要做的是

with open('file.txt', 'rb') as fo:
    data = fo.read()  # This is in bytes

# We "decipher" the bytes  into something we can work with
readable_str = data.decode('utf-8')  

bytes_again = readable_str.encode('utf-8')
with open('other_file.txt', 'wb') as fw:
    fw.write(bytes_again)

Python：字节没有正确转换？

问题描述投票：0回答：1

1个回答

最新问题

Python：字节没有正确转换？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1