使用二进制字符串进行 Python 编码/解码

问题描述 投票:0回答:1

参考:

我正在尝试解码 git 对象:

import zlib
import os

...
# current directory is .git/objects
    for current, subs, files in os.walk('.'):
        for filename in files:
            # in format ##/#{38}
            
            path = os.path.join(current, filename)[2:]

            # 'info/' and 'pack/' exist
            # don't worry about packed files

            with open(path, 'r') as file:
                
                # returns bytes object
                # assuming UTF-8 encoding (default) vs. legacy
                # https://git-scm.com/docs/git-commit#_discussion
                # .decode() also defaults to utf-8
                
                print(zlib.decompress(file.read()).decode())

然而,这会遇到一个

UnicodeDecodeError
,正确的方法看起来像:

with open(path, 'rb') as file:
  data = zlib.decompress(file.read())
  header, content = data.split(b'\0', 1)

然后将其读取为二进制数据。在另一篇相关帖子中,一位评论者提到

rb
根本不解码,这似乎不准确,因为提供的二进制字符串是人类可读的,我想澄清一下,因为文档相当稀疏。

我发现用

rb
读取的字符串必须由前缀
b
引用才能成为二进制字符串。我的问题是:如果
git
默认情况下(以及在此存储库中)使用
UTF-8
,为什么解码不起作用?它如何解码二进制字符串并将其呈现为人类可读的格式(即,
b'This is a string'
,如果它无法解码它?

python git zlib
1个回答
-1
投票

在示例中,我查看了“内容”中包含的无法解码为 UTF-8 的数据。

这里是我使用的测试代码:

from pathlib import Path
import zlib

git_file = Path.home().joinpath(
    "bluez", ".git", "objects", "c9",
    "4fdc6335829ab797dd06a6f0ac3fd123dd55a8")

data = zlib.decompress(git_file.read_bytes())
print(f"Raw {data}")
print(f"Raw as hex: {data.hex(' ')}")
# Decode on all data gives
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 27: invalid start byte
# data.decode()
header, content = data.split(b'\0', 1)
print(f"Header: {header} or as UTF8: {header.decode('UTF8')}")
print(f"Content decode replace errors:\n {content.decode('UTF8', errors='replace')}")

这是它给出的输出:

Raw b'tree 1854\x00100755 agent.py\x00W\xa7A\x83\xdf%\x96\x1f\xef\xac\xbf\xd4X\x11\xa45\xf2&e\xcb100644 bluezutils.py\x00 D\xe43)\x16\xccfy\xdf\xa4+\xe5\xae\xf7\x06\x8b\xd2\x18+100644 dbusdef.py\x00\xd3\x17\xc1\x8d\xe2\x82\xdd\x81\xdc\xef\x82\xa3\xac\x10\x08.\xbfV\x8e\xf2100644 example-adv-monitor\x00\xa4\x05\xfc{\x0e\x11\xfai\xa33\xcdn^\xce\xc7o\x14\xcfkO100755 example-advertisement\x00_\x02.\xe6v\x97\x0f\xf0\xac\xeb)\xc5\x85L\x8d`O\xc7\x97\x03100755 example-battery-provider\x00\x15"\xa5\xe0u\xca/\x04A\xcfu,U:]K\x10\xd8\xf7\xf4100644 example-endpoint\x00\x16e\x1ch:\x7f\xf2\xef\xba6\n\xff\x1b\xdd\x15\xda\xde\x85A\x9d100755 example-gatt-client\x00^k\xef\x9d{\x92\xb3\xb9\xc1\xe3\x8d7\xb2B\x1b\x13s\xaf\xd2\x9b100755 example-gatt-server\x00w#\x1c:\xd1\x02\xa2[AQ\x7f\x9bA|7\x1b{\xa04\xb0100644 example-player\x00\x14\x97\xd1\x10z\x16\x81H5/0\x08\xfc\t\xa1\x10\xd6\xba\x0c\x0b100755 exchange-business-cards\x00\x9a:\xa2\x9f\xb4v&\xa5\x156-B\xb2\x0cAB\x067C\x16100755 ftp-client\x00\xefuj\xb2\xb3\r\x92s*\x92\xeb(B\x8c\x02\xb7L\xaa\xfbc100755 get-managed-objects\x00Q%\xeeRG\xd8\x87\xc7\xb3\ntaQ\xafS\xee\xc1={\x95100755 get-obex-capabilities\x00\xa7\x98\nD%\x95i\xd4 \x10\xfc\x86aQ\xdc4\x02\xcbG\xe5100755 list-devices\x00\xb1\x12Ul0\xb2\n\xb6\xa6R7\x93\xda\x84\xda%]\xa87\xb2100755 list-folders\x00\xb4\xe3\xf1\x00\xb0\x96&\xda\x7f\xa5{\xbb\x1dY\xadn4l\xe3c100755 map-client\x00\xa2\xd9j\xe5\xf0\xea4\xf0\x16-\xe6Wk\xa6\x9f\xffm\xff"\xa8100755 monitor-bluetooth\x00\xa3\x97~ n\xec\xce\xd9\x1c\x1d\xa5\xafZHK\x1a\xcd\xda\xc0\x91100755 opp-client\x00O\x00\xa4\x1c\x01)\xea?\x14HD\x0e\xf1\xb1\xbf\r\xf6\xbe,\xc6100755 pbap-client\x00\xe6\xca\xfd\xd3\x01B!^\xf6\x16\xad?\xde\xfbPO\xe3\x03)0100644 sap_client.py\x00\xfe\xd1:\xed\xc8@\x16\x91Y\xf6\xfb\xab\x18d\xdfa[\xa2\x00\xb4100644 service-did.xml\x00R\xebh\xc0 \xabem\xf0f\xb6@E)\xff7u\xe4\x9dc100644 service-ftp.xml\x00\x1b\xda\x88W\xf5\xa8\x8e\x99p]\n\x05\xe7\xac\xb8\xd3\xb95L)100644 service-opp.xml\x005\x1bJA\n\xdf\x97s\xc4\xe7\xcb\xc4\x81\xb1\xc3Y\xd9\xf3\nF100644 service-record.dtd\x00\xf5;\xe5\xd0R\xd2Un\xc4\x07)\x1a\xf1\xc3vd+#\xaa;100644 service-spp.xml\x00+\x15l?\x03\x81\x88]\\W\xda\xad\x91\x81A\xcf\xf2k&"100755 simple-agent\x00O\xda\xff\x1e\xb7e\xa4\x96k\xa5\r0\xc2is>D\x06!\x8c100755 simple-endpoint\x00Y\xca\x18\x9c\xe5\x0eF\xd0\xfc?\xfb\xad\x0bKm\xe1\xf6\xa5B2100755 simple-obex-agent\x00\x06Om0\xb9\xeb\xb2\x84P\x91\xae\xc9\xcc\xbe\xad\xc2!\x98\x08\xc7100755 simple-player\x00\x92h(D\xd0\xf4\xef\\\xef\xef__6\xbe\x8b\x8aq\xd1\x8dF100755 test-adapter\x00\x961\xd9O\xe3\x11\xb4\xbb\xc3\xf7\x07&\xfb\x1e\xf5\xf8\xcd3\xfa~100755 test-device\x00\xa1\xe5\x08\x16gO\xda6\xf4\x82M\x8d\x88\xfb\x89\x82\x04iZ\xe1100755 test-discovery\x00\xec\xcc|~1\xf0p\xb4\x91\xe7\xd1\xa0\xe0u\x9e\xc2\x9f\xc0 \'100755 test-gatt-profile\x00\xa9s\xae\x14\xed1\x81wV\xe7\x0b\xd4[\x9c;\xaaKN\xa90100755 test-health\x00\xd6\xb47\xed\x88\xc5/\xc6(\xbd\x08\x14(\x9b<\x1d]v\xaf\x1a100755 test-health-sink\x00Wf]+\xa6I\xaf\xa1\x1b%\xeed\x1c\x0cI\xa7\x868\xa1b100755 test-hfp\x00\x11\xe3(\xe5L\xc8h"\x04UQ\xce8)\xa7\xc3\xc7\xa4-\xf6100644 test-join\x00\x96\x97\x95\tG@6\x8bD\xcd\xda6\r\xa44[\x8f\xd9\x9b\xde100755 test-manager\x00?\xa7 Z\x04\xb6\xa1\xdc\xd40\xab\xd1\xfd\xbc\xab!\xdd\x8bN\x96100755 test-mesh\x00\xfb\xf2Gk\xfd6\x15\x8f\xf2\xec\xb8\xd5\xee\xc2\xe1oYgR\x02100755 test-nap\x00\xd5\xc7W\xb7\x9d\xe1\x1e\xc7s0\xcb\xf8\x1d\xe8\x07\xaf\xae\x11.E100755 test-network\x00\xac\xc7\xdf\xf6^HVt\xf0\x085\xd6\x93\x84\x88@&\xe7\xb0k100755 test-profile\x00\xaf\x1e#\xf7e\xdd\xef8\x16\xe6(\xb4\x06\xaa\x91\x05\x93\xde\xc3\xed100755 test-sap-server\x00\xdd\xb1\xef\xe9\xbc\x8c\xb6\x84\xc1>\xa0VO&\x10\x11\xc7\xb3-\x86'
Raw as hex: 74 72 65 65 20 31 38 35 34 00 31 30 30 37 35 35 20 61 67 65 6e 74 2e 70 79 00 57 a7 41 83 df 25 96 1f ef ac bf d4 58 11 a4 35 f2 26 65 cb 31 30 30 36 34 34 20 62 6c 75 65 7a 75 74 69 6c 73 2e 70 79 00 20 44 e4 33 29 16 cc 66 79 df a4 2b e5 ae f7 06 8b d2 18 2b 31 30 30 36 34 34 20 64 62 75 73 64 65 66 2e 70 79 00 d3 17 c1 8d e2 82 dd 81 dc ef 82 a3 ac 10 08 2e bf 56 8e f2 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2d 61 64 76 2d 6d 6f 6e 69 74 6f 72 00 a4 05 fc 7b 0e 11 fa 69 a3 33 cd 6e 5e ce c7 6f 14 cf 6b 4f 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 61 64 76 65 72 74 69 73 65 6d 65 6e 74 00 5f 02 2e e6 76 97 0f f0 ac eb 29 c5 85 4c 8d 60 4f c7 97 03 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 62 61 74 74 65 72 79 2d 70 72 6f 76 69 64 65 72 00 15 22 a5 e0 75 ca 2f 04 41 cf 75 2c 55 3a 5d 4b 10 d8 f7 f4 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2d 65 6e 64 70 6f 69 6e 74 00 16 65 1c 68 3a 7f f2 ef ba 36 0a ff 1b dd 15 da de 85 41 9d 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 67 61 74 74 2d 63 6c 69 65 6e 74 00 5e 6b ef 9d 7b 92 b3 b9 c1 e3 8d 37 b2 42 1b 13 73 af d2 9b 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 67 61 74 74 2d 73 65 72 76 65 72 00 77 23 1c 3a d1 02 a2 5b 41 51 7f 9b 41 7c 37 1b 7b a0 34 b0 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2d 70 6c 61 79 65 72 00 14 97 d1 10 7a 16 81 48 35 2f 30 08 fc 09 a1 10 d6 ba 0c 0b 31 30 30 37 35 35 20 65 78 63 68 61 6e 67 65 2d 62 75 73 69 6e 65 73 73 2d 63 61 72 64 73 00 9a 3a a2 9f b4 76 26 a5 15 36 2d 42 b2 0c 41 42 06 37 43 16 31 30 30 37 35 35 20 66 74 70 2d 63 6c 69 65 6e 74 00 ef 75 6a b2 b3 0d 92 73 2a 92 eb 28 42 8c 02 b7 4c aa fb 63 31 30 30 37 35 35 20 67 65 74 2d 6d 61 6e 61 67 65 64 2d 6f 62 6a 65 63 74 73 00 51 25 ee 52 47 d8 87 c7 b3 0a 74 61 51 af 53 ee c1 3d 7b 95 31 30 30 37 35 35 20 67 65 74 2d 6f 62 65 78 2d 63 61 70 61 62 69 6c 69 74 69 65 73 00 a7 98 0a 44 25 95 69 d4 20 10 fc 86 61 51 dc 34 02 cb 47 e5 31 30 30 37 35 35 20 6c 69 73 74 2d 64 65 76 69 63 65 73 00 b1 12 55 6c 30 b2 0a b6 a6 52 37 93 da 84 da 25 5d a8 37 b2 31 30 30 37 35 35 20 6c 69 73 74 2d 66 6f 6c 64 65 72 73 00 b4 e3 f1 00 b0 96 26 da 7f a5 7b bb 1d 59 ad 6e 34 6c e3 63 31 30 30 37 35 35 20 6d 61 70 2d 63 6c 69 65 6e 74 00 a2 d9 6a e5 f0 ea 34 f0 16 2d e6 57 6b a6 9f ff 6d ff 22 a8 31 30 30 37 35 35 20 6d 6f 6e 69 74 6f 72 2d 62 6c 75 65 74 6f 6f 74 68 00 a3 97 7e 20 6e ec ce d9 1c 1d a5 af 5a 48 4b 1a cd da c0 91 31 30 30 37 35 35 20 6f 70 70 2d 63 6c 69 65 6e 74 00 4f 00 a4 1c 01 29 ea 3f 14 48 44 0e f1 b1 bf 0d f6 be 2c c6 31 30 30 37 35 35 20 70 62 61 70 2d 63 6c 69 65 6e 74 00 e6 ca fd d3 01 42 21 5e f6 16 ad 3f de fb 50 4f e3 03 29 30 31 30 30 36 34 34 20 73 61 70 5f 63 6c 69 65 6e 74 2e 70 79 00 fe d1 3a ed c8 40 16 91 59 f6 fb ab 18 64 df 61 5b a2 00 b4 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 64 69 64 2e 78 6d 6c 00 52 eb 68 c0 20 ab 65 6d f0 66 b6 40 45 29 ff 37 75 e4 9d 63 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 66 74 70 2e 78 6d 6c 00 1b da 88 57 f5 a8 8e 99 70 5d 0a 05 e7 ac b8 d3 b9 35 4c 29 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 6f 70 70 2e 78 6d 6c 00 35 1b 4a 41 0a df 97 73 c4 e7 cb c4 81 b1 c3 59 d9 f3 0a 46 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 72 65 63 6f 72 64 2e 64 74 64 00 f5 3b e5 d0 52 d2 55 6e c4 07 29 1a f1 c3 76 64 2b 23 aa 3b 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 73 70 70 2e 78 6d 6c 00 2b 15 6c 3f 03 81 88 5d 5c 57 da ad 91 81 41 cf f2 6b 26 22 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 61 67 65 6e 74 00 4f da ff 1e b7 65 a4 96 6b a5 0d 30 c2 69 73 3e 44 06 21 8c 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 65 6e 64 70 6f 69 6e 74 00 59 ca 18 9c e5 0e 46 d0 fc 3f fb ad 0b 4b 6d e1 f6 a5 42 32 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 6f 62 65 78 2d 61 67 65 6e 74 00 06 4f 6d 30 b9 eb b2 84 50 91 ae c9 cc be ad c2 21 98 08 c7 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 70 6c 61 79 65 72 00 92 68 28 44 d0 f4 ef 5c ef ef 5f 5f 36 be 8b 8a 71 d1 8d 46 31 30 30 37 35 35 20 74 65 73 74 2d 61 64 61 70 74 65 72 00 96 31 d9 4f e3 11 b4 bb c3 f7 07 26 fb 1e f5 f8 cd 33 fa 7e 31 30 30 37 35 35 20 74 65 73 74 2d 64 65 76 69 63 65 00 a1 e5 08 16 67 4f da 36 f4 82 4d 8d 88 fb 89 82 04 69 5a e1 31 30 30 37 35 35 20 74 65 73 74 2d 64 69 73 63 6f 76 65 72 79 00 ec cc 7c 7e 31 f0 70 b4 91 e7 d1 a0 e0 75 9e c2 9f c0 20 27 31 30 30 37 35 35 20 74 65 73 74 2d 67 61 74 74 2d 70 72 6f 66 69 6c 65 00 a9 73 ae 14 ed 31 81 77 56 e7 0b d4 5b 9c 3b aa 4b 4e a9 30 31 30 30 37 35 35 20 74 65 73 74 2d 68 65 61 6c 74 68 00 d6 b4 37 ed 88 c5 2f c6 28 bd 08 14 28 9b 3c 1d 5d 76 af 1a 31 30 30 37 35 35 20 74 65 73 74 2d 68 65 61 6c 74 68 2d 73 69 6e 6b 00 57 66 5d 2b a6 49 af a1 1b 25 ee 64 1c 0c 49 a7 86 38 a1 62 31 30 30 37 35 35 20 74 65 73 74 2d 68 66 70 00 11 e3 28 e5 4c c8 68 22 04 55 51 ce 38 29 a7 c3 c7 a4 2d f6 31 30 30 36 34 34 20 74 65 73 74 2d 6a 6f 69 6e 00 96 97 95 09 47 40 36 8b 44 cd da 36 0d a4 34 5b 8f d9 9b de 31 30 30 37 35 35 20 74 65 73 74 2d 6d 61 6e 61 67 65 72 00 3f a7 20 5a 04 b6 a1 dc d4 30 ab d1 fd bc ab 21 dd 8b 4e 96 31 30 30 37 35 35 20 74 65 73 74 2d 6d 65 73 68 00 fb f2 47 6b fd 36 15 8f f2 ec b8 d5 ee c2 e1 6f 59 67 52 02 31 30 30 37 35 35 20 74 65 73 74 2d 6e 61 70 00 d5 c7 57 b7 9d e1 1e c7 73 30 cb f8 1d e8 07 af ae 11 2e 45 31 30 30 37 35 35 20 74 65 73 74 2d 6e 65 74 77 6f 72 6b 00 ac c7 df f6 5e 48 56 74 f0 08 35 d6 93 84 88 40 26 e7 b0 6b 31 30 30 37 35 35 20 74 65 73 74 2d 70 72 6f 66 69 6c 65 00 af 1e 23 f7 65 dd ef 38 16 e6 28 b4 06 aa 91 05 93 de c3 ed 31 30 30 37 35 35 20 74 65 73 74 2d 73 61 70 2d 73 65 72 76 65 72 00 dd b1 ef e9 bc 8c b6 84 c1 3e a0 56 4f 26 10 11 c7 b3 2d 86

Header: b'tree 1854' or as UTF8: tree 1854
Content decode replace errors:
 100755 agent.py W�A��%�﬿�X�5�&e�100644 bluezutils.py  D�3)�fyߤ+����+100644 dbusdef.py ����݁��.�V��100644 example-adv-monitor ��{�i�3�n^��o�kO100755 example-advertisement _.�v���)ŅL�`OǗ100755 example-battery-provider "��u�/A�u,U:]K���100644 example-endpoint eh:��6
�s*��(B��L��c100755 get-managed-objects Q%�RG؇dz
taQ�S��={�100755 get-obex-capabilities ��
D%�i� ��aQ�4�G�100755 list-devices �Ul0�
��,�100755 pbap-client ����B!^��?��PO�)0100644 sap_client.py ��:��@�Y���d�a[� �100644 service-did.xml R�h� �em�f�@E)�7u�c100644 service-ftp.xml ڈW����p]
笸ӹ5L)100644 service-opp.xml 5JA
ߗs���ā��Y��
�4[�ٛ�100755 test-manager ?� Z����0�����!݋N�100755 test-mesh ��Gk�6�������oYgR100755 test-nap ��W����s0�����.E100755 test-network ����^HVt5֓��@&�k100755 test-profile �#�e��8�(�������100755 test-sap-server ݱ�鼌���>�VO&dz-�

正如您所见,“内容”部分包含可以转换为 UTF-8 的信息以及一些无法表示为可打印字符的信息。

© www.soinside.com 2019 - 2024. All rights reserved.