根据 RFC1951:
3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
The Huffman codes for the two alphabets appear in the block
immediately after the header bits and before the actual
compressed data, first the literal/length code and then the
distance code. Each code is defined by a sequence of code
lengths, as discussed in Paragraph 3.2.2, above. For even
greater compactness, the code length sequences themselves are
compressed using a Huffman code.
我真的很抱歉,但我所理解的大部分内容不是来自 RFC1951,而是来自这里的答案。 上面的陈述并没有为初学者提供清晰的图片,也没有为数据的外观提供任何视觉背景。
我从 zlib 的 doc Algorithm.txt 中读到它们是两棵树:
Literals or match lengths are compressed with one Huffman tree, and
match distances are compressed with another tree. The trees are stored
in a compact form at the start of each block.
大家可以看到,后者比RFC1951指出得更清楚,这就带来了问题。
(HCLEN+4)*3 位之后是什么?
我知道您将 4 添加到 4 位 HCLEN 的十进制值中,然后从位流中提取该数字的确切总数,其中每次提取都是 3 位数字(0 到 7),并感谢 M. Adler 清除了位的含义从右向左阅读。
=====================================
bitstream data
_____________________________________
|17|57|????????
第一棵树的尽头在哪里?
我的目标是确定为什么共享前缀的 2 个输入没有共享压缩流中某处的几个字节。 我知道它是因为块头,但由于动态块根据输入数据构造树,所以理论上必须相同,因为它是编码而不是散列,所以头本身应该有类似的东西:
raw block data at (19, 3)
Dynamic Huffman tree: length codes: 13, distances codes: 29, code_lengths_length: 16
lengths: [4, 0, 6, 4, 5, 4, 3, 3, 5, 6, 5, 2, 0, 0, 0, 0, 3, 5, 5]
raw block data at (19, 3)
Dynamic Huffman tree: length codes: 9, distances codes: 29, code_lengths_length: 16
lengths: [4, 0, 6, 5, 3, 3, 3, 3, 5, 6, 5, 3, 0, 0, 0, 0, 3, 5, 5]
raw block data at (20, 3)
Dynamic Huffman tree: length codes: 9, distances codes: 30, code_lengths_length: 16
lengths: [3, 0, 5, 0, 6, 3, 4, 5, 6, 6, 6, 2, 0, 0, 0, 0, 2, 5, 5]
deflate-blocktype 2 dyna huff beginning at (16, 0)
raw block data at (16, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 6, 6, 4, 4, 3, 3, 5, 5, 5, 2, 0, 0, 0, 0, 3, 5, 5]
raw block data at (17, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 6, 5, 4, 4, 3, 3, 5, 6, 5, 2, 0, 0, 0, 0, 3, 5, 5]
raw block data at (17, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 5, 5, 4, 3, 3, 3, 4, 5, 5, 3, 0, 0, 0, 0, 3, 5, 5]
4C 9D D5 CE 35 3D 19 40 CF 49
6C 9C D7 B2 F3 4A 15 84 EF A9
4C 9C D7 B2 F3 BA 0D 85 EF 33
94 9D D7 B2 EB 4A 11 86 EF A9
94 9D D7 92 EB BA 11 45 DF 5D
(HCLEN+4)*3 位之后是什么?
RFC 1951 后面的内容是:
HLIT + 257 code lengths for the literal/length alphabet,
encoded using the code length Huffman code
第一棵树的尽头在哪里?
它不是一棵树,我认为你指的是第二个霍夫曼代码描述。第一个霍夫曼代码描述是
(HCLEN+4)*3
位。
当使用第一个代码描述的霍夫曼代码读取了
HLIT + 257
长度时,第二个代码描述结束。
RFC 1951 继续说道:
HDIST + 1 code lengths for the distance alphabet,
encoded using the code length Huffman code
因此,当读取了使用第一个代码描述的霍夫曼代码的
HDIST + 1
长度时,第三个代码描述结束。