动态哈夫曼块的实际结构是什么？

Question

根据 RFC1951：

  3.2.7. Compression with dynamic Huffman codes (BTYPE=10)

     The Huffman codes for the two alphabets appear in the block
     immediately after the header bits and before the actual
     compressed data, first the literal/length code and then the
     distance code.  Each code is defined by a sequence of code
     lengths, as discussed in Paragraph 3.2.2, above.  For even
     greater compactness, the code length sequences themselves are
     compressed using a Huffman code.

我真的很抱歉，但我所理解的大部分内容不是来自 RFC1951，而是来自这里的答案。上面的陈述并没有为初学者提供清晰的图片，也没有为数据的外观提供任何视觉背景。

我从 zlib 的 doc Algorithm.txt 中读到它们是两棵树：

Literals or match lengths are compressed with one Huffman tree, and
match distances are compressed with another tree. The trees are stored
in a compact form at the start of each block.

大家可以看到，后者比RFC1951指出得更清楚，这就带来了问题。

(HCLEN+4)*3 位之后是什么？

我知道您将 4 添加到 4 位 HCLEN 的十进制值中，然后从位流中提取该数字的确切总数，其中每次提取都是 3 位数字（0 到 7），并感谢 M. Adler 清除了位的含义从右向左阅读。

=====================================
bitstream data
_____________________________________
|17|57|????????

第一棵树的尽头在哪里？

我的目标是确定为什么共享前缀的 2 个输入没有共享压缩流中某处的几个字节。我知道它是因为块头，但由于动态块根据输入数据构造树，所以理论上必须相同，因为它是编码而不是散列，所以头本身应该有类似的东西：

raw block data at (19, 3)
Dynamic Huffman tree: length codes: 13, distances codes: 29, code_lengths_length: 16
lengths: [4, 0, 6, 4, 5, 4, 3, 3, 5, 6, 5, 2, 0, 0, 0, 0, 3, 5, 5]

raw block data at (19, 3)
Dynamic Huffman tree: length codes: 9, distances codes: 29, code_lengths_length: 16
lengths: [4, 0, 6, 5, 3, 3, 3, 3, 5, 6, 5, 3, 0, 0, 0, 0, 3, 5, 5]

raw block data at (20, 3) 
Dynamic Huffman tree: length codes: 9, distances codes: 30, code_lengths_length: 16
lengths: [3, 0, 5, 0, 6, 3, 4, 5, 6, 6, 6, 2, 0, 0, 0, 0, 2, 5, 5]
                          
deflate-blocktype 2 dyna huff beginning at (16, 0)
raw block data at (16, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 6, 6, 4, 4, 3, 3, 5, 5, 5, 2, 0, 0, 0, 0, 3, 5, 5]

raw block data at (17, 3)  
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 6, 5, 4, 4, 3, 3, 5, 6, 5, 2, 0, 0, 0, 0, 3, 5, 5]

raw block data at (17, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 5, 5, 4, 3, 3, 3, 4, 5, 5, 3, 0, 0, 0, 0, 3, 5, 5]


4C 9D D5 CE 35 3D 19 40 CF 49
6C 9C D7 B2 F3 4A 15 84 EF A9
4C 9C D7 B2 F3 BA 0D 85 EF 33
94 9D D7 B2 EB 4A 11 86 EF A9
94 9D D7 92 EB BA 11 45 DF 5D

Answer 1

(HCLEN+4)*3 位之后是什么？

RFC 1951 后面的内容是：

HLIT + 257 code lengths for the literal/length alphabet,
   encoded using the code length Huffman code

第一棵树的尽头在哪里？

它不是一棵树，我认为你指的是第二个霍夫曼代码描述。第一个霍夫曼代码描述是

(HCLEN+4)*3

位。

当使用第一个代码描述的霍夫曼代码读取了

HLIT + 257

长度时，第二个代码描述结束。

RFC 1951 继续说道：

HDIST + 1 code lengths for the distance alphabet,
   encoded using the code length Huffman code

因此，当读取了使用第一个代码描述的霍夫曼代码的

HDIST + 1

长度时，第三个代码描述结束。

动态哈夫曼块的实际结构是什么？

问题描述投票：0回答：1

1个回答

最新问题

动态哈夫曼块的实际结构是什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1