动态哈夫曼块的实际结构是什么?

问题描述 投票:0回答:1

根据 RFC1951:

  3.2.7. Compression with dynamic Huffman codes (BTYPE=10)

     The Huffman codes for the two alphabets appear in the block
     immediately after the header bits and before the actual
     compressed data, first the literal/length code and then the
     distance code.  Each code is defined by a sequence of code
     lengths, as discussed in Paragraph 3.2.2, above.  For even
     greater compactness, the code length sequences themselves are
     compressed using a Huffman code.

我真的很抱歉,但我所理解的大部分内容不是来自 RFC1951,而是来自这里的答案。 上面的陈述并没有为初学者提供清晰的图片,也没有为数据的外观提供任何视觉背景。

我从 zlib 的 doc Algorithm.txt 中读到它们是两棵树:

Literals or match lengths are compressed with one Huffman tree, and
match distances are compressed with another tree. The trees are stored
in a compact form at the start of each block.

大家可以看到,后者比RFC1951指出得更清楚,这就带来了问题。

(HCLEN+4)*3 位之后是什么?

我知道您将 4 添加到 4 位 HCLEN 的十进制值中,然后从位流中提取该数字的确切总数,其中每次提取都是 3 位数字(0 到 7),并感谢 M. Adler 清除了位的含义从右向左阅读。

=====================================
bitstream data
_____________________________________
|17|57|????????

第一棵树的尽头在哪里?

我的目标是确定为什么共享前缀的 2 个输入没有共享压缩流中某处的几个字节。 我知道它是因为块头,但由于动态块根据输入数据构造树,所以理论上必须相同,因为它是编码而不是散列,所以头本身应该有类似的东西:

raw block data at (19, 3)
Dynamic Huffman tree: length codes: 13, distances codes: 29, code_lengths_length: 16
lengths: [4, 0, 6, 4, 5, 4, 3, 3, 5, 6, 5, 2, 0, 0, 0, 0, 3, 5, 5]

raw block data at (19, 3)
Dynamic Huffman tree: length codes: 9, distances codes: 29, code_lengths_length: 16
lengths: [4, 0, 6, 5, 3, 3, 3, 3, 5, 6, 5, 3, 0, 0, 0, 0, 3, 5, 5]

raw block data at (20, 3) 
Dynamic Huffman tree: length codes: 9, distances codes: 30, code_lengths_length: 16
lengths: [3, 0, 5, 0, 6, 3, 4, 5, 6, 6, 6, 2, 0, 0, 0, 0, 2, 5, 5]
                          
deflate-blocktype 2 dyna huff beginning at (16, 0)
raw block data at (16, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 6, 6, 4, 4, 3, 3, 5, 5, 5, 2, 0, 0, 0, 0, 3, 5, 5]

raw block data at (17, 3)  
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 6, 5, 4, 4, 3, 3, 5, 6, 5, 2, 0, 0, 0, 0, 3, 5, 5]

raw block data at (17, 3)
Dynamic Huffman tree: length codes: 18, distances codes: 30, code_lengths_length: 16
lengths: [4, 0, 5, 5, 4, 3, 3, 3, 4, 5, 5, 3, 0, 0, 0, 0, 3, 5, 5]


4C 9D D5 CE 35 3D 19 40 CF 49
6C 9C D7 B2 F3 4A 15 84 EF A9
4C 9C D7 B2 F3 BA 0D 85 EF 33
94 9D D7 B2 EB 4A 11 86 EF A9
94 9D D7 92 EB BA 11 45 DF 5D
deflate
1个回答
0
投票

(HCLEN+4)*3 位之后是什么?

RFC 1951 后面的内容是:

HLIT + 257 code lengths for the literal/length alphabet,
   encoded using the code length Huffman code

第一棵树的尽头在哪里?

它不是一棵树,我认为你指的是第二个霍夫曼代码描述。第一个霍夫曼代码描述是

(HCLEN+4)*3
位。

当使用第一个代码描述的霍夫曼代码读取了

HLIT + 257
长度时,第二个代码描述结束。

RFC 1951 继续说道:

HDIST + 1 code lengths for the distance alphabet,
   encoded using the code length Huffman code

因此,当读取了使用第一个代码描述的霍夫曼代码的

HDIST + 1
长度时,第三个代码描述结束。

© www.soinside.com 2019 - 2024. All rights reserved.