Perl:如何从md5 :: digest addfile()获取“字节读取”?

问题描述 投票:1回答:2

我正在使用Digest :: MD5来计算数据流的MD5;即一个GZIPped文件(或者说准确地说,3000),它太大而不适合RAM。所以我这样做:

 use Digest::MD5 qw(md5_base64);

 my ($filename) = @_;                # this is in a sub
 my $ctx = Digest::MD5 -> new;

 $openme = $filename;        # Usually, it's a plain file
 $openme = "gunzip -c '$filename' |" if ($filename =~ /\.gz$/); # is gz

 open (FILE, $openme); # gunzip to STDOUT
 binmode(FILE);
 $ctx -> addfile(*FILE);   # passing filehandle
 close(FILE);

这是成功的。 addfile整齐地在gunzip的输出中啜饮并给出正确的MD5。

但是,我真的非常想知道slurped数据的大小(在这种情况下是gunzipped“file”)。

我可以添加一个额外的

  $size = 0 + `gunzip -c very/big-file.gz | wc -c`;

但这将涉及两次读取文件。

有没有办法提取从Digest :: MD5中剔除的字节数?我尝试捕获结果:$result = $ctx -> addfile(*FILE);并在$ result和$ ctx上执行Data :: Dumper,但没有出现任何有趣的内容。

编辑:文件通常不是gzip压缩。添加了代码以显示我真正做的事情。

perl md5
2个回答
3
投票

我会在perl中完成所有操作,而不依赖于外部程序进行解压缩:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use IO::Uncompress::Gunzip qw/$GunzipError/;
use Digest::MD5;

my $filename = shift or die "Missing gzip filename!\n";

my $md5 = Digest::MD5->new;
# Allow for reading both gzip format files and uncompressed files.
# This is the default behavior, but might as well be explicit about it.
my $z = IO::Uncompress::Gunzip->new($filename, Transparent => 1)
  or die "Unable to open $filename: $GunzipError\n";
my $len = 0;

while ((my $blen = $z->read(my $block)) > 0) {
  $len += $blen;
  $md5->add($block);
}
die "There was an error reading the file: $GunzipError\n" unless $z->eof;

say "Total uncompressed length: $len";
say "MD5: ", $md5->hexdigest;

如果你想使用gunzip而不是核心IO::Uncompress::Gunzip模块,你可以做类似的事情,但是,使用read一次获取一大块数据:

#!/usr/bin/perl
use warnings;
use strict;
use autodie; # So we don't have to explicitly check for i/o related errors
use feature qw/say/;
use Digest::MD5;

my $filename = shift or die "Missing gzip filename!\n";

my $md5 = Digest::MD5->new;
# Note use of lexical file handle and safer version of opening a pipe
# from a process that eliminates shell shenanigans. Also uses the :raw
# perlio layer instead of calling binmode on the handle (which has the
# same effect)
open my $z, "-|:raw", "gunzip", "-c", $filename;
# Non-compressed version
# open my $z, "<:raw", $filename;
my $len = 0;

while ((my $blen = read($z, my $block, 4096)) > 0) {
  $len += $blen;
  $md5->add($block);
}

say "Total uncompressed length: $len";
say "MD5: ", $md5->hexdigest;

2
投票

您可以自己阅读内容,并将其输入$ctx->add($data),并保持您已经通过的数据的运行计数。无论是在单个调用中还是在多个调用中添加所有数据,都不会对基础算法产生任何影响。文档包括:

    All these lines will have the same effect on the state of the $md5 object:

        $md5->add("a"); $md5->add("b"); $md5->add("c");
        $md5->add("a")->add("b")->add("c");
        $md5->add("a", "b", "c");
        $md5->add("abc");

这表明你可以一次只做一件。

© www.soinside.com 2019 - 2024. All rights reserved.