我正在使用Digest :: MD5来计算数据流的MD5;即一个GZIPped文件(或者说准确地说,3000),它太大而不适合RAM。所以我这样做:
use Digest::MD5 qw(md5_base64);
my ($filename) = @_; # this is in a sub
my $ctx = Digest::MD5 -> new;
$openme = $filename; # Usually, it's a plain file
$openme = "gunzip -c '$filename' |" if ($filename =~ /\.gz$/); # is gz
open (FILE, $openme); # gunzip to STDOUT
binmode(FILE);
$ctx -> addfile(*FILE); # passing filehandle
close(FILE);
这是成功的。 addfile
整齐地在gunzip的输出中啜饮并给出正确的MD5。
但是,我真的非常想知道slurped数据的大小(在这种情况下是gunzipped“file”)。
我可以添加一个额外的
$size = 0 + `gunzip -c very/big-file.gz | wc -c`;
但这将涉及两次读取文件。
有没有办法提取从Digest :: MD5中剔除的字节数?我尝试捕获结果:$result = $ctx -> addfile(*FILE);
并在$ result和$ ctx上执行Data :: Dumper,但没有出现任何有趣的内容。
编辑:文件通常不是gzip压缩。添加了代码以显示我真正做的事情。
我会在perl中完成所有操作,而不依赖于外部程序进行解压缩:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use IO::Uncompress::Gunzip qw/$GunzipError/;
use Digest::MD5;
my $filename = shift or die "Missing gzip filename!\n";
my $md5 = Digest::MD5->new;
# Allow for reading both gzip format files and uncompressed files.
# This is the default behavior, but might as well be explicit about it.
my $z = IO::Uncompress::Gunzip->new($filename, Transparent => 1)
or die "Unable to open $filename: $GunzipError\n";
my $len = 0;
while ((my $blen = $z->read(my $block)) > 0) {
$len += $blen;
$md5->add($block);
}
die "There was an error reading the file: $GunzipError\n" unless $z->eof;
say "Total uncompressed length: $len";
say "MD5: ", $md5->hexdigest;
如果你想使用gunzip
而不是核心IO::Uncompress::Gunzip
模块,你可以做类似的事情,但是,使用read
一次获取一大块数据:
#!/usr/bin/perl
use warnings;
use strict;
use autodie; # So we don't have to explicitly check for i/o related errors
use feature qw/say/;
use Digest::MD5;
my $filename = shift or die "Missing gzip filename!\n";
my $md5 = Digest::MD5->new;
# Note use of lexical file handle and safer version of opening a pipe
# from a process that eliminates shell shenanigans. Also uses the :raw
# perlio layer instead of calling binmode on the handle (which has the
# same effect)
open my $z, "-|:raw", "gunzip", "-c", $filename;
# Non-compressed version
# open my $z, "<:raw", $filename;
my $len = 0;
while ((my $blen = read($z, my $block, 4096)) > 0) {
$len += $blen;
$md5->add($block);
}
say "Total uncompressed length: $len";
say "MD5: ", $md5->hexdigest;
您可以自己阅读内容,并将其输入$ctx->add($data)
,并保持您已经通过的数据的运行计数。无论是在单个调用中还是在多个调用中添加所有数据,都不会对基础算法产生任何影响。文档包括:
All these lines will have the same effect on the state of the $md5 object: $md5->add("a"); $md5->add("b"); $md5->add("c"); $md5->add("a")->add("b")->add("c"); $md5->add("a", "b", "c"); $md5->add("abc");
这表明你可以一次只做一件。