Linux上的二进制grep？

Question

假设我生成了以下二进制文件：

# generate file:
python -c 'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]' > mydata.bin

# get file size in bytes
stat -c '%s' mydata.bin

# 14

并且说，我想使用类似grep的语法找到所有零（0x00）的位置。

到目前为止，我能做的最好的事情是：

$ hexdump -v -e "1/1 \" %02x\n\"" mydata.bin | grep -n '00'

1: 00
2: 00
3: 00
4: 00
9: 00
12: 00

但是，这隐式地将原始二进制文件中的每个字节转换为多字节ASCII表示，grep在其上运行;不完全是优化的主要例子:)

有没有像Linux的二进制grep？也可能是支持正则表达式语法的东西，也支持字节“字符” - 也就是说，我可以编写类似'a(\x00*)b'的东西，并在字节'a'之间匹配'0或更多'字节0的出现（97）和'b'（98）？

编辑：上下文是我正在研究一个驱动程序，我捕获8位数据;数据中出现问题，可能是千字节到兆字节，我想检查特定的签名以及它们发生的位置。（到目前为止，我正在使用千字节片段，所以优化并不重要 - 但如果我开始在兆字节长的捕获中得到一些错误，我需要分析那些，我的猜测是我想要更优化的东西:)。特别是，我想要一些东西，我可以“grep”一个字节作为一个字符 - hexdump迫使我每个字节搜索字符串）

EDIT2：同样的问题，不同的论坛:) grepping through a binary file for a sequence of bytes

EDIT3：感谢@tchrist的回答，这里也是'grepping'和匹配，并显示结果的例子（虽然与OP的问题不完全相同）：

$ perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin

ca000000cb000000cc000000cd000000ce     # Matched data (hex)
66357                                  # Offset (dec)

为了将匹配的数据分组为每个字节（两个十六进制字符），需要指定“H2 H2 H2 ...”，匹配字符串中有多少字节;因为我的比赛'.....\0\0\0\xCC\0\0\0.....'涵盖17个字节，我可以在Perl中写'"H2"x17'。这些“H2”中的每一个都将返回一个单独的变量（如列表中所示），因此还需要使用join在它们之间添加空格 - 最终：

$ perl -ln0777e 'print join(" ", unpack("H2 "x17,$1)), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin

ca 00 00 00 cb 00 00 00 cc 00 00 00 cd 00 00 00 ce
66357

嗯..确实Perl是非常好的'二进制grepping'工具，我必须承认:)只要一个人学习正确的语法:)

Answer 1

单线输入

这是较短的单行版本：

% perl -ln0e 'print tell' < inputfile

这是一个稍长的单行：

% perl -e '($/,$\) = ("\0","\n"); print tell while <STDIN>' < inputfile

连接这两个单行的方法是通过重新编译第一个程序：

% perl -MO=Deparse,-p -ln0e 'print tell'
BEGIN { $/ = "\000"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
    chomp($_);
    print(tell);
}

程序输入

如果你想把它放在一个文件而不是从命令行调用它，这里有一个更明确的版本：

#!/usr/bin/env perl

use English qw[ -no_match_vars ];

$RS  = "\0";    # input  separator for readline, chomp
$ORS = "\n";    # output separator for print

while (<STDIN>) {
    print tell();
}

这是真正的长版本：

#!/usr/bin/env perl

use strict;
use autodie;  # for perl5.10 or better
use warnings qw[ FATAL all  ];

use IO::Handle;

IO::Handle->input_record_separator("\0");
IO::Handle->output_record_separator("\n");

binmode(STDIN);   # just in case

while (my $null_terminated = readline(STDIN)) {
    # this just *past* the null we just read:
    my $seek_offset = tell(STDIN);
    print STDOUT $seek_offset;  

}

close(STDIN);
close(STDOUT);

单列输出

顺便说一下，为了创建测试输入文件，我没有使用你庞大的Python脚本;我刚用这个简单的Perl单线程：

% perl -e 'print 0.0.0.0.2.4.6.8.0.1.3.0.5.20' > inputfile

你会发现Perl经常比Python做2-3倍做同样的工作。而且你不必在清晰度上妥协;什么可以更简单，上面的单线？

程序输出

我知道我知道。如果您还不熟悉该语言，可能会更清楚：

#!/usr/bin/env perl
@values = (
    0,  0,  0,  0,  2,
    4,  6,  8,  0,  1,
    3,  0,  5, 20,
);
print pack("C*", @values);

虽然这也有效：

print chr for @values;

同样如此

print map { chr } @values;

虽然对于那些喜欢一切都严谨而细致的人来说，这可能会更像你所看到的：

#!/usr/bin/env perl

use strict;
use warnings qw[ FATAL all ];
use autodie;

binmode(STDOUT);

my @octet_list = (
    0,  0,  0,  0,  2,
    4,  6,  8,  0,  1,
    3,  0,  5, 20,
);

my $binary = pack("C*", @octet_list);
print STDOUT $binary;

close(STDOUT);

TMTOWTDI

Perl支持多种方式来做事情，以便您可以选择最适合自己的方式。如果这是我计划作为学校或工作项目检查的内容，我肯定会选择更长，更谨慎的版本 - 或者至少在shell脚本中添加注释，如果我使用的是单行内容。

您可以在自己的系统上找到Perl的文档。只需输入

% man perl
% man perlrun
% man perlvar
% man perlfunc

在你的shell提示符下等。如果你想在网上找到漂亮的版本，请从perl获取perlrun，perlvar，perlfunc和http://perldoc.perl.org的联机帮助页。

Answer 2

这似乎对我有用：

grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>

简写：

grep -obUaP "<\x-hex pattern>" <file>

例：

grep -obUaP "\x01\x02" /bin/grep

输出（Cygwin二进制）：

153: <\x01\x02>
33210: <\x01\x02>
53453: <\x01\x02>

所以你可以再次grep这个来提取偏移量。但是别忘了再次使用二进制模式。

Answer 3

其他人似乎同样感到沮丧，并编写了自己的工具（或至少类似的东西）：bgrep。

Answer 4

bbe程序是一个类似sed的二进制文件编辑器。见documentation。

bbe示例：

bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin

11:x00 x00 xcc x00 x00 x00 xcd x00 x00 x00 xce

Explanation

-b search pattern between //. each 2 byte begin with \x (hexa notation).
   -b works like this /pattern/:length (in byte) after matched pattern
-s similar to 'grep -o' suppress unmatched output 
-e similar to 'sed -e' give commands
-e 'F d' display offsets before each result here: '11:'
-e 'p h' print results in hexadecimal notation
-e 'A \n' append end-of-line to each result

您也可以将其传输到sed以获得更清晰的输出：

bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin | sed -e 's/x//g'

11:00 00 cc 00 00 00 cd 00 00 00 ce

使用来自EDIT3的Perl的解决方案为我提供了大文件的“内存不足”错误。

bgrep也存在同样的问题。

bbe唯一的缺点是我不知道如何打印匹配模式之前的上下文。

Answer 5

仅使用grep解决直接问题的一种方法是创建包含单个空字节的文件。之后，grep -abo -f null_byte_file target_file将产生以下输出。

0:
1:
2:
3:
8:
11:

那当然是“-b”所要求的每个字节偏移量，后跟“-o”请求的空字节

我是第一个提倡perl的人，但在这种情况下，没有必要引入大家庭。

Answer 6

怎么样grep -a？不确定它如何在真正的二进制文件上工作，但它适用于操作系统认为是二进制的文本文件。

Linux上的二进制grep？

问题描述投票：28回答：6

6个回答

单线输入

程序输入

单列输出

程序输出

TMTOWTDI

Explanation

最新问题

Linux上的二进制grep？

问题描述 投票：28回答：6

6个回答

单线输入

程序输入

单列输出

程序输出

TMTOWTDI

Explanation

最新问题

问题描述投票：28回答：6