grep -f file 作为文件按顺序打印

Question

我需要从文件中 grep 模式，但需要按顺序排列它们。

$ cat patt.grep
name1
name2

$ grep -f patt.grep myfile.log
name2:some xxxxxxxxxx
name1:some xxxxxxxxxx

我得到的输出是首先找到 name2 并打印出来，然后发现 name1 也打印出来。但我的要求是按照 patt.grep 文件的顺序首先获取 name1 。

我期望输出为

name1:some xxxxxxxxxx
name2:some xxxxxxxxxx

Answer 1

您可以通过管道将

patt.grep

传输到

xargs

，这会将模式一次传递到

grep

。

默认情况下

xargs

在命令末尾附加参数。但在这种情况下，

grep

需要

myfile.log

作为最后一个参数。因此，请使用

-I{}

选项告诉

xargs

将

{}

替换为参数。

cat patt.grep | xargs -Ihello grep hello myfile.log

Answer 2

通过逐行阅读，按出现顺序依次使用

patt.grep

中的正则表达式：

while read ptn; do grep $ptn myfile.log; done < patt.grep

Answer 3

我尝试了同样的情况，并使用以下命令轻松解决：

我认为如果您的数据与您所代表的格式相同，那么您可以使用它。

grep -f patt.grep myfile.log | sort

enter image description here

Answer 4

一个简单的解决方法是在

sort

之前

grep

日志文件：

grep -f patt.grep <(sort -t: myfile.log)

但是，如果

patt.grep

未排序，则可能无法按所需顺序产生结果。

为了保留模式文件中指定的顺序，您可以使用

awk

代替：

awk -F: 'NR==FNR{a[$0];next}$1 in a' patt.grep myfile.log

Answer 5

这样应该可以了

awk -F":" 'NR==FNR{a[$1]=$0;next}{ if ($1 in a) {print a[$0]} else {print $1, $1} }' myfile.log patt. grep > z

Answer 6

这不能仅靠

grep

来完成。

有关简单实用但效率低下的解决方案，请参阅owlman的回答。它为

grep

中的每个模式调用 patt.grep

一次。

如果这不是一个选择，请考虑以下方法：

grep -f patt.grep myfile.log |
 awk -F: 'NR==FNR { l[$1]=$0; next } $1 in l {print l[$1]}' - patt.grep

一次性将所有模式传递到
```
grep
```
，
然后使用
```
patt.grep
```
根据
```
awk
```
中的模式顺序对它们进行排序：
- 首先将所有输出行（通过标准输入，
```
-
```
  ，即通过管道传递）读取到关联中。使用第一个基于
```
:
```
  的字段作为键的数组
- 然后循环
```
patt.grep
```
  的行并打印相应的输出行（如果有）。

限制：

假设
```
patt.grep
```
中的所有模式都与日志文件中第一个基于
```
:
```
的标记匹配，如问题中的示例输出数据所暗示的那样。
假设每个模式仅匹配一次 - 如果可能存在多次匹配，则
```
awk
```
解决方案必须变得更加复杂。

Answer 7

这里有一个 python 脚本，它包装了 grep 来执行此操作。特点：

连续打印多次出现的图案
使用 grep 的
```
--only-matching
```
选项
如果未找到模式，则打印警告
速度相当快（但不适用于正则表达式），所以最好使用
```
grep -Fw
```

#!/usr/bin/env python3

# grep -f in order of pattern file.
# If a pattern occurs multiple times in the input, all matches are printed thereunder.

import argparse
import sys
import subprocess
from collections import defaultdict

def eprint(*args, **kwargs):
    print('kgrep.py', *args, file=sys.stderr, **kwargs)


class FileHelper:
    def __init__(self, filepath):
        self.file = open(filepath, "rb", buffering=1024*1024)
        self.line_nb = 0

    # Loop through our file until the specified line number
    def readline(self, line_nb):
        if self.line_nb == line_nb:
            # already got that one
            return None
        assert line_nb > self.line_nb
        line = None
        while self.line_nb < line_nb:
            line = self.file.readline()
            self.line_nb += 1
        if line is None:
            eprint("line_nb", line_nb , "not found")
            exit(1)
        # we use the \n later anyway, so do not line.rstrip()
        return line


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--file', '-f' , help="", required=True)
    parser.add_argument('--only-matching', '-o', action='store_true', help="")

    args, unknown_args = parser.parse_known_args()
    input_file = None
    for arg in unknown_args:
        if arg.startswith('-'):
            continue
        if input_file is not None:
            eprint('multiple input files not supported:', input_file, arg)
            exit(1)
        input_file = arg

    if input_file is None:
        eprint('missing input file')
        exit(1)
    grep_args = 'grep -f - -o -n'.split(' ')
    grep_args.extend(unknown_args)

    proc = subprocess.Popen(grep_args, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
        stderr=sys.stderr, bufsize=1024*1024)

    # First pass all needles to grep (but remember them)
    input_ = sys.stdin.buffer if args.file == '-' else open(args.file, "rb")
    needles = []
    while True:
        line = input_.readline()
        if not line:
            break
        proc.stdin.write(line)
        needles.append(line.rstrip())

    proc.stdin.flush()
    proc.stdin.close() # close stdin to signal end of input

    only_m = args.only_matching
    helper_file = FileHelper(input_file)
    matches_dict = defaultdict(list)
    # Read grep's line-number prefixed output and extract the full line
    while True:
        line = proc.stdout.readline()
        if not line:
            break
        line_nb, grep_match = line.split(b':', 1)
        full_line = grep_match if only_m else helper_file.readline(int(line_nb))
        if full_line is not None:
            matches_dict[grep_match.rstrip()].append(full_line)

    for needle in needles:
        line = matches_dict.get(needle)
        if line is None:
            eprint("warning: needle not found:", needle.decode())
            continue

        # we remember that we already printed a match by setting the first el to None
        if line[0] is None:
            continue
        for m in line:
            sys.stdout.buffer.write(m)
        line[0] = None

    exit(proc.wait())


if __name__ == '__main__':
    try:
        main()
    except (BrokenPipeError, KeyboardInterrupt) as e:
        # avoid additional broken pipe error. s. https://stackoverflow.com/a/26738736
        sys.stderr.close()
        exit(e.errno)

grep -f file 作为文件按顺序打印

问题描述投票：0回答：7

7个回答

最新问题

grep -f file 作为文件按顺序打印

问题描述 投票：0回答：7

7个回答

最新问题

问题描述投票：0回答：7