始终在 grep 中包含第一行

Question

我经常 grep CSV 文件，其列名称位于第一行。因此，我希望 grep 的输出始终包含第一行（以获取列名称）以及与 grep 模式匹配的任何行。最好的方法是什么？

Answer 1

sed：

sed '1p;/pattern/!d' input.txt

awk：

awk 'NR==1 || /pattern/' input.txt

grep1：

grep1() { awk -v pattern="${1:?pattern is empty}" 'NR==1 || $0~pattern' "${2:-/dev/stdin}"; }

Answer 2

您可以为其中一个列名称添加替代模式匹配。如果一列被称为 COL 那么这将起作用：

$ grep -E 'COL|pattern' file.csv

Answer 3

grep 实际上没有行号的概念，但 awk 有，所以这里有一个输出行包含“Incoming”的示例 - 以及第一行，无论它是什么：

awk 'NR == 1 || /Incoming/' foo.csv

你可以制作一个脚本（有点过分，但是）。我创建了一个文件 grep+1，并将其放入其中：

#!/bin/sh
pattern="$1" ; shift
exec awk 'NR == 1 || /'"$pattern"'/' "$@"

现在可以：

./grep+1 Incoming

编辑：删除了“{print;}”，这是 awk 的默认操作。

Answer 4

另一种选择：

$ cat data.csv | (read line; echo "$line"; grep SEARCH_TERM)

示例：

$ echo "title\nvalue1\nvalue2\nvalue3" | (read line; echo "$line"; grep value2)

输出：

title
value2

Answer 5

您可以使用

sed

而不是

grep

来执行此操作：

sed -n -e '1p' -e '/pattern/p' < $FILE

但是，如果第一行恰好包含该模式，这会将第一行打印两次。

-n

告诉

sed

默认不要打印每一行。

-e '1p'

打印第一行。

-e '/pattern/p'

打印与模式匹配的每一行。

Answer 6

这是一个非常通用的解决方案，例如，如果您想在保留第一行的情况下对文件进行排序。基本上，“按原样传递第一行，然后对其余数据执行我想要的任何操作（

awk

/
grep
/
sort
/其他）。”

在脚本中尝试一下，也许将其称为

keepfirstline

（不要忘记

chmod +x keepfirstline

并将其放入您的

PATH

中）：

#!/bin/bash
IFS='' read -r JUST1LIINE
printf "%s\n" "$JUST1LIINE"
exec "$@"

可以按如下方式使用：

cat your.data.csv | keepfirstline grep SearchTerm > results.with.header.csv

或者也许，如果你想用

awk

进行过滤

cat your.data.csv | keepfirstline awk '$1 < 3' > results.with.header.csv

我经常喜欢对文件进行排序，但将标题保留在第一行

cat your.data.csv | keepfirstline sort

keepfirstline

执行给定的命令 (

grep SearchTerm

)，但仅在读取并打印第一行之后。

Answer 7

就做吧

head -1 <filename>

然后执行

grep

Answer 8

所以，我不久前在上面发布了一个完全不同的简短答案。

但是，对于那些渴望使用所有相同选项（尽管此脚本要求您在涉及 optarg 时使用长选项）并且可以处理文件名等中的奇怪字符的命令而言，看起来像 grep 的命令，等等..把它拆开很有趣。

本质上它是一个总是发出第一行的 grep。如果您认为没有匹配行的文件应该跳过发出第一行（标题行），那么，这将作为读者的练习。我保存的是

grep+1

。

#!/bin/bash
# grep+1 [<option>...] [<regex>] [<file>...]
# Emits the first line of each input and ignores it otherwise.
# For grep options that have optargs, only the --forms will work here.

declare -a files options
regex_seen=false
regex=

double_dash_seen=false
for arg in "$@" ; do
    is_file_or_rx=true
    case "$arg" in
        -*) is_file_or_rx=$double_dash_seen ;;
    esac
    if $is_file_or_rx ; then
        if ! $regex_seen ; then
            regex="$arg"
            regex_seen=true
        else
            files[${#files[*]}]="$arg"     # append the value
        fi
    else
        options[${#options[*]}]="$arg"     # append the value       
    fi
done

# We could either open files all at once in the shell and pass the handles into
# one grep call, but that would limit how many we can process to the fd limit.
# So instead, here's the simpler approach with a series of grep calls

if $regex_seen ; then
    if [ ${#files[@]} -gt 0 ] ; then
        for file in "${files[@]}" ; do
            head -n 1 "$file"
            tail -n +2 "$file" | grep --label="$file" "${options[@]}" "$regex" 
        done
    else
        grep "${options[@]}"   # stdin
    fi
else
    grep "${options[@]}"   # probably --help
fi

#--eof

Answer 9

所有答案都是正确的。对于 grep 命令（而不是文件）的输出（包括第一行）的情况来说，另一个想法可以像这样完成;-)

df -h | grep -E '(^Filesystem|/mnt)'  # <<< returns usage of devices, with mountpoint '/mnt/...'
ps aux | grep -E '(^USER|grep)'       # <<< returns all grep-process

grep 的

-E

 选项启用其正则表达式模式。我们 grep 使用

|

 的字符串可以解释为“或”，因此我们在

df

-exmaple 中查找行：

Filesystem

 开头（第一个子表达式中的前导“^”表示“行开头为”）

/mnt

另一种方法可能是将输出通过管道传输到

tempfile

 并像其他帖子中所示那样 grep 内容。如果您不知道第一行的内容，这会很有帮助。

head -1 <file> && grep ff <file>

Answer 10

对于文件

head -n 1 file.csv ; grep MyValue file.csv

对于命令

ps -aux | (head -n 1 ; grep index) | grep -v grep

手表用

watch "ps -aux | (head -n 1 ; grep index) | grep -v grep"

始终在 grep 中包含第一行

问题描述投票：0回答：10

10个回答

sed：

awk：

grep1：

最新问题

始终在 grep 中包含第一行

问题描述 投票：0回答：10

10个回答

sed：

awk：

grep1：

最新问题

问题描述投票：0回答：10