如何使用bash获取文件列表中某些字符串的列表？

Question

这个标题也许并不能真正的描述问题，但是我找不到更简洁的方式来描述这个问题。

我有一个包含不同文件的目录，这些文件的名字就像这样。

{some text}2019Q2{some text}.pdf

所以文件名中的某处有一个年份后面跟着一个大写的Q 然后是另一个数字。其他的文字可以是任何东西，但不会包含任何与格式year-Q-number匹配的东西。也不会有数字直接在这个格式之前或之后。

我可以想办法从一个文件名中得到这些信息，但实际上我需要一个 "列表"，这样我就可以在bash中做一个for-loop。

所以，如果我的目录中包含了这些文件，我需要一个for循环来翻阅这些文件。

costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf

我想用for循环的方式来处理2019Q2, 2019Q3, 2020Q1, 和2020Q2.

EDIT.我的目录是这样的：我想用一个for循环来过2019Q2、2019Q3、2020Q1和2020Q2。

这是我目前的成果它能够提取子字符串，但它仍然有双倍。由于我已经在循环中，我不知道如何删除双数。

find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
   echo $line | grep -oP '[0-9]{4}Q[0-9]'
done

Answer 1

# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
    echo "$file"
done

我使用了 -print %p 只打印文件名，而不是全路径。GNU sed有 -z 选项，您可以使用 -print0 (或 -print "%p\0").

如果你想这样做，如果你的文件名中没有换行，那么就不需要在bash中对list进行循环（作为一个经验法则，尽量避免使用 while read line，它的速度非常慢）。)

find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'

或者用一个零分离的流。

find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'

如果你想从列表中删除重复的元素，就用管子把它转到... sort -u.

Answer 2

试试这个，在bash中。

~ > $ ls
costumerA_2019Q2_something.pdf  costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf  other.pdf
costumerA_2020Q1_something.pdf  someother.file.txt

~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf

~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

如何使用bash获取文件列表中某些字符串的列表？

问题描述投票：0回答：2

2个回答

最新问题

如何使用bash获取文件列表中某些字符串的列表？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2