我有一个包含文件名前缀列表的文件,我想对其执行 grep 以 grep 出一组特定的行,其中包含用于后续计算均方根偏差的坐标,我将使用单行执行该坐标在 awk 中。我已经验证正则表达式的语法适用于单个文件,但是当我从 for 循环中使用 bash 变量时,grep 将无法识别该变量。我尝试了双引号、转义字符和大括号的多种排列,但它们都没有产生所需的输出。以下是初始 grep 和管道到第二次 grep 执行以清理 awk 步骤的数据后的原始数据:
grep " [A-Z] " ../glide-dock_SP_8CHM/ligands/3579831839.sdf | grep " 'F\|S\|C\|O\|N "
2.7118 4.0281 21.0125 S 0 0 0 0 0 0
4.0921 3.8708 21.4967 O 0 0 0 0 0 0
1.8208 2.8602 20.9648 O 0 0 0 0 0 0
1.9954 5.2598 21.8979 N 0 0 1 0 0 0
2.8079 4.6789 19.3978 N 0 0 0 0 0 0
2.3264 6.6613 21.7269 C 0 0 0 0 0 0
0.5679 5.5328 21.8880 C 0 0 0 0 0 0
3.8805 4.1396 18.5518 C 0 0 0 0 0 0
1.5591 4.8798 18.6409 C 0 0 0 0 0 0
0.9197 6.9883 22.3042 C 0 0 0 0 0 0
0.8840 7.1094 23.8539 C 0 0 0 0 0 0
0.1007 8.1332 21.6760 C 0 0 0 0 0 0
-0.1257 6.7465 24.4613 O 0 0 0 0 0 0
1.9559 7.6325 24.4759 N 0 0 0 0 0 0
0.9264 9.3760 21.3740 C 0 0 0 0 0 0
2.1242 7.8085 25.9208 C 0 0 1 0 0 0
1.5512 9.5358 20.1194 C 0 0 0 0 0 0
1.0887 10.3742 22.3570 C 0 0 0 0 0 0
3.4722 7.2469 26.3919 C 0 0 0 0 0 0
1.7964 9.2425 26.3963 C 0 0 0 0 0 0
2.3648 10.6567 19.8673 C 0 0 0 0 0 0
1.8942 11.5002 22.1018 C 0 0 0 0 0 0
3.4888 6.2142 27.0561 O 0 0 0 0 0 0
4.6188 7.8580 26.0568 N 0 0 0 0 0 0
2.6369 10.3760 25.7768 C 0 0 0 0 0 0
2.5397 11.6366 20.8599 C 0 0 0 0 0 0
4.8138 9.1019 25.3276 C 0 0 0 0 0 0
4.1501 10.2988 26.0247 C 0 0 0 0 0 0
我有一个文件列表,其中包含我希望执行此搜索和提取的所有文件的前缀,因此我认为 for 语句是最佳选择。这是我最初的尝试:
for i in $(cat ligand-list.txt)
> do
> grep " [A-Z] " ../glide-dock_SP_8CHM/ligands/$i.sdf | grep " 'F\|S\|C\|O\|N " | awk '{print $1,$2,$3}' > $i_8CHM.xyz
> done
grep: ../glide-dock_SP_8CHM/ligands/ligand-list.txt.sdf: No such file or directory
我知道我需要在 grep 命令中的变量周围进行某种字符转义,所以我想使用双引号,但这产生了与上面相同的输出。添加带有双引号的转义字符会将 $i 视为文字字符串:
for i in $(cat ligand-list.txt)
> do
> grep " [A-Z] " ../glide-dock_SP_8CHM/ligands/"\$i".sdf | grep " 'F\|S\|C\|O\|N " | awk '{print $1,$2,$3}' > $i_8CHM.xyz
> done
.
.
.
grep: ../glide-dock_SP_8CHM/ligands/$i.sdf: No such file or directory
添加大括号是我能得到的最接近的结果,因为我的 for 变量作为字符串传递给 grep,但 grep 现在将转义字符和大括号视为 bash 变量中的字符串:
for i in $(cat ligand-list.txt)
> do
> grep " [A-Z] " ../glide-dock_SP_8CHM/ligands/"\{$i}".sdf | grep " 'F\|S\|C\|O\|N " | awk '{print $1,$2,$3}' > $i_8CHM.xyz
> done
.
.
.
grep: ../glide-dock_SP_8CHM/ligands/\{3579831839}.sdf: No such file or directory
,",{} 位置的任何其他组合或更改都会产生相同的结果。根据我对 SO 和 grep 文档的阅读,我认为我需要的只是双引号来让我的 bash 变量作为文字字符串传递,但是我认为这只是 grep 搜索的正则表达式,而不是 grep 正在搜索的文件。非常感谢对此的一些澄清,因为我一天中的大部分时间都在用这个敲击键盘。
设置:
$ mkdir -p ../glide-dock_SP_8CHM/ligands
$ cat ligand-list.txt
12345
78900
ABCDE
$ head ../glide-dock_SP_8CHM/ligands/*sdf
==> ../glide-dock_SP_8CHM/ligands/12345.sdf <==
99.9999 99.9999 12345 A 0 0 0 0 0 0
2.7118 4.0281 12345 F 0 0 0 0 0 0
99.9999 99.9999 12345 G 0 0 0 0 0 0
4.0921 3.8708 12345 S 0 0 0 0 0 0
==> ../glide-dock_SP_8CHM/ligands/78900.sdf <==
99.9999 99.9999 78900 B 0 0 0 0 0 0
2.7118 4.0281 78900 C 0 0 0 0 0 0
99.9999 99.9999 78900 M 0 0 0 0 0 0
4.0921 3.8708 78900 O 0 0 0 0 0 0
==> ../glide-dock_SP_8CHM/ligands/ABCDE.sdf <==
99.9999 99.9999 ABCDE L 0 0 0 0 0 0
2.7118 4.0281 ABCDE O 0 0 0 0 0 0
99.9999 99.9999 ABCDE Z 0 0 0 0 0 0
4.0921 3.8708 ABCDE F 0 0 0 0 0 0
注意事项:
修改OP的
bash / while
代码:
while read -r i
do
grep '[[:upper:]]' ../glide-dock_SP_8CHM/ligands/"$i".sdf | grep " [FSCON] " | awk '{print $1,$2,$3}' > "$i"_8CHM.xyz
done < ligand-list.txt
合并 2x
grep
通话:
while read -r i
do
grep " [FSCON] " ../glide-dock_SP_8CHM/ligands/"$i".sdf | awk '{print $1,$2,$3}' > "$i"_8CHM.xyz
done < ligand-list.txt
删除
grep
通话:
while read -r i
do
awk '$4 ~ /^[FSCON]$/ {print $1,$2,$3}' ../glide-dock_SP_8CHM/ligands/"$i".sdf > "$i"_8CHM.xyz
done < ligand-list.txt
消除
while
循环并使用单个 awk
脚本:
awk '
FNR==1 { close(fname)
n=split(FILENAME,a,/[/.]/)
fname = a[n-1] "_8CHM.xyz"
}
$4 ~ /^[FSCON]$/ { print $1, $2, $3 > fname }
' $(printf '../glide-dock_SP_8CHM/ligands/%s.sdf\n' $(cat ligand-list.txt))
这些都会生成:
$ head *_8CHM.xyz
==> 12345_8CHM.xyz <==
2.7118 4.0281 12345
4.0921 3.8708 12345
==> 78900_8CHM.xyz <==
2.7118 4.0281 78900
4.0921 3.8708 78900
==> ABCDE_8CHM.xyz <==
2.7118 4.0281 ABCDE
4.0921 3.8708 ABCDE