因此,我有名为 1, 2, ... 19500.fa 的基因文件,并希望将它们分类到下游管道的文件夹 200, 400, 600... 19600 中。我知道如何做到这一点,但它非常可怕:
for file in "${files[@]}"; do
base_name=$(basename "$file")
gene_number=$(echo "$base_name" | cut -d'_' -f2 | cut -d'.' -f1)
to_path= (path to folder containing 200, 400, ... 19600 folders)
#if it's gene_200.fa, 400.fa etc. copy into that dir
if (( $gene_number%200 == 0)); then
cp file $to_path/$gene_number/$file
elif (( $gene_number < 200 )); then
cp file $to_path/200/$file
elif (( $gene_number > 19400)); then
cp file $to_path/19600/$file
# the endless pain of 200-400, 400-600, 600-800 ... 19200-19400
elif (( $gene_number > 200 && $gene_number < 400)); then
cp file $to_path/19600/$file
elif ....
我的问题是:是否有一种不那么繁琐的方法可以做到这一点,而无需将任何一个文件复制到多个文件夹中? (例如,如果我仅按基因编号排序< file name a file named gene_3.fa would be copied into all folders)
您可以这样做,只需将
delta
值更改为 200
并根据需要添加 cp
或 mv
:
#!/usr/bin/env bash
delta=5
for file in gene_{1..20}.fa; do
if [[ "$file" =~ .*_([0-9]+).* ]]; then
gene_number="${BASH_REMATCH[1]}"
fi
bucket=$(( gene_number / delta ))
bucket=$(( bucket * delta + delta ))
echo "$file -> $bucket"
done
$ ./tst.sh
gene_1.fa -> 5
gene_2.fa -> 5
gene_3.fa -> 5
gene_4.fa -> 5
gene_5.fa -> 10
gene_6.fa -> 10
gene_7.fa -> 10
gene_8.fa -> 10
gene_9.fa -> 10
gene_10.fa -> 15
gene_11.fa -> 15
gene_12.fa -> 15
gene_13.fa -> 15
gene_14.fa -> 15
gene_15.fa -> 20
gene_16.fa -> 20
gene_17.fa -> 20
gene_18.fa -> 20
gene_19.fa -> 20
gene_20.fa -> 25