Shell 脚本 - 跳过要处理的文件

问题描述 投票:0回答:1

我正在尝试处理文件夹中存在的多个文件。我的要求是并行处理所有文件,但最多 15 个。我编写了下面的脚本来实现相同的目的。然而,这并没有按预期工作。

此脚本正在处理第一次迭代中的所有文件(即本例中的 15 个文件),但一旦前 15 个文件完成,它就会处理备用文件。因此,如果一个文件夹有 27 个文件,它将处理所有前 15 个文件,然后处理剩余 12 个文件中的 6 个。

我做错了什么以及如何纠正?


# Path to the folder containing the files
INPUT_FILES_FOLDER="/mnt/data/INPUT"
OUTPUT_FILES_FOLDER="/mnt/data/OUTPUT"

# Path to the Docker image
DOCKER_IMAGE="your_docker_image"

# Number of parallel instances of Docker to run
MAX_PARALLEL=15

# Counter for the number of parallel instances
CURRENT_PARALLEL=0

# Function to process files
process_files() {
    for file in "$INPUT_FILES_FOLDER"/*; do
    input_file=`basename $file` 
    output_file="PROCESSED_${input_file}"

    input_folder_file="/data/INPUT/${input_file}"
    output_folder_file="/data/OUTPUT/${output_file}"

    echo "Input File: $input_file"
    echo "Output File: $output_file"

    echo "Input Folder + File: $input_folder_file"
    echo "Output Folder + File: $output_folder_file"


        # Check if the current number of parallel instances is less than the maximum allowed
        if [ "$CURRENT_PARALLEL" -lt "$MAX_PARALLEL" ]; then
            # Increment the counter for the number of parallel instances
            ((CURRENT_PARALLEL++))
            
            # Run Docker container in the background, passing the file as input
        # docker run hello-world
        docker run --rm -v /mnt/data/:/data my-docker-image:v5.1.0 -i $input_folder_file -o $output_folder_file &
            
            # Print a message indicating the file is being processed
            # echo "Processing $file"
        else
            # If the maximum number of parallel instances is reached, wait for one to finish
            wait -n && ((CURRENT_PARALLEL--))
        fi
    done
    
    # Wait for all remaining Docker instances to finish
    wait
}

# Call the function to process files
process_files

I am trying to process multiple files present in a folder. My requirement is to process ALL the files but at max 15 in parallel. I wrote the below script to achieve the same. However, this isn't working as expected.

This script is processing all the files in the firs iteration (i.e. 15 in this case) but once the first 15 are done, it's processing alternate files. Thus if a folder has say 27 files, it's processing all the first 15 and then 6 of the remaining 12. 

What am I doing wrong and how can I correct it?
linux docker shell
1个回答
0
投票

内循环存在错误。用伪代码写出来,你的脚本基本上是在做:

For each file:
  If there are fewer containers running then the maximum:
    Increment the counter
    Start a new container for the file
  Otherwise:
    Wait for a container to finish

请注意,在

else
情况下,您不需要
docker run
容器。您等待前一个容器退出,但随后不对当前文件执行任何操作。

如果达到限制,您应该能够将其重组为

wait
以便容器完成,然后在条件之外启动一个新容器。

for file in "$INPUT_FILES_FOLDER"/*; do
  # ... set up variables ...
  if [ "$CURRENT_PARALLEL" -ge "$MAX_PARALLEL" ]; then
    wait -n
    CURRENT_PARALLEL=$((CURRENT_PARALLEL - 1))
  fi
  CURRENT_PARALLEL=$((CURRENT_PARALLEL + 1))
  docker run --rm -v /mnt/data/:/data my-docker-image:v5.1.0 -i "$input_folder_file" -o "$output_folder_file" &
done
© www.soinside.com 2019 - 2024. All rights reserved.