等待脚本中的 bash 后台作业完成

Question

为了最大化 CPU 使用率（我在 EC2 中的 Debian Lenny 上运行），我有一个简单的脚本来并行启动作业：

#!/bin/bash

for i in apache-200901*.log; do echo "Processing $i ..."; do_something_important; done &
for i in apache-200902*.log; do echo "Processing $i ..."; do_something_important; done &
for i in apache-200903*.log; do echo "Processing $i ..."; do_something_important; done &
for i in apache-200904*.log; do echo "Processing $i ..."; do_something_important; done &
...

我对这个工作解决方案非常满意；但是，我不知道如何编写进一步的代码，仅在所有循环完成后才执行。

有办法做到这一点吗？

Answer 1

有一个

bash

内置命令。

wait [n ...]
      Wait for each specified process and return its termination  sta‐
      tus.   Each  n  may be a process ID or a job specification; if a
      job spec is given, all processes  in  that  job’s  pipeline  are
      waited  for.  If n is not given, all currently active child pro‐
      cesses are waited for, and the return  status  is  zero.   If  n
      specifies  a  non-existent  process or job, the return status is
      127.  Otherwise, the return status is the  exit  status  of  the
      last process or job waited for.

Answer 2

使用 GNU Parallel 将使您的脚本更短并且可能更高效：

parallel 'echo "Processing "{}" ..."; do_something_important {}' ::: apache-*.log

这将为每个 CPU 核心运行一项作业，并继续执行此操作，直到处理完所有文件。

您的解决方案基本上会在运行之前将作业分组。这里 32 个职位分为 4 组：

Simple scheduling

GNU Parallel 在完成后会生成一个新进程 - 保持 CPU 处于活动状态，从而节省时间：

GNU Parallel scheduling

了解更多：

观看介绍视频以进行快速介绍： https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
浏览教程 (man parallel_tutorial)。你命令行会因此而爱你。

Answer 3

我最近不得不这样做，最终得到了以下解决方案：

while true; do
  wait -n || {
    code="$?"
    ([[ $code = "127" ]] && exit 0 || exit "$code")
    break
  }
done;

其工作原理如下：

一旦其中一个（可能是多个）后台作业退出，

wait -n

就会退出。它的计算结果始终为 true，并且循环一直持续到：

退出代码
```
127
```
：最后一个后台作业成功退出。在在这种情况下，我们忽略退出代码并使用代码退出子 shell 0.
任何后台作业失败。我们只需使用该退出代码退出子 shell。

使用

set -e

，这将保证脚本提前终止并传递任何失败的后台作业的退出代码。

Answer 4

一个带有

wait $(jobs -p)

的最小示例：

  for i in {1..3}
  do
    (echo "process $i started" && sleep 5 && echo "process $i finished")&
  done  

  sleep 0.1 # For sequential output
  echo "Waiting for processes to finish" 
  wait $(jobs -p)
  echo "All processes finished"

示例输出：

process 1 started
process 2 started
process 3 started
Waiting for processes to finish
process 2 finished
process 1 finished
process 3 finished
All processes finished

Answer 5

这是我的粗略解决方案：

function run_task {
        cmd=$1
        output=$2
        concurency=$3
        if [ -f ${output}.done ]; then
                # experiment already run
                echo "Command already run: $cmd. Found output $output"
                return
        fi
        count=`jobs -p | wc -l`
        echo "New active task #$count:  $cmd > $output"
        $cmd > $output && touch $output.done &
        stop=$(($count >= $concurency))
        while [ $stop -eq 1 ]; do
                echo "Waiting for $count worker threads..."
                sleep 1
                count=`jobs -p | wc -l`
                stop=$(($count > $concurency))
        done
}

这个想法是使用“jobs”来查看有多少孩子在后台活跃，并等待这个数字下降（孩子退出）。一旦孩子存在，就可以开始下一个任务。

如您所见，还有一些额外的逻辑可以避免多次运行相同的实验/命令。它为我完成了这项工作。但是，这个逻辑可以被跳过或进一步改进（例如，检查文件创建时间戳、输入参数等）。

Answer 6

如果您只想等待所有作业并返回，请使用以下一行。

while wait -n; do : ; done; # wait until it's possible to wait for bg job

NB.

wait

一旦多个作业中的任何一个完成即可返回

等待脚本中的 bash 后台作业完成

问题描述投票：0回答：6

6个回答

最新问题

等待脚本中的 bash 后台作业完成

问题描述 投票：0回答：6

6个回答

最新问题

问题描述投票：0回答：6