Systemd 服务的子进程正在变成僵尸(Python3 的 C 子进程)

问题描述 投票:0回答:0

环境

我做了一个服务,为我的程序调用监控服务。监控服务在首次运行时启动其余程序,然后开始监控所有单独的进程。其中一个进程是一个C程序,而其余的——包括主要的监控程序——都是Python3的。

附言这是我第一次制作服务文件。 操作系统是 CentOS 7.9

服务文件

[Unit]
Description = My-Health service that checks and runs all other my-services for a machine
Requires = mysqld.service
Requires = redis.service

[Service]
User = root
Group = root
ExecStartPre = /root/bind_int.sh    # Bind DPDK Interfaces
ExecStart = /usr/local/bin/python3 /root/My-project/python-scripts/health.py
Restart = always
KillSignal = SIGKILL

[Install]
WantedBy = multi-user.target

分叉机制

subprocess.Popen (shlex.split(cmd))

cmd
的取值如下:

cmd = 'python3.8 /root/My-project/python-scripts/process1.py'    # Example Python sub-process
cmd = '/root/My-project/src/build/c-process'    # Example C sub-process

问题

问题是,C 子进程正在变成僵尸

[root@MyMachine My-project]# systemctl status my-health
● my-health.service - My-Health service that checks and runs all other my-services for a machine
   Loaded: loaded (/usr/lib/systemd/system/my-health.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-02-20 19:16:42 PKT; 7s ago
  Process: 20969 ExecStartPre=/root/bind_int.sh (code=exited, status=0/SUCCESS)
 Main PID: 21024 (health)
    Tasks: 3
   CGroup: /system.slice/my-health.service
           ├─18391 process1
           ├─18393 process2
           └─21024 process3

Feb 20 19:16:42 AUQ-02 systemd[1]: Started My-Health service that checks and runs all other my-services for a machine.

systemctl status
表明 python 子进程正在正确执行。但是它并没有列出C-sub-process。
ps aux
表示C子进程执行了,但是变成了
<defunct>
。哪个我的监控脚本在下一次迭代中检测并重新启动它。

ps auxc | grep c-process

root     21073  0.0  0.0      0     0 ?        Z    19:16   0:00 c-process <defunct>

它不断重新启动它。

ps auxc | grep c-process

root     23873  0.0  0.0      0     0 ?        Z    20:35   0:00 c-process <defunct>

有趣的是,如果我手动运行 my-health.py,C 进程会像我预期的那样顺利运行。

systemctl stop my-health
cd /root/My-project/python-scripts
python3 health.py &
ps auxc | grep c-process

root      4632  502  0.0 269082728 9492 pts/13 Rl   20:44   1:05 c-process

我试过的

  1. 而不是
    subprocess.Popen
    ,我试过
    subprocess.run (shlex.split(cmd))
    os.startfile(cmd)
    os.system(cmd)
    。但是那些要么不起作用,要么提供与
    Popen
    相同的结果。
  2. 我试过在 c-process 的末尾加上一个符号
    cmd
    ,比如:
    cmd = '/root/My-project/src/build/c-process &'
    
    这在我自己的机器上修复了它,但在其他人的机器上没有。
  3. 我在服务文件中尝试了其他选项,例如:
    type = forking
    type = simple
    KillMode = process
    
    但是,尽管这些工作与文档中所说的完全一样,但它仍然没有解决我的问题。

我不熟悉将我的程序制作成 Linux 服务,需要有关如何不让孩子被杀死的帮助。我什至不知道问题是在服务文件中还是在Parent python文件或C子文件中。

编辑

关于父子进程的更多上下文

正如所问,这里有更多关于我的主要健康应用程序如何监控和分叉其他进程的上下文。 本质上,我需要所有我的进程无限期地运行。只要机器在运行。

Parent health.py(简体)

class ProcessMonitor():
    #This function returns True if the given process processName is running, False otherwise.
    def checkIfProcessRunning (self, processName):
        #Iterate over the all the running process
        for proc in psutil.process_iter():
            try:
                # Check if process name contains the given name string.
                if processName in proc.name():
                    return True
            except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess) as exc:
                self.logger.error(f"{processName}: {exc}")
        return False

    #This function is intended to restart a process/service, by executing the given shell cmd
    def restart (self, name, cmd):
        try:
            subprocess.Popen (shlex.split(cmd))
            self.logger.info("restarted" % name)
            return True

        except Exception as exc:
            self.logger.error(str(exc))
            return False

    def monitorProcess (self, processName, restart_cmd):
        running = self.checkIfProcessRunning(processName)
        if not running:
            self.restart (processName, restart_cmd)

        return running

class HealthMonitor():
    def __init__(self, conf_file):
        self.pm = ProcessMonitor()
        self.process = self.parse_start_cmds(conf_file)
        self.wait_time = self.parse_interval(conf_file)

    def getStatus(self, processList):
        statusList = dict()
        for p in processList:
            statusList[p] = self.pm.monitorProcess(p, self.process[p]['Restart-Command'])
        return statusList

    def monitor(self, processList):
        while True:
            statusList = self.getStatus(processList)
            yield statusList
            sleep(self.wait_time)

    def monitor_machine_type_1(self):
        processes = ['process1', 'c-process', 'process2']

        for statusList in self.monitor(processes):
            self.insert_stats_into_db(statusList)

if __name__ == "__main__":
    hm = HealthMonitor(config_file)
    hm.monitor_machine_type_1()

编辑#2

我试过问其他地方。包括聊天 GPT。它建议进行更多错误记录,并在服务的环境文件中包含库路径。这样,代码现在可以在另一台开发机器上运行,但仍然不是它需要运行的所有客户端机器

改进我的健康服务档案

[Unit]
Description = My-Health service that checks and runs all other my-services for a type-1 machine
Requires = mysqld.service
Requires = redis.service

[Service]
User = root
Group = root
EnvironmentFile = /etc/my-program/c-process.env    # Contains LD_LIBRARY_PATH
WorkingDirectory = /root/My-project
ExecStartPre = /root/bind_int.sh
ExecStart = /usr/local/bin/python3 python-scripts/health.py
Restart = always
KillSignal = SIGKILL
KillMode = process
Type = idle
OOMPolicy = kill
StandardOutput = syslog
StandardError = syslog
SyslogIdentifier = My-project

[Install]
WantedBy = multi-user.target

此外,我发现当我通过服务运行它时,

subprocess.Popen()
c-process
-11
上返回代码
wait()

c centos7 systemd python-3.8
© www.soinside.com 2019 - 2024. All rights reserved.