我做了一个服务,为我的程序调用监控服务。监控服务在首次运行时启动其余程序,然后开始监控所有单独的进程。其中一个进程是一个C程序,而其余的——包括主要的监控程序——都是Python3的。
附言这是我第一次制作服务文件。 操作系统是 CentOS 7.9
[Unit]
Description = My-Health service that checks and runs all other my-services for a machine
Requires = mysqld.service
Requires = redis.service
[Service]
User = root
Group = root
ExecStartPre = /root/bind_int.sh # Bind DPDK Interfaces
ExecStart = /usr/local/bin/python3 /root/My-project/python-scripts/health.py
Restart = always
KillSignal = SIGKILL
[Install]
WantedBy = multi-user.target
subprocess.Popen (shlex.split(cmd))
cmd
的取值如下:
cmd = 'python3.8 /root/My-project/python-scripts/process1.py' # Example Python sub-process
cmd = '/root/My-project/src/build/c-process' # Example C sub-process
问题是,C 子进程正在变成僵尸
[root@MyMachine My-project]# systemctl status my-health
● my-health.service - My-Health service that checks and runs all other my-services for a machine
Loaded: loaded (/usr/lib/systemd/system/my-health.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2023-02-20 19:16:42 PKT; 7s ago
Process: 20969 ExecStartPre=/root/bind_int.sh (code=exited, status=0/SUCCESS)
Main PID: 21024 (health)
Tasks: 3
CGroup: /system.slice/my-health.service
├─18391 process1
├─18393 process2
└─21024 process3
Feb 20 19:16:42 AUQ-02 systemd[1]: Started My-Health service that checks and runs all other my-services for a machine.
systemctl status
表明 python 子进程正在正确执行。但是它并没有列出C-sub-process。
ps aux
表示C子进程执行了,但是变成了<defunct>
。哪个我的监控脚本在下一次迭代中检测并重新启动它。
ps auxc | grep c-process
root 21073 0.0 0.0 0 0 ? Z 19:16 0:00 c-process <defunct>
它不断重新启动它。
ps auxc | grep c-process
root 23873 0.0 0.0 0 0 ? Z 20:35 0:00 c-process <defunct>
有趣的是,如果我手动运行 my-health.py,C 进程会像我预期的那样顺利运行。
systemctl stop my-health
cd /root/My-project/python-scripts
python3 health.py &
ps auxc | grep c-process
root 4632 502 0.0 269082728 9492 pts/13 Rl 20:44 1:05 c-process
subprocess.Popen
,我试过subprocess.run (shlex.split(cmd))
,os.startfile(cmd)
,os.system(cmd)
。但是那些要么不起作用,要么提供与Popen
相同的结果。cmd
,比如:
cmd = '/root/My-project/src/build/c-process &'
这在我自己的机器上修复了它,但在其他人的机器上没有。type = forking
type = simple
KillMode = process
但是,尽管这些工作与文档中所说的完全一样,但它仍然没有解决我的问题。我不熟悉将我的程序制作成 Linux 服务,需要有关如何不让孩子被杀死的帮助。我什至不知道问题是在服务文件中还是在Parent python文件或C子文件中。
正如所问,这里有更多关于我的主要健康应用程序如何监控和分叉其他进程的上下文。 本质上,我需要所有我的进程无限期地运行。只要机器在运行。
class ProcessMonitor():
#This function returns True if the given process processName is running, False otherwise.
def checkIfProcessRunning (self, processName):
#Iterate over the all the running process
for proc in psutil.process_iter():
try:
# Check if process name contains the given name string.
if processName in proc.name():
return True
except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess) as exc:
self.logger.error(f"{processName}: {exc}")
return False
#This function is intended to restart a process/service, by executing the given shell cmd
def restart (self, name, cmd):
try:
subprocess.Popen (shlex.split(cmd))
self.logger.info("restarted" % name)
return True
except Exception as exc:
self.logger.error(str(exc))
return False
def monitorProcess (self, processName, restart_cmd):
running = self.checkIfProcessRunning(processName)
if not running:
self.restart (processName, restart_cmd)
return running
class HealthMonitor():
def __init__(self, conf_file):
self.pm = ProcessMonitor()
self.process = self.parse_start_cmds(conf_file)
self.wait_time = self.parse_interval(conf_file)
def getStatus(self, processList):
statusList = dict()
for p in processList:
statusList[p] = self.pm.monitorProcess(p, self.process[p]['Restart-Command'])
return statusList
def monitor(self, processList):
while True:
statusList = self.getStatus(processList)
yield statusList
sleep(self.wait_time)
def monitor_machine_type_1(self):
processes = ['process1', 'c-process', 'process2']
for statusList in self.monitor(processes):
self.insert_stats_into_db(statusList)
if __name__ == "__main__":
hm = HealthMonitor(config_file)
hm.monitor_machine_type_1()
我试过问其他地方。包括聊天 GPT。它建议进行更多错误记录,并在服务的环境文件中包含库路径。这样,代码现在可以在另一台开发机器上运行,但仍然不是它需要运行的所有客户端机器
[Unit]
Description = My-Health service that checks and runs all other my-services for a type-1 machine
Requires = mysqld.service
Requires = redis.service
[Service]
User = root
Group = root
EnvironmentFile = /etc/my-program/c-process.env # Contains LD_LIBRARY_PATH
WorkingDirectory = /root/My-project
ExecStartPre = /root/bind_int.sh
ExecStart = /usr/local/bin/python3 python-scripts/health.py
Restart = always
KillSignal = SIGKILL
KillMode = process
Type = idle
OOMPolicy = kill
StandardOutput = syslog
StandardError = syslog
SyslogIdentifier = My-project
[Install]
WantedBy = multi-user.target
此外,我发现当我通过服务运行它时,
subprocess.Popen()
的c-process
在 -11
上返回代码wait()
。