我正在连接到 SQS 的 Amazon Linux 2 工作实例上运行 Node.js 应用程序。
一切正常,只是由于技术原因我需要定期重启服务器。为此,我设置了一个 cron 以在晚上运行
/sbin/shutdown -r now
。
当实例重新启动时,我收到有关 SQS 守护程序服务的错误:
[INFO] Executing instruction: configureSqsd
[INFO] get sqsd conf from cfn metadata and write into sqsd conf file ...
[INFO] Executing instruction: startSqsd
[INFO] Running command /bin/sh -c systemctl show -p PartOf sqsd.service
[INFO] Running command /bin/sh -c systemctl is-active sqsd.service
[INFO] Running command /bin/sh -c systemctl start sqsd.service
[ERROR] An error occurred during execution of command [self-startup] - [startSqsd].
Stop running the command. Error: startProcess Failure: starting process "sqsd" failed:
Command /bin/sh -c systemctl start sqsd.service failed with error exit status 1.
Stderr:Job for sqsd.service failed because the control process exited with error code.
See "systemctl status sqsd.service" and "journalctl -xe" for details.
然后实例卡在一个循环中,初始化运行直到它遇到 sqsd.service 错误,然后重新开始。
systemctl status sqsd.service
命令似乎没有显示比我们已经得到的更多的信息,只是它以状态 1 退出:
● sqsd.service - This is sqsd daemon
Loaded: loaded (/etc/systemd/system/sqsd.service; enabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) (Result: exit-code)
Process: 2748 ExecStopPost=/bin/sh -c (code=exited, status=0/SUCCESS)
Process: 2745 ExecStopPost=/bin/sh -c rm -f /var/pids/sqsd.pid (code=exited, status=0/SUCCESS)
Process: 2753 ExecStart=/bin/sh -c /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd start (code=exited, status=1/FAILURE)
CGroup: /system.slice/sqsd.service
└─2789 /opt/elasticbeanstalk/lib/ruby/bin/ruby /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd start
查看
journalctl -xe
时发现最有趣的是:
sqsd[9704]: /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:58:in `initialize': No such file or directory @ rb_sysopen - /var/run/aws-sqsd/default.pid (Errno::ENOENT)
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:58:in `open'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:58:in `start'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:83:in `launch'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:111:in `<top (required)>'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `load'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `<main>'
systemd[1]: sqsd.service: control process exited, code=exited status=1
systemd[1]: Failed to start This is sqsd daemon.
根据日志,重启服务器时文件
/var/run/aws-sqsd/default.pid
不存在。它确实存在于重建中并包含应用程序进程 ID。
如果我添加文件,设置过程会更进一步,直到缺少类似的文件。
有没有人遇到过这个问题?不知道为什么在正常重启后启动 sqsd.service 失败,但在初始部署和重建环境后工作正常......它几乎看起来像是在寻找一个不存在的配置文件......
是否有任何其他方法可以安全地重启我应该尝试的实例?
我有同样的问题。不发布解决方案,而是发布有关该问题的更多数据。我在 /var/log/messages 中发现错误表明 SQSd 守护进程内存不足。
Apr 28 15:43:05 ip-172-31-121-3 sqsd: /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:42:in `fork': Cannot allocate memory - fork(2) (Errno::ENOMEM)
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:42:in `start'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:83:in `launch'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:111:in `<top (required)>'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `load'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `<main>'
Apr 28 15:43:05 ip-172-31-121-3 systemd: sqsd.service: control process exited, code=exited status=1
Apr 28 15:43:05 ip-172-31-121-3 systemd: Failed to start This is sqsd daemon.
Apr 28 15:43:05 ip-172-31-121-3 systemd: Unit sqsd.service entered failed state.
Apr 28 15:43:05 ip-172-31-121-3 systemd: sqsd.service failed.
在设置了一个更大的实例类之后,一切顺利,但我不确定这不仅仅是刷新的实例(如提到的 david.emilsson)还是额外的内存。
我找到了 ami linux 2 的解决方案: 文件 /etc/tmpfiles.d/aws-sqsd.conf 应添加以下内容:
d /var/run/aws-sqsd 0755 root 根
f /var/run/aws-sqsd/default.pid 0644 sqsd sqsd -
它将创建目录 /var/run/aws-sqsd 和文件 /var/run/aws-sqsd/default.pid
我使用 eb 扩展从应用程序端添加了这个配置
由于权限问题,我在 Elastic Beanstalk 上遇到了同样的问题(由于 Amazon Linux 使用的旧版本 systemd 中存在错误,这并不清楚)。虽然不是最理想的解决方案,因为我只有一个实例重建环境解决了我的问题。否则,请遵循 Jeka 的解决方案。