我为节点/快速容器设置了新的docker HEALTHCHECK
命令,如下所示
HEALTHCHECK --interval=5m --timeout=5s \
CMD $ROOT_APPLICATION/healthcheck.sh localhost $PORT || exit 1
healthcheck.sh
脚本执行以下操作
function healthcheck () {
pt=$2
port=${pt:-"3000"}
RES=$(curl -s http://$1:$2/status | jq '.status.online')
if [ "$RES" == "true" ]; then
return 0
else
return 1
fi
}
healthcheck $@
节点api /status
返回如下json
{
"status":
{
"online": true
}
}
我确定健康检查正确无误,因为我可以在容器内这样做
# $ROOT_APPLICATION/healthcheck.sh localhost $PORT
# status=$?
# [ $status -eq 0 ] && echo "success" || echo "failed"
success
也手动调用运行状况检查端点:
# curl -s http://localhost:3000/ws/1.0/status| jq '.status.online'
true
[如果我检查容器状态,则会得到unhealthy
状态
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d8539fa50a40 my_container "node --max_old_sp..." 35 minutes ago Up 35 minutes (unhealthy) 0.0.0.0:3000-3001->3000-3001/tcp my_image
[UPDATE]
这里是docker container inspect
的输出
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "unhealthy",
"FailingStreak": 9,
"Log": [
{
"Start": "2020-03-25T10:11:17.687207876Z",
"End": "2020-03-25T10:11:17.829283718Z",
"ExitCode": -1,
"Output": "rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"invalid environment 'NODE_ID'\"\n"
要重现此问题,您必须定义未分配docker-compose
变量environment
的NODE_ID
Yaml文件:
(最小Yaml文件)
version: '2'
services:
my_service:
environment:
NODE_ID
NODE_ENV=production
检查服务将显示运行状况检查的结果(并且完整的检查还将显示NODE_ID变量的问题)。
starting container process caused \"invalid environment 'NODE_ID'\"\n
提供的撰写文件中未分配的NODE_ID
假定您已在运行docker-compose up
或docker stack deploy
命令的环境中定义了此变量。您可以对此进行调整以为其提供默认值:
version: '2'
services:
my_service:
environment:
NODE_ID=${NODE_ID:-undefined}
NODE_ENV=production
有关该语法的更多详细信息,请参见https://docs.docker.com/compose/compose-file/#variable-substitution