我最初没有显示这些日志条目,但在启动
dind
容器之前,我看到了这些日志条目:
Using Docker executor with image adoptopenjdk:11-jdk ...
WARNING: Container based cache volumes creation is disabled. Will not create volume for "/cache"
WARNING: Container based cache volumes creation is disabled. Will not create volume for "/certs/client"
这让我尝试通过将以下内容添加到我的
.gitlab-ci.yml
来完全禁用 TLS
default:
image: adoptopenjdk:11-jdk
services:
- docker:dind
variables:
# Instruct Testcontainers to use the daemon of DinD.
DOCKER_HOST: tcp://docker:2375
#Disable TLS communication to docker daemon
DOCKER_TLS_CERTDIR: ""
# Improve performance with overlayfs.
DOCKER_DRIVER: overlay2
包含
DOCKER_TLS_CERTDIR: ""
实际上现在可以使一切正常工作!
不过我没有解释为什么。如果有人能给我这个解释,我会很乐意标记答案,否则我会自己添加答案。
我的最终目的是使用从我们的 docker 注册表中提取的实时 API 实例来运行客户端库的测试套件(使用 TestContainers)。所以我们必须在作业中执行 docker pull 来获取容器。这是 TestContainers 拉取私有镜像的推荐方式,用
docker pull
预取(据我所知)。
我们已经在 EC2 上设置了自动缩放 docker 运行程序。它工作得很好,但我遇到了一个问题,我需要从作业中的私人 ECR 注册表中提取图像。
跑步者经理
config.toml
如下:
# limit of the jobs that can be run concurrently across all runners
concurrent = 10
check_interval = 0
[[runners]]
name = "Build-manager(Instance : i-REDACTED)"
url = "https://our.gitlab.server/"
token = "REDACTED"
token_obtained_at = "REDACTED"
token_expires_at = "REDACTED"
executor = "docker+machine"
# maximum number of machines (running and idle) that this runner will spawn
limit = 5
[runners.docker]
image = "adoptopenjdk:11-jdk"
privileged = true
pull_policy = "if-not-present"
tls_verify = false
volumes = [
# "/var/run/docker.sock:/var/run/docker.sock",
"/cache",
"/certs/client"
]
#disable the Docker executor’s inner cache mechanism since we will use the distributed cache mode
disable_cache = true
services_limit = -1
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "our-gitlab-cache-bucket"
BucketLocation = "REDACTED"
[runners.machine]
IdleCount = 1
IdleTime = 1800
MaxBuilds = 100
MachineDriver = "amazonec2"
MachineName = "ci-build-runner-%s"
MachineOptions = [
"amazonec2-region=eu-west-1",
"amazonec2-zone=a",
"amazonec2-ami=REDACTED",
"amazonec2-iam-instance-profile=Build-manager-runner-InstanceProfile",
"amazonec2-vpc-id=REDACTED",
"amazonec2-subnet-id=subnet-REDACTED",
"amazonec2-private-address-only=true",
"amazonec2-tags=CostId,Build-Runners,CostIdDetail,Build-Runners.Generic,InstanceType,Gitlab-runner,Project,Build-Runners,runner-manager-name,gitlab-aws-autoscaler",
"amazonec2-instance-type=m5.xlarge",
"amazonec2-security-group=Build-manager-runners_ACCESS",
"amazonec2-request-spot-instance=true",
"amazonec2-spot-price=0.1",
"amazonec2-volume-type=gp3",
"amazonec2-root-size=100",
"amazonec2-userdata=/etc/gitlab-runner/runner-startup.sh",
"amazonec2-volume-encrypted=true",
]
[[runners.machine.autoscaling]]
Periods = ["* * * * * mon-fri *"]
IdleCount = 0
IdleTime = 1800
Timezone = "UTC"
[[runners.machine.autoscaling]]
Periods = ["* * 8-18 * * mon-fri *"]
IdleCount = 0
IdleTime = 3600
Timezone = "UTC"
[[runners.machine.autoscaling]]
Periods = ["* * * * * sat,sun *"]
IdleCount = 0
IdleTime = 1800
Timezone = "UTC"
让我们看看 .gitlab-ci.yml 文件的简化版本:
default:
image: adoptopenjdk:11-jdk
services:
- docker:dind
variables:
# Instruct Testcontainers to use the daemon of DinD.
DOCKER_HOST: tcp://docker:2375
# Improve performance with overlayfs.
DOCKER_DRIVER: overlay2
#Keep this up to date
API_VERSION: v2.1.1
test_only:
stage: build
script:
#install docker,etc
- ./scripts/bash/install_os_dependencies.sh
#install aws-cli
- ./scripts/bash/install_aws.sh
# - echo "PAUSING"
# - sleep 600
#pre-fetch the API docker image, so it's available in gradle build for testcontainers
- ./scripts/bash/pre_fetch_docker_images.sh
- ./scripts/bash/test.sh
pre_fetch_docker_images.sh
包含以下命令:
#!/usr/bin/env bash
set -e
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $PRIVATE_ECR_REGISTRY_URL_BASE
#error out if env var API_VERSION is undefined
: ${API_VERSION:?"Need to set API_VERSION"}
echo "**************************************************************************"
echo "* Using version $API_VERSION for tests"
echo "* Please make sure the environment variable API_VERSION is up to date!"
echo "**************************************************************************"
#pre-fetch the image so that TestContainers does not have to
docker pull $PRIVATE_ECR_REGISTRY_URL_BASE/api:$API_VERSION
当像这样设置 config.toml 时(重点关注 docker 卷),我可以看到以下日志:
Starting service docker:dind ...
Using locally found image version due to "if-not-present" pull policy
Using docker image sha256:88e4c241e05bc46acc203ff700199934e57307d05a26b5c408e2fba5b99ee178 for docker:dind with digest docker@sha256:7ff986c816ccc8af25c9f560ca0cba45de2ca2ea2d7099c63099f5539e0d0359 ...
Waiting for services to be up and running (timeout 30 seconds)...
Using locally found image version due to "if-not-present" pull policy
Using docker image sha256:fd22b579185389e40922764c514a3a996f264479b85877b9392ca2f5039d94fd for adoptopenjdk:11-jdk with digest adoptopenjdk@sha256:0f081fe6de07a0a97d74768f512e2a2f2493cb5f383d7d4fa9f46a6d689b6850 ...
Preparing environment 00:00
Running on runner-zcq4javy-project-494-concurrent-0 via runner-zcq4javy-ci-build-runner-1712671477-f5171009...
可以看到 docker:dind 服务已启动,并且 CI 正在进行中。
当执行
pre_fetch_docker_images.sh
时,我看到以下日志:
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
**************************************************************************
* Using version v2.1.1 for tests
* Please make sure the environment variable API_VERSION is up to date!
*************************************************************************
Cannot connect to the Docker daemon at tcp://docker:2375. Is the docker daemon running?
比较奇特的是,docker login 命令成功了,但是 docker pull 命令却没有成功连接到守护进程。
如果我修改
config.toml
文件中附加到 docker runner 的卷,如下所示(安装 docker.sock
特殊文件):
volumes = [
"/var/run/docker.sock:/var/run/docker.sock",
"/cache",
"/certs/client"
]
然后启动的时候就看到这个效果了
docker:dind
:
Starting service docker:dind ...
Pulling docker image docker:dind ...
Using docker image sha256:88e4c241e05bc46acc203ff700199934e57307d05a26b5c408e2fba5b99ee178 for docker:dind with digest docker@sha256:7ff986c816ccc8af25c9f560ca0cba45de2ca2ea2d7099c63099f5539e0d0359 ...
Waiting for services to be up and running (timeout 30 seconds)...
*** WARNING: Service runner-zcq4javy-project-494-concurrent-0-72e2651efc1535ef-docker-0 probably didn't start properly.
Health check error:
service "runner-zcq4javy-project-494-concurrent-0-72e2651efc1535ef-docker-0-wait-for-service" timeout
Health check container logs:
2024-04-09T17:15:06.149248253Z waiting for TCP connection to 172.17.0.2 on [2375 2376]...
2024-04-09T17:15:06.149382718Z dialing 172.17.0.2:2376...
2024-04-09T17:15:06.149458061Z dialing 172.17.0.2:2375...
2024-04-09T17:15:07.149771628Z dialing 172.17.0.2:2375...
2024-04-09T17:15:07.149794328Z dialing 172.17.0.2:2376...
2024-04-09T17:15:08.150110255Z dialing 172.17.0.2:2375...
2024-04-09T17:15:08.150153846Z dialing 172.17.0.2:2376...
Service container logs:
2024-04-09T17:15:06.499698081Z Certificate request self-signature ok
2024-04-09T17:15:06.499731896Z subject=CN = docker:dind server
2024-04-09T17:15:06.514639701Z /certs/server/cert.pem: OK
2024-04-09T17:15:07.164341862Z Certificate request self-signature ok
2024-04-09T17:15:07.164359294Z subject=CN = docker:dind client
2024-04-09T17:15:07.179270874Z /certs/client/cert.pem: OK
2024-04-09T17:15:07.181725611Z cat: can't open '/proc/net/ip6_tables_names': No such file or directory
2024-04-09T17:15:07.182213651Z cat: can't open '/proc/net/arp_tables_names': No such file or directory
2024-04-09T17:15:07.184062522Z iptables v1.8.10 (nf_tables)
2024-04-09T17:15:07.253190858Z time="2024-04-09T17:15:07.253040060Z" level=info msg="Starting up"
2024-04-09T17:15:07.253842654Z failed to load listeners: can't create unix socket /var/run/docker.sock: device or resource busy
因此
docker:dind
服务未启动。
当作业运行时,我可以看到以下日志:
$ ./scripts/bash/pre_fetch_docker_images.sh
error during connect: Post "http://docker:2375/v1.24/auth": dial tcp: lookup docker on 172.31.0.2:53: no such host
因此,当
docker login
服务未运行时,即使 docker:dind
也不会成功。
这似乎是由于在 docker.sock
容器上安装了 docker:dind
特殊文件。
奖金:
这些是
docker:dind
容器在设置为正确启动时的日志:
time="2024-04-10T08:13:16.742248889Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
time="2024-04-10T08:13:16.742278990Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2024-04-10T08:13:16.742416138Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2024-04-10T08:13:16.742435311Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2024-04-10T08:13:16.742527880Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2024-04-10T08:13:16.742583398Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2024-04-10T08:13:16.742596458Z" level=info msg="metadata content store policy set" policy=shared
time="2024-04-10T08:13:16.750261307Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2024-04-10T08:13:16.750300776Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2024-04-10T08:13:16.750323461Z" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
time="2024-04-10T08:13:16.750348102Z" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
time="2024-04-10T08:13:16.750371909Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2024-04-10T08:13:16.750526792Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2024-04-10T08:13:16.750762774Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2024-04-10T08:13:16.750894466Z" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
time="2024-04-10T08:13:16.750910812Z" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2024-04-10T08:13:16.750923657Z" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2024-04-10T08:13:16.750942770Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.750961241Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.750981835Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751003981Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751023494Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751043017Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751062786Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751084010Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751118648Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751141720Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751158954Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751179967Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751199232Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751220168Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751243364Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751263313Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751288533Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751310306Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751329237Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751353281Z" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751383850Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751407966Z" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2024-04-10T08:13:16.751433608Z" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751684149Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751845865Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2024-04-10T08:13:16.752041985Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2024-04-10T08:13:16.752140429Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2024-04-10T08:13:16.752272441Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2024-04-10T08:13:16.752382023Z" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin"
time="2024-04-10T08:13:16.752580900Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.752603424Z" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2024-04-10T08:13:16.752615609Z" level=info msg="NRI interface is disabled by configuration."
time="2024-04-10T08:13:16.752925145Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
time="2024-04-10T08:13:16.752996676Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
time="2024-04-10T08:13:16.753060550Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2024-04-10T08:13:16.753132170Z" level=info msg="containerd successfully booted in 0.035738s"
time="2024-04-10T08:13:17.729206072Z" level=info msg="Setting the storage driver from the $DOCKER_DRIVER environment variable (overlay2)"
time="2024-04-10T08:13:17.729228449Z" level=info msg="[graphdriver] trying configured driver: overlay2"
time="2024-04-10T08:13:17.755365665Z" level=info msg="Loading containers: start."
time="2024-04-10T08:13:17.878372710Z" level=info msg="Loading containers: done."
time="2024-04-10T08:13:17.889604357Z" level=info msg="Docker daemon" commit=8b79278 containerd-snapshotter=false storage-driver=overlay2 version=26.0.0
time="2024-04-10T08:13:17.889734278Z" level=info msg="Daemon has completed initialization"
time="2024-04-10T08:13:17.930324430Z" level=info msg="API listen on /var/run/docker.sock"
time="2024-04-10T08:13:17.930343488Z" level=info msg="API listen on [::]:2376"
2024/04/10 08:13:17 http: TLS handshake error from 172.17.0.3:50382: EOF
还有其他人以不同的方式实现了我最初的目标,或者解决了这个问题吗?
docker 版本 19 似乎引入了一些有关 DinD 和 gitlab CI 证书创建的重大更改,这可以解释为什么必须关闭 TLS,请查看这篇文章。
https://about.gitlab.com/blog/2019/07/31/docker-in-docker-with-docker-19-dot-03/