使用自动缩放docker.machine执行器,无法设置docker:dind

问题描述 投票:0回答:1

更新(完整的初始描述如下)

更新1:DOCKER_TLS_CERTDIR

我最初没有显示这些日志条目,但在启动

dind
容器之前,我看到了这些日志条目:

Using Docker executor with image adoptopenjdk:11-jdk ...
WARNING: Container based cache volumes creation is disabled. Will not create volume for "/cache"
WARNING: Container based cache volumes creation is disabled. Will not create volume for "/certs/client"

这让我尝试通过将以下内容添加到我的

.gitlab-ci.yml

来完全禁用 TLS
default:
  image: adoptopenjdk:11-jdk
  services:
    - docker:dind

variables:
  # Instruct Testcontainers to use the daemon of DinD.
  DOCKER_HOST: tcp://docker:2375
  #Disable TLS communication to docker daemon
  DOCKER_TLS_CERTDIR: ""
  # Improve performance with overlayfs.
  DOCKER_DRIVER: overlay2

包含

DOCKER_TLS_CERTDIR: ""
实际上现在可以使一切正常工作!

不过我没有解释为什么。如果有人能给我这个解释,我会很乐意标记答案,否则我会自己添加答案。

初步描述

我的最终目的是使用从我们的 docker 注册表中提取的实时 API 实例来运行客户端库的测试套件(使用 TestContainers)。所以我们必须在作业中执行 docker pull 来获取容器。这是 TestContainers 拉取私有镜像的推荐方式,用

docker pull
预取(据我所知)。

我们已经在 EC2 上设置了自动缩放 docker 运行程序。它工作得很好,但我遇到了一个问题,我需要从作业中的私人 ECR 注册表中提取图像。

跑步者经理

config.toml
如下:

# limit of the jobs that can be run concurrently across all runners
concurrent = 10
check_interval = 0

[[runners]]
  name = "Build-manager(Instance : i-REDACTED)"
  url = "https://our.gitlab.server/"
  token = "REDACTED"
  token_obtained_at = "REDACTED"
  token_expires_at = "REDACTED"
  executor = "docker+machine"
  # maximum number of machines (running and idle) that this runner will spawn
  limit = 5
  [runners.docker]
    image = "adoptopenjdk:11-jdk"
    privileged = true
    pull_policy = "if-not-present"
    tls_verify = false
    
    volumes = [
#              "/var/run/docker.sock:/var/run/docker.sock",
               "/cache",
               "/certs/client"
            ]
    
    #disable the Docker executor’s inner cache mechanism since we will use the distributed cache mode
    disable_cache = true
    services_limit = -1
  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "s3.amazonaws.com"
      BucketName = "our-gitlab-cache-bucket"
      BucketLocation = "REDACTED"
  [runners.machine]
    IdleCount = 1
    IdleTime = 1800
    MaxBuilds = 100
    MachineDriver = "amazonec2"
    MachineName = "ci-build-runner-%s"
    MachineOptions = [
      "amazonec2-region=eu-west-1",
      "amazonec2-zone=a",
      "amazonec2-ami=REDACTED",
      "amazonec2-iam-instance-profile=Build-manager-runner-InstanceProfile",
      "amazonec2-vpc-id=REDACTED",
      "amazonec2-subnet-id=subnet-REDACTED",
      "amazonec2-private-address-only=true",
      "amazonec2-tags=CostId,Build-Runners,CostIdDetail,Build-Runners.Generic,InstanceType,Gitlab-runner,Project,Build-Runners,runner-manager-name,gitlab-aws-autoscaler",
      "amazonec2-instance-type=m5.xlarge",
      "amazonec2-security-group=Build-manager-runners_ACCESS",
      "amazonec2-request-spot-instance=true",
      "amazonec2-spot-price=0.1",
      "amazonec2-volume-type=gp3",
      "amazonec2-root-size=100",
      "amazonec2-userdata=/etc/gitlab-runner/runner-startup.sh",
      "amazonec2-volume-encrypted=true",
    ]
    [[runners.machine.autoscaling]]
      Periods = ["* * * * * mon-fri *"]
      IdleCount = 0
      IdleTime = 1800
      Timezone = "UTC"
    [[runners.machine.autoscaling]]
      Periods = ["* * 8-18 * * mon-fri *"]
      IdleCount = 0
      IdleTime = 3600
      Timezone = "UTC"
    [[runners.machine.autoscaling]]
      Periods = ["* * * * * sat,sun *"]
      IdleCount = 0
      IdleTime = 1800
      Timezone = "UTC"

让我们看看 .gitlab-ci.yml 文件的简化版本:

default:
  image: adoptopenjdk:11-jdk
  services:
    - docker:dind

variables:
  # Instruct Testcontainers to use the daemon of DinD.
  DOCKER_HOST: tcp://docker:2375
  # Improve performance with overlayfs.
  DOCKER_DRIVER: overlay2
  #Keep this up to date
  API_VERSION: v2.1.1

test_only:
  stage: build
  script:
    #install docker,etc
    - ./scripts/bash/install_os_dependencies.sh 
    #install aws-cli
    - ./scripts/bash/install_aws.sh 
#    - echo "PAUSING"
#    - sleep 600
    #pre-fetch the API docker image, so it's available in gradle build for testcontainers
    - ./scripts/bash/pre_fetch_docker_images.sh
    - ./scripts/bash/test.sh

pre_fetch_docker_images.sh
包含以下命令:

#!/usr/bin/env bash
set -e

aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $PRIVATE_ECR_REGISTRY_URL_BASE

#error out if env var API_VERSION is undefined
: ${API_VERSION:?"Need to set API_VERSION"}

echo "**************************************************************************"
echo "* Using version $API_VERSION for tests"
echo "* Please make sure the environment variable API_VERSION is up to date!"
echo "**************************************************************************"
#pre-fetch the image so that TestContainers does not have to
docker pull $PRIVATE_ECR_REGISTRY_URL_BASE/api:$API_VERSION

当像这样设置 config.toml 时(重点关注 docker 卷),我可以看到以下日志:

Starting service docker:dind ...
Using locally found image version due to "if-not-present" pull policy
Using docker image sha256:88e4c241e05bc46acc203ff700199934e57307d05a26b5c408e2fba5b99ee178 for docker:dind with digest docker@sha256:7ff986c816ccc8af25c9f560ca0cba45de2ca2ea2d7099c63099f5539e0d0359 ...
Waiting for services to be up and running (timeout 30 seconds)...
Using locally found image version due to "if-not-present" pull policy
Using docker image sha256:fd22b579185389e40922764c514a3a996f264479b85877b9392ca2f5039d94fd for adoptopenjdk:11-jdk with digest adoptopenjdk@sha256:0f081fe6de07a0a97d74768f512e2a2f2493cb5f383d7d4fa9f46a6d689b6850 ...
Preparing environment 00:00
Running on runner-zcq4javy-project-494-concurrent-0 via runner-zcq4javy-ci-build-runner-1712671477-f5171009...

可以看到 docker:dind 服务已启动,并且 CI 正在进行中。

当执行

pre_fetch_docker_images.sh
时,我看到以下日志:

WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
**************************************************************************
* Using version v2.1.1 for tests
* Please make sure the environment variable API_VERSION is up to date!
*************************************************************************
Cannot connect to the Docker daemon at tcp://docker:2375. Is the docker daemon running?

比较奇特的是,docker login 命令成功了,但是 docker pull 命令却没有成功连接到守护进程。

如果我修改

config.toml
文件中附加到 docker runner 的卷,如下所示(安装
docker.sock
特殊文件):

    volumes = [
              "/var/run/docker.sock:/var/run/docker.sock",
               "/cache",
               "/certs/client"
            ]

然后启动的时候就看到这个效果了

docker:dind
:

Starting service docker:dind ...
Pulling docker image docker:dind ...
Using docker image sha256:88e4c241e05bc46acc203ff700199934e57307d05a26b5c408e2fba5b99ee178 for docker:dind with digest docker@sha256:7ff986c816ccc8af25c9f560ca0cba45de2ca2ea2d7099c63099f5539e0d0359 ...
Waiting for services to be up and running (timeout 30 seconds)...
*** WARNING: Service runner-zcq4javy-project-494-concurrent-0-72e2651efc1535ef-docker-0 probably didn't start properly.
Health check error:
service "runner-zcq4javy-project-494-concurrent-0-72e2651efc1535ef-docker-0-wait-for-service" timeout
Health check container logs:
2024-04-09T17:15:06.149248253Z waiting for TCP connection to 172.17.0.2 on [2375 2376]...
2024-04-09T17:15:06.149382718Z dialing 172.17.0.2:2376...
2024-04-09T17:15:06.149458061Z dialing 172.17.0.2:2375...
2024-04-09T17:15:07.149771628Z dialing 172.17.0.2:2375...
2024-04-09T17:15:07.149794328Z dialing 172.17.0.2:2376...
2024-04-09T17:15:08.150110255Z dialing 172.17.0.2:2375...
2024-04-09T17:15:08.150153846Z dialing 172.17.0.2:2376...
Service container logs:
2024-04-09T17:15:06.499698081Z Certificate request self-signature ok
2024-04-09T17:15:06.499731896Z subject=CN = docker:dind server
2024-04-09T17:15:06.514639701Z /certs/server/cert.pem: OK
2024-04-09T17:15:07.164341862Z Certificate request self-signature ok
2024-04-09T17:15:07.164359294Z subject=CN = docker:dind client
2024-04-09T17:15:07.179270874Z /certs/client/cert.pem: OK
2024-04-09T17:15:07.181725611Z cat: can't open '/proc/net/ip6_tables_names': No such file or directory
2024-04-09T17:15:07.182213651Z cat: can't open '/proc/net/arp_tables_names': No such file or directory
2024-04-09T17:15:07.184062522Z iptables v1.8.10 (nf_tables)
2024-04-09T17:15:07.253190858Z time="2024-04-09T17:15:07.253040060Z" level=info msg="Starting up"
2024-04-09T17:15:07.253842654Z failed to load listeners: can't create unix socket /var/run/docker.sock: device or resource busy

因此

docker:dind
服务未启动。

当作业运行时,我可以看到以下日志:

$ ./scripts/bash/pre_fetch_docker_images.sh
error during connect: Post "http://docker:2375/v1.24/auth": dial tcp: lookup docker on 172.31.0.2:53: no such host

因此,当

docker login
服务未运行时,即使
docker:dind
也不会成功。 这似乎是由于在
docker.sock
容器上安装了
docker:dind
特殊文件。

奖金:

这些是

docker:dind
容器在设置为正确启动时的日志:

time="2024-04-10T08:13:16.742248889Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
time="2024-04-10T08:13:16.742278990Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2024-04-10T08:13:16.742416138Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2024-04-10T08:13:16.742435311Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2024-04-10T08:13:16.742527880Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2024-04-10T08:13:16.742583398Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2024-04-10T08:13:16.742596458Z" level=info msg="metadata content store policy set" policy=shared
time="2024-04-10T08:13:16.750261307Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2024-04-10T08:13:16.750300776Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2024-04-10T08:13:16.750323461Z" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
time="2024-04-10T08:13:16.750348102Z" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
time="2024-04-10T08:13:16.750371909Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2024-04-10T08:13:16.750526792Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2024-04-10T08:13:16.750762774Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2024-04-10T08:13:16.750894466Z" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
time="2024-04-10T08:13:16.750910812Z" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2024-04-10T08:13:16.750923657Z" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2024-04-10T08:13:16.750942770Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.750961241Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.750981835Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751003981Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751023494Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751043017Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751062786Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751084010Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2024-04-10T08:13:16.751118648Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751141720Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751158954Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751179967Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751199232Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751220168Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751243364Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751263313Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751288533Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751310306Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751329237Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751353281Z" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751383850Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751407966Z" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2024-04-10T08:13:16.751433608Z" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751684149Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.751845865Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2024-04-10T08:13:16.752041985Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2024-04-10T08:13:16.752140429Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2024-04-10T08:13:16.752272441Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2024-04-10T08:13:16.752382023Z" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin"
time="2024-04-10T08:13:16.752580900Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2024-04-10T08:13:16.752603424Z" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2024-04-10T08:13:16.752615609Z" level=info msg="NRI interface is disabled by configuration."
time="2024-04-10T08:13:16.752925145Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
time="2024-04-10T08:13:16.752996676Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
time="2024-04-10T08:13:16.753060550Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2024-04-10T08:13:16.753132170Z" level=info msg="containerd successfully booted in 0.035738s"
time="2024-04-10T08:13:17.729206072Z" level=info msg="Setting the storage driver from the $DOCKER_DRIVER environment variable (overlay2)"
time="2024-04-10T08:13:17.729228449Z" level=info msg="[graphdriver] trying configured driver: overlay2"
time="2024-04-10T08:13:17.755365665Z" level=info msg="Loading containers: start."
time="2024-04-10T08:13:17.878372710Z" level=info msg="Loading containers: done."
time="2024-04-10T08:13:17.889604357Z" level=info msg="Docker daemon" commit=8b79278 containerd-snapshotter=false storage-driver=overlay2 version=26.0.0
time="2024-04-10T08:13:17.889734278Z" level=info msg="Daemon has completed initialization"
time="2024-04-10T08:13:17.930324430Z" level=info msg="API listen on /var/run/docker.sock"
time="2024-04-10T08:13:17.930343488Z" level=info msg="API listen on [::]:2376"
2024/04/10 08:13:17 http: TLS handshake error from 172.17.0.3:50382: EOF

还有其他人以不同的方式实现了我最初的目标,或者解决了这个问题吗?

gitlab gitlab-ci-runner testcontainers
1个回答
0
投票

docker 版本 19 似乎引入了一些有关 DinD 和 gitlab CI 证书创建的重大更改,这可以解释为什么必须关闭 TLS,请查看这篇文章。

https://about.gitlab.com/blog/2019/07/31/docker-in-docker-with-docker-19-dot-03/

© www.soinside.com 2019 - 2024. All rights reserved.