访问 ECS 上的 Dagster 用户代码服务器时出错?

问题描述 投票:0回答:1

我的用户代码服务器有问题。网络服务器无法访问它,即使我系统地排列了我能想到的不同变量的所有选项,我也无法让它工作。

该代码是从原始的deploy_ecs示例演变而来的,由于docker停止支持ECS,我需要对其进行调整。我为 ECS 结构和 ECR 注册表创建了 terraform 设置,并将 docker-compose 文件调整为新结构。用户代码服务器正在运行,但网络服务器无法访问它。该错误消息与 DNS 解析有关。用户代码服务器正在端口 4000 上运行,并且网络服务器正在尝试通过 dagster-usercode.dagster.local:4000 访问它。这是云地图服务名称,应解析为用户代码服务器的IP地址。这并没有发生。

ECS中分布式安装:

  • 1 个网络服务器
  • 1 守护进程服务器
  • 1 个用户代码服务器
网络服务器上的

workspace.yaml:

load_from:
  - python_file: job2.py
  - python_package:
      location_name: "webserver-jobs"
      package_name: job3
  - grpc_server:
      host: dagster-usercode.dagster.local
      port: 4000
      location_name: "usercode-jobs"

AWS 中的云地图:

  • 命名空间 dagster.local
  • 云地图服务dagster-usercode,指向ECS服务实例dagster_usercode

AWS 中的 ECS:

  • 服务dagster_usercode正在运行引用ECR中的usercode服务器容器的任务,根据日志找到并运行该任务。

用户代码服务器上的启动命令是这样实现的:

dagster api grpc -h 0.0.0.0 -p 4000 -f sample_jobs.py

在网络服务器上加载workspace.yaml时,我看到以下效果:

  • job2已加载并可以调用
  • 作业3:错误
dagster._core.errors.DagsterInvariantViolationError: No repositories, jobs, pipelines, graphs, or asset definitions found in "job3".  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server.py", line 408, in __init__    self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories(  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server.py", line 242, in __init__    loadable_targets = get_loadable_targets(  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/utils.py", line 60, in get_loadable_targets    else loadable_targets_from_python_package(package_name, working_directory)  File "/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 51, in loadable_targets_from_python_package    return loadable_targets_from_loaded_module(module)  File "/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 116, in loadable_targets_from_loaded_module    raise DagsterInvariantViolationError(
  • grpc:错误
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE  File "/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/context.py", line 614, in _load_location    else origin.create_location(self.instance)  File "/usr/local/lib/python3.10/site-packages/dagster/_core/host_representation/origin.py", line 373, in create_location    return GrpcServerCodeLocation(self, instance=instance)  File "/usr/local/lib/python3.10/site-packages/dagster/_core/host_representation/code_location.py", line 632, in __init__    list_repositories_response = sync_list_repositories_grpc(self.client)  File "/usr/local/lib/python3.10/site-packages/dagster/_api/list_repositories.py", line 20, in sync_list_repositories_grpc    api_client.list_repositories(),  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 250, in list_repositories    res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 173, in _query    self._raise_grpc_exception(  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 156, in _raise_grpc_exception    raise DagsterUserCodeUnreachableError(The above exception was caused by the following exception:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:   status = StatusCode.UNAVAILABLE details = "DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=A name=dagster-usercode.dagster.local is_balancer=0: Domain name not found"   debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-02-12T16:21:15.104749061+00:00", grpc_status:14, grpc_message:"DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=A name=dagster-usercode.dagster.local is_balancer=0: Domain name not found"}">  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 171, in _query    return self._get_response(method, request=request_type(**kwargs), timeout=timeout)  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 141, in _get_response    return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1160, in __call__    return _end_unary_response_blocking(state, call, False, None)  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1003, in _end_unary_response_blocking    raise _InactiveRpcError(state)  # pytype: disable=not-instantiableThe above exception occurred during handling of the following exception:dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server_watcher.py", line 119, in watch_grpc_server_thread    watch_for_changes()  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server_watcher.py", line 82, in watch_for_changes    new_server_id = client.get_server_id(timeout=REQUEST_TIMEOUT)  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 233, in get_server_id    res = self._query("GetServerId", api_pb2.Empty, timeout=timeout)  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 173, in _query    self._raise_grpc_exception(  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 156, in _raise_grpc_exception    raise DagsterUserCodeUnreachableError(The above exception was caused by the following exception:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:   status = StatusCode.UNAVAILABLE details = "DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=AAAA name=dagster-usercode.dagster.local is_balancer=0: Domain name not found"    debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-02-12T16:21:04.309598593+00:00", grpc_status:14, grpc_message:"DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=AAAA name=dagster-usercode.dagster.local is_balancer=0: Domain name not found"}">  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 171, in _query    return self._get_response(method, request=request_type(**kwargs), timeout=timeout)  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 141, in _get_response    return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1160, in __call__    return _end_unary_response_blocking(state, call, False, None)  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1003, in _end_unary_response_blocking    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable

缺少什么?从 ECS 方面来看,一切看起来都很顺利。 找不到包中的作业,但这不是主要问题。 我主要关心的是用户代码服务器,我对下一步感到茫然。

非常感谢任何帮助。

docker dns amazon-ecs dagster aws-cloudmap
1个回答
0
投票

问题实际上是 AWS VPC 设置 - AWS Cloud Map 名称空间与自我管理的 DNS 不兼容。一旦设置正确,所有机器都必须重新构建,因为 resolv.conf 因错误设置而被感染。

© www.soinside.com 2019 - 2024. All rights reserved.