我的用户代码服务器有问题。网络服务器无法访问它,即使我系统地排列了我能想到的不同变量的所有选项,我也无法让它工作。
该代码是从原始的deploy_ecs示例演变而来的,由于docker停止支持ECS,我需要对其进行调整。我为 ECS 结构和 ECR 注册表创建了 terraform 设置,并将 docker-compose 文件调整为新结构。用户代码服务器正在运行,但网络服务器无法访问它。该错误消息与 DNS 解析有关。用户代码服务器正在端口 4000 上运行,并且网络服务器正在尝试通过 dagster-usercode.dagster.local:4000 访问它。这是云地图服务名称,应解析为用户代码服务器的IP地址。这并没有发生。
ECS中分布式安装:
workspace.yaml:
load_from:
- python_file: job2.py
- python_package:
location_name: "webserver-jobs"
package_name: job3
- grpc_server:
host: dagster-usercode.dagster.local
port: 4000
location_name: "usercode-jobs"
AWS 中的云地图:
AWS 中的 ECS:
用户代码服务器上的启动命令是这样实现的:
dagster api grpc -h 0.0.0.0 -p 4000 -f sample_jobs.py
在网络服务器上加载workspace.yaml时,我看到以下效果:
dagster._core.errors.DagsterInvariantViolationError: No repositories, jobs, pipelines, graphs, or asset definitions found in "job3". File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server.py", line 408, in __init__ self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories( File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server.py", line 242, in __init__ loadable_targets = get_loadable_targets( File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/utils.py", line 60, in get_loadable_targets else loadable_targets_from_python_package(package_name, working_directory) File "/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 51, in loadable_targets_from_python_package return loadable_targets_from_loaded_module(module) File "/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 116, in loadable_targets_from_loaded_module raise DagsterInvariantViolationError(
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE File "/usr/local/lib/python3.10/site-packages/dagster/_core/workspace/context.py", line 614, in _load_location else origin.create_location(self.instance) File "/usr/local/lib/python3.10/site-packages/dagster/_core/host_representation/origin.py", line 373, in create_location return GrpcServerCodeLocation(self, instance=instance) File "/usr/local/lib/python3.10/site-packages/dagster/_core/host_representation/code_location.py", line 632, in __init__ list_repositories_response = sync_list_repositories_grpc(self.client) File "/usr/local/lib/python3.10/site-packages/dagster/_api/list_repositories.py", line 20, in sync_list_repositories_grpc api_client.list_repositories(), File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 250, in list_repositories res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest) File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 173, in _query self._raise_grpc_exception( File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 156, in _raise_grpc_exception raise DagsterUserCodeUnreachableError(The above exception was caused by the following exception:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=A name=dagster-usercode.dagster.local is_balancer=0: Domain name not found" debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-12T16:21:15.104749061+00:00", grpc_status:14, grpc_message:"DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=A name=dagster-usercode.dagster.local is_balancer=0: Domain name not found"}"> File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 171, in _query return self._get_response(method, request=request_type(**kwargs), timeout=timeout) File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 141, in _get_response return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1160, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1003, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiableThe above exception occurred during handling of the following exception:dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server_watcher.py", line 119, in watch_grpc_server_thread watch_for_changes() File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/server_watcher.py", line 82, in watch_for_changes new_server_id = client.get_server_id(timeout=REQUEST_TIMEOUT) File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 233, in get_server_id res = self._query("GetServerId", api_pb2.Empty, timeout=timeout) File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 173, in _query self._raise_grpc_exception( File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 156, in _raise_grpc_exception raise DagsterUserCodeUnreachableError(The above exception was caused by the following exception:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=AAAA name=dagster-usercode.dagster.local is_balancer=0: Domain name not found" debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-12T16:21:04.309598593+00:00", grpc_status:14, grpc_message:"DNS resolution failed for dagster-usercode.dagster.local:4000: C-ares status is not ARES_SUCCESS qtype=AAAA name=dagster-usercode.dagster.local is_balancer=0: Domain name not found"}"> File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 171, in _query return self._get_response(method, request=request_type(**kwargs), timeout=timeout) File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/client.py", line 141, in _get_response return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1160, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1003, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable
缺少什么?从 ECS 方面来看,一切看起来都很顺利。 找不到包中的作业,但这不是主要问题。 我主要关心的是用户代码服务器,我对下一步感到茫然。
非常感谢任何帮助。
问题实际上是 AWS VPC 设置 - AWS Cloud Map 名称空间与自我管理的 DNS 不兼容。一旦设置正确,所有机器都必须重新构建,因为 resolv.conf 因错误设置而被感染。