在射线设置过程中激活Conda环境

问题描述 投票:0回答:1

我正在尝试启动本地Ray群集,但是初始化和设置命令正在引发错误,我不确定它们的含义。

对于每个命令,执行后将显示以下消息(完整日志显示在下方):

bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell

它们似乎并未阻止某些命令成功执行,但是我无法使用以下命令在每个节点上激活conda环境:

# List of shell commands to run to set up each nodes.
setup_commands:
    - conda activate pytorch-dev

任何帮助或解释将不胜感激。

我的群集配置文件(cluster_config_local.yaml)包含:

# An unique identifier for the head node and workers of this cluster.
cluster_name: default

## NOTE: Typically for local clusters, min_workers == initial_workers == max_workers.

# The minimum number of workers nodes to launch in addition to the head
# node. This number should be >= 0.
# Typically, min_workers == initial_workers == max_workers.
min_workers: 12
# The initial number of worker nodes to launch in addition to the head node.
# Typically, min_workers == initial_workers == max_workers.
initial_workers: 12

# The maximum number of workers nodes to launch in addition to the head node.
# This takes precedence over min_workers.
# Typically, min_workers == initial_workers == max_workers.
max_workers: 12

# Autoscaling parameters.
# Ignore this if min_workers == initial_workers == max_workers.
autoscaling_mode: default
target_utilization_fraction: 0.8
idle_timeout_minutes: 5

# This executes all commands on all nodes in the docker container,
# and opens all the necessary ports to support the Ray cluster.
# Empty string means disabled. Assumes Docker is installed.
docker:
    image: "" # e.g., tensorflow/tensorflow:1.5.0-py3
    container_name: "" # e.g. ray_docker
    run_options: []  # Extra options to pass into "docker run"

# Local specific configuration.
provider:
    type: local
    head_ip: cs19090bs #Lab 3, machine 311
    worker_ips: [
        cs19091bs, cs19093bs, cs19094bs, cs19095bs, cs19096bs,
        cs19103bs, cs19102bs, cs19101bs, cs19100bs, cs19099bs, cs19098bs, cs19097bs
    ]

# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: user
    ssh_private_key: ~/.ssh/id_rsa

# Leave this empty.
head_node: {}

# Leave this empty.
worker_nodes: {}

# Files or directories to copy to the head and worker nodes. The format is a
# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
file_mounts: {
#    "/path1/on/remote/machine": "/path1/on/local/machine",
#    "/path2/on/remote/machine": "/path2/on/local/machine",
}

# List of commands that will be run before `setup_commands`. If docker is
# enabled, these commands will run outside the container and before docker
# is setup.
initialization_commands: []

# List of shell commands to run to set up each nodes.
setup_commands:
    - conda activate pytorch-dev

# Custom commands that will be run on the head node after common setup.
head_setup_commands: []

# Custom commands that will be run on worker nodes after common setup.
worker_setup_commands: []

# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
    - ray stop
    - ulimit -c unlimited && ray start --head --redis-port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml

# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
    - ray stop
    - ray start --redis-address=$RAY_HEAD_IP:6379

我执行ray up cluster_config_local.yaml时显示的完整日志是:

2019-11-11 10:18:06,930 INFO node_provider.py:41 -- ClusterState: Loaded cluster state: ['cs19091bs', 'cs19093bs', 'cs19094bs', 'cs19095bs', 'cs19096bs', 'cs19090bs', 'cs19103bs', 'cs19102bs', 'cs19101bs', 'cs19100bs', 'cs19099bs', 'cs19098bs', 'cs19097bs']
This will create a new cluster [y/N]: y
2019-11-11 10:18:08,413 INFO commands.py:201 -- get_or_create_head_node: Launching new head node...
2019-11-11 10:18:08,414 INFO node_provider.py:85 -- ClusterState: Writing cluster state: ['cs19091bs', 'cs19093bs', 'cs19094bs', 'cs19095bs', 'cs19096bs', 'cs19090bs', 'cs19103bs', 'cs19102bs', 'cs19101bs', 'cs19100bs', 'cs19099bs', 'cs19098bs', 'cs19097bs']
2019-11-11 10:18:08,416 INFO commands.py:214 -- get_or_create_head_node: Updating files on head node...
2019-11-11 10:18:08,417 INFO updater.py:356 -- NodeUpdater: cs19090bs: Updating to 345f31e4c980153f1c40ae2c0be26b703d4bbfde
2019-11-11 10:18:08,419 INFO node_provider.py:85 -- ClusterState: Writing cluster state: ['cs19091bs', 'cs19093bs', 'cs19094bs', 'cs19095bs', 'cs19096bs', 'cs19090bs', 'cs19103bs', 'cs19102bs', 'cs19101bs', 'cs19100bs', 'cs19099bs', 'cs19098bs', 'cs19097bs']
2019-11-11 10:18:08,419 INFO updater.py:398 -- NodeUpdater: cs19090bs: Waiting for remote shell...
2019-11-11 10:18:08,420 INFO updater.py:210 -- NodeUpdater: cs19090bs: Waiting for IP...
2019-11-11 10:18:08,429 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Got IP [LogTimer=9ms]
2019-11-11 10:18:08,442 INFO updater.py:262 -- NodeUpdater: cs19090bs: Running uptime on 132.181.15.173...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
 10:18:10 up 4 days, 22:41,  1 user,  load average: 1.14, 0.56, 0.38
2019-11-11 10:18:10,178 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Got remote shell [LogTimer=1759ms]
2019-11-11 10:18:10,181 INFO node_provider.py:85 -- ClusterState: Writing cluster state: ['cs19091bs', 'cs19093bs', 'cs19094bs', 'cs19095bs', 'cs19096bs', 'cs19090bs', 'cs19103bs', 'cs19102bs', 'cs19101bs', 'cs19100bs', 'cs19099bs', 'cs19098bs', 'cs19097bs']
2019-11-11 10:18:10,182 INFO updater.py:262 -- NodeUpdater: cs19090bs: Running mkdir -p ~ on 132.181.15.173...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-11-11 10:18:11,640 INFO updater.py:460 -- NodeUpdater: cs19090bs: Syncing /tmp/ray-bootstrap-aomvoo_d to ~/ray_bootstrap_config.yaml...
sending incremental file list
ray-bootstrap-aomvoo_d

sent 120 bytes  received 47 bytes  111.33 bytes/sec
total size is 1,063  speedup is 6.37
2019-11-11 10:18:12,147 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Synced /tmp/ray-bootstrap-aomvoo_d to ~/ray_bootstrap_config.yaml [LogTimer=1964ms]
2019-11-11 10:18:12,147 INFO updater.py:262 -- NodeUpdater: cs19090bs: Running mkdir -p ~ on 132.181.15.173...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-11-11 10:18:13,610 INFO updater.py:460 -- NodeUpdater: cs19090bs: Syncing /home/cosc/student/atu31/.ssh/id_rsa to ~/ray_bootstrap_key.pem...
sending incremental file list

sent 60 bytes  received 12 bytes  48.00 bytes/sec
total size is 3,243  speedup is 45.04
2019-11-11 10:18:14,131 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Synced /home/cosc/student/atu31/.ssh/id_rsa to ~/ray_bootstrap_key.pem [LogTimer=1984ms]
2019-11-11 10:18:14,133 INFO node_provider.py:85 -- ClusterState: Writing cluster state: ['cs19091bs', 'cs19093bs', 'cs19094bs', 'cs19095bs', 'cs19096bs', 'cs19090bs', 'cs19103bs', 'cs19102bs', 'cs19101bs', 'cs19100bs', 'cs19099bs', 'cs19098bs', 'cs19097bs']
2019-11-11 10:18:14,134 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Initialization commands completed [LogTimer=0ms]
2019-11-11 10:18:14,134 INFO updater.py:262 -- NodeUpdater: cs19090bs: Running conda activate pytorch-dev on 132.181.15.173...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-11-11 10:18:15,740 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Setup commands completed [LogTimer=1605ms]
2019-11-11 10:18:15,740 INFO updater.py:262 -- NodeUpdater: cs19090bs: Running ray stop on 132.181.15.173...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-11-11 10:18:17,809 INFO updater.py:262 -- NodeUpdater: cs19090bs: Running ulimit -c unlimited && ray start --head --redis-port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml on 132.181.15.173...
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2019-11-11 10:18:19,923 INFO scripts.py:303 -- Using IP address 132.181.15.173 for this node.
2019-11-11 10:18:19,924 INFO resource_spec.py:205 -- Starting Ray with 7.62 GiB memory available for workers and up to 3.81 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2019-11-11 10:18:20,169 INFO scripts.py:333 -- 
Started Ray on this node. You can add additional nodes to the cluster by calling

    ray start --redis-address 132.181.15.173:6379

from the node you wish to add. You can connect a driver to the cluster from Python by running

    import ray
    ray.init(redis_address="132.181.15.173:6379")

If you have trouble connecting from a different machine, check that your firewall is configured properly. If you wish to terminate the processes that have been started, run

    ray stop
2019-11-11 10:18:20,221 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Ray start commands completed [LogTimer=4480ms]
2019-11-11 10:18:20,222 INFO log_timer.py:21 -- NodeUpdater: cs19090bs: Applied config 345f31e4c980153f1c40ae2c0be26b703d4bbfde [LogTimer=11804ms]
2019-11-11 10:18:20,224 INFO node_provider.py:85 -- ClusterState: Writing cluster state: ['cs19091bs', 'cs19093bs', 'cs19094bs', 'cs19095bs', 'cs19096bs', 'cs19090bs', 'cs19103bs', 'cs19102bs', 'cs19101bs', 'cs19100bs', 'cs19099bs', 'cs19098bs', 'cs19097bs']
2019-11-11 10:18:20,226 INFO commands.py:281 -- get_or_create_head_node: Head node up-to-date, IP address is: 132.181.15.173
To monitor auto-scaling activity, you can run:

  ray exec cluster/cluster_config_local.yaml 'tail -n 100 -f /tmp/ray/session_*/logs/monitor*'

To open a console on the cluster:

  ray attach cluster_config_local.yaml

To get a remote shell to the cluster manually, run:

  ssh -i ~/.ssh/id_rsa [email protected]
ray
1个回答
0
投票
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell

此错误消息是无害的(应在Ray中忽略)。参见How to tell bash not to issue warnings "cannot set terminal process group" and "no job control in this shell" when it can't assert job control?

© www.soinside.com 2019 - 2024. All rights reserved.