尝试使用 VLLM 和 Vertex AI 部署 Llama3 70b 时出现 Ray 错误

Question

使用 Vertex ai 自定义容器在线预测，我正在尝试部署：meta-

美洲驼/Meta-Llama-3-70B-指示

在 8 个 NVIDIA_L4 gpu 上使用 vllm 0.4.1 和收获：

/tmp/ray 已满 95% 以上，可用空间：5031063552；容量： 101203873792. 如果需要溢出，对象创建将失败。

这是我看到的最后一个日志，部署失败后

在虚拟机上运行自定义容器没有任何问题，

要创建我正在使用 sdk 中的 google aiplat 的模型：

model_resource = aiplatform.Model.upload(
    serving_container_image_uri=serving_container_image_uri,
    serving_container_shared_memory_size_mb=16384,
    ...
    )

并使用 vllm 加载模型（由容器运行的代码）：

from vllm import LLM
self.model = LLM(
    model=model_config.model_hf_name,
    dtype="auto",
    tensor_parallel_size=model_config.tensor_parallel_size,
    enforce_eager=model_config.enforce_eager,
    disable_custom_all_reduce=model_config.disable_custom_all_reduce,
    worker_use_ray=bool(model_config.tensor_parallel_size > 1),
    enable_prefix_caching=False,
    max_model_len=model_config.max_seq_len,
)

Answer 1

显然，使用自定义容器的 Vertex AI 在线预测有存储限制，

所以，需要设置足够的共享内存用于gpu vllm通信+模型存储，大约140gb，为了安全起见，我让240gb (

model_resource = aiplatform.Model.upload( serving_container_image_uri=serving_container_image_uri, serving_container_shared_memory_size_mb=240000, ... )

)

在容器代码中，我显式加载了

/dev/shm

下的模型，现在它具有容量

    model_path = f"/dev/shm/new_model_path"
    ray_tmp_dir = "/dev/shm/tmp/ray"
    os.makedirs(ray_tmp_dir, exist_ok=True)
    ray.init(_temp_dir=ray_tmp_dir,num_gpus=model_config.tensor_parallel_size)

    download_llm(
        src="some remote storage where the model is kept"
        dest=model_path,
    )

    self.model = LLM(
        model=model_path,
        quantization=model_config.quantization if 
        model_config.quantization else None,
        dtype="auto",
        tensor_parallel_size=model_config.tensor_parallel_size,
        enforce_eager=model_config.enforce_eager,
        disable_custom_all_reduce=model_config.disable_custom_all_reduce,
        worker_use_ray=bool(model_config.tensor_parallel_size > 1),
        enable_lora=bool(self.lora_adapter_path is not None),
        enable_prefix_caching=False,
        max_model_len=model_config.max_seq_len,
    )

还必须使用指向 /dev/shm 下的 tmp 目录来初始化 ray

尝试使用 VLLM 和 Vertex AI 部署 Llama3 70b 时出现 Ray 错误

问题描述投票：0回答：1

1个回答

最新问题

尝试使用 VLLM 和 Vertex AI 部署 Llama3 70b 时出现 Ray 错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1