我正在尝试按照安装文档在 docker 容器内使用 llama cpp 运行 llama index。
以下 this repo 用于安装 llama_cpp_python==0.2.6.
DOCKERFILE
# Use the official Python image for Python 3.11
FROM python:3.11
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# ARG FORCE_CMAKE=1
# ARG CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# Install project dependencies
RUN FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" python -m pip install -r requirements.txt
# Command to run the server
CMD ["python", "./server.py"]
Run cmd:
docker build -t llm_server ./llm
docker run -it -p 2023:2023 --gpus all llm_server
问题: 由于某种原因,llama cpp 文档中的环境变量在 docker 容器中无法按预期工作。
预期行为:BLAS= 1(使用 GPU 的 llm)
容器内的 nvidia-smi 输出:
# nvidia-smi
Thu Nov 23 05:48:30 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01 Driver Version: 546.01 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti On | 00000000:01:00.0 On | N/A |
| N/A 48C P8 4W / 80W | 1257MiB / 6144MiB | 7% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 392 G /Xwayland N/A |
+---------------------------------------------------------------------------------------+
#
# ARG FORCE_CMAKE=1
# ARG CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# ENV FORCE_CMAKE=1
# ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# Install project dependencies
RUN FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" python -m pip install -r requirements.txt```
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
在 Windows 上我使用此图像:
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
这就是我在安装之前设置必要的变量的方法。
ENV CMAKE_ARGS="-DLLAMA_CUBLAS=ON"
RUN pip install llama-cpp-python
对我有用。再次强调,在 Windows 上使用 Docker Desktop!