运行 Azure 空间分析容器时出现问题

我需要使用 WSL 在我的桌面计算机上创建并运行 Azure 空间分析容器。我遵循了本教程:https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/spatial-analysis-container?tabs=desktop-machine。根据 IoT 中心的说法,一切都应该运行良好。但我没有得到任何输出,当查看空间分析模块的日志时,我经常看到这个错误:

2024-03-06T19:46:45.429562642Z <warning> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Error: Failed to allocate shared buffer. Skipping frame. 

2024-03-06T19:46:45.501593175Z <warning> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Failed to get CUDA handle: cudaIpcGetMemHandle failed with error 2 

2024-03-06T19:46:45.502484545Z <error> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Cannot create cuda shared buffer. Size: 6684672

我不知道该怎么办,这也是来自 nvidia-smi 的报告:

| NVIDIA-SMI 530.30.02              Driver Version: 527.99       CUDA Version: 12.0     |


| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                                         |                      |               MIG M. |


|   0  NVIDIA GeForce RTX 3060 L...    On | 00000000:01:00.0  On |                  N/A |

| N/A   55C    P8               16W / 115W|   2609MiB /  6144MiB |     28%      Default |

|                                         |                      |                  N/A |


I tried restarting the modules and whole IoT edge. Also checked the connectivity to IoT hub and that should be also fine.

Thanks in advance for your help!
该错误是由于 GPU 内存导致的,导致缓冲区大小为 6.6 MB。如果您的 NVIDIA GeForce RTX 3060 的可用内存有限,分配此缓冲区可能会导致问题。确保系统中有足够的空间并满足空间分析容器要求


安装 NVIDIA CUDA 工具包和 Nvidia 显卡驱动程序:

  • 运行提供的 bash 脚本来安装所需的驱动程序和 CUDA 工具包。 - 安装后重新启动机器。
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
sudo reboot

安装 Docker CE 和 nvidia-docker2:

  • 安装Docker CE和nvidia-docker2软件包。
 sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y docker-ce nvidia-docker2
sudo systemctl restart docker
  • 启用 NVIDIA MPS:

    • 为 NVIDIA 多进程服务 (MPS) 配置 GPU,以获得更好的性能。
      sudo nvidia-smi --compute-mode=EXCLUSIVE_PROCESS
    echo "SHELL=/bin/bash" > /tmp/nvidia-mps-cronjob
    sudo chown root:root /tmp/nvidia-mps-cronjob
    sudo mv /tmp/nvidia-mps-cronjob /etc/cron.d/
    sudo chown root:root /tmp/nvidia-mps.service
    sudo mv /tmp/nvidia-mps.service /etc/systemd/system/
    sudo systemctl --now enable nvidia-mps.service

在主机上配置 Azure IoT Edge:

  • 创建 Azure IoT 中心实例:

    • 使用 Azure CLI 创建 Azure IoT 中心的实例。
sudo az login
sudo az account set --subscription "<name or ID of Azure Subscription>"
sudo az group create --name "<resource-group-name>" --location "<your-region>"
sudo az iot hub create --name "<iothub-group-name>" --sku S1 --resource-group "<resource-group-name>"
sudo az iot hub device-identity create --hub-name "<iothub-name>" --device-id "<device-name>" --edge-enabled
  • 安装 Azure IoT Edge:

    • 下载并安装 Azure IoT Edge 版本 1.0.9。
    sudo cp ./microsoft-prod.list /etc/apt/sources.list.d/
    curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
    sudo cp ./microsoft.gpg /etc/apt/trusted.gpg.d/
    sudo apt-get update
    sudo apt-get install iotedge=1.1* libiothsm-std=1.1 
  • 注册 IoT Edge 设备:

    • 从之前创建的 IoT Edge 设备获取连接字符串并更新配置文件。
    • 重新启动 IoT Edge 服务。
 sudo az iot hub device-identity connection-string show --device-id <device-id> --hub-name <hub-name>
sudo nano /etc/iotedge/config.yaml  # Replace ADD DEVICE CONNECTION STRING HERE with the connection string
sudo systemctl restart iotedge

  • 部署容器:

    • 使用 Azure CLI 将容器部署为主机上的 IoT Edge 模块。
   sudo az login
    sudo az extension add --name azure-iot
    sudo az iot edge set-modules --hub-name "<iothub-name>" --device-id "<device-name>" --content DeploymentManifest.json --subscription "<name or ID of Azure Subscription>"
