运行 Azure 空间分析容器时出现问题

问题描述 投票:0回答:1

我需要使用 WSL 在我的桌面计算机上创建并运行 Azure 空间分析容器。我遵循了本教程:https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/spatial-analysis-container?tabs=desktop-machine。根据 IoT 中心的说法,一切都应该运行良好。但我没有得到任何输出,当查看空间分析模块的日志时,我经常看到这个错误:

2024-03-06T19:46:45.429562642Z <warning> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Error: Failed to allocate shared buffer. Skipping frame. 

2024-03-06T19:46:45.501593175Z <warning> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Failed to get CUDA handle: cudaIpcGetMemHandle failed with error 2 

2024-03-06T19:46:45.502484545Z <error> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Cannot create cuda shared buffer. Size: 6684672

我不知道该怎么办,这也是来自 nvidia-smi 的报告:


| NVIDIA-SMI 530.30.02              Driver Version: 527.99       CUDA Version: 12.0     |

|-----------------------------------------+----------------------+----------------------+

| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                                         |                      |               MIG M. |

|=========================================+======================+======================|

|   0  NVIDIA GeForce RTX 3060 L...    On | 00000000:01:00.0  On |                  N/A |

| N/A   55C    P8               16W / 115W|   2609MiB /  6144MiB |     28%      Default |

|                                         |                      |                  N/A |

+-----------------------------------------+----------------------+----------------------+````

I tried restarting the modules and whole IoT edge. Also checked the connectivity to IoT hub and that should be also fine.

Thanks in advance for your help!
azure computer-vision iot azure-cognitive-services azure-iot-edge
1个回答
0
投票

该错误是由于 GPU 内存导致的,导致缓冲区大小为 6.6 MB。如果您的 NVIDIA GeForce RTX 3060 的可用内存有限,分配此缓冲区可能会导致问题。确保系统中有足够的空间并满足空间分析容器要求

以下是安装和运行空间分析容器的分步指南:

安装 NVIDIA CUDA 工具包和 Nvidia 显卡驱动程序:

  • 运行提供的 bash 脚本来安装所需的驱动程序和 CUDA 工具包。 - 安装后重新启动机器。
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
sudo reboot

安装 Docker CE 和 nvidia-docker2:

  • 安装Docker CE和nvidia-docker2软件包。
 sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y docker-ce nvidia-docker2
sudo systemctl restart docker
  • 启用 NVIDIA MPS:

    • 为 NVIDIA 多进程服务 (MPS) 配置 GPU,以获得更好的性能。
      sudo nvidia-smi --compute-mode=EXCLUSIVE_PROCESS
    echo "SHELL=/bin/bash" > /tmp/nvidia-mps-cronjob
    
    sudo chown root:root /tmp/nvidia-mps-cronjob
    sudo mv /tmp/nvidia-mps-cronjob /etc/cron.d/
    
    sudo chown root:root /tmp/nvidia-mps.service
    sudo mv /tmp/nvidia-mps.service /etc/systemd/system/
    sudo systemctl --now enable nvidia-mps.service

在主机上配置 Azure IoT Edge:

  • 创建 Azure IoT 中心实例:

    • 使用 Azure CLI 创建 Azure IoT 中心的实例。
sudo az login
sudo az account set --subscription "<name or ID of Azure Subscription>"
sudo az group create --name "<resource-group-name>" --location "<your-region>"
sudo az iot hub create --name "<iothub-group-name>" --sku S1 --resource-group "<resource-group-name>"
sudo az iot hub device-identity create --hub-name "<iothub-name>" --device-id "<device-name>" --edge-enabled
  • 安装 Azure IoT Edge:

    • 下载并安装 Azure IoT Edge 版本 1.0.9。
    sudo cp ./microsoft-prod.list /etc/apt/sources.list.d/
    curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
    sudo cp ./microsoft.gpg /etc/apt/trusted.gpg.d/
    sudo apt-get update
    sudo apt-get install iotedge=1.1* libiothsm-std=1.1 
    
  • 注册 IoT Edge 设备:

    • 从之前创建的 IoT Edge 设备获取连接字符串并更新配置文件。
    • 重新启动 IoT Edge 服务。
 sudo az iot hub device-identity connection-string show --device-id <device-id> --hub-name <hub-name>
sudo nano /etc/iotedge/config.yaml  # Replace ADD DEVICE CONNECTION STRING HERE with the connection string
sudo systemctl restart iotedge

enter image description here

部署空间分析容器:

  • 部署容器:

    • 使用 Azure CLI 将容器部署为主机上的 IoT Edge 模块。
   sudo az login
    sudo az extension add --name azure-iot
    sudo az iot edge set-modules --hub-name "<iothub-name>" --device-id "<device-name>" --content DeploymentManifest.json --subscription "<name or ID of Azure Subscription>"
© www.soinside.com 2019 - 2024. All rights reserved.