LiDAR 深度 + 视觉手部追踪，实现 3D 手部追踪

Question

我想使用 Vision 2D 手部跟踪输入，结合 ARKit > People Occlusion > Body Segmentation With Depth（利用 LiDAR）来获取索引尖端的 3D 世界坐标。

我正在做的步骤：

1 - Vision提供的指尖2D屏幕位置作品

2 - 来自 CVPixelBuffer 的深度数据似乎也正确

3 - 从 2D 屏幕坐标 + 深度数据到 3D 世界坐标的非投影是错误的

理想情况下，我可以获得类似于 Josh Caspersz 的 LiDAR Lab 应用程序的结果：

这是我的代码，它将 2D 点坐标 + 深度处理为 3D 世界坐标：

// Result from Vision framework
// Coordinates top right of the screen with Y to the left, X down
indexTip = CGPoint(x:(indexTipPoint.location.x) * CGFloat(arView.bounds.width),
                           y:(1 - indexTipPoint.location.y) * CGFloat(arView.bounds.height))
        
if let segmentationBuffer:CVPixelBuffer = frame.estimatedDepthData {

      let segmentationWidth = CVPixelBufferGetWidth(segmentationBuffer)
      let segmentationHeight = CVPixelBufferGetHeight(segmentationBuffer)
            
      let xConverted:CGFloat = indexTip.x * CGFloat(segmentationWidth) / CGFloat(arView.bounds.width)
      let yConverted:CGFloat = indexTip.y * CGFloat(segmentationHeight) / CGFloat(arView.bounds.height)

      if let indexDepth:Float = segmentationBuffer.value(column: Int(xConverted), row: Int(yConverted)) {

           if indexDepth != 0 {
                 let cameraIntrinsics = frame.camera.intrinsics

                 var xrw: Float = (Float(indexTip.x) - cameraIntrinsics[2][0]) * indexDepth
                 xrw = xrw / cameraIntrinsics[0][0]
                 var yrw: Float = (Float(indexTip.y) - cameraIntrinsics[2][1]) * indexDepth
                 yrw = yrw / cameraIntrinsics[1][1]
                 let xyzw: SIMD4<Float> = SIMD4<Float>(xrw, yrw, indexDepth, 1.0)
                 let vecResult = frame.camera.viewMatrix(for: .portrait) * xyzw
                    
                 resultAnchor.setPosition(SIMD3<Float>(vecResult.x, vecResult.y, vecResult.z), relativeTo: nil)
           }
      }
 }

这是一个跑步时的样子的视频，似乎总是位于空间的特定区域：视频

计算基本上是示例代码中的计算使用场景深度显示点云

最后，如果您想亲自尝试一下，这里是完整的 zip 文件：ZIP。

知道我的计算有什么问题吗？

Answer 1

答案似乎就在那里： https://developer.apple.com/forums/thread/705216?answerId=712036022#712036022还有https://github.com/Reality-Dev/BodyTracking

Answer 2

@oscar-falmer 是的，我在 Apple 开发者论坛上写了这个答案并制作了身体跟踪包。我也尝试在这里链接到它们，但是有人出现并删除了我的链接，因为它们只不过是链接。这是此处复制的解决方案。

视觉结果采用视觉坐标：归一化、(0,0) 左下、(1,1) 右上。 AVFoundation 坐标为 (0,0) 左上角，(1,1) 右下角。要将 Vision 坐标转换为 AVFoundation 坐标，您必须像这样翻转 Y 轴：

public extension CGPoint {
    func convertVisionToAVFoundation() -> CGPoint {
        return CGPoint(x: self.x, y: 1 - self.y)
    }
}

这个 AVFoundation 坐标需要用作索引深度缓冲区的输入，如下所示：

public extension CVPixelBuffer {

    ///The input point must be in normalized AVFoundation coordinates. i.e. (0,0) is in the Top-Left, (1,1,) in the Bottom-Right.

    func value(from point: CGPoint) -> Float? {

        let width = CVPixelBufferGetWidth(self)

        let height = CVPixelBufferGetHeight(self)

        let colPosition = Int(point.x * CGFloat(width))

        let rowPosition = Int(point.y * CGFloat(height))

        return value(column: colPosition, row: rowPosition)

    }

    func value(column: Int, row: Int) -> Float? {

        guard CVPixelBufferGetPixelFormatType(self) == kCVPixelFormatType_DepthFloat32 else { return nil }

        CVPixelBufferLockBaseAddress(self, .readOnly)

        if let baseAddress = CVPixelBufferGetBaseAddress(self) {

            let width = CVPixelBufferGetWidth(self)

            let index = column + (row * width)

            let offset = index * MemoryLayout<Float>.stride

            let value = baseAddress.load(fromByteOffset: offset, as: Float.self)

                CVPixelBufferUnlockBaseAddress(self, .readOnly)

            return value

        }
        CVPixelBufferUnlockBaseAddress(self, .readOnly)

        return nil
    }
}

这就是从愿景请求中获取给定职位深度所需的全部内容。

如果您想在屏幕上找到与 UIKit 或

ARView.ray(through:)

等内容一起使用的位置，则需要进一步转换。愿景请求于

arView.session.currentFrame.capturedImage

执行来自

ARFrame.displayTransform(for:viewportSize:)

的文档：

归一化图像坐标范围从左上角的(0,0)开始将图像移动到右下角的 (1,1)。该方法创建一个代表旋转和纵横比裁剪的仿射变换使相机图像适应指定的必要操作方向和指定视口的纵横比。这仿射变换不会缩放到视口的像素大小。这 captureImage像素缓冲区是由捕获的原始图像设备相机，因此未针对设备方向或视图进行调整长宽比。

因此，屏幕上渲染的图像是相机捕获的帧的裁剪版本，并且需要从 AVFoundation 坐标到显示 (UIKit) 坐标进行转换。从 AVFoundation 坐标转换为显示（UIKit）坐标：

public extension ARView {

      func convertAVFoundationToScreenSpace(_ point: CGPoint) -> CGPoint? {

        //Convert from normalized AVFoundation coordinates (0,0 top-left, 1,1 bottom-right)

        //to screen-space coordinates.

        guard

            let arFrame = session.currentFrame,

            let interfaceOrientation = window?.windowScene?.interfaceOrientation

        else {return nil}

            let transform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size)

            let normalizedCenter = point.applying(transform)

            let center = normalizedCenter.applying(CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height))

            return center
    }
}

反方向，从 UIKit 显示坐标到 AVFoundation 坐标：

public extension ARView {

    func convertScreenSpaceToAVFoundation(_ point: CGPoint) -> CGPoint? {

        //Convert to normalized pixel coordinates (0,0 top-left, 1,1 bottom-right)

        //from screen-space UIKit coordinates.

        guard

          let arFrame = session.currentFrame,

          let interfaceOrientation = window?.windowScene?.interfaceOrientation

        else {return nil}

          let inverseScaleTransform = CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height).inverted()

          let invertedDisplayTransform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size).inverted()

          let unScaledPoint = point.applying(inverseScaleTransform)

          let normalizedCenter = unScaledPoint.applying(invertedDisplayTransform)

          return normalizedCenter
    }
}

从 UIKit 屏幕坐标获取世界空间坐标和相应的深度值：


    /// Get the world-space position from a UIKit screen point and a depth value
    /// - Parameters:
    ///   - screenPosition: A CGPoint representing a point on screen in UIKit coordinates.
    ///   - depth: The depth at this coordinate, in meters.
    /// - Returns: The position in world space of this coordinate at this depth.
    private func worldPosition(screenPosition: CGPoint, depth: Float) -> simd_float3? {

        guard

            let rayResult = arView.ray(through: screenPosition)

        else {return nil}

        //rayResult.direction is a normalized (1 meter long) vector pointing in the correct direction, and we want to go the length of depth along this vector.

         let worldOffset = rayResult.direction * depth

         let worldPosition = rayResult.origin + worldOffset

         return worldPosition
    }

要为屏幕上的给定点设置实体在世界空间中的位置：


    let currentFrame = arView.session.currentFrame,

    let sceneDepth = (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap

    let depthAtPoint = sceneDepth.value(from: avFoundationPosition),

    let worldPosition = worldPosition(screenPosition: uiKitPosition, depth: depthAtPoint)

    trackedEntity.setPosition(worldPosition, relativeTo: nil)

并且不要忘记在您的

ARConfiguration

上设置正确的框架语义：

    func runNewConfig(){

        // Create a session configuration
        let configuration = ARWorldTrackingConfiguration()

        //Goes with (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap
        let frameSemantics: ARConfiguration.FrameSemantics = [.smoothedSceneDepth, .sceneDepth]

        //Goes with currentFrame.estimatedDepthData
        //let frameSemantics: ARConfiguration.FrameSemantics = .personSegmentationWithDepth


        if ARWorldTrackingConfiguration.supportsFrameSemantics(frameSemantics) {
            configuration.frameSemantics.insert(frameSemantics)
        }

        // Run the view's session

        session.run(configuration)
    }

LiDAR 深度 + 视觉手部追踪，实现 3D 手部追踪

问题描述投票：0回答：2

2个回答

最新问题

LiDAR 深度 + 视觉手部追踪，实现 3D 手部追踪

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2