LiDAR 深度 + 视觉手部追踪,实现 3D 手部追踪

问题描述 投票:0回答:2

我想使用 Vision 2D 手部跟踪输入,结合 ARKit > People Occlusion > Body Segmentation With Depth(利用 LiDAR)来获取索引尖端的 3D 世界坐标。

我正在做的步骤:

1 - Vision提供的指尖2D屏幕位置作品

2 - 来自 CVPixelBuffer 的深度数据似乎也正确

3 - 从 2D 屏幕坐标 + 深度数据到 3D 世界坐标的非投影是错误的

理想情况下,我可以获得类似于 Josh Caspersz 的 LiDAR Lab 应用程序的结果:

这是我的代码,它将 2D 点坐标 + 深度处理为 3D 世界坐标:

// Result from Vision framework
// Coordinates top right of the screen with Y to the left, X down
indexTip = CGPoint(x:(indexTipPoint.location.x) * CGFloat(arView.bounds.width),
                           y:(1 - indexTipPoint.location.y) * CGFloat(arView.bounds.height))
        
if let segmentationBuffer:CVPixelBuffer = frame.estimatedDepthData {

      let segmentationWidth = CVPixelBufferGetWidth(segmentationBuffer)
      let segmentationHeight = CVPixelBufferGetHeight(segmentationBuffer)
            
      let xConverted:CGFloat = indexTip.x * CGFloat(segmentationWidth) / CGFloat(arView.bounds.width)
      let yConverted:CGFloat = indexTip.y * CGFloat(segmentationHeight) / CGFloat(arView.bounds.height)

      if let indexDepth:Float = segmentationBuffer.value(column: Int(xConverted), row: Int(yConverted)) {

           if indexDepth != 0 {
                 let cameraIntrinsics = frame.camera.intrinsics

                 var xrw: Float = (Float(indexTip.x) - cameraIntrinsics[2][0]) * indexDepth
                 xrw = xrw / cameraIntrinsics[0][0]
                 var yrw: Float = (Float(indexTip.y) - cameraIntrinsics[2][1]) * indexDepth
                 yrw = yrw / cameraIntrinsics[1][1]
                 let xyzw: SIMD4<Float> = SIMD4<Float>(xrw, yrw, indexDepth, 1.0)
                 let vecResult = frame.camera.viewMatrix(for: .portrait) * xyzw
                    
                 resultAnchor.setPosition(SIMD3<Float>(vecResult.x, vecResult.y, vecResult.z), relativeTo: nil)
           }
      }
 }

这是一个跑步时的样子的视频,似乎总是位于空间的特定区域: 视频

计算基本上是示例代码中的计算使用场景深度显示点云

最后,如果您想亲自尝试一下,这里是完整的 zip 文件:ZIP

知道我的计算有什么问题吗?

swift computer-vision arkit lidar cvpixelbuffer
2个回答

0
投票

@oscar-falmer 是的,我在 Apple 开发者论坛上写了这个 答案 并制作了 身体跟踪包。我也尝试在这里链接到它们,但是有人出现并删除了我的链接,因为它们只不过是链接。这是此处复制的解决方案。

视觉结果采用视觉坐标:归一化、(0,0) 左下、(1,1) 右上。 AVFoundation 坐标为 (0,0) 左上角,(1,1) 右下角。要将 Vision 坐标转换为 AVFoundation 坐标,您必须像这样翻转 Y 轴:

public extension CGPoint {
    func convertVisionToAVFoundation() -> CGPoint {
        return CGPoint(x: self.x, y: 1 - self.y)
    }
}

这个 AVFoundation 坐标需要用作索引深度缓冲区的输入,如下所示:

public extension CVPixelBuffer {

    ///The input point must be in normalized AVFoundation coordinates. i.e. (0,0) is in the Top-Left, (1,1,) in the Bottom-Right.

    func value(from point: CGPoint) -> Float? {

        let width = CVPixelBufferGetWidth(self)

        let height = CVPixelBufferGetHeight(self)

        let colPosition = Int(point.x * CGFloat(width))

        let rowPosition = Int(point.y * CGFloat(height))

        return value(column: colPosition, row: rowPosition)

    }

    func value(column: Int, row: Int) -> Float? {

        guard CVPixelBufferGetPixelFormatType(self) == kCVPixelFormatType_DepthFloat32 else { return nil }

        CVPixelBufferLockBaseAddress(self, .readOnly)

        if let baseAddress = CVPixelBufferGetBaseAddress(self) {

            let width = CVPixelBufferGetWidth(self)

            let index = column + (row * width)

            let offset = index * MemoryLayout<Float>.stride

            let value = baseAddress.load(fromByteOffset: offset, as: Float.self)

                CVPixelBufferUnlockBaseAddress(self, .readOnly)

            return value

        }
        CVPixelBufferUnlockBaseAddress(self, .readOnly)

        return nil
    }
}

这就是从愿景请求中获取给定职位深度所需的全部内容。

如果您想在屏幕上找到与 UIKit 或

ARView.ray(through:)
等内容一起使用的位置,则需要进一步转换。 愿景请求于
arView.session.currentFrame.capturedImage
执行 来自
ARFrame.displayTransform(for:viewportSize:)
的文档:

归一化图像坐标范围从左上角的(0,0)开始 将图像移动到右下角的 (1,1)。该方法创建一个 代表旋转和纵横比裁剪的仿射变换 使相机图像适应指定的必要操作 方向和指定视口的纵横比。这 仿射变换不会缩放到视口的像素大小。这 captureImage像素缓冲区是由捕获的原始图像 设备相机,因此未针对设备方向或视图进行调整 长宽比。

因此,屏幕上渲染的图像是相机捕获的帧的裁剪版本,并且需要从 AVFoundation 坐标到显示 (UIKit) 坐标进行转换。 从 AVFoundation 坐标转换为显示(UIKit)坐标:

public extension ARView {

      func convertAVFoundationToScreenSpace(_ point: CGPoint) -> CGPoint? {

        //Convert from normalized AVFoundation coordinates (0,0 top-left, 1,1 bottom-right)

        //to screen-space coordinates.

        guard

            let arFrame = session.currentFrame,

            let interfaceOrientation = window?.windowScene?.interfaceOrientation

        else {return nil}

            let transform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size)

            let normalizedCenter = point.applying(transform)

            let center = normalizedCenter.applying(CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height))

            return center
    }
}

反方向,从 UIKit 显示坐标到 AVFoundation 坐标:

public extension ARView {

    func convertScreenSpaceToAVFoundation(_ point: CGPoint) -> CGPoint? {

        //Convert to normalized pixel coordinates (0,0 top-left, 1,1 bottom-right)

        //from screen-space UIKit coordinates.

        guard

          let arFrame = session.currentFrame,

          let interfaceOrientation = window?.windowScene?.interfaceOrientation

        else {return nil}

          let inverseScaleTransform = CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height).inverted()

          let invertedDisplayTransform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size).inverted()

          let unScaledPoint = point.applying(inverseScaleTransform)

          let normalizedCenter = unScaledPoint.applying(invertedDisplayTransform)

          return normalizedCenter
    }
}

从 UIKit 屏幕坐标获取世界空间坐标和相应的深度值:


    /// Get the world-space position from a UIKit screen point and a depth value
    /// - Parameters:
    ///   - screenPosition: A CGPoint representing a point on screen in UIKit coordinates.
    ///   - depth: The depth at this coordinate, in meters.
    /// - Returns: The position in world space of this coordinate at this depth.
    private func worldPosition(screenPosition: CGPoint, depth: Float) -> simd_float3? {

        guard

            let rayResult = arView.ray(through: screenPosition)

        else {return nil}

        //rayResult.direction is a normalized (1 meter long) vector pointing in the correct direction, and we want to go the length of depth along this vector.

         let worldOffset = rayResult.direction * depth

         let worldPosition = rayResult.origin + worldOffset

         return worldPosition
    }

要为屏幕上的给定点设置实体在世界空间中的位置:


    let currentFrame = arView.session.currentFrame,

    let sceneDepth = (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap

    let depthAtPoint = sceneDepth.value(from: avFoundationPosition),

    let worldPosition = worldPosition(screenPosition: uiKitPosition, depth: depthAtPoint)

    trackedEntity.setPosition(worldPosition, relativeTo: nil)

并且不要忘记在您的

ARConfiguration
上设置正确的框架语义:

    func runNewConfig(){

        // Create a session configuration
        let configuration = ARWorldTrackingConfiguration()

        //Goes with (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap
        let frameSemantics: ARConfiguration.FrameSemantics = [.smoothedSceneDepth, .sceneDepth]

        //Goes with currentFrame.estimatedDepthData
        //let frameSemantics: ARConfiguration.FrameSemantics = .personSegmentationWithDepth


        if ARWorldTrackingConfiguration.supportsFrameSemantics(frameSemantics) {
            configuration.frameSemantics.insert(frameSemantics)
        }

        // Run the view's session

        session.run(configuration)
    }
© www.soinside.com 2019 - 2024. All rights reserved.