我想使用 Vision 2D 手部跟踪输入,结合 ARKit > People Occlusion > Body Segmentation With Depth(利用 LiDAR)来获取索引尖端的 3D 世界坐标。
我正在做的步骤:
1 - Vision提供的指尖2D屏幕位置作品
2 - 来自 CVPixelBuffer 的深度数据似乎也正确
3 - 从 2D 屏幕坐标 + 深度数据到 3D 世界坐标的非投影是错误的
理想情况下,我可以获得类似于 Josh Caspersz 的 LiDAR Lab 应用程序的结果:
这是我的代码,它将 2D 点坐标 + 深度处理为 3D 世界坐标:
// Result from Vision framework
// Coordinates top right of the screen with Y to the left, X down
indexTip = CGPoint(x:(indexTipPoint.location.x) * CGFloat(arView.bounds.width),
y:(1 - indexTipPoint.location.y) * CGFloat(arView.bounds.height))
if let segmentationBuffer:CVPixelBuffer = frame.estimatedDepthData {
let segmentationWidth = CVPixelBufferGetWidth(segmentationBuffer)
let segmentationHeight = CVPixelBufferGetHeight(segmentationBuffer)
let xConverted:CGFloat = indexTip.x * CGFloat(segmentationWidth) / CGFloat(arView.bounds.width)
let yConverted:CGFloat = indexTip.y * CGFloat(segmentationHeight) / CGFloat(arView.bounds.height)
if let indexDepth:Float = segmentationBuffer.value(column: Int(xConverted), row: Int(yConverted)) {
if indexDepth != 0 {
let cameraIntrinsics = frame.camera.intrinsics
var xrw: Float = (Float(indexTip.x) - cameraIntrinsics[2][0]) * indexDepth
xrw = xrw / cameraIntrinsics[0][0]
var yrw: Float = (Float(indexTip.y) - cameraIntrinsics[2][1]) * indexDepth
yrw = yrw / cameraIntrinsics[1][1]
let xyzw: SIMD4<Float> = SIMD4<Float>(xrw, yrw, indexDepth, 1.0)
let vecResult = frame.camera.viewMatrix(for: .portrait) * xyzw
resultAnchor.setPosition(SIMD3<Float>(vecResult.x, vecResult.y, vecResult.z), relativeTo: nil)
}
}
}
这是一个跑步时的样子的视频,似乎总是位于空间的特定区域: 视频
计算基本上是示例代码中的计算使用场景深度显示点云
最后,如果您想亲自尝试一下,这里是完整的 zip 文件:ZIP。
知道我的计算有什么问题吗?
@oscar-falmer 是的,我在 Apple 开发者论坛上写了这个 答案 并制作了 身体跟踪包。我也尝试在这里链接到它们,但是有人出现并删除了我的链接,因为它们只不过是链接。这是此处复制的解决方案。
视觉结果采用视觉坐标:归一化、(0,0) 左下、(1,1) 右上。 AVFoundation 坐标为 (0,0) 左上角,(1,1) 右下角。要将 Vision 坐标转换为 AVFoundation 坐标,您必须像这样翻转 Y 轴:
public extension CGPoint {
func convertVisionToAVFoundation() -> CGPoint {
return CGPoint(x: self.x, y: 1 - self.y)
}
}
这个 AVFoundation 坐标需要用作索引深度缓冲区的输入,如下所示:
public extension CVPixelBuffer {
///The input point must be in normalized AVFoundation coordinates. i.e. (0,0) is in the Top-Left, (1,1,) in the Bottom-Right.
func value(from point: CGPoint) -> Float? {
let width = CVPixelBufferGetWidth(self)
let height = CVPixelBufferGetHeight(self)
let colPosition = Int(point.x * CGFloat(width))
let rowPosition = Int(point.y * CGFloat(height))
return value(column: colPosition, row: rowPosition)
}
func value(column: Int, row: Int) -> Float? {
guard CVPixelBufferGetPixelFormatType(self) == kCVPixelFormatType_DepthFloat32 else { return nil }
CVPixelBufferLockBaseAddress(self, .readOnly)
if let baseAddress = CVPixelBufferGetBaseAddress(self) {
let width = CVPixelBufferGetWidth(self)
let index = column + (row * width)
let offset = index * MemoryLayout<Float>.stride
let value = baseAddress.load(fromByteOffset: offset, as: Float.self)
CVPixelBufferUnlockBaseAddress(self, .readOnly)
return value
}
CVPixelBufferUnlockBaseAddress(self, .readOnly)
return nil
}
}
这就是从愿景请求中获取给定职位深度所需的全部内容。
如果您想在屏幕上找到与 UIKit 或
ARView.ray(through:)
等内容一起使用的位置,则需要进一步转换。
愿景请求于 arView.session.currentFrame.capturedImage
执行
来自 ARFrame.displayTransform(for:viewportSize:)
的文档:
归一化图像坐标范围从左上角的(0,0)开始 将图像移动到右下角的 (1,1)。该方法创建一个 代表旋转和纵横比裁剪的仿射变换 使相机图像适应指定的必要操作 方向和指定视口的纵横比。这 仿射变换不会缩放到视口的像素大小。这 captureImage像素缓冲区是由捕获的原始图像 设备相机,因此未针对设备方向或视图进行调整 长宽比。
因此,屏幕上渲染的图像是相机捕获的帧的裁剪版本,并且需要从 AVFoundation 坐标到显示 (UIKit) 坐标进行转换。 从 AVFoundation 坐标转换为显示(UIKit)坐标:
public extension ARView {
func convertAVFoundationToScreenSpace(_ point: CGPoint) -> CGPoint? {
//Convert from normalized AVFoundation coordinates (0,0 top-left, 1,1 bottom-right)
//to screen-space coordinates.
guard
let arFrame = session.currentFrame,
let interfaceOrientation = window?.windowScene?.interfaceOrientation
else {return nil}
let transform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size)
let normalizedCenter = point.applying(transform)
let center = normalizedCenter.applying(CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height))
return center
}
}
反方向,从 UIKit 显示坐标到 AVFoundation 坐标:
public extension ARView {
func convertScreenSpaceToAVFoundation(_ point: CGPoint) -> CGPoint? {
//Convert to normalized pixel coordinates (0,0 top-left, 1,1 bottom-right)
//from screen-space UIKit coordinates.
guard
let arFrame = session.currentFrame,
let interfaceOrientation = window?.windowScene?.interfaceOrientation
else {return nil}
let inverseScaleTransform = CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height).inverted()
let invertedDisplayTransform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size).inverted()
let unScaledPoint = point.applying(inverseScaleTransform)
let normalizedCenter = unScaledPoint.applying(invertedDisplayTransform)
return normalizedCenter
}
}
从 UIKit 屏幕坐标获取世界空间坐标和相应的深度值:
/// Get the world-space position from a UIKit screen point and a depth value
/// - Parameters:
/// - screenPosition: A CGPoint representing a point on screen in UIKit coordinates.
/// - depth: The depth at this coordinate, in meters.
/// - Returns: The position in world space of this coordinate at this depth.
private func worldPosition(screenPosition: CGPoint, depth: Float) -> simd_float3? {
guard
let rayResult = arView.ray(through: screenPosition)
else {return nil}
//rayResult.direction is a normalized (1 meter long) vector pointing in the correct direction, and we want to go the length of depth along this vector.
let worldOffset = rayResult.direction * depth
let worldPosition = rayResult.origin + worldOffset
return worldPosition
}
要为屏幕上的给定点设置实体在世界空间中的位置:
let currentFrame = arView.session.currentFrame,
let sceneDepth = (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap
let depthAtPoint = sceneDepth.value(from: avFoundationPosition),
let worldPosition = worldPosition(screenPosition: uiKitPosition, depth: depthAtPoint)
trackedEntity.setPosition(worldPosition, relativeTo: nil)
并且不要忘记在您的
ARConfiguration
上设置正确的框架语义:
func runNewConfig(){
// Create a session configuration
let configuration = ARWorldTrackingConfiguration()
//Goes with (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap
let frameSemantics: ARConfiguration.FrameSemantics = [.smoothedSceneDepth, .sceneDepth]
//Goes with currentFrame.estimatedDepthData
//let frameSemantics: ARConfiguration.FrameSemantics = .personSegmentationWithDepth
if ARWorldTrackingConfiguration.supportsFrameSemantics(frameSemantics) {
configuration.frameSemantics.insert(frameSemantics)
}
// Run the view's session
session.run(configuration)
}