D3D12Bundles 示例中的索引实例真的能提高性能吗？

第一个问题

文档中的代码片段在 for 循环中调用 DrawIndexdeInstanced

for (UINT i = 0; i < m_cityRowCount; i++) {
    for (UINT j = 0; j < m_cityColumnCount; j++) {
        pCommandList->DrawIndexedInstanced(numIndices, 1, 0, 0, 0);
    }
}

但是API

void DrawIndexedInstanced(
  [in] UINT IndexCountPerInstance,
  [in] UINT InstanceCount,
  [in] UINT StartIndexLocation,
  [in] INT  BaseVertexLocation,
  [in] UINT StartInstanceLocation
);
void DrawInstanced(
  [in] UINT VertexCountPerInstance,
  [in] UINT InstanceCount,
  [in] UINT StartVertexLocation,
  [in] UINT StartInstanceLocation
);

有

StartInstanceLocation

和

InstanceCount

 参数，我假设这些参数会受到

InstanceIndex*StartInstanceLocation

 抵消的影响。

那么以下等价吗？

DrawIndexedInstanced(100, 2, 0,   0, 100);
//vs
DrawIndexedInstanced(100, 1,   0, 0,   0);
DrawIndexedInstanced(100, 1, 100, 0,   0);

DrawInstanced(100, 2,   0, 100);
//vs
DrawInstanced(100, 1,   0,   0);
DrawInstanced(100, 1, 100,   0);

第二个问题

实例化如何提高文档引用的

D3D12Bundles 示例的性能？他们在每个实例之间调用 SetPipelineState

。顶点着色器中用于

g_mWorldViewProj

 的常量缓冲区也会更改每个实例。任何东西如何被重复利用？

for (UINT i = 0; i < m_cityRowCount; i++) {
    for (UINT j = 0; j < m_cityColumnCount; j++) {
        // Alternate which PSO to use; the pixel shader is different on 
        // each just as a PSO setting demonstration.
        pCommandList->SetPipelineState(usePso1 ? pPso1 : pPso2);
        usePso1 = !usePso1;

        // Set this city's CBV table and move to the next descriptor.
        pCommandList->SetGraphicsRootDescriptorTable(2, cbvSrvHandle);
        cbvSrvHandle.Offset(cbvSrvDescriptorSize);

        pCommandList->DrawIndexedInstanced(numIndices, 1, 0, 0, 0);
    }
}

0
投票

用于实例化的规范示例是

InstancingFX11（而不是由 DrawIndexedInstanced() 的坞站链接到的

D3D12Bundles

）

InstantingFX11 示例的作者写了一些关于如何正确使用实例的

注意 Instancing.cpp 中定义缓冲区的代码，因为这基本上实现了 2 个顶点缓冲区。 1 个用于几何图形，另一个用于实例数据（在本例中为矩阵）。添加第二个缓冲区就像在绘制调用周围添加另一个 for 循环（但效率更高）。

您的实例化示例仅讨论添加instanceid系统变量。实例化需要连接到绘制上下文的第二个顶点缓冲区，其中包含独特的数据，例如世界翻译矩阵。然后，您使用第二个缓冲区的定义更新您的签名，并在 HLSL 代码中定义它也将接收实例数据。您的示例是单个缓冲区版本，您可以在其中使用常量缓冲区和要在其中查找的实例 id。这样效率比较低。

在顶点着色器中查找数据意味着该数据无法被驱动程序内联。 GPU 波前的任何预缓存/设置都会被浪费。对于您访问的每个顶点，硬件现在会查找相关的数组条目，而不是由硬件加载一次并作为顶点着色器的参数传入。

性能的关键在于每个实例的第一个顶点缓冲区（包含实际顶点）保持相同。而只有第二个顶点缓冲区（包含不同的世界平移矩阵）在实例之间跨步。

问题描述投票：0回答：1

第一个问题

1个回答

最新问题

D3D12Bundles 示例中的索引实例真的能提高性能吗？

问题描述 投票：0回答：1

第一个问题

1个回答

最新问题

问题描述投票：0回答：1