OpenGL 到 FFMpeg 编码

Question

我有一个

opengl

缓冲区，我需要将其直接转发到

ffmpeg

来进行基于 nvenc 的

h264

编码。

我当前的做法是

glReadPixels

从帧缓冲区中获取像素，然后将该指针传递到

ffmpeg

中，以便它可以将帧编码为

H264

的

RTSP

数据包。然而，这很糟糕，因为我必须将字节从 GPU 内存复制到 CPU 内存，然后才将它们复制回 GPU 进行编码。

Answer 1

如果您查看发布日期与此答案的日期，您会发现我花了很多时间在这方面。（这是我过去 4 周的全职工作）。

由于我很难让它发挥作用，我将写一个简短的指南，希望可以帮助任何发现这个的人。

概要

我的基本流程是OGL帧缓冲区对象颜色附件（纹理）→nvenc（nvidia编码器）

注意事项

一些注意事项：
1) nvidia编码器可以接受YUV或RGB类型的图像。
2) FFMPEG 4.0及以下版本无法将RGB图像传递给nvenc。
3) 根据我的问题，FFMPEG 已更新以接受 RGB 作为输入。

有一些不同的事情需要了解：
1) AVHWDeviceContext-将其视为 ffmpegs 设备抽象层。
2) AVHWFramesContext-将其视为 ffmpegs 硬件帧抽象层。
3) cuMemcpy2D-将 cuda 映射的 OGL 纹理复制到 ffmpeg 创建的 cuda 缓冲区所需的方法。

全面性

本指南是对标准软件编码指南的补充。这不是完整的代码，只能在标准流程之外使用。

代码详情

设置

你需要首先获取你的 GPU 名称，为此我找到了一些代码（我不记得从哪里得到它），它们进行了一些 cuda 调用并获取了 GPU 名称：

int getDeviceName(std::string& gpuName)
{
//Setup the cuda context for hardware encoding with ffmpeg
NV_ENC_BUFFER_FORMAT eFormat = NV_ENC_BUFFER_FORMAT_IYUV;
int iGpu = 0;
CUresult res;
ck(cuInit(0));
int nGpu = 0;
ck(cuDeviceGetCount(&nGpu));
if (iGpu < 0 || iGpu >= nGpu)
{
    std::cout << "GPU ordinal out of range. Should be within [" << 0 << ", " 
<< nGpu - 1 << "]" << std::endl;
    return 1;
}
CUdevice cuDevice = 0;
ck(cuDeviceGet(&cuDevice, iGpu));
char szDeviceName[80];
ck(cuDeviceGetName(szDeviceName, sizeof(szDeviceName), cuDevice));
gpuName = szDeviceName;
epLog::msg(epMSG_STATUS, "epVideoEncode:H264Encoder", "...using device \"%s\"", szDeviceName);

return 0;
}

接下来您需要设置硬件设备和硬件框架上下文：

    getDeviceName(gpuName);
    ret = av_hwdevice_ctx_create(&m_avBufferRefDevice, AV_HWDEVICE_TYPE_CUDA, gpuName.c_str(), NULL, NULL);
    if (ret < 0) 
    {
        return -1;
    }

    //Example of casts needed to get down to the cuda context
    AVHWDeviceContext* hwDevContext = (AVHWDeviceContext*)(m_avBufferRefDevice->data);
    AVCUDADeviceContext* cudaDevCtx = (AVCUDADeviceContext*)(hwDevContext->hwctx);
    m_cuContext = &(cudaDevCtx->cuda_ctx);

    //Create the hwframe_context
    //  This is an abstraction of a cuda buffer for us. This enables us to, with one call, setup the cuda buffer and ready it for input
    m_avBufferRefFrame = av_hwframe_ctx_alloc(m_avBufferRefDevice);

    //Setup some values before initialization 
    AVHWFramesContext* frameCtxPtr = (AVHWFramesContext*)(m_avBufferRefFrame->data);
    frameCtxPtr->width = width;
    frameCtxPtr->height = height;
    frameCtxPtr->sw_format = AV_PIX_FMT_0BGR32; // There are only certain supported types here, we need to conform to these types
    frameCtxPtr->format = AV_PIX_FMT_CUDA;
    frameCtxPtr->device_ref = m_avBufferRefDevice;
    frameCtxPtr->device_ctx = (AVHWDeviceContext*)m_avBufferRefDevice->data;

    //Initialization - This must be done to actually allocate the cuda buffer. 
    //  NOTE: This call will only work for our input format if the FFMPEG library is >4.0 version..
    ret = av_hwframe_ctx_init(m_avBufferRefFrame);
    if (ret < 0) {
        return -1;
    }

    //Cast the OGL texture/buffer to cuda ptr
    CUresult res;
    CUcontext oldCtx;
    m_inputTexture = texture;
    res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
    res = cuCtxPushCurrent(*m_cuContext);
    res = cuGraphicsGLRegisterImage(&cuInpTexRes, m_inputTexture, GL_TEXTURE_2D, CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY);
    res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL

    //Assign some hardware accel specific data to AvCodecContext 
    c->hw_device_ctx = m_avBufferRefDevice;//This must be done BEFORE avcodec_open2()
    c->pix_fmt = AV_PIX_FMT_CUDA; //Since this is a cuda buffer, although its really opengl with a cuda ptr
    c->hw_frames_ctx = m_avBufferRefFrame;
    c->codec_type = AVMEDIA_TYPE_VIDEO;
    c->sw_pix_fmt = AV_PIX_FMT_0BGR32;

    // Setup some cuda stuff for memcpy-ing later
    m_memCpyStruct.srcXInBytes = 0;
    m_memCpyStruct.srcY = 0;
    m_memCpyStruct.srcMemoryType = CUmemorytype::CU_MEMORYTYPE_ARRAY;

    m_memCpyStruct.dstXInBytes = 0;
    m_memCpyStruct.dstY = 0;
    m_memCpyStruct.dstMemoryType = CUmemorytype::CU_MEMORYTYPE_DEVICE;

请记住，虽然上面做了很多工作，但显示的代码是标准软件编码代码的补充。确保也包括所有这些调用/对象初始化。

与软件版本不同，输入 AVFrame 对象所需的只是在 alloc 调用之后获取缓冲区：

// allocate RGB video frame buffer
    ret = av_hwframe_get_buffer(m_avBufferRefFrame, rgb_frame, 0);  // 0 is for flags, not used at the moment

注意它接受 hwframe_context 作为参数，这就是它如何知道在 GPU 上分配什么设备、大小、格式等。

调用对每一帧进行编码

现在我们已设置完毕，准备编码。在每次编码之前，我们需要将帧从纹理复制到 cuda 缓冲区。我们通过将 cuda 数组映射到纹理，然后将该数组复制到 cuDeviceptr（由上面的 av_hwframe_get_buffer 调用分配）来实现此目的：

//Perform cuda mem copy for input buffer
CUresult cuRes;
CUarray mappedArray;
CUcontext oldCtx;

//Get context
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
cuRes = cuCtxPushCurrent(*m_cuContext);

//Get Texture
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);

//Map texture to cuda array
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0); // Nvidia says its good practice to remap each iteration as OGL can move things around

//Release texture
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);

//Setup for memcopy
m_memCpyStruct.srcArray = mappedArray;
m_memCpyStruct.dstDevice = (CUdeviceptr)rgb_frame->data[0]; // Make sure to copy devptr as it could change, upon resize
m_memCpyStruct.dstPitch = rgb_frame->linesize[0];   // Linesize is generated by hwframe_context
m_memCpyStruct.WidthInBytes = rgb_frame->width * 4; //* 4 needed for each pixel
m_memCpyStruct.Height = rgb_frame->height;          //Vanilla height for frame

//Do memcpy
cuRes = cuMemcpy2D(&m_memCpyStruct); 

//release context
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL

现在我们只需调用 send_frame 就可以了！

        ret = avcodec_send_frame(c, rgb_frame);

注意：我省略了大部分代码，因为它不面向公众。我可能有一些细节不正确，这就是我能够理解过去一个月收集的所有数据的方式......请随时纠正任何不正确的内容。另外，有趣的是，在这一切过程中，我的计算机崩溃了，我丢失了所有最初的调查（我没有检查源代码控制的所有内容），其中包括我在互联网上找到的所有各种示例代码。因此，如果您看到某样东西是您的，请大声喊出来。这可以帮助其他人得出我得出的结论。

喊话

向 BtbN 大声喊叫：https://webchat.freenode.net/#ffmpeg，如果没有他们的帮助，我不会得到任何这些。

Answer 2

首先要检查的是它可能“不好”，但它运行得足够快吗？提高效率总是好的，但如果有效，就不要破坏它。

如果确实存在性能问题...

1 仅使用 FFMPEG 软件编码，无需硬件辅助。那么您只需从 GPU 复制到 CPU 一次。（如果视频编码器位于 GPU 上并且您通过 RTSP 发送数据包，则编码后会有第二个 GPU 到 CPU。）

2 寻找 NVIDIA（我假设这是您谈论 nvenc 时的 GPU）GL 纹理格式扩展和/或命令，这些扩展将在 GPU H264 上直接执行到 OpenGL 缓冲区的编码。

Answer 3

无需手动设置

device_ref

的

device_ctx

和/或

AVHWFramesContext

字段，因为它们已使用提供给

av_hwframe_ctx_alloc()

的参考进行设置。更重要的是，这里的完成方式打破了

AVBufferRef

类提供的引用计数。对这些的新引用应使用

av_buffer_ref()

。

令人惊讶的是，这可能是我设法找到的将 cuda 添加到编码管道的最有凝聚力的示例，因此即使该帖子已有 6 年历史，我也觉得有责任添加此评论。

OpenGL 到 FFMpeg 编码

问题描述投票：0回答：3

3个回答

概要

注意事项

全面性

代码详情

设置

调用对每一帧进行编码

喊话

最新问题

OpenGL 到 FFMpeg 编码

问题描述 投票：0回答：3

3个回答

概要

注意事项

全面性

代码详情

设置

调用对每一帧进行编码

喊话

最新问题

问题描述投票：0回答：3