opencl的执行时间与什么有关?

问题描述 投票:0回答:1

显示队列、提交、开始、结束的时间函数如下:

void PrintProfilingInfo(cl_event event)
{
    cl_int err_num = -1;

    cl_ulong t_queued;
    cl_ulong t_submitted;
    cl_ulong t_started;
    cl_ulong t_ended;
    cl_ulong t_completed;

    err_num = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_QUEUED,
                                      sizeof(cl_ulong), &t_queued, NULL);

    // submit time
    err_num = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_SUBMIT,
                                      sizeof(cl_ulong), &t_submitted, NULL);

    // start time
    err_num = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,
                                      sizeof(cl_ulong), &t_started, NULL);

    // end time
    err_num = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END,
                                      sizeof(cl_ulong), &t_ended, NULL);

    // complete time
    err_num = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_COMPLETE,
                                      sizeof(cl_ulong), &t_completed, NULL);

    printf("queue  -> submit : %fus\n", (t_submitted - t_queued) * 1e-3);
    printf("submit -> start  : %fus\n", (t_started - t_submitted) * 1e-3);
    printf("start  -> end    : %fus\n", (t_ended - t_started) * 1e-3);
    printf("end    -> finish : %fus\n", (t_completed - t_ended) * 1e-3);
}

显示总执行时间的函数如下:

timeval t_start;
long long time_diff;
timeval end;

gettimeofday(&t_start, NULL);

err_code = clEnqueueNDRangeKernel(cl_cmd_queue,
                                    kernel,
                                    2,
                                    NULL,
                                    global_work_size,
                                    local_work_size,
                                    0,
                                    NULL,
                                    &kernel_event);
err_code = clWaitForEvents(1, &kernel_event);

gettimeofday(&end, NULL);
time_diff = 1000000 * (end.tv_sec - start.tv_sec) + end.tv_usec - start.tv_usec;
printf(" ==>> OpenCL Gaussian Blur average cost: %lld us\n", func_name, time_diff);

执行结果如下:

我的问题是:

  1. 为什么总执行时间这么长?
  2. 为什么设备上所有时间的总和不等于总执行时间?那么,排队、提交、开始、结束、完成的时间主要花在哪里呢?

我看了官方文档,好像没有详细说明队列提交时间段做了什么事情。

c arm opencl
1个回答
0
投票

对于范围较小的内核,运行时的主要部分可能不是内核运行时本身,而是 PCIe 数据传输。如果您在内核调用之前有一个非阻塞 CPU->GPU 内存副本,那么也会使用您的时钟进行测量。为了避免这种情况,请在时钟开始之前添加

clFinishQueue

非阻塞队列提交基本上具有瞬时运行时间。只有阻塞的

clFinishQueue
clWaitForEvents
告诉您排队内存复制和/或内核的运行时间。

© www.soinside.com 2019 - 2024. All rights reserved.