在 C++ 中使用 Eigen 和 ViennaCL 库执行矩阵乘法运算时,我遇到性能问题。我正在比较在系统的集成 GPU 和 CPU 上执行这些操作的性能。
我的系统具有集成的 Intel GPU,我在第八代 Intel Core i5 上运行代码。令我惊讶的是,我发现使用 ViennaCL 在 GPU 上执行矩阵乘法大约需要 200 秒,而使用 Eigen 在 CPU 上执行则只需要大约 20 秒。
我对这种性能差异感到困惑,并想更好地了解其背后的原因。集成GPU在矩阵乘法运算方面的性能真的比CPU差吗?
#include <Eigen/Dense>
#include <chrono>
#include <iostream>
#include <viennacl/matrix.hpp>
int main() {
const int size = 1000; // Size of the matrices
// Creating two large matrices using ViennaCL
viennacl::matrix<float> matrix1_viennacl(size, size);
viennacl::matrix<float> matrix2_viennacl(size, size);
// Initializing the matrices with random values
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
matrix1_viennacl(i, j) = rand() / static_cast<float>(RAND_MAX);
matrix2_viennacl(i, j) = rand() / static_cast<float>(RAND_MAX);
}
}
// Performing intensive computation with the matrices using ViennaCL and measuring the
// execution time
auto start_viennacl = std::chrono::steady_clock::now();
for (int i = 0; i < 100; ++i) {
// Performing a matrix-matrix multiplication operation with ViennaCL
viennacl::matrix<float> result_viennacl =
viennacl::linalg::prod(matrix1_viennacl, matrix2_viennacl);
}
auto end_viennacl = std::chrono::steady_clock::now();
std::chrono::duration<double> time_viennacl = end_viennacl - start_viennacl;
// Printing the execution time with ViennaCL
std::cout << "Execution time with ViennaCL: " << time_viennacl.count()
<< " seconds" << std::endl;
// Creating two large matrices using Eigen
Eigen::MatrixXf matrix1_eigen(size, size);
Eigen::MatrixXf matrix2_eigen(size, size);
// Initializing the matrices with the same random values
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
matrix1_eigen(i, j) = matrix1_viennacl(i, j);
matrix2_eigen(i, j) = matrix2_viennacl(i, j);
}
}
// Performing intensive computation with the matrices using Eigen and measuring the
// execution time
auto start_eigen = std::chrono::steady_clock::now();
for (int i = 0; i < 100; ++i) {
// Performing a matrix-matrix multiplication operation with Eigen
Eigen::MatrixXf result_eigen = matrix1_eigen * matrix2_eigen;
}
auto end_eigen = std::chrono::steady_clock::now();
std::chrono::duration<double> time_eigen = end_eigen - start_eigen;
// Printing the execution time with Eigen
std::cout << "Execution time with Eigen: " << time_eigen.count()
<< " seconds" << std::endl;
return 0;
}
我使用预制
workspace "Project"
configurations { "Debug", "Release" }
location "build"
project "Project"
kind "ConsoleApp"
language "C++"
targetdir "build/bin/%{cfg.buildcfg}"
objdir "build/obj/%{cfg.buildcfg}"
files { "src/*.cpp", "include/*.hpp" }
includedirs { "include", "vendor/*" }
filter "configurations:Debug"
symbols "On"
optimize "On"
filter "configurations:Release"
symbols "Off"
optimize "On"
filter {}
@:~/repos/cpp-projct$ tree -L 1
.
├── build
├── cr.sh
├── include
├── premake.lua
├── src
└── vendor (eigen and viennaCL here, just $ wget and $ tar)
@:~/repos/cpp-lkdmw$ bash cr.sh (compile and run)
Execution time with ViennaCL: 78.3986 seconds
Execution time with Eigen: 7.76729 seconds
感谢 Ted Lyngmo,我记得我应该在不使用编译器优化的情况下进行基准测试,特别是因为我没有在代码中的任何地方使用矩阵结果,所以肯定会发生一些重大的优化,特别是对于 Eigen,它不处理图形处理器。那么基本上线程中的问题就这样解决了。答案是关闭编译器优化。为了提供更丰富的答案,我将在此处粘贴一些执行,显示测试运行,现在清楚地显示 Eigen 扩展的时间成本比 viennaCL 多得多。
矩阵尺寸 | ViennaCL 执行时间(秒) | 特征执行时间(秒) |
---|---|---|
10 | 0.370761 | 0.00420438 |
100 | 1.91319 | 1.60117 |
200 | 14.0389 | 12.7632 |
250 | 14.2478 | 25.0458 |
400 | 72.917 | 100.017 |