为什么这个程序在Linux上比在Windows上慢很多?

问题描述 投票:0回答:1

我尝试在 Linux 上运行 此代码,但我发现它在 Linux 上的运行速度比在 Windows 上慢 2 倍 ~ 3 倍。使用其官方输入示例

data/BS_1000_torus.xyz
,在 Windows 上需要约 20 秒,但在 Linux 上需要约 60 秒。我正在尝试弄清楚如何设置编译,以便在 Linux 上运行时具有相同的性能。让我详细解释一下。

与性能无关的更改(在两个平台上进行):

首先,我做了以下更改(这应该不会影响性能):

  • int main()
     中的 
    int main(int argc, char* argv[])
     更改为 
    MAIN/main.cpp
  • modelpath
     中的 
    modelname
    string modelpath(argv[1]); string modelname(argv[2]);
     定义更改为 
    MAIN/main.cpp
  • cout << OpenmpCnt <<" "<< neighboor[ii].size()<< '\n';
    中的
    if (OpenmpCnt % 1 == 0) cout << OpenmpCnt <<" "<< neighboor[ii].size()<< '\n';
    更改为
    MAIN/myRPD.hpp

在 Windows 上:

我按照README中的确切步骤使用vcpkg和VS2022编译项目。运行

BS_1000_torus
模型的输出是:

C:\Users\Admin\3dlab\GCNO-master\build\MAIN\Release>MAIN.exe ..\..\..\data\ BS_1000_torus
BS_1000_torus
..\..\..\data\BS_1000_torus.xyz
Read point cloud.
make Regular_triangulation .
1000
1 24
maxwd:   1.0544
minwd:   -0.0224343
Compute 3D Voronoi DONE>>
====================WindingNumLBFGSTest
Total Query points : 27358 After QuChong n: 7387
Start opt...
0       0       0.467   34.8834 -1874.41
1       1       0.93    30.8527 -1901.45
2       5       2.229   272.353 -4946.91
3       12      4.068   286.53  -4950.46
4       15      4.872   272.801 -9440.19
5       18      5.66    343.159 -10486.8
6       19      5.923   205.406 -12655.6
7       20      6.177   142.322 -13870.6
8       21      6.442   88.3575 -14319.2
9       22      6.711   88.2494 -14543.1
10      23      6.977   40.6635 -14694.6
11      24      7.236   27.3445 -14749.8
12      25      7.501   37.5665 -14794.3
13      27      8.023   43.3708 -14805.6
14      28      8.29    22.8125 -14839
15      29      8.554   12.3813 -14858.5
16      30      8.828   12.7212 -14868.1
17      31      9.083   10.5348 -14873.1
18      32      9.353   10.6349 -14879.8
19      33      9.616   13.8896 -14885.9
20      34      9.883   11.9295 -14890.9
21      35      10.148  6.47527 -14895.3
22      36      10.401  5.40127 -14897.1
23      37      10.671  4.67613 -14898.3
24      38      10.93   4.81827 -14899.9
25      39      11.196  4.73649 -14900.6
26      40      11.462  3.04003 -14901.3
27      41      11.731  2.97842 -14902.1
28      42      11.987  3.96565 -14902.9
29      43      12.253  5.25759 -14903.3
30      44      12.534  2.75646 -14904
31      45      12.811  3.03649 -14904.6
32      46      13.078  3.65436 -14905.3
33      47      13.365  5.22664 -14906
34      48      13.639  4.31341 -14906.3
35      49      13.903  2.22393 -14906.8
36      50      14.167  1.87306 -14907.1
37      51      14.439  1.705   -14907.4
38      53      14.959  3.01899 -14907.4
39      54      15.222  1.1483  -14907.6
40      55      15.498  1.17605 -14907.7
41      56      15.764  1.90729 -14908
42      57      16.015  2.48752 -14908.3
43      59      16.543  3.24821 -14908.4
44      60      16.82   1.45012 -14908.6
45      61      17.087  0.917486        -14908.7
reach the gradient tolerance
Time: 21.544
successful!

在 Linux 上

在 Linux 上,我首先进行了以下精确更改以成功编译它:

  • 删除了
    CMakeLists.txt
  • 中的 vcpkg 工具链规范
  • set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -fopenmp -pthread -Ofast")
     之后添加了 
    CMakeLists.txt
    set(CMAKE_BUILD_TYPE RELEASE)
  • 将所有 2 次出现的
    isinf
    更改为
    std::isinf
  • 将 7 次出现的
    #include<io.h>
    更改为
    #include<sys/io.h>
  • 将 3 次出现的
    #include<Eigen\dense>
    更改为
    #include<Eigen/Dense>
  • 由于与
    gamma
     中的另一个 
    MyRPD.hpp
     名称冲突,将 
    gamma_in_myrpd
     中出现的所有 2 次 
    gamma
     更改为 
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:241:1

然后我使用cmake和make来编译它:

mkdir build
cd build
cmake ..
make -j

cmake 的输出:

-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- 3.3.9
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found version "1.71.0")  
-- BOOST FOUNDED
-- Using header-only CGAL
-- Targeting Unix Makefiles
-- Using /usr/bin/c++ compiler.
-- Found GMP: /usr/lib/x86_64-linux-gnu/libgmp.so  
-- Found MPFR: /usr/lib/x86_64-linux-gnu/libmpfr.so  
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found suitable version "1.71.0", minimum required is "1.66")  
-- Boost include dirs: /usr/include
-- Boost libraries:    
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Using gcc version 4 or later. Adding -frounding-math
-- Build type: RELEASE
-- USING CXXFLAGS = ' -mavx2 -fopenmp -pthread -Ofast -O3 -DNDEBUG'
-- USING EXEFLAGS = ' '
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Configuring done (1.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/3dlab/GCNO-master/build

程序的输出:

user@hostname:~/3dlab/GCNO-master/build/MAIN$ time ./MAIN ../../data/ BS_1000_torus
BS_1000_torus
../../data/BS_1000_torus.xyz
Read point cloud.
make Regular_triangulation .
1000
1 24
maxwd:   1.0544
minwd:   -0.0224343
Compute 3D Voronoi DONE>> 
====================WindingNumLBFGSTest
Total Query points : 27358 After QuChong n: 7387
Start opt...
0       0       22.0036 38.5047 -1850.45
1       1       43.0174 32.7596 -1883.99
2       7       169.859 267.366 -4697.31
3       9       212.268 325.436 -6697.75
4       14      336.544 365.834 -8079.5
5       15      366.901 272.077 -11049.1
6       17      409.494 218.65  -12689.6
7       18      439     164.219 -13535.9
8       19      468.766 87.8456 -14130.7
9       20      498.674 83.124  -14445.5
10      21      520.337 77.9374 -14609.5
11      22      541.77  42.1958 -14759
12      23      563.054 25.4194 -14813.8
13      24      583.856 19.4954 -14834.5
14      25      605.325 22.7484 -14857.1
15      26      626.445 15.3582 -14870.6
16      27      647.956 11.2956 -14881.5
17      28      669.702 10.6789 -14887.7
18      29      690.506 11.0623 -14894
19      30      712.203 10.2341 -14898.2
20      31      733.47  6.42271 -14901.3
21      32      754.479 4.74715 -14903.1
22      33      776.028 3.54571 -14904.3
23      34      797.442 4.74518 -14905
24      35      818.401 2.57567 -14905.5
25      36      839.537 2.52228 -14905.8
26      37      860.522 3.10824 -14906.4
27      39      903.594 3.77916 -14906.8
28      40      925.124 2.3472  -14907.2
29      41      946.461 1.73657 -14907.6
30      42      967.828 2.35503 -14907.7
31      43      989.876 1.32562 -14907.9
32      44      1011.1  1.20013 -14908
33      45      1032.04 1.58659 -14908.1
34      46      1053.61 1.68829 -14908.2
35      47      1075.17 1.06202 -14908.4
36      48      1097.96 1.03016 -14908.4
37      49      1120.53 1.20543 -14908.5
38      51      1163.89 1.33721 -14908.5
39      52      1186.12 0.958506        -14908.6
reach the gradient tolerance
Time: 1191.18
successful!

real    1m4.763s
user    19m50.042s
sys     0m1.148s

系统规格:

  • 两项测试均在同一台计算机(双启动)上进行,配备 Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz(10 个内核,每个内核 2 个线程)。我将
    omp_set_num_threads(20);
    设置在
    int main
    的开头。
  • Windows系统:Windows 10
  • Linux系统:5.15.0-88-generic#98~20.04.1-Ubuntu

问题:

  • 程序输出包括 5 列,第一列是迭代次数,第三列是使用
    const clock_t time = clock();
    计算的运行时间。为什么 Windows 和 Linux 上的差异如此之大?我知道这是CPU时间,而不是挂钟时间。但为什么它依赖于平台呢?
  • 即使我在 Linux 上启用了所有优化标志,为什么挂钟时间也如此不同(Windows 上为 20 秒,Linux 上为 60 秒)?
  • 如何设置编译使其在 Linux 上与 Windows 上一样快?
c++ linux windows performance compiler-optimization
1个回答
-2
投票

我不确定你在Windows上运行的程序是如何编译的,但在Linux上你在调试中编译,如果你想让它更快,你应该更改为发布。

© www.soinside.com 2019 - 2024. All rights reserved.