我尝试在 Linux 上运行 此代码,但我发现它在 Linux 上的运行速度比在 Windows 上慢 2 倍 ~ 3 倍。使用其官方输入示例
data/BS_1000_torus.xyz
,在 Windows 上需要约 20 秒,但在 Linux 上需要约 60 秒。我正在尝试弄清楚如何设置编译,以便在 Linux 上运行时具有相同的性能。让我详细解释一下。
首先,我做了以下更改(这应该不会影响性能):
int main()
中的
int main(int argc, char* argv[])
更改为
MAIN/main.cpp
modelpath
中的
modelname
和
string modelpath(argv[1]); string modelname(argv[2]);
定义更改为
MAIN/main.cpp
cout << OpenmpCnt <<" "<< neighboor[ii].size()<< '\n';
中的 if (OpenmpCnt % 1 == 0) cout << OpenmpCnt <<" "<< neighboor[ii].size()<< '\n';
更改为 MAIN/myRPD.hpp
。我按照README中的确切步骤使用vcpkg和VS2022编译项目。运行
BS_1000_torus
模型的输出是:
C:\Users\Admin\3dlab\GCNO-master\build\MAIN\Release>MAIN.exe ..\..\..\data\ BS_1000_torus
BS_1000_torus
..\..\..\data\BS_1000_torus.xyz
Read point cloud.
make Regular_triangulation .
1000
1 24
maxwd: 1.0544
minwd: -0.0224343
Compute 3D Voronoi DONE>>
====================WindingNumLBFGSTest
Total Query points : 27358 After QuChong n: 7387
Start opt...
0 0 0.467 34.8834 -1874.41
1 1 0.93 30.8527 -1901.45
2 5 2.229 272.353 -4946.91
3 12 4.068 286.53 -4950.46
4 15 4.872 272.801 -9440.19
5 18 5.66 343.159 -10486.8
6 19 5.923 205.406 -12655.6
7 20 6.177 142.322 -13870.6
8 21 6.442 88.3575 -14319.2
9 22 6.711 88.2494 -14543.1
10 23 6.977 40.6635 -14694.6
11 24 7.236 27.3445 -14749.8
12 25 7.501 37.5665 -14794.3
13 27 8.023 43.3708 -14805.6
14 28 8.29 22.8125 -14839
15 29 8.554 12.3813 -14858.5
16 30 8.828 12.7212 -14868.1
17 31 9.083 10.5348 -14873.1
18 32 9.353 10.6349 -14879.8
19 33 9.616 13.8896 -14885.9
20 34 9.883 11.9295 -14890.9
21 35 10.148 6.47527 -14895.3
22 36 10.401 5.40127 -14897.1
23 37 10.671 4.67613 -14898.3
24 38 10.93 4.81827 -14899.9
25 39 11.196 4.73649 -14900.6
26 40 11.462 3.04003 -14901.3
27 41 11.731 2.97842 -14902.1
28 42 11.987 3.96565 -14902.9
29 43 12.253 5.25759 -14903.3
30 44 12.534 2.75646 -14904
31 45 12.811 3.03649 -14904.6
32 46 13.078 3.65436 -14905.3
33 47 13.365 5.22664 -14906
34 48 13.639 4.31341 -14906.3
35 49 13.903 2.22393 -14906.8
36 50 14.167 1.87306 -14907.1
37 51 14.439 1.705 -14907.4
38 53 14.959 3.01899 -14907.4
39 54 15.222 1.1483 -14907.6
40 55 15.498 1.17605 -14907.7
41 56 15.764 1.90729 -14908
42 57 16.015 2.48752 -14908.3
43 59 16.543 3.24821 -14908.4
44 60 16.82 1.45012 -14908.6
45 61 17.087 0.917486 -14908.7
reach the gradient tolerance
Time: 21.544
successful!
在 Linux 上,我首先进行了以下精确更改以成功编译它:
CMakeLists.txt
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -fopenmp -pthread -Ofast")
之后添加了
CMakeLists.txt
到
set(CMAKE_BUILD_TYPE RELEASE)
isinf
更改为 std::isinf
#include<io.h>
更改为 #include<sys/io.h>
#include<Eigen\dense>
更改为 #include<Eigen/Dense>
gamma
中的另一个
MyRPD.hpp
名称冲突,将
gamma_in_myrpd
中出现的所有 2 次
gamma
更改为
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:241:1
然后我使用cmake和make来编译它:
mkdir build
cd build
cmake ..
make -j
cmake 的输出:
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- 3.3.9
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found version "1.71.0")
-- BOOST FOUNDED
-- Using header-only CGAL
-- Targeting Unix Makefiles
-- Using /usr/bin/c++ compiler.
-- Found GMP: /usr/lib/x86_64-linux-gnu/libgmp.so
-- Found MPFR: /usr/lib/x86_64-linux-gnu/libmpfr.so
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found suitable version "1.71.0", minimum required is "1.66")
-- Boost include dirs: /usr/include
-- Boost libraries:
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Using gcc version 4 or later. Adding -frounding-math
-- Build type: RELEASE
-- USING CXXFLAGS = ' -mavx2 -fopenmp -pthread -Ofast -O3 -DNDEBUG'
-- USING EXEFLAGS = ' '
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Configuring done (1.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/3dlab/GCNO-master/build
程序的输出:
user@hostname:~/3dlab/GCNO-master/build/MAIN$ time ./MAIN ../../data/ BS_1000_torus
BS_1000_torus
../../data/BS_1000_torus.xyz
Read point cloud.
make Regular_triangulation .
1000
1 24
maxwd: 1.0544
minwd: -0.0224343
Compute 3D Voronoi DONE>>
====================WindingNumLBFGSTest
Total Query points : 27358 After QuChong n: 7387
Start opt...
0 0 22.0036 38.5047 -1850.45
1 1 43.0174 32.7596 -1883.99
2 7 169.859 267.366 -4697.31
3 9 212.268 325.436 -6697.75
4 14 336.544 365.834 -8079.5
5 15 366.901 272.077 -11049.1
6 17 409.494 218.65 -12689.6
7 18 439 164.219 -13535.9
8 19 468.766 87.8456 -14130.7
9 20 498.674 83.124 -14445.5
10 21 520.337 77.9374 -14609.5
11 22 541.77 42.1958 -14759
12 23 563.054 25.4194 -14813.8
13 24 583.856 19.4954 -14834.5
14 25 605.325 22.7484 -14857.1
15 26 626.445 15.3582 -14870.6
16 27 647.956 11.2956 -14881.5
17 28 669.702 10.6789 -14887.7
18 29 690.506 11.0623 -14894
19 30 712.203 10.2341 -14898.2
20 31 733.47 6.42271 -14901.3
21 32 754.479 4.74715 -14903.1
22 33 776.028 3.54571 -14904.3
23 34 797.442 4.74518 -14905
24 35 818.401 2.57567 -14905.5
25 36 839.537 2.52228 -14905.8
26 37 860.522 3.10824 -14906.4
27 39 903.594 3.77916 -14906.8
28 40 925.124 2.3472 -14907.2
29 41 946.461 1.73657 -14907.6
30 42 967.828 2.35503 -14907.7
31 43 989.876 1.32562 -14907.9
32 44 1011.1 1.20013 -14908
33 45 1032.04 1.58659 -14908.1
34 46 1053.61 1.68829 -14908.2
35 47 1075.17 1.06202 -14908.4
36 48 1097.96 1.03016 -14908.4
37 49 1120.53 1.20543 -14908.5
38 51 1163.89 1.33721 -14908.5
39 52 1186.12 0.958506 -14908.6
reach the gradient tolerance
Time: 1191.18
successful!
real 1m4.763s
user 19m50.042s
sys 0m1.148s
omp_set_num_threads(20);
设置在 int main
的开头。const clock_t time = clock();
计算的运行时间。为什么 Windows 和 Linux 上的差异如此之大?我知道这是CPU时间,而不是挂钟时间。但为什么它依赖于平台呢?我不确定你在Windows上运行的程序是如何编译的,但在Linux上你在调试中编译,如果你想让它更快,你应该更改为发布。