Speed tests¶
The scripts used for these tests can be found in \tests\speed\
The following computers have been used:
Note
The tests here were reduced in the number of rays/samples as compared to real calculations to let them run reasonably quickly. Longer calculations would demonstrate yet bigger difference between the slowest and the fastest cases, as the overheads (job distribution, collecting of histograms and plotting) would become relatively less important.
The tables below show execution times in seconds. Some cells have two values: for Python 2 and for Python 3.
Multithreading and multiprocessing in ray tracing¶
\tests\speed\1_SourceZCrystalThetaAlpha_speed.py
.\examples\withRaycing\07_AnalyzerBent2D\01BD_SourceZCrystalThetaAlpha.py
.This script calculates diffraction of a 2D-bent diced crystal anayzer from three types of geometric source.
system |
1 |
multithreading |
multiprocessing |
|||
---|---|---|---|---|---|---|
2 |
4 |
2 |
4 |
|||
Windows 10 64 bit |
510.2, 539.2 |
306.0, 318.5 |
220.1, 220.6 |
458.8, 573.4 |
338.8, 441.2 |
|
Ubuntu 16.04 64 bit |
436.1, 446.7 |
257.2, 263.3 |
183.3, 188.6 |
246.5, 250.0 |
157.1, 158.7 |
|
Windows 10 64 bit |
359.9, 546.9 |
290.8, 292.0 |
218.7, 219.7 |
362.2, 436.4 |
290.8, 363.6 |
|
Ubuntu 16.10 64 bit |
293.3, 293.1 |
172.3, 173.6 |
169.5, 168.3 |
173.3, 172.4 |
156.1, 154.6 |
OpenCL performance with Undulator source¶
\tests\speed\2_synchrotronSources_speed.py
.\examples\withRaycing\01_SynchrotronSources\synchrotronSources.py
.This script calculates characteristics of an undulator source at energies around one harmonic.
system |
no OpenCL |
OpenCL on CPU |
OpenCL on GPU |
|
---|---|---|---|---|
Windows 10 64 bit |
1471 1385 |
36.0 34.1 |
25.7 23.9 |
|
Ubuntu 16.04 64 bit |
950 950 |
34.6 35.4 |
20.6 21.0 |
|
Windows 10 64 bit |
1801 1909 |
61.0 60.3 |
126 123 |
|
Ubuntu 16.10 64 bit |
1245 1255 |
57.6 60.2 |
122 127 |
|
local |
30.0 |
|||
ZMQ 1Gb |
182.9 |
OpenCL performance with wave propagation¶
\tests\speed\3_Softi_CXIw2D_speed.py
.\examples\withRaycing\14_SoftiMAX\Softi_CXIw2D.py
.This script calculates several consecutive wave propagation integrals from the source down to the focus at the sample. Here, each wave is represented by 2·105 samples and thus each integral considers 4·1010 scattering paths. Such calculations are impossible in numpy and have to be carried out with the help of OpenCL. Even with OpenCL, these calculations are not feasible on low end graphics cards and therefore are not exemplified here on a laptop. Notice also that for the real example a larger number of samples, > 106 (i.e. > 1012 scattering paths), should be opted for better convergence.
system |
OpenCL on CPU |
OpenCL on GPU |
|
---|---|---|---|
Windows 10 64 bit |
637 635 |
76.8 76.4 |
|
Ubuntu 16.04 64 bit |
602 605 |
71.1 71.0 |
|
1×K80 |
196 |
||
4×K80 |
76.5 |
||
1×P100 |
53.0 |
||
4×P100 |
25.8 |
||
Xeon E5-2650 v3 |
321 |
||
Xeon E5-2650 v4 |
251 |
||
Xeon Gold 6130 |
162 |
||
1×A100 |
17.5 |
||
2×A100 |
11.5 |
||
local |
40.4 |
||
ZMQ 1Gb |
48.4 |
Summary¶
Except for the case of computing with OpenCL on GPU, calculations in Linux are usually significantly faster than in Windows. Especially when using multithreading or multiprocessing, the execution in Linux is dramatically faster.
There is no significant difference in speed between Python 2 and Python 3, except for multiprocessing in Windows, where Python 2 performs better.
For geometric ray tracing a decent laptop can be a reasonable choice.