Speed tests¶
The scripts used for these tests can be found in \tests\speed\
The following computers have been used:
[1] | (1, 2, 3) A workstation with Intel Core i7-7700K @4.20 GHz×4(8), 16 GB and with AMD FirePro W9100 GPU. Windows 10 with Python 2.7.10 64 bit and 3.6.1 64 bit, Ubuntu 16.04 LTS with Python 2.7.12 64 bit and 3.5.2 64 bit. |
[2] | (1, 2) An ASUS UX430UQ laptop with Intel Core i7-7500U CPU @ 2.70 GHz×2(4), 16 GB and with NVIDIA GeForce 940MX GPU. Windows 10 with Python 2.7.10 64 bit and 3.6.1 64 bit, Ubuntu 16.10 with Python 2.7.12 64 bit and 3.5.2 64 bit. |
[3] | A DELL GPU node with 2 Nvidia Tesla K80 GPUs (double-chip each), CentOS 7 – run by Zdeněk Matěj (MAX IV) |
[4] | A DELL GPU node with 4 Nvidia Tesla P100 GPUs, CentOS 7 – run by Zdeněk Matěj (MAX IV) |
[5] | A CPU node with Intel Xeon E5-2650 v3 @ 2.30GHz×20(40) – run by Zdeněk Matěj (MAX IV) |
[6] | A CPU node with Intel Xeon E5-2650 v4 @ 2.20GHz×24(48) – run by Zdeněk Matěj (MAX IV) |
[7] | A CPU node with Intel Xeon Gold 6130 @ 2.10GHz×32(64) – run by Zdeněk Matěj (MAX IV) |
Note
The tests here were reduced in the number of rays/samples as compared to real calculations to let them run reasonably quickly. Longer calculations would demonstrate yet bigger difference between the slowest and the fastest cases, as the overheads (job distribution, collecting of histograms and plotting) would become relatively less important.
The tables below show execution times in seconds. Some cells have two values: for Python 2 and for Python 3.
Multithreading and multiprocessing in ray tracing¶
\tests\speed\1_SourceZCrystalThetaAlpha_speed.py
.\examples\withRaycing\07_AnalyzerBent2D\01BD_SourceZCrystalThetaAlpha.py
.This script calculates diffraction of a 2D-bent diced crystal anayzer from three types of geometric source.
system | 1 | multithreading | multiprocessing | |||
---|---|---|---|---|---|---|
2 | 4 | 2 | 4 | |||
[1] | Windows 10 64 bit | 510.2, 539.2 | 306.0, 318.5 | 220.1, 220.6 | 458.8, 573.4 | 338.8, 441.2 |
Ubuntu 16.04 64 bit | 436.1, 446.7 | 257.2, 263.3 | 183.3, 188.6 | 246.5, 250.0 | 157.1, 158.7 | |
[2] | Windows 10 64 bit | 359.9, 546.9 | 290.8, 292.0 | 218.7, 219.7 | 362.2, 436.4 | 290.8, 363.6 |
Ubuntu 16.10 64 bit | 293.3, 293.1 | 172.3, 173.6 | 169.5, 168.3 | 173.3, 172.4 | 156.1, 154.6 |
OpenCL performance with Undulator source¶
\tests\speed\2_synchrotronSources_speed.py
.\examples\withRaycing\01_SynchrotronSources\synchrotronSources.py
.This script calculates characteristics of an undulator source at energies around one harmonic.
system | no OpenCL | OpenCL on CPU | OpenCL on GPU | |
---|---|---|---|---|
[1] | Windows 10 64 bit | 1471 1385 | 36.0 34.1 | 25.7 23.9 |
Ubuntu 16.04 64 bit | 950 950 | 34.6 35.4 | 20.6 21.0 | |
[2] | Windows 10 64 bit | 1801 1909 | 61.0 60.3 | 126 123 |
Ubuntu 16.10 64 bit | 1245 1255 | 57.6 60.2 | 122 127 |
OpenCL performance with wave propagation¶
\tests\speed\3_Softi_CXIw2D_speed.py
.\examples\withRaycing\14_SoftiMAX\Softi_CXIw2D.py
.This script calculates several consecutive wave propagation integrals from the source down to the focus at the sample. Here, each wave is represented by 2·105 samples and thus each integral considers 4·1010 scattering paths. Such calculations are impossible in numpy and have to be carried out with the help of OpenCL. Even with OpenCL, these calculations are not feasible on low end graphics cards and therefore are not exemplified here on a laptop. Notice also that for the real example a larger number of samples, > 1·106, should be opted for better convergence.
system | OpenCL on CPU | OpenCL on GPU | |
---|---|---|---|
[1] | Windows 10 64 bit | 637 635 | 76.8 76.4 |
Ubuntu 16.04 64 bit | 602 605 | 71.1 71.0 | |
[3] | 1×K80 | 196 | |
4×K80 | 76.5 | ||
[4] | 1×P100 | 53.0 | |
4×P100 | 25.8 | ||
[5] | Xeon E5-2650 v3 | 321 | |
[6] | Xeon E5-2650 v4 | 251 | |
[7] | Xeon Gold 6130 | 162 |
Summary¶
- Except for the case of computing with OpenCL on GPU, calculations in Linux are usually significantly faster than in Windows. Especially when using multithreading or multiprocessing, the execution in Linux is dramatically faster.
- There is no significant difference in speed between Python 2 and Python 3, except for multiprocessing in Windows, where Python 2 performs better.
- For geometric ray tracing a decent laptop can be a reasonable choice.