# Speed tests¶

The scripts used for these tests can be found in \tests\speed\

The following computers have been used:

 [1] (1, 2, 3) A workstation with Intel Core i7-7700K @4.20 GHz×4(8), 16 GB and with AMD FirePro W9100 GPU. Windows 10 with Python 2.7.10 64 bit and 3.6.1 64 bit, Ubuntu 16.04 LTS with Python 2.7.12 64 bit and 3.5.2 64 bit.
 [2] (1, 2) An ASUS UX430UQ laptop with Intel Core i7-7500U CPU @ 2.70 GHz×2(4), 16 GB and with NVIDIA GeForce 940MX GPU. Windows 10 with Python 2.7.10 64 bit and 3.6.1 64 bit, Ubuntu 16.10 with Python 2.7.12 64 bit and 3.5.2 64 bit.
 [3] A DELL GPU node with 2 Nvidia Tesla K80 GPUs (double-chip each), CentOS 7 – run by Zdeněk Matěj (MAX IV)
 [4] A DELL GPU node with 4 Nvidia Tesla P100 GPUs, CentOS 7 – run by Zdeněk Matěj (MAX IV)
 [5] A CPU node with Intel Xeon E5-2650 v3 @ 2.30GHz×20(40) – run by Zdeněk Matěj (MAX IV)
 [6] A CPU node with Intel Xeon E5-2650 v4 @ 2.20GHz×24(48) – run by Zdeněk Matěj (MAX IV)
 [7] A CPU node with Intel Xeon Gold 6130 @ 2.10GHz×32(64) – run by Zdeněk Matěj (MAX IV)

Note

The tests here were reduced in the number of rays/samples as compared to real calculations to let them run reasonably quickly. Longer calculations would demonstrate yet bigger difference between the slowest and the fastest cases, as the overheads (job distribution, collecting of histograms and plotting) would become relatively less important.

The tables below show execution times in seconds. Some cells have two values: for Python 2 and for Python 3.

## Multithreading and multiprocessing in ray tracing¶

Script: \tests\speed\1_SourceZCrystalThetaAlpha_speed.py.
The test is based on the example \examples\withRaycing\07_AnalyzerBent2D\01BD_SourceZCrystalThetaAlpha.py.

This script calculates diffraction of a 2D-bent diced crystal anayzer from three types of geometric source.

2 4 2 4
[1] Windows 10 64 bit 510.2, 539.2 306.0, 318.5 220.1, 220.6 458.8, 573.4 338.8, 441.2
Ubuntu 16.04 64 bit 436.1, 446.7 257.2, 263.3 183.3, 188.6 246.5, 250.0 157.1, 158.7
[2] Windows 10 64 bit 359.9, 546.9 290.8, 292.0 218.7, 219.7 362.2, 436.4 290.8, 363.6
Ubuntu 16.10 64 bit 293.3, 293.1 172.3, 173.6 169.5, 168.3 173.3, 172.4 156.1, 154.6

## OpenCL performance with Undulator source¶

Script: \tests\speed\2_synchrotronSources_speed.py.
The test is based on the example \examples\withRaycing\01_SynchrotronSources\synchrotronSources.py.

This script calculates characteristics of an undulator source at energies around one harmonic.

system no OpenCL OpenCL on CPU OpenCL on GPU
[1] Windows 10 64 bit 1471 1385 36.0 34.1 25.7 23.9
Ubuntu 16.04 64 bit 950 950 34.6 35.4 20.6 21.0
[2] Windows 10 64 bit 1801 1909 61.0 60.3 126 123
Ubuntu 16.10 64 bit 1245 1255 57.6 60.2 122 127

## OpenCL performance with wave propagation¶

Script: \tests\speed\3_Softi_CXIw2D_speed.py.
The test is based on the example \examples\withRaycing\14_SoftiMAX\Softi_CXIw2D.py.

This script calculates several consecutive wave propagation integrals from the source down to the focus at the sample. Here, each wave is represented by 2·105 samples and thus each integral considers 4·1010 scattering paths. Such calculations are impossible in numpy and have to be carried out with the help of OpenCL. Even with OpenCL, these calculations are not feasible on low end graphics cards and therefore are not exemplified here on a laptop. Notice also that for the real example a larger number of samples, > 1·106, should be opted for better convergence.

system OpenCL on CPU OpenCL on GPU
[1] Windows 10 64 bit 637 635 76.8 76.4
Ubuntu 16.04 64 bit 602 605 71.1 71.0
[3] 1×K80   196
4×K80   76.5
[4] 1×P100   53.0
4×P100   25.8
[5] Xeon E5-2650 v3 321
[6] Xeon E5-2650 v4 251
[7] Xeon Gold 6130 162

## Summary¶

• Except for the case of computing with OpenCL on GPU, calculations in Linux are usually significantly faster than in Windows. Especially when using multithreading or multiprocessing, the execution in Linux is dramatically faster.
• There is no significant difference in speed between Python 2 and Python 3, except for multiprocessing in Windows, where Python 2 performs better.
• For geometric ray tracing a decent laptop can be a reasonable choice.