Calculations on GPU¶
GPU can be used for several types of calculations, e.g. of an undulator source. The gain is enormous. You should try it. Even on an “ordinary” Intel processor the execution with OpenCL becomes significantly faster.
Here are some benchmarks on a system with Intel Core i7-3930K 3.20 GHz CPU, ASUS Radeon R9 290 GPU, Python 2.7.6 64-bit, Windows 7 64-bit. Script examples/withRaycing/01_SynchrotronSources/synchrotronSources.py, 1 million rays, execution times in seconds:
CPU 1 process |
5172, 1 CPU process loaded |
CPU 10 processes |
1245, with heavily loaded system |
openCL with CPU |
163, with highly loaded system |
openCL with GPU |
132, with almost idle CPU |
You will need AMD/NVIDIA drivers (if you have a GPU, however this is not a must), a CPU only OpenCL runtime, pytools and pyopencl.
Note
When using OpenCL, no further parallelization is possible by means of multithreading or multiprocessing. You should turn them off by using the default values processes=1 and threads=1 in the run properties.
Please run the script tests/raycing/info_opencl.py
for getting information
about your OpenCL platforms and devices. You will pass then the proper indices
in the lists of the platforms and devices as parameters to pyopencl methods or,
alternatively, pass ‘auto’ to targetOpenCL
.
Important
Consider the warnings and tips on using xrt with GPUs.
Hint
Consider also Speed tests for a few selected cases.
OpenCL Server¶
This example contains a Python script designed to run on a GPU server, leveraging ZeroMQ (ZMQ) for efficient data transfer. The script acts as a remote accelerator device, receiving data from a client Python script, performing calculations on the GPU, and returning the results for plotting on a local computer.
Users can seamlessly execute their scripts in their favorite IDE, offloading resource-intensive calculations to a remote server over the network. The only trade-off is the potential delay due to data transfer, which is outweighed by the benefits when local computations take longer than data transfer time. Furthermore, the local graphical user interface (GUI) remains responsive without freezes or issues caused by high GPU/CPU loads. This script now supports all acceleration scenarios:
synchrotron sources,
wave propagation,
bent crystals,
multilayer reflectivity.
Script Components¶
The GPU accelerator script is comprised of two files located at
tests/raycing/RemoteOpenCLCalculation
:
zmq_server.py
: The server script is the main component, responsible for receiving data and getting kernel names from the client. It listens on a predefined port, processes the received package, executes the specified kernel on the GPU and sends the computed data back to the client. This server script can be executed independently or in conjunction with the queue manager.queue_device.py
: The queue manager script facilitates the handling of multiple user requests and the distribution of computational tasks across multiple servers. It provides scalability and load balancing capabilities. The queue manager can be executed on the same machine as the server or on a dedicated node. However, when running the queue manager on a separate node, data transfer times may increase.
Running the Script¶
To execute the GPU accelerator script, follow these steps:
Set up the GPU server environment with the necessary dependencies, including pyzmq, xrt and underlying dependencies (numpy, scipy, matplotlib, pyopencl). Start the server script, either as a standalone process or in conjunction with the queue manager, based on your specific requirements.
Ensure that the client Python script is configured to connect to the correct server (or queue manager) address and port:
targetOpenCL="GPU_SERVER_ADDRESS:15559"