Calculations on GPU

GPU can be used for several types of calculations, e.g. of an undulator source. The gain is enormous. You should try it. Even on an “ordinary” Intel processor the execution with OpenCL becomes significantly faster.

Here are some benchmarks on a system with Intel Core i7-3930K 3.20 GHz CPU, ASUS Radeon R9 290 GPU, Python 2.7.6 64-bit, Windows 7 64-bit. Script examples/withRaycing/01_SynchrotronSources/synchrotronSources.py, 1 million rays, execution times in seconds:

CPU 1 process

5172, 1 CPU process loaded

CPU 10 processes

1245, with heavily loaded system

openCL with CPU

163, with highly loaded system

openCL with GPU

132, with almost idle CPU

You will need AMD/NVIDIA drivers (if you have a GPU, however this is not a must), a CPU only OpenCL runtime, pytools and pyopencl.

Note

When using OpenCL, no further parallelization is possible by means of multithreading or multiprocessing. You should turn them off by using the default values processes=1 and threads=1 in the run properties.

Please run the script tests/raycing/info_opencl.py for getting information about your OpenCL platforms and devices. You will pass then the proper indices in the lists of the platforms and devices as parameters to pyopencl methods or, alternatively, pass ‘auto’ to targetOpenCL.

Important

Consider the warnings and tips on using xrt with GPUs.

Hint

Consider also Speed tests for a few selected cases.

OpenCL Server

This example contains a Python script designed to run on a GPU server, leveraging ZeroMQ (ZMQ) for efficient data transfer. The script acts as a remote accelerator device, receiving data from a client Python script, performing calculations on the GPU, and returning the results for plotting on a local computer.

Users can seamlessly execute their scripts in their favorite IDE, offloading resource-intensive calculations to a remote server over the network. The only trade-off is the potential delay due to data transfer, which is outweighed by the benefits when local computations take longer than data transfer time. Furthermore, the local graphical user interface (GUI) remains responsive without freezes or issues caused by high GPU/CPU loads. This script now supports all acceleration scenarios:

  • synchrotron sources,

  • wave propagation,

  • bent crystals,

  • multilayer reflectivity.

Script Components

The GPU accelerator script is comprised of two files located at tests/raycing/RemoteOpenCLCalculation:

  1. zmq_server.py: The server script is the main component, responsible for receiving data and getting kernel names from the client. It listens on a predefined port, processes the received package, executes the specified kernel on the GPU and sends the computed data back to the client. This server script can be executed independently or in conjunction with the queue manager.

  2. queue_device.py: The queue manager script facilitates the handling of multiple user requests and the distribution of computational tasks across multiple servers. It provides scalability and load balancing capabilities. The queue manager can be executed on the same machine as the server or on a dedicated node. However, when running the queue manager on a separate node, data transfer times may increase.

Running the Script

To execute the GPU accelerator script, follow these steps:

Set up the GPU server environment with the necessary dependencies, including pyzmq, xrt and underlying dependencies (numpy, scipy, matplotlib, pyopencl). Start the server script, either as a standalone process or in conjunction with the queue manager, based on your specific requirements.

Ensure that the client Python script is configured to connect to the correct server (or queue manager) address and port:

targetOpenCL="GPU_SERVER_ADDRESS:15559"