Manish Agrawal


Week 2 And 3

16 Jun 2019 »

GSoC 2019 :: gprMax Weekly Documentation

Week 2 & 3

By the end of week 3 I was able to run a basic source, receiver and fields update model based on OpenCl. But the journey to the basic implementation of solver involved lots of minor failures and errors. My work from both the weeks resulted into PR merge (#210) into gprMax repository.

PyOpenCl Solver is aimed to generalize over any devices or computing architectures. But to attain generalization and optimization for every devices is not in the scope at this early stage of project. Just to make a case that FDTD Algorithm for computing E and H Fields can be simulated using OpenCl, same kernel indexing techniques from the FDTD Solver CUDA codes were adapted. PyOpenCl class is initiated by selecting the choice of OpenCl Platform and Device. As of now only single device computation is supported in which there is a single host device and single kernel device.

PyCUDA comes with SourceModule which can render the kernel template into kernel codes. Since PyOpenCl doesn’t have its native template rendering tool, I have used Jinja2 template rendering tool. Jinja2 is a famous rendering tool used widely in web development. In our case, jinja2 will take the .cl kernel template and build them as our kernel code which will be passed into Program Module of PyOpenCl. To pass var1 as a value into jinja templates `` is used and to add comments into jinja templates {#comments#} is used.

Constant Global Variables

Another major struggle was on passing the updatecoeffsE and updatecoeffsH variable into global constant memory of the device. Since OpenCl kernel codes are written on C99 standards with some restrictions, a global constant can be declared on program scope by adding __constant as a qualifier to the required variable. Since updatecoeffsE/H are not universally constant and varies as per model file, we can not hard code and write the values to the kernel files. Nor we can declare with some dummy values initially with the intention of passing the actual updatecoeffsE/H values later on by calling some function from host. This will not work since the variables are already declared as constant and thus compilation error is raised. To resolve the issue, updatecoeffsE/H values were passed through jinja2 templated rendering such that the constant variable is initiated with these values itself. I find this a simple and effective way for resolving this issue. I happy to learn if you have something new or simple way.

Setting Work-Item and Work-Size

Running the model file user_models/cylinder_Ascan_2D.in requires updating E/H fields across 12600 different cells. The cells are distributed in 3 dimensions thus single-dimension indexing is a simple and effective way of understanding how work-items will behave. In case of OpenCl, single work-items should be responsible for updating E/H Field values in single cell. Collection of work-items is a work-group. Every hardware have their set of fixed parameter of MAX_WORK_GROUP_SIZE and MAX_WORK_ITEM_SIZES. The former being 256 and the latter being (256,256,256). This will mean that there cannot be more than 256 work item in a single work group and there can be atmost 256 work-items in any dimension in a single work-group. Thus, it is clear what values of global_work_size and local_work_size should be which is needed to be passed while calling the kernel function. global_work_size will be (nx*ny*nz,1,1) and local_work_size can be None which can be handled by opencl by distributing the work-items evenly across different dimensions.

Finally, by struggling at other numerous minor errors, the model file was successfully simulated. The produced output was validated with corresponding reference output. The simulation is acceptable since the difference match is less than 0dB. I have tested the PyOpenCl Solver with Intel CPUs, Intel Graphic Cards and Nvidia GPUs from Google Colab. The codes are running smoothly in all those platforms.