Manish Agrawal


Week 9 And 10

04 Aug 2019 »

GSoC 2019 :: gprMax Weekly Documentation

Very happy for passing second evaluation :D. Received some constructive appreciation from my mentor for the work that is being carried out.

Week 10

Since beginning and prior to this project, the issue of memory profiling and time profiling is a continuing issue. Be it for CPU accelerated or CUDA based accelerated simulations, the estimatation of memory usage has been a priority. In this article, I will try to reflect on what tools I have come across and the methods that are being used currently for making the profiling of the OpenCL based acceleration efficient.

Time Profiling

OpenCl supports event profiling through CommandQueue properties. CommandQueues are used as context managers which are initiated by triggering PyOpenCl.command_queue_properties.PROFILING_ENABLE. Find here a simple example for time profiling a simple kernel executions. For every kernel executions that are triggered in the PyOpenCl Solver return an event which can be used for taking note of time consumed across that kernel executions. If event is the returned Event PyOpenCl class instance than

event.profile.end - event.profile.start

gives the device specific time taken for kernel executions.

Memory Profiling

Unlike PyCUDA, PyOpenCl doesn’t have any support for memory profiling. There are no inbuilt tools which can be used to get the information of device memory used at any instance. PyOpenCl author, Andreas Kloeckner advised to monkeypatch the cl.Buffer constructor to instrument the same. But with discussions with my mentor we decided to only instrument memory allocation for PyOpenCl’s Array. It is quite simply to take into account the total device memory which is being allocated from the host side. Practically, these are those arrays which are transferred to Device from Host using cl_array.to_device() function. Since we know the type and size of every array we can iteratively add all these transfers and come up with estimated memory allocation that will take place in the device side.

Doing so, we come up with the OpenCl Time and Memory Profiling. Since, the work was an abstract method of estimating variables, there was a need for profiling the Solver with some external softwares. Till some time ago, nvvp NVIDIA Visual Profiler used to support profiling of OpenCl based softwares. But they have removed this support in their newer versions of application. So there are no such softwares which can profile a Nvidia GPU based OpenCl softwares. Running nvidia-smi at a refresh rate of 2/sec we were able to make account that the OpenClSolver used additional of ~42 MB of memory in the Nvidia GPU device over the estimated memory usage. So, accordingly the estimated memory usage is reported. CodeXL is available for AMD OpenCl software profilings, and Intel too have their OpenCl software profiler.

The PR #219 is the work being accepted for time and memory profiling feature into OpenCl Solver.