I am currently an Assistant Professor in the Department of Computer Science at the University of Western Ontario. I am member of the Ontario Research Center for Computer Algebra (ORCCA). My PhD is from the University of Western Ontario and was supervised by Professor Marc Moreno Maza. I have also received a BSc (Hons) in Computer Science (Software Engineering) from Memorial University of Newfoundland.
I am also currently the coach of Western's varsity heavyweight men's rowing team. Go Mustangs!
My research interests focus on software engineering and software performance engineering for high performance and scientific computing.
I am currently working on the Basic Polynomial Algebra Subprograms (BPAS) library.
This library provides fast, optimized symbolic (and symbolic-numeric) polynomial algebra routines. This includes polynomial arithmetic, polynomial system solving, and various polynomial operations. The library is designed to exploit shared-memory multi-processors and data locality. Moreover, template metaprogramming is used to provide compile-time type safety for mathematical and algebraic objects.
I am researching models of computation, concurrency platforms, and software design patterns which better support irregular parallelism. A large class of often over-looked applications exhibit irregular parallelism, where the opportunities for concurrency are not known ahead of time and most be dynamically found and exploited. Further, irregular parallel applications often experience dynamic data generation. Where that generated data is of extreme sizes, we may consider the application to be "dynamically data-intensive". Therefore, my work considers irregular parallelism simultaneously with data locality and cache complexity.
The development and optimization of parallel programs is a challenging endeavor, requiring substantial expertise, ad hoc knowledge, and intuition. The programmer must understand the runtime dynamics of the parallel program and carefully choose program parameters. This includes the number of threads, their affinity, and their priority. In GPU programming, this includes the grid and thread block configurations.
We are developing tools and profilers to measure, understand, and diagnose the performance of irregular parallel applications. The data can be used to inform an auto-tuning algorithm towards improved performance. Toward easier optimization of GPU programs, we are developing KLARAPTOR. This tool uses compile-time analysis to dynamically decide the best kernel launch parameters (e.g. thread block configuration) for a particular kernel invocation. This takes into consideration the target device as well as the dynamic data sizes of the particular invocation.
Last Modified November 10, 2022