CS4435b - CS9624b - High Performance Computing with a Focus on Hardware Acceleration Technologies

University of Western Ontario
Computer Science Department
Date: January 4, 2010

Current hardware improvements focus on increasing the number of computations that can be performed in parallel rather than on increasing clock speed alone. This change has brought multi-processor workstations to the desktop, expanding interest in parallel algorithms and software capable of exploiting these computing resources. At the same time, these new hardware acceleration technologies stress the need of a deeper understanding of performance issues in software design.

The aim of this course is to introduce you to the design and analysis of algorithms and software programs capable of taking advantage of these new computing resources. The following concepts will guide our quest for high performance: parallelism, scalability, granularity, locality, cache complexity, synchronization, scheduling and load balancing.

Out of the course, you are anticipated to have an in depth understanding of the following subjects:

A quarter part of the course will give an overview of other hot topics in high performance computing, including the following ones:

Prerequisites for undergraduate students

CS 210, 211, 305. A good familiarity with linear algebra and complexity analysis of algorithms is recommended.


This presents the contents of the course, its assignments, quizzes and projects. outline.html

Schedule of Lectures

Week Jan. 4-10 Introduction to software performance
Week Jan. 11-17 Introduction to multicore programming
Week Jan. 18-24 Multithreading parallelism and performance
Week Jan. 25-31 Analysis of multithreaded algorithms
Week Feb. 1-7 Practical issues in parallelism
  Case Study
Week Feb. 8-14 Synchronizing without locks
Week Feb. 22-28 Cache memories
Week Mar. 1-7 Practical issues with locality
Week Mar. 8-14 Cache complexity
Week Mar. 15-21 Bit Hacks
Week Mar. 22-28 Space-time tradeoffs
Week Mar. 29-4 Experiences with coding high-performance numerical libraries
Week Apr. 19-26 Project presentations

Student evaluation

The course is very oriented toward quizzes, assignments and projects. For CS 4435, assignments and projects constitute 40% and 40% of the course mark, respectively. There is no midterm examination and no final examination. However, there will be at least four quizzes. Quizzes constitute 20% of the course mark. For CS 9624, quizzes and projects constitute 20% and 80% of the course mark, respectively.

Some keywords (to be completed)

These are links to the Wikipedia pages of some keywords for this course.

Some other courses on HPC and parallelism

These are links to courses on HPC and Parallelism, which have inspired this CS4435b - CS9624b. In particular, I am grateful to Saman P. Amarasinghe (MIT), Matteo Frigo (Intel), Charles E. Leiserson (MIT) Markus Pueschel (CMU) for sharing with me the sources of their course notes and other documents.

Concurrency platforms

These are links to programming languages for multi-threaded parallelism.

Some HPC libraries in scientific computing

These are links to some HPC libraries in scientific computing.
These software makes use of auto-tuning techniques.

Performance analyzers and debuggers

These are links to some HPC libraries in scientific computing.
These software makes use of auto-tuning techniques.

Some links to hardware architecture pages

These are links to hardware architecture pages related to this course.

Some conferences on HPC and parallelism