CS 9624 and CS 4435 - High Performance Computing: From Models of Computation to Applications

University of Western Ontario
Computer Science Department
Date: January 3, 2011



Once upon a time, everything was slow on a computer. Today, hierarchical memories, multi-cores, instruction-level parallelism have change the game.

In this course we will present theoretical models that permit to achieve high-performance on today's computers. This will be illustrated by various applications.

Today's parallel hardware architectures (multicores, graphics processing units, etc.) and computer memory hierarchies (from processor registers to hard disks via successive cache memories) enforce revisiting many fundamental algorithms which were often designed with algebraic complexity as the main complexity measure and with sequential running time as the main performance counter.

Indeed, on modern computers minimizing the number of arithmetic operations is no longer the primary factor affecting performance. Effective utilization of the underlying architecture (increasing the amount of instruction level parallelism, better utilizing the memory hierarchy, etc.) can be much more important. Many examples exist where algorithms that use more arithmetic operations outperform algorithms with fewer arithmetic operations.

In this course, we will present different theoretical tools (cache complexity, multithreaded parallelism, space-time tradeoffs, code optimization for parallelism and locality, etc.) that are adapted for taking best advantage of parallel architectures and hierarchical memories.

We will discuss a large variety of applications (finite-difference methods, compression algorithms, artificial intelligence, metabolic networks, etc.) and compare the implementation techniques of a given algorithm on different hardware architectures (multicores, GPGPUs).

To have a more detailed idea about this course, please visit the web site of a course I taught last year and from which I will borrow part of the materials for this year's course: CS 4435 and CS 9624 - High Performance Computing with a Focus on Hardware Acceleration Technologies.

Follow this link for various resources (software tools and tutorials, hardware documentation, conferences, other HPC course web sites, etc.) regarding this course and HPC in general.

Prerequisites for undergraduate students

CS 210, 211, 3305. A good familiarity with linear algebra and complexity analysis of algorithms is recommended.

Outline

This presents the contents of the course, its assignments, quizzes and projects. outline.html

Lecture notes:

  • An Introduction to Software Performannce Engineering: slides and handouts.
  • Multithreaded Parallelism and Performance Measures slides and handouts.
  • Analysis of Multithreaded Algorithms slides and handouts.
  • Cache memories: complexity analysis and practical issues. slides and handouts.
  • Space - time tradeoffs. slides and handouts.
  • Parallel Generation of Transversal Hypergraphs (with Charles E. Leiserson, Liyun Li and Yuzhen Xie) slides
  • Problem sets:

  • Problem set 1
  • Problem set 2