**
**

**
University of Western Ontario
Computer Science Department
**

**Date:** September 7, 2009

*
*

*
*

*
*

**Current hardware improvements have focused on increasing
the number of computations that can be performed
in parallel rather than on increasing clock speed alone.
This change has brought multi-core workstations
to the desktop, expanding interest in parallel algorithms
and software capable of exploiting these computing resources.
**

**The aim of the course is to introduce you to the art of
analyzing and designing parallel algorithms. The following
concepts will guide our quest for high performance:
finding parallelism, scalability,
granularity, locality, synchronization,
scheduling and load balance.
**

**Out of the course, you are anticipated to
have an in depth understanding of
some important parallel machine models,
parallel complexity theory,
fundamental parallel algorithms
(for sorting, linear algebra, FFT)
and parallel programming environments
and tools such as MPI, OpenMP, Cilk, Kaapi, UPC, Linpack.
**

**Prerequisites.**- CS 210a/b, 211a/b, 305a/b.
A good familiarity with
*linear algebra*and*complexity analysis of algorithms*is recommended. **Outlines.**- This presents the contents of the course, its assignments, quizzes and projects.
`outline.html` **Lectures (tentative schedule).**Week Jan. 7-13 `Overview of the course``Performance and scalability of parallel algorithms`Week Jan. 14-20 `Parallel machine model`Week Jan. 21-27 `Parallel Complexity Theory (I)`Week Jan. 28-3 `Parallel Complexity Theory (II)`Week Feb. 4-10 `Parallel Sorting`Week Feb. 18-24 `Parallel Matrix Operations`Week Mar. 3-9 `Solving systems of linear equations in parallel`Week Mar. 3-9 Parallel programming environments (I) Week Mar. 10-16 Scheduling, load balancing (I) Week Mar 17-23 Scheduling, load balancing (II) Week Mar 24-30 Code optimization for parallelism and locality Week Mar 31-6 Parallel programming environments (II) Week Apr. 7-13 Project presentations **Student Evaluation.**- The course is very oriented toward assignments and projects. Assignments and projects constitute 40% and 40% of the course mark, respectively. There is no midterm examination and no final examination. However, there will be at least four quizzes. Quizzes constitute 20% of the course mark.
**Assignments and projects for Winter 2008.**`Assignment 1 (compressed postscript).``Assignment 1 (html pages).`Posted Feb. 7 Due Feb. 21 **Software and Courses.**- These are links to some books and software related to the course.
`MPICH-A Portable Implementation of MPI``Development Tools for MPICH``CS 402/CS 535: Parallel and Distributed Computing`at UWO`CS 424b - CS 556b Foundations of Computational Algebra`at UWO`Parallel Algorithms (WISM 459, 2005/2006)`by Rob Bisseling`Topics in Parallel Computing (CS 838, Univ. of Wisconsin-Madison, Spring 1999)`by Pavel Tvrdik`Parallel Algorithms (CS 662, San Diego Univ., Spring 1996)`by Roger Whitney`U.C. Berkeley CS267/EngC233 Home Page`by James Demmel`MIT 6.895 Fall 2003`by Charles E. Leiserson

**Conferences.**`ACM Workshop on Parallel Symbolic Computation 2007``20th ACM Symposium on Parallelism in Algorithms and Architectures``PARALLEL PROCESSING AND APPLIED MATHEMATICS``SIAM Conference on Parallel Processing for Scientific Computing``Some Compiler and Parallel Computing Conferences``IEEE International Parallel & Distributed Processing Symposium``ACM Symposium on Parallelism in Algorithms and Architectures``Some Theoretical Computer Science Conferences`

**Papers on Parallel Scientific Computing.***Estimating Execution Time of Distributed Applications*by Maciej Drozdowski*pARMS: A Package for Solving General Sparse Linear Systems on Parallel Computers*by Y. Saad and M. Sosonkina*Fast Scheduling and Partitioning Algorithm in the Multi-processor System with Redundant Communication Resources*by Eryk Laskowski*Evaluation of Parallel Programs by Measurement of Its Granularity*by Jan Kwiatkowski*Parallel Triangular Decompositions of an Oil Refining Simulation*by Xiaodong Zhang*Parallel Algorithms for Arrangements*by Richard Anderson, Paul Beame and Erik Brisson*Efficient parallel computation of arrangements of hyperplanes in d dimensions*by Torben Hagerup, Hermann Jung and Emo Welzl*Recursive Array Layouts and Fast Parallel Matrix Multiplication*by Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala and Mithuna Thottethodi*A Quantitative Comparison of Parallel Computation Models*by Ben H.H. Juurlink and Harry A.G. Wijshoff*Towards Efficiemlcy and Portability: Programming with the BSP Model*by Mark Gouclreau, Kevin Lang Satish Rao, Torsten Suel and Thanasis Tsantilas*A BSP Parallel Model for the Göttfert Algorithm over F2*by Fatima Abu Salem*Fast Rectangular Matrix Multiplications and Improving Parallel Matrix Computations*by Xiaohan Huang and Victor Y. Pan*NC*by T. Gautier and J.L. Roth^{2}Computation of Gcd-free Basis and Application to Parallel Algebraic Numbers Computation*Processor-Efficient Parallel Matrix Inversion over .Abstract Fields: Two Extensions*by W. Eberly*Complexity of the Wu-Ritt decomposition*by Á. Szántó*Distributed Partial Evaluation*by Michael Sperber, Herbert Klaeren and Peter Thiemann*Processor Efficient Parallel Solution of Linear Systems over an Abstract Field*by E. Kaltofen and V. Pan*On the Parallel Cbmplexity of Matrix Factorization Algorithms*byMauro Leoncini, Giovanni Manzi and Luciano Margara*Multidimensional, Multiprocessor, Out-of-Core FFTs with Distributed Memory and Parallel Disks*by Lauren M. Baptist and Thomas H. Cormen*Multithreaded Algorithms for the Fast Fourier Transform*by Parimala Thulasiraman, Kevin B. Theobald, Ashfaq A. Khokhar and Guang R. Gao*How much can we speedup Gaussian Elimination with Pivoting?*by M. Leoncini*Communication Efficient Matrix Multiplication on Hypercubes*by Himanshu Gupta and P. Sadayappan*Distributed Data Structures and Algorithms for Gröbner Basis Computation*by SOUMEN CHAKRABARTI and KATHERINE YELICK*On the Correctness of a Distributed Memory Gröbner Basis Algorithm*by SOUMEN CHAKRABARTI and KATHERINE YELICK*Extended Parallelism in the Gröbner Basis Algorithm*by Stephen A. Schwab*A Fine-Grained Parallel Completion Procedure*by Reinhard Bündgen, Manfred Göbel and Wolfgang Küchlin*Parallelization of The Sparse Modular GCD Algorithm for Multivariate Polynomials on Shared Memory Multiprocessors*by Mohamed Omar Rayes, Paul S. Wang and Kenneth Weber*Parallel Computation of Modular Multivariate Polynomial Resultants on a Shared Memory Machine*by H. Hong and H. W. Loidl*A Parallel Completion Procedure for Term Rewriting Systems*by Katherine Yelick and Stephen Garland*Programming Models for Irregular Applications*by Katherine Yelick*Data Structures for Irregular Applications*by Katherine Yelick*Strategy-Accurate Parallel Buchberger Algorithms*by G. Attardi and C. Traverso*Work Efficient Parallel Solution of Toeplitz Systems and Polynomial GCD*by John H. Reif*Processor-Efficient Parallel Matrix Inversion over .Abstract Fields: Two Extensions*by W. Eberly*A New Approach to Parallel Computation of Polynomial GCD and to Related Parallel Computations over Fields and Integer Rings*by Victor Y. Pan*Processor-Efficient Parallel Computation of Polynomial Greatest Common Divisors*by Erich Kaltofen