Continual exponential speed increases resulting from the realization of Moore's Law has come to an end, at least for the foreseeable future. As a result, attention has been focused instead on techniques for parallelizing algorithms as the current best way to achieve massive speedups. Two common approaches are: multicore parallelization (either in a single machine or in a networked collection of machines) and approaches based on using Graphical Processing Units (GPUs) for general-purpose parallel computing.
We will start in this course by studying basic concepts of single-machine multithreaded programming. We will then extend this idea to multicore programming across a network of machines. Finally we will turn our attention to GPUs, first by studying key hardware architectural concepts that must be mastered in order to understand not only how GPU programming works, but – more importantly – how to achieve optimal performance on GPUs. Our study of hardware architectures will be at a fairly high functional level, sufficient only to understand how software must be designed to exploit the potential offered by the hardware.
We will study and/or implement algorithms using all of these approaches. We will use common software APIs, languages, and tools including (but not necessarily limited to):
Prerequisite: EECS 448
Textbook: Multicore and GPU Programming: An Integrated Approach, Gerassimos Barlas, (Morgan Kaufmann).