2013 Short Course on Parallel Programming


Homework Assignments
Although accounts on the parallel server for the hands-on activity are only available to on-site attendees, online attendees are welcome to do the homework on their own platforms.
To view the hands-on lab assignments:

-For those who have registered with XSEDE:

-Those who have not registered with XSEDE may view the assignments via the local site:


Monday, August 19

*9:00 - 9:30- Introduction and Welcome (Jim Demmel, UCB)
Slides and Video
Greeting, Overview, and talk about logistics

*9:30 - 12:00- Introduction to Parallel Architectures and Pthreads (John Kubiatowicz, UCB)

Slides and Video

Why parallelism is our future, and what programmers need to know about the hardware in order to write efficient programs. We also introduce parallel programming with Pthreads. (includes 30 min break)

*12:00 - 1:15- Lunch

*1:15 - 2:15- Shared Memory Programming with OpenMP- Basics (Tim Mattson, Intel)

Slides and Video

We introduce OpenMP; an industry standard API for programming shared memory computers. OpenMP provides a simple path for programmers to get started with parallel programming. In this lecture, we'll focus on the core features of the original versions of OpenMP.

*2:15 - 3:00- More about OpenMP- New Features (Tim Mattson, Intel)

Slides and Video

Since its introduction in 1997, OpenMP has grown beyond simple parallel loops. In this lecture we'll explore the more recent features with an emphasis on the tasking model added in OpeMP 3.0 and the classes of algorithms this model supports.

*3:00 - 3:30- Break

*3:30 - 4:30- Talk and Demo: Performance Tuning Random Slowdowns in Recurring Functionalities (Gary Carleton, Intel)


This session discusses one of the lesser known features of the VTune™ Amplifier XE, Frame Analysis, that can be used to find causes for repeating application functionalities that occasionally slow down at random times. Examples would be video frame rates, internet search engine response times, … There will be a demo in which we will profile a commercial application to determine why the video frame rate sometimes slows down as the app runs.

*4:30 - 5:00- Break/ Transition to Rooms

*5:00 - 6:00- Parallel Sessions:
1. Introduction to NERSC Tools (Room TBD)
2. Introduction to OpenMP (Room TBD)

*6:00 - 7:00- Informal Meet & Greet Reception (Soda Hall, 5th Floor)

Tuesday, August 20

*8:45 - 9:45- Programming Distributed Memory Systems with MPI (Tim Mattson, Intel)

Slides and Video

*9:45 - 10:45- Sources of Parallelism and Locality in Simulation (Jim Demmel, UCB)

Slides and Video

We show how to recognize recurring opportunities to exploit parallelism in simulating real or artificial "worlds", as well as opportunities to minimize data movement.

*10:45 - 11:15- Break

*11:15 - 12:15- Architecting Parallel Software with Patterns (Kurt Keutzer, UCB)

Slides and Video

We give an overview of design patterns and how complex parallel software systems can be architected with them.

*12:15 - 1:30- Lunch

*1:30 - 2:30- An Introduction to GPU, CUDA, and OpenCL (Bryan Catanzaro, NVIDIA Research)

Slides and Video

GPUs (Graphics Processing Units) have evolved into programmable manycore parallel processors. We will discuss the CUDA and OpenCL programming models, GPU architecture, and how to write high performance code on GPUs.

*2:30 - 3:00- Break/ Transition to Rooms 373, 380, and 405) for OpenMP and NERSC Tools (all rooms are located in Soda Hall)

*3:00 - 6:00- Hands-on Activities (NERSC and OpenMP)

Wednesday, August 21

*8:45 - 9:45-Partitioned Global Address Space Programming with Unified Parallel C (UPC) (Kathy Yelick, UCB and LBL)

Slides and Video

The largest and highest performance computers have distributed memory instead of shared memory, and are programmed using message passing (MPI)or new languages like UPC.

*9:45 - 10:15- Break

*10:15 - 12:15- Computational Patterns and Autotuning (Jim Demmel, UCB)


We discuss several recurring computational patterns (eg linear algebra and stencils) whose fastest implementations are written automatically by other programs called autotuners.

*12:15 - 1:30- Lunch

*1:30 - 2:30- Performance Debugging: Methods and Tools (David Skinner, LBL)

Slides and Video

When a parallel program runs slower than expected, "performance debugging" may be done most effectively using a variety of tools that automatically instrument and display performance data.

*2:30 - 3:30- Cloud Computing using MapReduce Hadoop, Spark (Matei Zaharia, UCB)

Slides and Video

Cloud computing allows users to easily exploit large commerical compute clusters available at many companies. We discuss programming tools (eg Hadoop, MapReduce) that make them easy to use.

*3:30 - 4:00- Break

*4:00 - 5:00 - ParLab Applications: Browsers, Vision, & Music (Matt Torok, UCB; Michael Anderson, UCB; David Wessel, UCB)



We will describe parallel algorithms for computing layout of web documents and of data visualizations. We will demonstrate a data visualization running on a GPU.

Vision: Title: "GPU-Accelerated Computer Vision"
We will describe parallel algorithms and implementations for two computer vision applications: Image contour detection and optical flow. Both applications will be demonstrated live, running on a GPU.

Music:Title: "A Couple Music Applications"
Introduction to work being done to improve quality of service for real-time audio processing tasks, such as partitioned convolution, and to increase throughput for batch music information retrieval applications, such as drum detection.