Fundamentals of Parallel Programming
|
|
|
| |
|
|
Course Highlights:
This course is a foundational two-day course (with an optional third day) that offers concepts and techniques for parallel programming. The course covers the basic approaches to parallel programming (shared memory and message passing), techniques to identify parallelism at various levels (loop-level and algorithm level), techniques to determine which types of parallelism (data parallelism, function parallelism, and pipeline parallelism) are suitable for a given case, and an overview of multicore architectures. The course is designed to equip professionals or code developers who have experience in C/C++ programming an ability to identify and exploit parallelism and concurrency offered by multicore or multiprocessor systems, and to tailor their code to run well on multicore systems. Participants will be introduced the concepts, as well as given hands-on experience in parallel programming on multicore systems. Roughly 30-40% of the course will be focused on hands-on programming experience solving several problems that are relevant to real world scenarios. The hands-on programming is based on the industry-standard OpenMP. However, while most parallel programming courses focus only on OpenMP, this course covers parallelization concepts that go beyond OpenMP. By learning these concepts, course participants will learn how to exploit parallelism that cannot be directly expressed through OpenMP, thereby bypassing the limitations of OpenMP.
A complimentary copy of the textbook “Fundamentals of Parallel Computer Architecture” is included in the registration.
Objective of the course:
Future processors will pack more and more cores on a single die. Unless code is written specifically to exploit such multicore architectures, it will not enjoy the performance scaling across processor generations that many code developers have relied on in the last 25 years. In the multicore era, the mastery of parallel programming techniques and knowledge of architecture features of multicore processors are essential in unlocking the performance potentials of multicore processors.
- Do you know how differently you should program on multicore processors versus on single core processors?
- Do you know how to create parallel programs that are efficient on a multicore processor?
- Do you know how to exploit parallelism at all levels (loop level, code level, algorithm level) and various types of parallelism?
- Do you know which type of parallelism is suitable for a given problem and platform?
- Do you know how to specify parallelism using Google MapReduce programming model?
If you answer “no” for one or more of the above questions, this course is for you. Code developers who know how to exploit the potentials of multicore processors will have a significant competitive advantage over those who do not.
The course objective is to equip code developers or other professionals the foundational concepts and techniques of parallel programming and to equip them with the practical skills and experience in how to efficiently apply them on multicore and multiprocessor systems.
Who Should Attend:
Code developers or other professionals who have experience in C/C++ programming. The course does not assume prior knowledge in multicore architecture or parallel programming. In order to acquire a complete skill set on multicore programming, participants are encouraged to register for a follow up course “Advanced Programming on Multicore: Writing and Tuning for Performance and Scalability”, which focuses on how to structure and tune programs for maximum performance and scalability to a large number of cores.
Course Outline:
1. Perspectives
- Why parallel programming?
- Classes of parallel architectures
- Landscape of parallel programming and architecture
- Trends and implication on software development in current and future multicore processors
2. Parallel Programming Models
- Shared memory model
- Message passing model
- Google MapReduce programming model
- Merits and drawbacks of different models
3. Shared Memory Parallel Programming Techniques
- Steps in parallel programming
- Identifying Loop-Level Parallelism
- Iteration-space Traversal Graph and Loop-Carried Dependence Graph
- Finding Parallel Tasks Across Iterations
- Identifying Algorithm-Level Parallelism
- Determining the Scope of Variables
- Privatization
- Reduction Variables and Operation
- Synchronization primitives and their uses
4. Parallel programming with OpenMP
- How various types of parallelism are expressed
- How scopes of variables are expressed
- How tasks are scheduled and balanced
- Tasking feature in OpenMP 3.0
- Limitations of OpenMP for parallel programming
- Case studies in OpenMP programming
5. Correctness and Performance Issues in Shared Memory Parallel Programming
- Common Correctness Pitfalls
- Result Preservation
- Incorrect or Missing Synchronization
- Incorrect Variable Scope
- Compiler Limitations
- A Case Study of a Parallelizing Compiler
- Performance Considerations
- Amdahl’s Law
- Parallel Thread Granularity
- Synchronization Granularity
- Inherent and Artifactual Communication
- Scheduling and Load Balancing
- Memory Hierarchy Considerations
6. Parallel Programming for Linked Data Structures (LDS)
- Overview of how LDS (lists, hash tables, trees, graphs, etc.) are accessed
- Parallelization Challenges in LDS
- Why Loop-Level Parallelization is Insufficient
- Approaches to Parallelization of LDS
- Challenges in LDS Parallelization
- Parallelization Techniques for Linked Lists
- Parallelization Among Readers
- Global Lock Approach
7. Exploiting Alternative Parallelism (Optional third day material)
- Exploiting function parallelism
- Exploiting pipelined parallelism
- Comparing data, function, and pipelined parallelism
- Loop transformations that expose parallelism
- Exploiting DOACROSS parallelism
- Exploiting DOPIPE parallelism across loop statements
8. Parallel Programming with Google’s MapReduce (Optional third day material)
- Perspectives on MapReduce as an alternative parallel programming model
- Specifying map() and reduce() functions
- Case studies of programs that can be translated well into the MapReduce model
9. Advanced Programming for Linked Data Structures (Optional third day material)
- Fine-grain locking techniques: what to lock and when to lock
- How to analyze the need for read consistency and how to achieve it
- How to avoid deadlocks
- Parallelization of irregular LDS such as trees and graphs
- How to schedule and perform garbage collection safely
Notes:
- Students can take this course as a two-day course to cover Module 1-6.
- Students can also take this course as a three-day course to cover Module 1-9.
- In each module there will be Q&A sessions
- Module 3, 4, 5, 6, 7, 8, and 9 involve hands-on parallel programming exercises.
|