精品课程建设

Materials Outline

Textbook

Programming Massively Parallel Processors - 3rd Edition

Lecture 1

PPT(click here)

Lecture-1-cuda-introduction

Paper(click here)

1 An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems

2 The concurrency challenge

3 Software and the concurrency revolution

4 Some computer organizations and their effectiveness

5 MCUDA: an Efficient Implementation of CUDA Kernels for Multi-Core CPUs

Related Materials(click here)

1 MPI – A Message Passing Interface Standard Version 2.2

2 Algorithms and theory of computation handbook

3 NVIDIA CUDA C Programming Guide

4 Introduction to computing systems: from bits and gates to C and beyond

5 First Draft of a Report on the EDVAC

6 Computational thinking

Lecture 2

PPT(click here)

Lecture-2-kernel-multidimension

Lecture 3

PPT(click here)

Lecture-3-Memory and Data Locality

Lecture 4

PPT(click here)

Lecture-4-Performance considerations

Paper(click here)

1 Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

2 Program optimization space pruning for a multithreaded GPU

Related Materials

1 CUDA C Best Practices Guide v. 4.2

2 CUDA Occupancy Calculator. Web search using keywords “CUDA Occupancy Calculator”.

Lecture 5

PPT(click here)

Lecture-5-histogram

Related Materials

1 Merrill, D. (2015). Using compression to improve the performance response of parallel histogram computation, NVIDIA Research Technical Report.

Lecture 6

PPT(click here)

Lecture-6-Scan

Paper(click here)

1 A regular layout for parallel adders

2 Fast scan algorithms on graphics processors

3 A study of persistent threads style GPU programming for GPGPU Workloads

4 A parallel algorithm for the efficient solution of a general class of recurrence equations

5 Single-pass parallel prefix scan with decoupled look-back

6 StreamScan: fast scan algorithms for GPUs without global barrier synchronization

Related Materials(click here)

1 Parallel prefix sum with CUDA

Lecture 7

PPT(click here)

Lecture-7-Joint CUDA-MPI Programming

Related Materials

1 Gropp, William, Lusk, Ewing, & Skjellum, Anthony (1999a). Using MPI, 2nd edition: Portable parallel programming with the message passing interface. Cambridge, MA: MIT Press Scientific And Engineering Computation Series. ISBN 978-0-262-57132-6.

Lecture 8

PPT(click here)

Lecture-8-Sparse-matrix

Paper(click here)

1 Implementing sparse matrix–vector multiplication on throughput oriented processors

2 Methods of conjugate gradients for solving linear systems

Related Materials

1 Rice, J. R., & Boisvert, R. F. (1984). Solving Elliptic Problems Using, ELLPACK. Springer Verlag. 497 pages.

Lecture 9

PPT(click here)

Lecture-9-Parallel patterns

Paper(click here)

1 Efficient MPI implementation of a parallel, stable merge algorithm

Lecture 10

PPT(click here)

Lecture-10-Computational-Thinking

Paper

1 Rodrigues, C. I., Stone, J., Hardy, D., & Hwu, W. W. (2008). GPU acceleration of cutoff-based potential summation. In: ACM computing frontier conference 2008, Italy, May.

Related Materials(click here)

1 MPI—a message passing interface standard version 2.2

2 Computational thinking