PMPP 學習地圖 (Programming Massively Parallel Processors — MOC)

Overview

Topic Map

前言 (Introduction)

主題 概念筆記 練習
Ch.1 Introduction 異質平行運算與對運算速度的需求, 加速、平行程式設計挑戰與相關介面, 本書的整體目標與組織架構 練習

Part I — 基礎概念 (Fundamental Concepts)

主題 概念筆記 練習
Ch.2 Heterogeneous Data Parallel Computing 資料平行性與 CUDA C 程式結構, Vector Addition 主機端程式碼:Device Global Memory 與資料傳輸, Kernel Functions 與 Threading, 呼叫 Kernel、Compilation 與本章總結 練習
Ch.3 Multidimensional Grids And Data 多維 Grid 組織, Thread 對多維資料的映射與線性化, 影像模糊 Kernel, 矩陣乘法 Kernel 練習
Ch.4 Compute Architecture And Scheduling GPU 架構、Block 排程與 Synchronization, Warps、SIMD 硬體、Control Divergence 與 Warp Scheduling, Resource Partitioning、Occupancy 與 Device Properties 查詢 練習
Ch.5 Memory Architecture And Data Locality 記憶體存取效率與 CUDA 記憶體類型, Tiling 技術與 Tiled 矩陣乘法 Kernel, 邊界檢查與記憶體使用對 Occupancy 的影響 練習
Ch.6 Performance Considerations 記憶體合併存取, 隱藏記憶體延遲, 執行緒粗化, 優化清單與效能瓶頸 練習

Part II — 平行模式 (Parallel Patterns)

主題 概念筆記 練習
Ch.7 Convolution Convolution 基礎與基本平行 Kernel, Constant Memory 與快取階層, Tiled Convolution 與 Halo Cells 處理 練習
Ch.8 Stencil Stencil 基礎與基本平行核心, Stencil 的 Shared Memory Tiling, Thread Coarsening 與 Register Tiling 練習
Ch.9 Parallel Histogram 原子操作與基本直方圖核心, 直方圖優化:Privatization、Coarsening 與 Aggregation 練習
Ch.10 Reduction Reduction 基礎與簡單 Kernel, 優化單一 Block 的 Reduction Kernel, 擴展 Reduction:Hierarchical Reduction 與 Thread Coarsening 練習
Ch.11 Prefix Sum Scan Scan 基礎與 Kogge-Stone 演算法, 高 Work-Efficiency 的 Scan:Brent-Kung 與 Thread Coarsening, 任意長度輸入與 Single-Pass Scan 練習
Ch.12 Merge Merge 基礎與 Co-rank 概念, Co-rank Function 實作與 Basic Parallel Merge Kernel, Tiled 與 Circular Buffer Merge Kernel, Thread Coarsening 與本章總結 練習

Part III — 進階模式與應用 (Advanced Patterns & Applications)

主題 概念筆記 練習
Ch.13 Sorting 排序基礎與平行 Radix Sort 核心, Radix Sort 效能優化:Memory Coalescing、Radix 大小與 Thread Coarsening, Parallel Merge Sort 與其他平行排序方法 練習
Ch.14 Sparse Matrix Computation 稀疏矩陣與基礎儲存格式, 正則化儲存格式 練習
Ch.15 Graph Traversal 圖的表示法與廣度優先搜尋, Vertex-centric 與 Edge-centric 的 BFS 平行化, Frontier、Privatization 與其他 BFS 最佳化 練習
Ch.16 Deep Learning 機器學習基礎:Perceptron 與 Backpropagation, Convolutional Neural Networks, GPU Convolutional Layer:CUDA Kernel 與 GEMM 化, cuDNN Library 與本章總結 練習
Ch.17 Iterative MRI Reconstruction MRI 背景與 Iterative Reconstruction 問題建構, FHD Kernel 的平行化結構:Scatter vs Gather 與迴圈轉換, FHD Kernel 的記憶體頻寬優化:Registers 與 Constant Memory, Hardware 三角函數、精度驗證與效能調校 練習
Ch.18 Electrostatic Potential Map DCS 演算法與 Scatter vs Gather 核心設計, Thread Coarsening 與記憶體合併存取, Cutoff Binning 與資料規模擴充性 練習
Ch.19 Parallel Programming And Computational Thinking 平行運算的目標與 Amdahl's Law, 演算法選擇與權衡, 問題分解:Output-centric 與 Input-centric, 計算思維與平行化策略 練習

Part IV — 進階實務 (Advanced Practices)

主題 概念筆記 練習
Ch.20 Heterogeneous Computing Cluster MPI 叢集背景、Stencil 執行範例與 MPI Basics, MPI Point-to-Point 通訊與資料分發, 計算與通訊重疊, MPI Collective Communication 與 CUDA-aware MPI 練習
Ch.21 CUDA Dynamic Parallelism Dynamic Parallelism 基礎與概觀, 範例:Bezier Curves 與動態工作量, 遞迴範例:Quadtree, 重要執行考量與總結 練習
Ch.22 Advanced Practices And Future Evolution Host/Device 互動模型與記憶體演進, Kernel 執行控制, 記憶體頻寬與運算吞吐量, 程式開發環境與未來展望 練習
Ch.23 Conclusion And Outlook 目標回顧:全書四步學習路徑, 未來展望:大規模平行運算的演進 練習

Appendix — 數值考量 (Numerical Considerations)

主題 概念筆記 練習
Ch.A Numerical Considerations 浮點數資料表示, 可表示數值、精度與運算準確度, 平行演算法考量與數值穩定性 練習

Practice Notes

主題 連結
Ch.1 Introduction Introduction 練習題
Ch.2 Heterogeneous Data Parallel Computing Heterogeneous Data Parallel Computing 練習題
Ch.3 Multidimensional Grids And Data Multidimensional Grids And Data 練習題
Ch.4 Compute Architecture And Scheduling Compute Architecture And Scheduling 練習題
Ch.5 Memory Architecture And Data Locality Memory Architecture And Data Locality 練習題
Ch.6 Performance Considerations Performance Considerations 練習題
Ch.7 Convolution Convolution 練習題
Ch.8 Stencil Stencil 練習題
Ch.9 Parallel Histogram Parallel Histogram 練習題
Ch.10 Reduction Reduction 練習題
Ch.11 Prefix Sum Scan Prefix Sum Scan 練習題
Ch.12 Merge Merge 練習題
Ch.13 Sorting Sorting 練習題
Ch.14 Sparse Matrix Computation Sparse Matrix Computation 練習題
Ch.15 Graph Traversal Graph Traversal 練習題
Ch.16 Deep Learning Deep Learning 練習題
Ch.17 Iterative MRI Reconstruction Iterative MRI Reconstruction 練習題
Ch.18 Electrostatic Potential Map Electrostatic Potential Map 練習題
Ch.19 Parallel Programming And Computational Thinking Parallel Programming And Computational Thinking 練習題
Ch.20 Heterogeneous Computing Cluster Heterogeneous Computing Cluster 練習題
Ch.21 CUDA Dynamic Parallelism CUDA Dynamic Parallelism 練習題
Ch.22 Advanced Practices And Future Evolution Advanced Practices And Future Evolution 練習題
Ch.23 Conclusion And Outlook Conclusion And Outlook 練習題
Ch.A Numerical Considerations Numerical Considerations 練習題

Study Tools

工具 說明 連結
Quick Reference 公式 / 速查表 速查表
Exam Traps 常見陷阱與易混淆點 陷阱題

Tag Index

標籤採 5 層階層:area(領域)→ pattern(平行模式)→ technique(優化技巧)→ concept(具體概念)→ note-type(筆記類型)

層級 標籤
area #cuda-programming, #gpu-architecture, #memory-and-performance, #parallel-pattern, #application-case-study, #advanced-practice, #computational-thinking, #numerical-considerations
pattern #convolution, #deep-learning, #dynamic-parallelism, #electrostatic-potential, #graph-traversal, #histogram, #image-blur, #matrix-multiplication, #merge-sort, #mpi-cluster, #mri-reconstruction, #ordered-merge, #radix-sort, #reduction, #scan, #sparse-matrix, #stencil
technique #aggregation, #atomic-operations, #barrier-synchronization, #computation-communication-overlap, #constant-memory, #control-divergence, #corner-turning, #cuda-streams, #cutoff-binning, #halo-exchange, #im2col-unrolling, #load-balancing, #loop-transformation, #memory-coalescing, #memory-linearization, #pinned-memory, #privatization, #register-optimization, #register-tiling, #scatter-vs-gather, #shared-memory, #thread-coarsening, #tiling, #work-efficiency
note-type #concept, #practice, #dashboard
concept 層標籤

concept 層為各主題的具體概念/演算法標籤(共 88 個),例如 #kogge-stone#brent-kung#radix-sort#csr-format#warps-simd#roofline-model 等,依筆記內容選用。

標籤規則

每篇筆記必含 1 個 area 標籤,再依內容附加 2–4 個 pattern/technique/concept 標籤(detail 標籤一律 co-attach 其 area)。所有標籤皆為英文 kebab-case,且僅能取自本 registry。

Weak Areas

建議重點複習(易錯/高頻考點)

Non-core Topic Policy

來源 內容 處理
programming-massively-parallel-processors.pdf 全 23 章 + Appendix A 全數涵蓋 — 無排除章節
同上:各章 Exercises / References 課後習題答案、文獻清單 未逐題收錄;習題精神已轉化為各資料夾 Practice 題目