計算思維與平行化策略 (Computational Thinking)

#computational-thinking #algorithm-selection #problem-decomposition #cutoff-binning #application-case-study

重點總覽 (Overview)

主題	核心重點
定義 (Definition)	Computational thinking = 把 domain problem formulate 成 computation steps 與 algorithms 的思考過程 (Wing, 2006)
本質 (Nature)	是一門「藝術 (art)」，靠 iterative（實作 ↔ 抽象概念來回）與 bottom-up 學習培養
三大基礎技能	① Computer architecture ② Programming interfaces & compilers ③ Domain knowledge
平行程式三步驟	algorithm selection → problem decomposition → optimization & tuning
三種攻略 (good/better/best)	難度、複雜度、報酬「遞增」：good=accelerate；better=rewrite；best=holistic rethink
終極目標	讓科學「better, not just faster」— 用新方法解新問題

Important

本筆記聚焦 §19.4 Computational Thinking 與 §19.5 Summary。三步驟中的前兩步細節分別在 19-Parallel-Programming-And-Computational-Thinking/02-Algorithm-Selection-and-Tradeoffs 與 19-Parallel-Programming-And-Computational-Thinking/03-Problem-Decomposition-Output-vs-Input-Centric；本篇講「整體思維」與「攻略層級」。

計算思維的定義 (Definition of Computational Thinking)

定義：computational thinking 是「formulating domain problems in terms of computation steps and algorithms」的思考過程 (Wing, 2006)。
它是「parallel application development 中最重要的一環」。
強的計算思維者不只「分析」更會「transform」問題結構：辨識
- 哪些部分本質上是 serial（無法平行）。
- 哪些部分適合 high-performance parallel execution。
- 以及把前者搬到後者所牽涉的 domain-specific tradeoffs。
學習方式：是一門 art，最好用 iterative 方式教 — 在「practical experience」與「abstract concepts」之間來回。
bottom-up：先在「某個具體 programming model（如 CUDA C）」中學概念，打穩基礎後再 generalize 到其他 model（甚至非 GPU 的平行模型）。

Tip

書本刻意用 CUDA C 實作來「示範」這些技巧，但目的是建立「通用」的 computational thinking，而非只會寫 CUDA。

計算思維的基礎技能 (Foundational Skills)

要成為有效的 computational thinker，需橫跨三大知識領域：

技能領域	涵蓋內容	為何需要
Computer architecture	memory organization、caching & locality、memory bandwidth、SIMT vs SPMD vs SIMD 執行模型、floating-point precision vs accuracy	理解 algorithms / decompositions / optimizations 之間的 tradeoffs
Programming interfaces & compilers	parallel execution models、可用的 memory 種類、synchronization 支援、array data layout、thread granularity transformation（即 thread coarsening）	思考「資料結構與迴圈結構」如何安排以獲得更好效能
Domain knowledge	problem formulation、hard vs soft constraints、numerical methods、precision/accuracy、numerical stability	懂這些「ground rules」才能在套用演算法技巧時更有創意

SIMT vs SPMD vs SIMD 別混為一談

SIMD：硬體層級，一條指令同時操作多筆資料（vector lane）。
SIMT：GPU 的執行模型，warp 內 thread 各有自己的 PC/狀態但「鎖步」執行（容許 control divergence）。
SPMD：程式設計層級，所有 thread/process 跑「同一份程式」但處理不同資料（含 MPI ranks）。
三者層級不同，常被當成同義詞而答錯。

Tip

三大領域分別對應全書三大區塊：architecture（Ch.4-6）、interfaces/compilers（Ch.2-3）、domain（Ch.16-18 案例 + Ch.24 numerical）。最好的精進方式是「持續用優秀的計算解法解困難問題」。

平行程式設計的三步驟 (The Three Steps of Parallel Programming)

平行程式設計流程可拆成三步（§19.1 末提出，§19.5 總結）：

   ┌──────────────────┐   ┌──────────────────────┐   ┌──────────────────────────┐
   │ 1. Algorithm     │ → │ 2. Problem           │ → │ 3. Optimization & Tuning │
   │    Selection     │   │    Decomposition     │   │   (performance)          │
   ├──────────────────┤   ├──────────────────────┤   ├──────────────────────────┤
   │ complexity vs    │   │ output-centric       │   │ coalescing, tiling,      │
   │ parallelism;     │   │ (gather) vs          │   │ coarsening, occupancy... │
   │ generality vs    │   │ input-centric        │   │                          │
   │ efficiency;      │   │ (scatter);           │   │  ← Ch.6 一般性處理         │
   │ complexity vs    │   │ parallelism / load   │   │                          │
   │ accuracy         │   │ balance              │   │                          │
   └──────────────────┘   └──────────────────────┘   └──────────────────────────┘
        §19.2 / Note 02         §19.3 / Note 03            Ch.6 / Note 04 (Ch.6)

步驟	關鍵抉擇	詳見
Algorithm selection	在多個演算法間取得「best compromise」（complexity / parallelism / generality / accuracy）	19-Parallel-Programming-And-Computational-Thinking/02-Algorithm-Selection-and-Tradeoffs
Problem decomposition	output-centric (gather) vs input-centric (scatter)；考量 parallelism、load balance	19-Parallel-Programming-And-Computational-Thinking/03-Problem-Decomposition-Output-vs-Input-Centric
Optimization & tuning	coalescing、tiling、coarsening、privatization…	06-Performance-Considerations/04-Optimization-Checklist-and-Bottlenecks

Important

三步驟中，「optimization & tuning」是前面各章的主軸（Ch.6 做一般性處理）；本章 Ch.19 補足前兩步「algorithm selection」與「problem decomposition」的通用、深入討論。

平行化策略：Good / Better / Best (Parallelization Approaches)

攻擊「computation-hungry applications」有三種方法，難度、複雜度、報酬「同步遞增」：

  payoff / 報酬
   ▲
   │                                          ┌────────────┐
   │                                          │   BEST     │  holistic: 三步驟全做
   │                          ┌────────────┐  │ + rethink  │  + 重新設計 numerical
   │                          │  BETTER    │  │  numerics  │    methods/algorithms
   │       ┌────────────┐     │  rewrite   │  └────────────┘
   │       │   GOOD     │     │  / new     │   需 CS + domain（跨領域）
   │       │ accelerate │     │  code      │
   │       │ (recompile)│     └────────────┘
   │       └────────────┘      需 CS、少 domain
   │        少 CS、少 domain
   └────────────────────────────────────────────────────────►  難度 / 複雜度

層級	做法	需要的步驟	適合誰	限制 / 報酬
Good	「accelerate」legacy code：最基本是 recompile 換平台；可再用 optimized libraries/tools/directives（CuBLAS、CuFFT、Thrust、MATLAB、OpenACC）	「不需要」algorithm selection / decomposition / tuning	domain scientists（最少 CS 知識）	立即見效、decent speedup，但無法發揮平行運算全部潛力
Better	用 parallel programming 技巧 rewrite 現有 code，或從頭寫新 code	problem decomposition + optimization selection	nondomain computer scientists（最少 domain 知識）	仍無法達全部潛力 — 缺 domain 知識，無法做有效的 algorithm selection
Best	holistic：三步驟全做（algorithm selection + decomposition + optimization/tuning），並 rethink numerical methods & algorithms	全部三步驟 + 重新設計演算法	跨領域（interdisciplinary）：CS + domain insight	最大效能優勢，可能帶來 fundamental new discoveries（如做出傳統方法做不到的 high-fidelity 模擬）

「Best」的代表範例：Cutoff Binning

Ch.18 的 cutoff binning 是 best approach 的典範：

需 domain expertise → 願意「以少量 accuracy 換取 algorithmic complexity 大幅下降」。
需 decomposition + optimization 技巧 → 用 grid-centric (output-centric) 分解 + computer scientist 設計的 binning 技巧。
詳見 18-Electrostatic-Potential-Map/03-Cutoff-Binning-for-Scalability。

複雜度量化直覺：full-accuracy 的 DCS 計算量隨體積「平方」成長（work ∝ volume²，量級 O(N²)）；cutoff summation 利用「遠處粒子貢獻微小」改用 implicit method，把複雜度壓到接近 O(N) — 這正是 best approach 才能達成的演算法層級突破。

「Make science better, not just faster」

不會有人因為「just recompile」或「同樣演算法只是多開幾條 thread」拿 Nobel Prize 或 Turing Award。真正重大的科學發現來自 fresh computational thinking — 用這份運算紅利去「用新方法解新問題」。

總結 (Summary §19.5)

平行程式設計三主步：algorithm selection、problem decomposition、optimization & tuning。
Algorithm selection：常需在多演算法間取捨；有些在「同精度」下取得不同 tradeoff，有些則「犧牲精度」換取更 scalable 的執行時間。
Problem decomposition：不同分解造成不同的 thread 間干擾 (interference)、暴露的 parallelism、load imbalance 與其他效能因素。
Computational thinking 技能讓設計者「繞過 roadblocks」抵達好的解法。

考試/面試重點 (Exam / Test Patterns)

情境 / 關鍵字	答案 / 技巧
「Computational thinking 的定義」	把 domain problem formulate 成 computation steps 與 algorithms 的思考過程 (Wing, 2006)
「平行程式設計的三步驟」	algorithm selection → problem decomposition → optimization & tuning
「computational thinker 需要哪三大基礎技能」	① computer architecture ② programming interfaces & compilers ③ domain knowledge
「SIMT / SPMD / SIMD 差異」	SIMD=硬體 vector；SIMT=GPU warp 鎖步但各有狀態；SPMD=同程式跑不同資料（含 MPI）
「good approach 是什麼？用什麼工具」	recompile + optimized libraries：CuBLAS、CuFFT、Thrust、MATLAB、OpenACC；不需 algorithm selection/decomposition/tuning
「better vs best 差在哪」	better=rewrite/新寫，做 decomposition+optimization 但缺 domain 知識；best=holistic，連 numerical methods/algorithms 一起 rethink，需 CS+domain
「best approach 的書中範例」	cutoff binning（Ch.18）：domain 換 accuracy 降 complexity + grid-centric decomposition + binning
「為何 good/better 達不到全部潛力」	good 無平行洞見；better 缺 domain knowledge → 無法做有效 algorithm selection
「make science better, not just faster」	重大發現來自 fresh computational thinking，非「just recompile / more threads」
「DCS vs cutoff 複雜度」	DCS work ∝ volume²（O(N²)）；cutoff 近 O(N)，犧牲少量 accuracy

重點總覽 (Overview)

計算思維的定義 (Definition of Computational Thinking)

計算思維的基礎技能 (Foundational Skills)

平行程式設計的三步驟 (The Three Steps of Parallel Programming)

平行化策略：Good / Better / Best (Parallelization Approaches)

總結 (Summary §19.5)

考試/面試重點 (Exam / Test Patterns)

Related Notes