What is OpenMP?
Open specification for Multi-Processing
-> method for portable programming of shared memory systems
industry standard: ARB (Architecture Review Board)
What is the fork-join model?
start with single thread (initial thread)
parallel regions create additional threads (team threads)
team threads dissappear
implementations may keep team threads -> efficiency
implciit barrier at end of parallel region
number threads may change btw parallel regionas
What are ways to privatize a variable?
Loop variable: automatically privatized
Private clause: on parallel / worksharing device
#pragma omp parallel private(x)
Local variables: declared from within parallel region
#pragma omp parallel
What are variations of private?
first private: private and initializted with value of shared copy before region
#pragma omp parallel firstprivate(x)
last private: private and value of thread executing last iteration is copied to vaiable outside of region
-> private copies are not initialized
-> first private copies are initialized with outside value
What are some Loop scheduling strategies?
static: chunks of iterations distributed among threads in round-robin fashion
dynamic: threads request chunks of specific size from runtime. when finished -> threads request new chunk
guided: like dynamic, but chunk size prop to remaining work
runtime
What is the difference between Profiling and Tracing?
Profiling
what happenend, how much time
aggregates events and timings for execution as a whole
no chronology of events
low overhead and perturbation of programming
Tracing
when and where events took place
chronology with timestamps
extensive in time and data
high overhead and perturbation of programming
What is the inclusive and exclusive time for main?
int main()
{ /* takes 100 secs */
f1(); /* takes 20 secs */
/* other work */
f2(); /* takes 50 secs */
f3(); /* takes 20 secs */
…
inclusive time: 100 secs
exclusive time: 100-20-50-20 = 10 secs
What is the typical 5-stage pipeline?
Instr N
Pipe
line
stage
1
IF
ID
EX
MEM
WB
2
3
4
5
Clock Cycle
6
7
IF = Instruction Fetch
ID = Instruction Decode
EX = Execute/Calculate Address
MEM = Memory Access
WB = Write Block
What is the max clock frequency?
How to improve?
instructions:
IF = 200 ps
ID = 100 ps
EX = 300 ps
MEM = 200 ps
WB = 100 ps
max clock frequency
= slowest pipeline stage: 300 ps
max freq is 1/300 ps = 3.33 GHz
Improve
reduce latency of pipeline stages
divide pipeline stages (add more stages)
What is a profiling tool for OpenMP?`
ompP
based on preprocessor source code and direct measurement
(mostly) independent of compiler and runtime
What is ompPs Profiling Report?
Header (date, time, …)
Region Overview (number MP regions)
Flat Region Profile
Callgraph and Profiles
Overhead Analysis Report
Performance Property Detection Report
What are Overhead Categories defined in ompP?
Imbalance (I)
Synchronization (S)
Limited Parallelism (L)
Thread Management (M)
What are some OpenMP Tasking?
Initial Task
corresponds to whole program
executed by initial thread
Implicit Task
generated implicitly when parallel region encountered
executed by thread of parallel term
Explicit Task
created explicitly by user (#pragma omp tasj)
eventually executed:
deferred (exec decoupled from generating task)
undeferred (exec of generating tasj supsended until undeferred task completed)
What is the difference between taskwait and taskgroup?
#pragma omp taskwait
= shallow - synchronizes inly immediate child tasks (stand alone)
#pragma omp taskgroup
= deep - synchronizes all descendent tasks (in structured code block)
What are tied tasks, and what untied tasks?
What is final task?
tied tasks
same thread must resume execution (default)
untied tasks
any thread in team can resume executions
controlled by user using untied clause
final task
forces all child tasks to be induced and also final tasks
controlled by user using final clause
Zuletzt geändertvor 20 Tagen