Phase |
Step |
Action |
Baseline |
1 |
Select performance metrics from
the VTune performance analyzer |
2 |
Select suitable test code/data
and run base-line case |
3 |
Record
baseline metric values of all counters in the VTune performance
analyzer |
Serial |
4 |
Generate a routine calling tree |
5 |
Profile and rank the routines by
decreasing CPU time usage |
6 |
In the top ranking routine
analyze the loop structure |
7 |
Optimize the top ranking routine
using code modifications or compiler options. |
8 |
Repeat steps 2-3 for the modified
test code and compare to the base-line case |
9 |
Repeat 4-8 (for each new top
routine) |
Parallel |
10 |
Present serial optimized test
code to the vendor auto-parallel preprocessor |
11 |
Study output source and replace
vendor directives with OpenMP directives (modify as needed) |
12 |
Repeat steps 2-3 for parallel
test code and compare to base-line case |
13 |
If parallel code produces
incorrect numerical results go to step 15 |
14 |
Optimize the OpenMP parallel code
using the Intel Thread Checker™ |
15 |
Validate the parallel code using
the Intel Thread Checker™ |
16 |
Repeat steps 2-3 for modified
test code and compare to base-line case |