You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/english/hpc/compilation/situational.md
+34-2Lines changed: 34 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,9 +71,41 @@ int factorial(int n) {
71
71
```
72
72
73
73
<!--
74
-
75
74
What it usually does is it swaps the branches so that the more likely one goes immediately after jump (recall that "don't jump" branch is taken by default). The performance gain is usually rather small, because for most hot spots hardware branch prediction works just fine.
76
-
77
75
-->
78
76
79
77
There are many other cases like this when you need to point the compiler in the right direction, but we will get to them later when they become more relevant.
78
+
79
+
### Profile-Guided Optimization
80
+
81
+
Adding all this metadata to the source code is tedious. People already hate writing C++ even without having to do it.
82
+
83
+
It is also not always obvious whether certain optimizations are beneficial or not. To make a decision about branch reordering, function inlining, or loop unrolling, we need answers to questions like these:
84
+
85
+
- How often is this branch taken?
86
+
- How often is this function called?
87
+
- What is the average number of iterations in this loop?
88
+
89
+
Luckily for us, there is a way to provide this real-world information automatically.
90
+
91
+
*Profile-guided optimization* (PGO, also called "pogo" because it's easier and more fun to pronounce) is a technique that uses [profiling data](/hpc/profiling) to improve performance beyond what can be achieved with just static analysis. In a nutshell, it involves adding timers and counters to the points of interest in the program, compiling and running it on real data, and then compiling it again, but this time supplying additional information from the test run.
92
+
93
+
The whole process is automated by modern compilers. For example, the `-fprofile-generate` flag will let GCC instrument the program with profiling code:
After we run the program — preferably on input that is as representative of real use case as possible — it will create a bunch of `*.gcda` files that contain log data for the test run, after which we can rebuild the program, but now adding the `-fprofile-use` flag:
It usually improves performance by 10-20% for large codebases, and for this reason it is commonly included in the build process of performance-critical projects. One more reason to invest in solid benchmarking code.
106
+
107
+
<!--
108
+
109
+
We will study how profiling works more deeply in the [next chapter](../../profiling).
0 commit comments