Skip to content

Commit ab8ab8d

Browse files
committed
Update profiling on linux
1 parent 1d93218 commit ab8ab8d

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

org/implementation-details.org

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ You can profile Clasp compiled code and C++ together. On linux we use the ~perf~
5454

5555
** Profiling on linux
5656

57-
Clasp generates just in time (JIT) code for which we need symbol information. Use the ~(ext:generate-perf-map)~ command within clasp to generate a ~/tmp/perf-<pid>.map file that ~perf~ will use to convert addresses to symbols. Do this just prior to evaluating the code that you want to profile so that you get the maximum coverage of JITted symbols.
57+
Clasp generates just in time (JIT) code for which we need symbol information. Use the ~(ext:generate-perf-map)~ function within clasp to generate a ~/tmp/perf-<pid>.map~ file that ~perf~ will use to convert addresses to symbols. Do this just prior to evaluating the code that you want to profile so that you get the maximum coverage of JITted symbols.
5858

5959
#+BEGIN_SRC lisp
6060
COMMON-LISP-USER> (ext:getpid)
@@ -84,10 +84,8 @@ COMMON-LISP-USER> (time (dotimes (i 10) (fibonacci 41)))
8484

8585
In another window, within the FlameGraph directory do the following..
8686

87-
The ~perf~ profiling stack is only 127 frames deep by default and that is not enough for many profiling rungs. Use the following command to increase the depth.
88-
8987
#+BEGIN_SRC sh
90-
$ sudo sysctl -w kernel.perf_event_max_stack=2048 # (1)
88+
$ sudo sysctl -w kernel.perf_event_max_stack=1024 # (1)
9189
$ perf record -F 99 -p 33159 -g -o /tmp/perf.data -- sleep 10 # (2)
9290
[ perf record: Woken up 4 times to write data ]
9391
[ perf record: Captured and wrote 0.873 MB perf.data (983 samples) ]
@@ -96,7 +94,7 @@ $ ./flamegraph.pl /tmp/out.perf-folded >/tmp/perf.svg # (4)
9694
#+END_SRC
9795

9896
1. The profiling stack is only 127 frames deep by default. This will increase it to 2048.
99-
2. Record the perf data for our process.
97+
2. Record the perf data for our process. This needs to run during the time consuming process in Clasp.
10098
3. Generate the backtraces and fold them according to the flame graph instructions.
10199
4. Generate the flame graph.
102100

0 commit comments

Comments
 (0)