|
1 | 1 | # Lesson 07 |
2 | 2 |
|
3 | | -Here we will be fuzzing [libxml2]. During this lesson we will: |
4 | | -* see an importance of dictionaries |
5 | | -* learn how to minimize the corpus |
6 | | -* generate coverage report |
7 | | -* catch Out-of-Memory errors and memory leaks |
8 | | - |
9 | | - |
10 | | -### Build the library |
11 | | - |
12 | | -```bash |
13 | | -tar xzf libxml2.tgz |
14 | | -cd libxml2 |
15 | | - |
16 | | -./autogen.sh |
17 | | - |
18 | | -export FUZZ_CXXFLAGS="-O2 -fno-omit-frame-pointer -g -fsanitize=address \ |
19 | | - -fsanitize-coverage=edge,indirect-calls,8bit-counters,trace-cmp,trace-div,trace-gep" |
20 | | - |
21 | | -CXX="clang++ $FUZZ_CXXFLAGS" CC="clang $FUZZ_CXXFLAGS" \ |
22 | | - CCLD="clang++ $FUZZ_CXXFLAGS" ./configure |
23 | | -make -j$(nproc) |
24 | | -``` |
25 | | - |
26 | | -### Build the first fuzzer |
27 | | - |
28 | | -Take a look at the following fuzzer. Note the `xmlSetGenericErrorFunc` call. It |
29 | | -is there to disable logging of error messages like "Incorrect XML document". |
30 | | -These messages are very noisy, given the numbe rof invalid input generated by |
31 | | -the fuzzer: |
32 | | - |
33 | | -```cpp |
34 | | -#include "libxml/parser.h" |
35 | | - |
36 | | -void ignore (void* ctx, const char* msg, ...) { |
37 | | - // Error handler to avoid spam of error messages from libxml parser. |
38 | | -} |
39 | | - |
40 | | -extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { |
41 | | - xmlSetGenericErrorFunc(NULL, &ignore); |
42 | | - |
43 | | - if (auto doc = xmlReadMemory(reinterpret_cast<const char*>(data), |
44 | | - static_cast<int>(size), "noname.xml", NULL, 0)) { |
45 | | - xmlFreeDoc(doc); |
46 | | - } |
47 | | - |
48 | | - return 0; |
49 | | -} |
50 | | -``` |
51 | | -
|
52 | | -Then build it: |
53 | | -
|
54 | | -```bash |
55 | | -cd .. |
56 | | -clang++ -std=c++11 xml_read_memory_fuzzer.cc $FUZZ_CXXFLAGS -I libxml2/include \ |
57 | | - libxml2/.libs/libxml2.a ../../libFuzzer/libFuzzer.a -lz \ |
58 | | - -o xml_read_memory_fuzzer |
59 | | -``` |
60 | | - |
61 | | -### Run the fuzzer with and without a dictionary |
62 | | - |
63 | | -Run the fuzzer on empty corpus for 5 minutes (`-max_total_time=300`): |
64 | | - |
65 | | -```bash |
66 | | -mkdir corpus1 |
67 | | -./xml_read_memory_fuzzer -max_total_time=300 -print_final_stats=1 corpus1 |
68 | | -``` |
69 | | - |
70 | | -Open a new terminal and run the fuzzing on empty corpus again, but also add a |
71 | | -dictionary (`-dict=`): |
72 | | - |
73 | | -```bash |
74 | | -mkdir corpus2 |
75 | | -./xml_read_memory_fuzzer -dict=./xml.dict -max_total_time=300 \ |
76 | | - -print_final_stats=1 corpus2 |
77 | | -``` |
78 | | - |
79 | | -Compare output of both processes while they are running. You should see that the |
80 | | -second process gets the same coverage as the first one and then overrun it very |
81 | | -quickly. This is an impact of dictionary used. |
82 | | - |
83 | | - |
84 | | -### Corpus and coverage |
85 | | - |
86 | | -The first process terminates somewhere at: |
87 | | - |
88 | | -``` |
89 | | -#1975901 DONE cov: 1736 ft: 5795 corp: 1544/75Kb exec/s: 6564 rss: 494Mb |
90 | | -``` |
91 | | - |
92 | | -Let's minimize its corpus (using `-merge=1` flag): |
93 | | - |
94 | | -```bash |
95 | | -mkdir corpus1_min |
96 | | -./xml_read_memory_fuzzer -merge=1 corpus1_min corpus1 |
97 | | -``` |
98 | | - |
99 | | -The output looks like: |
100 | | - |
101 | | -```bash |
102 | | -INFO: Seed: 1508800405 |
103 | | -INFO: Loaded 1 modules (79184 guards): [0xd017e0, 0xd4ed20), |
104 | | -INFO: -max_len is not provided, using 1048576 |
105 | | -Loaded 1024/1539 files from corpus1 |
106 | | -=== Merging extra 1539 units |
107 | | -#1539 MIN0 cov: 1723 ft: 5810 units: 1008 exec/s: 0 rss: 95Mb |
108 | | -#2547 MIN1 cov: 1724 ft: 5764 units: 987 exec/s: 0 rss: 125Mb |
109 | | -#3534 MIN2 cov: 1724 ft: 5765 units: 975 exec/s: 0 rss: 154Mb |
110 | | -#4509 MIN3 cov: 1724 ft: 5763 units: 971 exec/s: 0 rss: 183Mb |
111 | | -=== Merge: written 971 units |
112 | | -``` |
113 | | -
|
114 | | -That means that libFuzzer made `971` testcase out of `1539` at the same code |
115 | | -coverage. |
116 | | -
|
117 | | -To get some understanding of inputs generated by the fuzzer from scratch, let's |
118 | | -brielfy go through the corpus: |
119 | | -
|
120 | | -```bash |
121 | | -strings corpus1_min/* | more |
122 | | -``` |
123 | | -
|
124 | | -The second process terminates somewhere at: |
125 | | -
|
126 | | -``` |
127 | | -#2317811 DONE cov: 2873 ft: 8005 corp: 2359/121Kb exec/s: 7700 rss: 438Mb |
128 | | -``` |
129 | | -
|
130 | | -The coverage is significantly higher comparing with the first process output. |
131 | | -
|
132 | | -Let's minimize its corpus as well: |
133 | | -
|
134 | | -```bash |
135 | | -mkdir corpus2_min |
136 | | -./xml_read_memory_fuzzer -merge=1 corpus2_min corpus2 |
137 | | -``` |
138 | | -
|
139 | | -The output: |
140 | | -
|
141 | | -```bash |
142 | | -INFO: Seed: 2449634923 |
143 | | -INFO: Loaded 1 modules (79184 guards): [0xd017e0, 0xd4ed20), |
144 | | -INFO: -max_len is not provided, using 1048576 |
145 | | -Loaded 1024/2356 files from corpus2 |
146 | | -Loaded 2048/2356 files from corpus2 |
147 | | -=== Merging extra 2356 units |
148 | | -#2356 MIN0 cov: 2829 ft: 8012 units: 1571 exec/s: 0 rss: 126Mb |
149 | | -#3927 MIN1 cov: 2830 ft: 7970 units: 1516 exec/s: 0 rss: 169Mb |
150 | | -#5443 MIN2 cov: 2830 ft: 7969 units: 1503 exec/s: 0 rss: 210Mb |
151 | | -#6946 MIN3 cov: 2830 ft: 7968 units: 1496 exec/s: 6946 rss: 250Mb |
152 | | -#8442 MIN4 cov: 2830 ft: 7967 units: 1494 exec/s: 8442 rss: 291Mb |
153 | | -=== Merge: written 1494 units |
154 | | -``` |
155 | | -
|
156 | | -And quickly go through the inputs generated by the fuzzer with a dictionary: |
157 | | -
|
158 | | -```bash |
159 | | -strings corpus2_min/* | more |
160 | | -``` |
161 | | -
|
162 | | -### Generate coverage report |
163 | | -
|
164 | | -```bash |
165 | | -ASAN_OPTIONS=coverage=1 ./xml_read_memory_fuzzer corpus1_min -runs=0 |
166 | | -``` |
167 | | -
|
168 | | -This command should generate `.sancov` file in your working directory: |
169 | | -
|
170 | | -```bash |
171 | | -$ ls *.sancov |
172 | | -xml_read_memory_fuzzer.26851.sancov |
173 | | -``` |
174 | | -
|
175 | | -Then we need to convert that binary file to a symbolized `.symcov` file: |
176 | | -
|
177 | | -```bash |
178 | | -sancov -symbolize xml_read_memory_fuzzer xml_read_memory_fuzzer.26851.sancov \ |
179 | | - > xml_read_memory_fuzzer.symcov |
180 | | -``` |
181 | | -
|
182 | | -To see the coverage report with user-friendly interface, let's launch local |
183 | | -[coverage report server]: |
184 | | -
|
185 | | -```bash |
186 | | -python3 coverage-report-server.py --symcov xml_read_memory_fuzzer.symcov \ |
187 | | - --srcpath libxml2 |
188 | | -``` |
189 | | -
|
190 | | -Open [localhost:8001](http://localhost:8001/) in your browser to see the report. |
191 | | -
|
192 | | -
|
193 | | -Let's generate coverage report for the second corpus (generated with dictionary) |
194 | | -and compare both reports by eyes. Open new terminal and do the same stuff: |
195 | | -
|
196 | | -```bash |
197 | | -ASAN_OPTIONS=coverage=1 ./xml_read_memory_fuzzer corpus2_min -runs=0 |
198 | | - |
199 | | -sancov -symbolize xml_read_memory_fuzzer <NEW_.SANCOV_FILE_PATH> \ |
200 | | - > xml_read_memory_fuzzer_2.symcov |
201 | | - |
202 | | -python3 coverage-report-server.py --symcov xml_read_memory_fuzzer_2.symcov \ |
203 | | - --srcpath libxml2 --port 8002 |
204 | | -``` |
205 | | -
|
206 | | -Go to [localhost:8002](http://localhost:8002/). |
207 | | -
|
208 | | -The second report obviously has higher percentage of coverage for the same files |
209 | | -and even more source code files covered. |
210 | | -
|
211 | | -
|
212 | | -### Build the second fuzzer |
213 | | -
|
214 | | -The second fuzzer aims `xmlRegexpCompile` function of libxml2 library: |
215 | | -
|
216 | | -```cpp |
217 | | -#include "libxml/parser.h" |
218 | | -#include "libxml/tree.h" |
219 | | -#include "libxml/xmlversion.h" |
220 | | - |
221 | | -void ignore (void * ctx, const char * msg, ...) { |
222 | | - // Error handler to avoid spam of error messages from libxml parser. |
223 | | -} |
224 | | - |
225 | | -// Entry point for LibFuzzer. |
226 | | -extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { |
227 | | - xmlSetGenericErrorFunc(NULL, &ignore); |
228 | | - |
229 | | - std::vector<uint8_t> buffer(size + 1, 0); |
230 | | - std::copy(data, data + size, buffer.data()); |
231 | | - |
232 | | - xmlRegexpPtr x = xmlRegexpCompile(buffer.data()); |
233 | | - if (x) |
234 | | - xmlRegFreeRegexp(x); |
235 | | - |
236 | | - return 0; |
237 | | -} |
238 | | -``` |
239 | | -
|
240 | | -Let's build it and run: |
241 | | -
|
242 | | -```bash |
243 | | -clang++ -std=c++11 xml_compile_regexp_fuzzer.cc $FUZZ_CXXFLAGS \ |
244 | | - -I libxml2/include libxml2/.libs/libxml2.a ../../libFuzzer/libFuzzer.a -lz \ |
245 | | - -o xml_compile_regexp_fuzzer |
246 | | -
|
247 | | -mkdir corpus3 |
248 | | -./xml_compile_regexp_fuzzer -dict=./xml.dict corpus3 |
249 | | -``` |
250 | | -
|
251 | | -You will quickly get an Out-of-memory crash: |
252 | | -
|
253 | | -```bash |
254 | | -#796 NEW cov: 289 bits: 845 indir: 49 corp: 54/1518b exec/s: 0 rss: 43Mb L: 64 MS: 4 CrossOver-PersAutoDict-CrossOver-ChangeByte- DE: " xml:id=\"1\""- |
255 | | -#800 NEW cov: 289 bits: 855 indir: 49 corp: 55/1556b exec/s: 0 rss: 43Mb L: 38 MS: 3 PersAutoDict-ChangeBit-CrossOver- DE: "%a"- |
256 | | -==27928== ERROR: libFuzzer: out-of-memory (used: 2100Mb; limit: 2048Mb) |
257 | | - To change the out-of-memory limit use -rss_limit_mb=<N> |
258 | | -
|
259 | | -Live Heap Allocations: 1003258238 bytes from 30527559 allocations; showing top 95% |
260 | | -732653304 byte(s) (73%) in 30527221 allocation(s) |
261 | | - #0 0x4c2a0c in __interceptor_malloc (/home/mmoroz/projects/libfuzzer-workshop/lessons/07/xml_compile_regexp_fuzzer+0x4c2a0c) |
262 | | - #1 0x5d8506 in xmlRegNewRange /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:719:28 |
263 | | - #2 0x5d8506 in xmlRegAtomAddRange /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:1251 |
264 | | - #3 0x5d717e in xmlFAParseCharRange /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5066:9 |
265 | | - #4 0x5d717e in xmlFAParsePosCharGroup /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5084 |
266 | | - #5 0x5d4c40 in xmlFAParseCharGroup /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5125:6 |
267 | | - #6 0x5d2f89 in xmlFAParseCharClass /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5145:2 |
268 | | - #7 0x5d2f89 in xmlFAParseAtom /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5299 |
269 | | - #8 0x5d2f89 in xmlFAParsePiece /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5316 |
270 | | - #9 0x5d25e4 in xmlFAParseBranch /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5351:8 |
271 | | - #10 0x5b03ad in xmlFAParseRegExp /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5377:5 |
272 | | - #11 0x5af8f4 in xmlRegexpCompile /home/mmoroz/projects/libfuzzer-workshop/lessons/07/libxml2/xmlregexp.c:5473:5 |
273 | | - #12 0x4f14d0 in LLVMFuzzerTestOneInput /home/mmoroz/projects/libfuzzer-workshop/lessons/07/xml_compile_regexp_fuzzer.cc:27:20 |
274 | | - <...> |
275 | | -``` |
276 | | -
|
277 | | -In some cases it can be a memory leak. To detect leaks, enable `detect_leaks=1` |
278 | | -option of AddressSanitizer and run the fuzzer again: |
279 | | -
|
280 | | -```bash |
281 | | -ASAN_OPTIONS=detect_leaks=1 ./xml_compile_regexp_fuzzer -dict=./xml.dict corpus3 |
282 | | -``` |
283 | | -
|
284 | | -That option enabled LeakSanitizer (a part of AddressSanitizer) to report memory |
285 | | -leaks and crash the similar way as other crash reports. |
286 | | -
|
287 | | -[coverage report server]: http://llvm.org/svn/llvm-project/llvm/trunk/tools/sancov/coverage-report-server.py |
288 | | -[libxml2]: http://www.xmlsoft.org/ |
| 3 | +This is a theorethical lesson, see the slides. |
0 commit comments