Program instrumentation
The goal is to instrument the fuzzed program to obtain metrics during fuzzing. These metrics are either to guide mutation of inputs or detecting "dangerous" behaviour. Programs needs to be instrumented to give this kind of info. Instrumentation can be done at different levels:
- at source code
- during compilation, usually AST
- binary
Bug oracles
Metrics that tells the fuzzer that it has detected a potential bug:
- segfaults and signals
- memory sanitizer
- Google sanitizers for LLVM
- ASAN, MSAN
- assertions in the code
- different behaviour in differential fuzzing
- memory state
- message passing
Metrics to improve fuzzing
Metrics that are collected and use to improve the selection of future inputs:
- code coverage
- code targeting: how fast it is to access specific code
- "distance" to certain type of vulnerabilities
- logging for better understanding of the program
- power consumption leaks
Code coverage
Multiple possible granularities:
- Basic block
- def: maximal sequence of consecutive statements that are always executed together
- measure which basic block get executed
- this provides least granularity since the coverage does not cover basic block order of execution
- Branch/Edge coverage
- measure the pair of consecutive blocks executed
- a pair of basic block is called an edge
- more precise and try to execute all conditional branches
- algo:
- Give a unique label to all basic block
- Store any data related edge coverage to a global var
- At the beginning of each label xor current label and previous one
- This value is the edge label and is used as index for map coverage
- At the end of a basic block rightshift current label
- this is to prevent 0-value label if basic block jumps on itself
- Store the rightshifted value as "previous visited block"
- At the end of program exec, print/send/store map coverage feedback
Tools for runtime instrumentation/tracing
This is used for blackbox fuzzing where you don't have access to the source code.
Mainly from afl++ doc:
- Frida: dynamic code instrumentation toolkit
- you can inject JS script into your native apps
- debug and script live process
- usable on many platforms: Windows, Mac, Linux, iOS, Android, QNX
- Qemu: dynamic code injection using hooks
- emulator
- TinyInst: runtime dynamic instrumentation library
- more lightweight than other tools
- easier to use but does not fit every usecase
- MacOS and Windows only
- Nyx: only on Linux
- Unicorn: fork of Qemu
- Wine + Qemu: to run Win32 binaries
- Unicorn: fork of Qemu
- Tracing at runtime
- Pintool: Intel x32/x64 on Linux, MacOS and Windows
- Dynamorio
- Intel x32/x64 on Linux, MacOS and Windows
- Arm and AArch64
- faster than Pintool but still slow
- Intel-PT
- use intels processor trace
- downsides: buffer is small and debug info is complex
- two AFL implementations: afl-pt and ptfuzzer
- Coresight: ARM processor trace
Binary instrumentation
Also for blackbox fuzzing. Instrumentation is done only once, having better performance than with runtime instrumentation.
Mainly from AFL++ documentation
- Dyninst
- instruments the target at load time
- save the binary with instrumentations
- Retrowrite: x86 binaries, decompiles to ASM which can be instrumented with afl-gcc
- Zafl: x86 binaries, decompiles to ASM which can be instrumented with afl-gcc
Compile-time instrumentation
Multiple advantages:
- speed: compiler can still optimize code after instrumentation
- portability: the instrumentation is architecture independent
Rust options
Two code coverage options:
- a GCC-compatible, gcov-based coverage implementation, enabled with
-Z profile
, which derives coverage data based on DebugInfo - a source-based code coverage implementation, enabled with
-C instrument-coverage
, which uses LLVM's native, efficient coverage instrumentation to generate very precise coverage data
Rust Source-based coverage
cargo-fuzz
uses this techniquecargo-fuzz
is not a fuzzer but a framework to call a fuzzer- the only supported fuzzer is
libFuzzer
- through the
libfuzzer-sys
crate
- done on MIR
- based on llvm source-based code coverage
rustc -C instrument-coverage
does:- insert
llvm.instrprof.increment
at control-flows - add a map in each library and binary to keep track of coverage information
- use symbol mangling v0
- insert
- uses the Rust profiler runtime
- enabled by default on the
+nightly
channel
- enabled by default on the
- needs to use a Rust demangler:
rustfilt
- can be provided to llvm options
Using it
- Compile with
cargo
RUSTFLAGS="-C instrument-coverage" cargo build
- may be necessary to use the profiler runtime:
RUSTC=$HOME/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc
- Run the binary compiled
- it should produce a file
default_*.profraw
- or name it with
LLVM_PROFILE_FILE="toto.profraw"
- it should produce a file
- Process coverage data with
llvm-profdata
- can be installed with
rustup
llvm-profdata merge -sparse toto.profraw -o toto.profdata
- can be installed with
- Create reports with
llvm-cov
- can be installed with
rustup
- create a report when combining profdata with the binary
llvm-cov show -Xdemangler=rustfilt target/debug/examples/toto \ -instr-profile=toto.profdata \ -show-line-counts-or-regions \ -show-instantiations \ -name=add_quoted_string
- can be installed with
LLVM options
LLVM has multiple options to instrument program during compilation
- Source Based Coverage
- Sanitizer Coverage
gcov
: A GCC-compatible coverage implementation which operates on DebugInfo. This is enabled by-ftest-coverage
or--coverage
Source-Based Coverage
Operates on AST and preprocessor information directly
- better to map lines of Rust source code to coverage reports
-fprofile-instr-generate -fcoverage-mapping
Sanitizer Coverage
- operates on LLVM IR
-fsanitize-coverage=trace-pc-guard
to trace with guards/closures- will insert a call to
__sanitizer_cov_trace_pc_guard(&guard_variable)
on every edge __sanitizer_cov_trace_pc_guard(&guard_variable)
can be- implemented by user
- defaulted to a counter with
-fsanitize-coverage=inline-8bit-counters
- defualted to a boolean flag with
-fsanitize-coverage=inline-bool-flag
- will insert a call to
- partial instrumentation with
-fsanitize-coverage-allowlist=allowlist.txt
and-fsanitize-coverage-ignorelist=blocklist.txt
- these lists are filled with function names
LibAFL tools
LibAFL project has directories such as:
libafl_targets
that can be used for instrumentationlibafl_cc
a library that provide facilities to wrap compilers
cargo-libafl
This is a replacement to cargo-fuzz
which went into maintenance.
cargo-libafl
is just a framework to prepare fuzzing.
The actual fuzzer is libfuzzer-sys
that is maintained in libafl_targets
.
cargo libafl init
- create a directory
fuzz_targets
- create a directory
cargo libafl run <fuzz target name>
exec_build
givesRUSTFLAGS="-Cpasses=sancov-module -Cllvm-args=-sanitizer-coverage-level=4 -Cllvm-args=-sanitizer-coverage-inline-8bit-counters -Cllvm-args=-sanitizer-coverage-pc-table -L /home/adang/.local/share/cargo-libafl/rustc-1.70.0-90c5418/cargo-libafl-0.1.8/cargo-libafl -lcargo_libafl_runtime -Cllvm-args=-sanitizer-coverage-trace-compares --cfg fuzzing -Clink-dead-code -Cllvm-args=-sanitizer-coverage-stack-depth -Cdebug-assertions -C codegen-units=1" "cargo" "build" "--manifest-path" "/home/adang/boum/fuzzy/playground/rust-url/fuzz/Cargo.toml" "--target" "x86_64-unknown-linux-gnu" "--release" "--bin" "fuzz_target_1"
AFL tools
Recommendation from AFL Guide to Fuzzing in Depth
LTO > LLVM > gcc plugin > gcc mode
- Important checking the coverage of the fuzzing
- use
afl-showmap
- Section 3.g of AFL Guide to Fuzzing in Depth
- use
LTO mode
- called
afl-clang-lto/afl-clang-lto++
- works with llvm11 or newer
- instrumentation at link time
- autodictionary feature
- while compiling a dictionary is generated
- in fuzzing a dictionary are base inputs that will help improve code coverage
- improve efficiency of the fuzzer by avoiding basic block label collision
- classic coverage labels blocks randomly
- lto-mode instrument the files and avoid block label collision
- only con is that is has long compile time
LLVM mode
gcc plugin mode
Called afl-gcc-fast/afl-g++-fast
Instrument the target with the help of gcc
plugins.
gcc/clang mode
The base version without any special features.
afl-gcc/afl-g++
andafl-clang/afl-clang++
AFL options
CmpLog
Log comparison operands in shared memory for the different mutators. Many comparison operands check whether part of the state are equal to a preset byte range. These byte ranges are called magic bytes.
CmpLog
guide mutations by linking the inputs to the states magic bytes.
laf-intel
"De-optimize" certain LLVM passes to increase code coverage.
persistent-mode
Fuzz a program in a single forked process instead of forking a new process for each fuzz execution. Really important if we fuzz with fork server, speed improvements x10-x20
partial instrumentation
Select which part of the code you want to instrument