Program instrumentation

The goal is to instrument the fuzzed program to obtain metrics during fuzzing. These metrics are either to guide mutation of inputs or detecting "dangerous" behaviour. Programs needs to be instrumented to give this kind of info. Instrumentation can be done at different levels:

at source code
during compilation, usually AST
binary

Bug oracles

Metrics that tells the fuzzer that it has detected a potential bug:

segfaults and signals
memory sanitizer
- Google sanitizers for LLVM
- ASAN, MSAN
assertions in the code
different behaviour in differential fuzzing
- memory state
- message passing

Metrics to improve fuzzing

Metrics that are collected and use to improve the selection of future inputs:

code coverage
code targeting: how fast it is to access specific code
"distance" to certain type of vulnerabilities
logging for better understanding of the program
power consumption leaks

Code coverage

Multiple possible granularities:

Basic block
- def: maximal sequence of consecutive statements that are always executed together
- measure which basic block get executed
- this provides least granularity since the coverage does not cover basic block order of execution
Branch/Edge coverage
- measure the pair of consecutive blocks executed
- a pair of basic block is called an edge
- more precise and try to execute all conditional branches
- algo:
  1. Give a unique label to all basic block
  2. Store any data related edge coverage to a global var
  3. At the beginning of each label xor current label and previous one
  4. This value is the edge label and is used as index for map coverage
  5. At the end of a basic block rightshift current label
    - this is to prevent 0-value label if basic block jumps on itself
  6. Store the rightshifted value as "previous visited block"
  7. At the end of program exec, print/send/store map coverage feedback

Tools for runtime instrumentation/tracing

This is used for blackbox fuzzing where you don't have access to the source code.

Mainly from afl++ doc:

Frida: dynamic code instrumentation toolkit
- you can inject JS script into your native apps
- debug and script live process
- usable on many platforms: Windows, Mac, Linux, iOS, Android, QNX
Qemu: dynamic code injection using hooks
- emulator
TinyInst: runtime dynamic instrumentation library
- more lightweight than other tools
- easier to use but does not fit every usecase
- MacOS and Windows only
Nyx: only on Linux
Unicorn: fork of Qemu
Wine + Qemu: to run Win32 binaries
Unicorn: fork of Qemu
Tracing at runtime
- Pintool: Intel x32/x64 on Linux, MacOS and Windows
- Dynamorio
  - Intel x32/x64 on Linux, MacOS and Windows
  - Arm and AArch64
  - faster than Pintool but still slow
- Intel-PT
  - use intels processor trace
  - downsides: buffer is small and debug info is complex
  - two AFL implementations: afl-pt and ptfuzzer
- Coresight: ARM processor trace

Binary instrumentation

Also for blackbox fuzzing. Instrumentation is done only once, having better performance than with runtime instrumentation.

Mainly from AFL++ documentation

Dyninst
- instruments the target at load time
- save the binary with instrumentations
Retrowrite: x86 binaries, decompiles to ASM which can be instrumented with afl-gcc
Zafl: x86 binaries, decompiles to ASM which can be instrumented with afl-gcc

Compile-time instrumentation

Multiple advantages:

speed: compiler can still optimize code after instrumentation
portability: the instrumentation is architecture independent

Rust options

Two code coverage options:

a GCC-compatible, gcov-based coverage implementation, enabled with -Z profile, which derives coverage data based on DebugInfo
a source-based code coverage implementation, enabled with -C instrument-coverage, which uses LLVM's native, efficient coverage instrumentation to generate very precise coverage data

Rust Source-based coverage

cargo-fuzz uses this technique
- cargo-fuzz is not a fuzzer but a framework to call a fuzzer
- the only supported fuzzer is libFuzzer
- through the libfuzzer-sys crate
done on MIR
based on llvm source-based code coverage
rustc -C instrument-coverage does:
- insert llvm.instrprof.increment at control-flows
- add a map in each library and binary to keep track of coverage information
- use symbol mangling v0
uses the Rust profiler runtime
- enabled by default on the +nightly channel
needs to use a Rust demangler: rustfilt
- can be provided to llvm options

Using it

Compile with cargo
- RUSTFLAGS="-C instrument-coverage" cargo build
- may be necessary to use the profiler runtime: RUSTC=$HOME/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc
Run the binary compiled
- it should produce a file default_*.profraw
- or name it with LLVM_PROFILE_FILE="toto.profraw"
Process coverage data with llvm-profdata
- can be installed with rustup
- llvm-profdata merge -sparse toto.profraw -o toto.profdata
Create reports with llvm-cov
- can be installed with rustup
- create a report when combining profdata with the binary
- llvm-cov show -Xdemangler=rustfilt target/debug/examples/toto \ -instr-profile=toto.profdata \ -show-line-counts-or-regions \ -show-instantiations \ -name=add_quoted_string

LLVM options

LLVM has multiple options to instrument program during compilation

Source Based Coverage
Sanitizer Coverage
gcov: A GCC-compatible coverage implementation which operates on DebugInfo. This is enabled by -ftest-coverage or --coverage

Source-Based Coverage

Operates on AST and preprocessor information directly

better to map lines of Rust source code to coverage reports
-fprofile-instr-generate -fcoverage-mapping

Sanitizer Coverage

operates on LLVM IR
-fsanitize-coverage=trace-pc-guard to trace with guards/closures
- will insert a call to __sanitizer_cov_trace_pc_guard(&guard_variable) on every edge
- __sanitizer_cov_trace_pc_guard(&guard_variable) can be
  - implemented by user
  - defaulted to a counter with -fsanitize-coverage=inline-8bit-counters
  - defualted to a boolean flag with -fsanitize-coverage=inline-bool-flag
partial instrumentation with -fsanitize-coverage-allowlist=allowlist.txt and -fsanitize-coverage-ignorelist=blocklist.txt
- these lists are filled with function names

LibAFL tools

LibAFL project has directories such as:

libafl_targets that can be used for instrumentation
libafl_cc a library that provide facilities to wrap compilers

`cargo-libafl`

This is a replacement to cargo-fuzz which went into maintenance. cargo-libafl is just a framework to prepare fuzzing. The actual fuzzer is libfuzzer-sys that is maintained in libafl_targets.

cargo libafl init
- create a directory fuzz_targets
cargo libafl run <fuzz target name>
- exec_build gives RUSTFLAGS="-Cpasses=sancov-module -Cllvm-args=-sanitizer-coverage-level=4 -Cllvm-args=-sanitizer-coverage-inline-8bit-counters -Cllvm-args=-sanitizer-coverage-pc-table -L /home/adang/.local/share/cargo-libafl/rustc-1.70.0-90c5418/cargo-libafl-0.1.8/cargo-libafl -lcargo_libafl_runtime -Cllvm-args=-sanitizer-coverage-trace-compares --cfg fuzzing -Clink-dead-code -Cllvm-args=-sanitizer-coverage-stack-depth -Cdebug-assertions -C codegen-units=1" "cargo" "build" "--manifest-path" "/home/adang/boum/fuzzy/playground/rust-url/fuzz/Cargo.toml" "--target" "x86_64-unknown-linux-gnu" "--release" "--bin" "fuzz_target_1"

AFL tools

Recommendation from AFL Guide to Fuzzing in Depth

LTO > LLVM > gcc plugin > gcc mode
Important checking the coverage of the fuzzing
- use afl-showmap
- Section 3.g of AFL Guide to Fuzzing in Depth

LTO mode

called afl-clang-lto/afl-clang-lto++
works with llvm11 or newer
instrumentation at link time
autodictionary feature
- while compiling a dictionary is generated
- in fuzzing a dictionary are base inputs that will help improve code coverage
improve efficiency of the fuzzer by avoiding basic block label collision
- classic coverage labels blocks randomly
- lto-mode instrument the files and avoid block label collision
only con is that is has long compile time

LLVM mode

gcc plugin mode

Called afl-gcc-fast/afl-g++-fast Instrument the target with the help of gcc plugins.

gcc/clang mode

The base version without any special features.

afl-gcc/afl-g++ and afl-clang/afl-clang++

tauri-fuzz documentation