SBE was standardized by the FIX Trading Community as a wire-encoding layer for financial messages, optimized around a single dominant constraint: deterministic, minimal-latency serialization and deserialization with zero heap allocation. It targets the sub-microsecond encoding budgets that matter in co-located HFT systems.
The design rejects the flexibility of self-describing formats (FIX tag-value, JSON, Protobuf) in exchange for three hard properties:
- In-place access — fields are read directly from the wire buffer via pointer arithmetic; no copy, no parse tree.
- Flyweight codec pattern — the "decoder" is a thin wrapper that holds a pointer and offset; it owns nothing.
- Schema-driven, not self-describing — the schema (an XML file, compiled offline) fully describes layout, so zero type metadata travels on the wire.
Every message on the wire is prefixed by a Message Header, typically 8 bytes (though schema-configurable):
┌────────────────┬──────────────┬────────────────┬──────────────┐
│ blockLength │ templateId │ schemaId │ version │
│ (uint16) │ (uint16) │ (uint16) │ (uint16) │
└────────────────┴──────────────┴────────────────┴──────────────┘
blockLength: byte size of the root block (fixed fields only).templateId: identifies the message type (e.g.,NewOrderSingle = 1).schemaId/version: schema identity for compatibility gating.
Immediately following the header is a flat, contiguous block of fixed-size primitive fields laid out without padding inserted by the codec — alignment is explicit in the schema, not inferred by the compiler. This means the codec emits exactly blockLength bytes for fixed fields, regardless of the CPU's natural alignment preferences.
┌──────────┬────────────┬──────────┬─────────────┬─────────────┐
│ Price │ Quantity │ Side │ OrderType │ TimeInForce │
│ (int64) │ (int32) │ (uint8) │ (uint8) │ (uint8) │
└──────────┴────────────┴──────────┴─────────────┴─────────────┘
offset 0 offset 8 offset 12 offset 13 offset 14
All multi-byte integers are little-endian by default (configurable to big-endian per schema). This matches x86-64 hardware natively, eliminating bswap instructions on the hot path for Intel/AMD co-location servers.
After the root block, optional repeating groups encode variable-length sequences (e.g., legs of a multi-leg order):
┌──────────────────────────────────┐
│ Group Header │
│ blockLength (uint16) │ ← size of each group entry's fixed block
│ numInGroup (uint8 or uint16) │ ← count of entries
└──────────────────────────────────┘
│ Entry 0 fixed block │
│ Entry 0 nested groups / vardata │
│ Entry 1 fixed block │
│ ... │
Groups nest recursively, enabling multi-level structures (e.g., a basket of legs each with their own fills). The codec walks the group via counted iteration with no dynamic allocation: each entry's decoder is positioned by advancing a raw pointer by blockLength.
varData fields (raw bytes or UTF-8 strings) are encoded at the end of a message or group entry:
┌─────────────────┬─────────────────────────┐
│ length (uint8) │ data (length bytes) │
└─────────────────┴─────────────────────────┘
varData is always last, because it breaks the constant-offset property that makes fixed-field access O(1).
SBE's generated C++ code does not deserialize into a struct. Instead, it generates a flyweight — a view object wrapping a char* buffer:
// Generated decoder (simplified)
class NewOrderSingleDecoder {
char* m_buffer;
size_t m_offset;
size_t m_actingBlockLength;
public:
NewOrderSingleDecoder& wrap(char* buffer, size_t offset,
size_t actingBlockLength, size_t bufferLength)
{
m_buffer = buffer;
m_offset = offset;
// no copy, no allocation
return *this;
}
// Field access: direct memory read at compile-time-known offset
std::int64_t price() const noexcept {
std::int64_t val;
std::memcpy(&val, m_buffer + m_offset + 0, sizeof(val));
return val; // compiler emits a single MOV on x86-64
}
std::int32_t orderQty() const noexcept {
std::int32_t val;
std::memcpy(&val, m_buffer + m_offset + 8, sizeof(val));
return val;
}
};Key properties:
std::memcpyfor unaligned reads: avoids UB fromreinterpret_cast<int64_t*>on potentially unaligned buffers. With-O2/-O3, compilers reduce this to a singleMOVinstruction on x86 with unaligned memory operands.- No virtual dispatch: every field accessor is a non-virtual inline function. The compiler inlines and eliminates all call overhead.
- Compiler-visible offsets: since offsets are compile-time constants (or small constant expressions), the compiler can constant-fold the pointer arithmetic and emit optimal load sequences.
SBE schemas define composite types — named structs of primitives — reused across messages. A canonical example is the price composite:
<composite name="Decimal">
<type name="mantissa" primitiveType="int64"/>
<type name="exponent" primitiveType="int8" presence="constant" valueRef="Exponents.E4"/>
</composite>presence="constant" means the exponent is not encoded on the wire — the decoder reconstructs it from the schema. This is a critical space-saving technique: on a market data feed encoding millions of price updates per second, saving even 1 byte per message multiplies across the entire feed.
SBE enumerations encode to a single primitive (e.g., uint8), avoiding string comparison:
<enum name="SideEnum" encodingType="uint8">
<validValue name="Buy">0</validValue>
<validValue name="Sell">1</validValue>
</enum>Bit sets encode multiple boolean flags into a single integer:
<set name="ExecInstSet" encodingType="uint8">
<choice name="NotHeld">0</choice>
<choice name="AllOrNone">1</choice>
<choice name="Pegged">2</choice>
</set>The generated C++ decoder exposes type-safe accessors:
bool notHeld = decoder.execInst().notHeld(); // tests bit 0
bool allOrNone = decoder.execInst().allOrNone(); // tests bit 1The schema is a contract — the source of truth for message layout. The SBE reference implementation (sbe-tool) compiles the schema XML into generated C++ headers.
SBE supports schema evolution through:
sinceVersionattribute on fields: a decoder at version N can safely skip fields added at version N+1 by usingactingVersionchecks.actingBlockLengthpassed at decode time allows a new decoder to skip unknown trailing fields in the root block.
This is important in a multi-venue HFT context where a venue may upgrade its SBE schema mid-deployment: receiving a message with version=5 when your decoder expects version=4 is handled gracefully by skipping unknown fields.
sbe-tool also emits a binary Intermediate Representation (IR) of the schema. This IR is itself SBE-encoded and can be transmitted on the wire ahead of messages, enabling dynamic decoding (e.g., in a monitoring tool or a traffic recorder) without recompiling. In live trading this is rarely used, but it is critical for ops tooling.
On modern x86-64 hardware (e.g., Intel Xeon, DDR5), a typical SBE NewOrderSingle encode/decode roundtrip is 10–60 ns depending on message complexity. This compares favorably to:
| Format | Encode+Decode (approx.) |
|---|---|
| SBE | ~20–60 ns |
| Protobuf | ~500–2000 ns |
| FIX tag-value | ~2000–10000 ns |
| JSON | ~5000–50000 ns |
SBE is architected to integrate with kernel-bypass networking (DPDK, OpenOnload/Solarflare, Mellanox/RDMA). The flyweight pattern is essential here: the NIC DMA-writes a frame directly into a pre-allocated ring buffer, and the SBE decoder wraps that buffer pointer in place — no copy, no syscall boundary crossing.
NIC DMA write → Ring buffer slot → SBE flyweight wraps slot → field access
The total path from packet arrival to parsed field value involves zero copies and zero heap allocations.
Because the schema is fixed at compile time, if (field.presence == optional) checks that would exist in a dynamic decoder are resolved at compile time and eliminated. The generated code for a required field contains no branch on the presence sentinel value.
The primary SBE message set mirrors FIX semantics but with binary compactness:
NewOrderSingle— Submit a new order. Encodes price, quantity, side, order type, TIF, client order ID, and exchange-specific fields in ~40–80 bytes.OrderCancelRequest/OrderCancelReplaceRequest— Cancel or amend; latency here directly impacts fill quality.ExecutionReport— Acknowledgment, fill, or rejection from exchange. The SBEExecTypeenum (single byte) distinguishes states that FIX encodes as ASCII characters with semantic overhead.
SBE market data feeds (CME MDP 3.0, Eurex Enhanced, ICE) encode:
- Incremental refresh messages: a repeating group of
MDEntryitems (bid/ask level updates) per market data packet. - Security definition messages: option chains, futures specs — typically complex nested structures that SBE handles via nested groups.
The incremental refresh inner loop in a market data handler looks like:
auto& mdIncrementalRefresh = decoder.wrapAndApplyHeader(buf, offset, bufLen);
for (auto& entry : mdIncrementalRefresh.noMDEntries()) {
const auto px = entry.mDEntryPx().mantissa(); // no copy
const auto qty = entry.mDEntrySize();
const auto type = entry.mDEntryType(); // enum, uint8
book.apply(px, qty, type);
}This loop compiles to a tight inner loop with no heap traffic and no branch mispredictions from dispatch on type strings.
Exchanges publishing SBE achieve message rates of 1–5 million messages/second per instrument group on a single multicast group. SBE's compact encoding directly reduces NIC buffer pressure and the probability of packet drops in high-load scenarios — a critical concern for quote stuffing detection and burst handling.
HFT protocols encode timestamps as uint64 nanoseconds since UNIX epoch — 8 bytes, no parsing, directly comparable. SBE schemas typically define a UTCTimestampNanos composite. Contrast with FIX SendingTime which encodes as YYYYMMDD-HH:MM:SS.sss (21 ASCII bytes, requiring sscanf-style parsing).
SBE messages are sized to fit within a fixed-size LMAX Disruptor-style ring buffer slot (typically 512 or 1024 bytes). Because SBE encodes into a pre-allocated char[] slot directly, the encoder never calls new, which would introduce non-deterministic latency from the allocator.
The generated SBE headers can be combined with std::variant and std::visit for dispatch:
using AnyMessage = std::variant
NewOrderSingleDecoder,
ExecutionReportDecoder,
OrderCancelRequestDecoder
>;
AnyMessage dispatchMessage(char* buf, size_t len) {
MessageHeaderDecoder hdr;
hdr.wrap(buf, 0, hdr.encodedLength(), len);
switch (hdr.templateId()) {
case NewOrderSingleDecoder::sbeTemplateId():
return NewOrderSingleDecoder{}.wrapAndApplyHeader(buf, 0, len);
// ...
}
}With [[likely]] / [[unlikely]] annotations on the hot-path variant, branch predictor hints align with market microstructure (e.g., ExecutionReport is the most frequent message in a fill-heavy session).
A typical SBE NewOrderSingle (40–80 bytes) fits in a single cache line (64 bytes). The flyweight decoder accesses all hot fields within that single line, resulting in one L1 cache miss on first access and zero on subsequent field reads. This is a deliberate protocol design choice, unlike Protobuf where varint chains cause sequential cache-line fetches.
| Venue / Feed | SBE Dialect |
|---|---|
| CME Group | MDP 3.0 (SBE, schema 9+) |
| Eurex | Enhanced Transaction Interface |
| ICE | iMpact / SBE |
| B3 (Brazil) | UMDF / SBE |
| Deutsche Börse | Xetra SBE |
| Moscow Exchange | FAST + SBE hybrid |
CME's MDP 3.0 adoption in 2015 was a watershed moment: it replaced FAST (which required stateful, LZW-derived compression, introducing context-dependent latency) with SBE's stateless encoding, reducing peak decode latency variance by an order of magnitude.
- Schema coupling: both encoder and decoder must agree on schema version. Mismatches are silent data corruption risks if version gating is not rigorous.
- Not self-describing: a packet capture is opaque without the schema file. Ops and debugging tooling must be schema-aware.
- Variable-length fields break O(1) access: a message with many
varDatafields degrades toward sequential scan behavior. HFT message designs mitigate this by replacing variable fields with fixed-size char arrays padded to maximum length. - No built-in compression: unlike FAST, SBE has no delta encoding. For extremely bandwidth-constrained links (cross-datacenter replication), a thin delta layer is sometimes applied atop SBE externally.
- Code generation discipline: schema changes require regenerating headers and recompiling all consumers. In a large C++ monorepo, this can trigger wide rebuild cascades.
SBE sits at the intersection of protocol design, hardware-aware programming, and financial market microstructure. Its success in HFT comes from ruthlessly sacrificing generality for the one property that matters most in co-located trading: predictable, minimal-latency, allocation-free serialization that maps directly onto how modern CPUs, caches, and kernel-bypass NICs work.
Note
Generated by Claude.ai
Model: Sonet 4.6
Prompt: Provide a thorough description of the SBE (Simple Binary Encoding) protocol in a high-frequency trading environment. This description is intended for a computer science expert fluent in C++. Describe the important technical and business features of the protocol.