⚠️ Requires a Niobium SDK install -- provided to licensed Niobium customers (the SDK is already present on FOG terminals). The source in this repository is Apache 2.0, but the examples link the proprietary Niobium compiler runtime (libnbcc) and will not build or run without the SDK.If you do not have access, the source is still useful as a reference for the FHE compile/runtime workflow, but
./run.shwill fail. To explore FHE without the Niobium runtime, look at the OpenFHE examples, HElib's tutorial, or Microsoft SEAL.
Short, runnable examples of computing on encrypted data with the Niobium FHE accelerator. Clone it on a FOG terminal (Niobium's customer workstation, SDK pre-installed), run the examples, then edit them with your own operations and watch the result.
No prior FHE experience needed. If you have a FOG terminal, you already have everything you need.
Your data leaks at every layer of the modern stack — hospital records, banking transactions, model providers, ad networks. Anywhere data sits in plaintext, something or someone can read it.
Fully Homomorphic Encryption (FHE) lets a third party compute on your data without ever decrypting it. The accelerator adds, multiplies, and rotates encrypted numbers; only the key holder ever sees the result. Hospitals can run analytics on patient records they never decrypt. Banks can score fraud on transactions they cannot read. A cloud can serve an ML model on inputs it physically cannot see.
The math has existed since 2009. What changed is performance: until recently, FHE was orders of magnitude too slow to be practical. The Niobium accelerator narrows that gap enough for real workloads. This kit is the shortest path to using it — run a real FHE computation in 90 seconds, then start writing your own.
You've got this. If you've never touched FHE before, that's the whole point of this repository. Every example is short, runnable, and gets you one step closer to writing something you could hand to a customer.
A few things you'll naturally reach for that behave differently than in regular code. Knowing these upfront saves an hour of head-scratching:
if (a > b)doesn't work. CKKS has no native comparison or branching on encrypted data. People approximate "isa>b?" with a polynomial (a sigmoid-like curve evaluated ona − b). Out of scope for this kit, but reachable viaEvalChebyshev/EvalPolyonce you're comfortable.- Variable-length data doesn't work. A ciphertext is a fixed-size packed vector. You always operate on the full vector; mask irrelevant slots with multiplications by 0/1.
- Exact integer equality doesn't work. CKKS is approximate. Ask "is it close to 42?" not "is it exactly 42?" (Other FHE schemes — BGV/BFV — are exact for integers but don't do CKKS's efficient real-number arithmetic; not in this kit.)
- Random slot access doesn't work.
slot[3] += 1is not a thing. Rotation + masking is how you isolate one slot.
If your problem fits the "add / subtract / multiply / rotate / polynomial"
shape, the rest of this kit will get you to a working program. If it
needs comparisons or branching, you'll be approximating with polynomials,
and EXPLORE.md
has more.
FHE (Fully Homomorphic Encryption) lets you compute on data while it stays encrypted. The accelerator never sees your plaintext — it adds, multiplies, and rotates ciphertext, and only the key holder can decrypt the result.
| Term | Meaning |
|---|---|
| ciphertext | an encrypted number (or a packed list of numbers) |
| slot | one lane of a packed list inside a single ciphertext |
| packing | putting many numbers into one ciphertext, one per slot |
| rotation | shift the slots inside a ciphertext left or right (a building block for reductions) |
| CKKS | the FHE scheme used by this kit; approximate arithmetic on real numbers (lossy by design) |
| multiplicative depth | how many multiplications deep your computation may go — a budget set up front |
| noise | a small numerical error every ciphertext carries; multiplies grow it, depth budget bounds it |
| Estimated precision: N bits | OpenFHE's printed estimate of usable bits remaining in the decrypted result (lower = closer to the noise floor) |
| key switching | the cryptographic step that follows a multiply or a rotation; needs evaluation keys |
| compiled program | the cached device-runnable instructions produced from your code |
| compile / runtime | first run = build + cache the compiled program; later runs = just execute it |
The first run compiles your example on the CPU (the device isn't touched yet, so no math result is shown). The device only runs on the second invocation, which is also where the kit prints the result and verifies it against a reference written during compile.
# First run — COMPILES (CPU only). No math output; the device hasn't run yet.
./run.sh 01_add
# ...
# [compile] OK -- cached compiled program ready. Run again to execute on the device
# (or pass --full to do both in one call).# Second run — RUNS on the device, prints results, verifies against compile-time reference.
./run.sh 01_add
# ...
# Results:
# sum = (11, 22, 33, 44, 55, 66, 77, 88, ... )
# [verify] ALL OUTPUTS CORRECT.When an FPGA is available, the same pattern works on real hardware:
./run.sh 01_add --fpga # cold: COMPILES on CPU (no FPGA work yet)
./run.sh 01_add --fpga # hot: RUNS on the FPGA + [verify] ALL OUTPUTS CORRECTWant both phases in one invocation? Pass
--full:./run.sh 01_add --fullcompiles and runs back-to-back, with results printed once at the end.
That's the whole loop. run.sh auto-detects your Niobium SDK install,
builds the example when needed, and handles simulator/FPGA setup. Each
example is short and heavily commented --
open it, change the math, run again.
| Path | What it is |
|---|---|
examples/ |
The lessons — one file per FHE operation. This is what you edit. |
common/starter_kit.h |
Shared helper that hides the compile/runtime plumbing. Don't touch unless curious. |
run.sh |
The only script you run. Builds (if needed), sets up the environment, then runs the example. |
Makefile |
What run.sh uses under the hood. You don't invoke it directly. |
LICENSE / NOTICE |
Apache 2.0. |
build/ (auto) |
Compiled binaries. Safe to delete; run.sh rebuilds as needed. |
.runs/ (auto) |
Per-backend compiled-program caches and keys. Safe to delete; running re-compiles. |
These examples are designed to be edited. Try this in 30 seconds:
- Open
examples/01_add.cpp. - Find the line
auto sum = s.cc()->EvalAdd(a, b);. - Change
EvalAddtoEvalSub. Save. - Run
./run.sh 01_addto compile against your edit, then./run.sh 01_addagain to run and see the math. (Or./run.sh 01_add --fullto do both in one shell call.)
You'll see the output flip from sums (11, 22, 33...) to differences
(-9, -18, -27...). Notice you didn't run --reset or rebuild anything by
hand — the kit detects the source change, throws away the old compiled
program, and re-compiles against your edit. The same loop works for any
operation — EvalSub, EvalMult, EvalSquare, and beyond.
(EvalRotate also needs its shift declared in rotation_indices, and
polynomial eval like EvalChebyshev is a later step — see
EXPLORE.md.)
Want more?
EXPLORE.mdhas tiered challenges (easy / medium / explore-the-limits), an operation cheat sheet, and an error-message reference. If something you tried didn't behave as expected, the Troubleshooting table below translates the common OpenFHE/Niobium messages.
Three knobs you'll touch when your math gets more interesting:
| Knob | Where | When you change it |
|---|---|---|
mult_depth |
2nd arg of Session(...) |
Each chained multiply needs +1. |
rotation_indices |
3rd arg of Session(...) |
List every shift you pass to EvalRotate. |
| Inputs / outputs | s.encrypt(...) and s.output(...) |
Whatever vectors and names your computation needs. |
Work through them in order — each introduces one new idea.
| # | Example | Tier | Teaches |
|---|---|---|---|
| 01 | 01_add |
1 | add two encrypted numbers; encrypt / decrypt basics |
| 02 | 02_subtract |
1 | subtraction |
| 03 | 03_multiply_by_constant |
1 | multiply a ciphertext by a public number |
| 04 | 04_multiply |
1 | multiply two ciphertexts → multiplicative depth |
| 05 | 05_square |
1 | x·x, a common building block |
| 06 | 06_rotate |
1 | shift the slots of a packed vector (packing, rotation keys) |
| 07 | 07_dot_product |
1 | a real algorithm: multiply + rotate-and-add to sum a vector |
| 08 | 08_polynomial |
1 | x² + 2x + 1 — your first chained formula, real depth budgeting |
| 09 | 09_average |
1 | mean of an encrypted vector — combines rotate-and-add + scalar mult |
| 10 | 10_external_inputs |
2 | inputs move out of source into a readable .input.txt. Edit the data → straight to runtime. |
| 11 | 11_pre_encrypted_inputs |
3 | the production shape: inputs live as pre-encrypted .bin files (3-arg tag_input). The cache stays valid across data changes. |
| 12 | 12_variance |
1 | a worked from-scratch example: how to plan a new computation, budget multiplicative depth, and verify by hand before trusting the device. |
| 13 | 13_raw_api |
— | graduation bridge — same encrypted add as example 01, but using the Niobium runtime API directly without the Session helper. Read this once you can write a kit-style workload and want to peek at what the helper hides. |
Tier 1 examples can also be run in tier-3 cache mode via --pre-encrypted — see Phase 2.
When you've worked through these, EXPLORE.md is the next stop: tiered challenges (easy / medium / break-it), an operation cheat sheet, and an error-message reference.
Every example in this kit has the same four-step skeleton. Once you can recognise these four steps you can write your own.
// 1. SETUP — depth budget + any rotation amounts you'll use.
Session s(argc, argv, "<name>", /*mult_depth=*/1, /*rotation_indices=*/{});
// 2. INPUTS — encrypted vectors, each tagged with a name.
auto a = s.encrypt("a", {1, 2, 3, 4, 5, 6, 7, 8});
// 3. COMPUTE — your FHE math goes inside this lambda. Call s.output(...)
// for every result you want the accelerator to return.
s.compute([&] {
auto y = s.cc()->EvalAdd(a, a);
s.output("y", y);
});
// 4. RESULTS — decrypt, print, and (on a re-run) verify device vs. CPU.
s.print_and_verify();What goes inside the lambda is plain
OpenFHE — EvalAdd,
EvalSub, EvalMult, EvalRotate, … See
EXPLORE.md for the operation
cheat-sheet.
Examples 01–09 use the simplest possible input shape — C++ literals — and
the kit takes a deliberate shortcut: any source edit invalidates the
cache. That's learning-mode, not the real compiler contract. The real compiler
reuses the cached program across pure data changes; only changing the
operations recompiles. You'll see the real behavior at examples 10 and 11, and you can
promote any earlier example to it with --pre-encrypted (see
Phase 2 below).
| Tier | Examples | Where the inputs live | What changes recompile vs. run |
|---|---|---|---|
| 1. Literals | 01–09, 12 | C++ vectors inside the .cpp |
Kit shortcut: any edit recompiles (data and ops share a file). |
| 2. Plaintext sidecar | 10_external_inputs | A readable .input.txt next to the example |
Edit the .txt → RUNTIME. Edit the .cpp → COMPILING. |
| 3. Pre-encrypted sidecar | 11_pre_encrypted_inputs | .bin files written by a "client" step |
Replace the .bin files → RUNTIME. Edit the .cpp → COMPILING. Production shape. |
Tier 1 keeps the lesson visible (you see the operation, the data, and the
result in one file) and intentionally over-invalidates so "edit → save →
rerun" feels instant. Tier 2 separates the two without hiding the data —
open the .input.txt and read the numbers. Tier 3 is what real Niobium
workloads ship: encrypted bytes on disk, the server never sees plaintext,
and the cached compiled program is reused across thousands of queries.
If you only remember one thing: source is operations, data lives outside. The kit takes you from "literals" to "production-real" without hand-waving.
When you have your own problem to compute on encrypted data:
- Copy any example to a new file in
examples/. Pick the one closest to what you want to do —07_dot_productfor vector math,08_polynomialfor chained scalar math,06_rotatefor slot manipulation. - Rename the
name=argument inSession(...)to match your new filename (no.cpp). Example:examples/my_thing.cpp→Session s(argc, argv, "my_thing", ...). (This name is the cache key, so each example gets its own compiled-program cache under.runs/<backend>/my_thing_a_0/.) - Run it:
./run.sh my_thing.
That's it. The Makefile globs examples/*.cpp, so there is nothing to
register. If you bump mult_depth or add a new rotation amount, just
edit the second / third Session(...) argument; the kit re-compiles
automatically.
If you outgrow C++ literals as inputs, switch to the pattern in
examples/10_external_inputs.cpp
(plaintext sidecar) or
examples/11_pre_encrypted_inputs.cpp
(pre-encrypted .bin files — the production shape).
The kit teaches FHE on Niobium through examples. docs/ covers
the SDK surface a customer interacts with at runtime — the timing banner,
the nb-run / nb-summary CLI, the bundled FBS workloads, and the
CryptoContext parameters the SDK validates against.
| Doc | When to read |
|---|---|
docs/banner.md |
After your first hot run — turn the four-row banner into something you can reason about. |
docs/cli.md |
When you want to understand how the kit wraps each example, or wrap a binary of your own. |
docs/workloads.md |
When you want to run the SDK's bundled FBS workloads at production scale. |
docs/crypto-params.md |
Background for anyone eventually writing FHE code outside the kit's helper. |
Start with docs/ once the kit's examples feel small.
Same code, two devices:
| Flag | Backend | What it is |
|---|---|---|
(default) / --fhe-sim |
FHE_SIM | real FHE math, executed on the CPU. Runs anywhere; great for developing. |
--fpga |
FPGA | real FHE math, executed on the Niobium accelerator. |
./run.sh 04_multiply # simulator
./run.sh 04_multiply --fpga # real hardwareEach backend caches its own compiled program (under .runs/<backend>/), so
switching backends starts fresh.
First FPGA run does a one-time setup. Before the first device run the runtime does a one-time setup step; later runs skip it.
Niobium works in two phases, like compiling a program once and then running the binary many times:
- Compile (first run) — the Niobium compiler compiles and optimizes your program, then caches a device-runnable program. Happens once per source version.
- Run (every run after) — the cached compiled program is sent straight to the device with your current input data. This is the production path.
Run any example twice and watch what each invocation does. The cold run is pure compile -- no math output, because the device hasn't run. The hot run prints results and verifies them.
$ ./run.sh 04_multiply
[COMPILE] Recording on CPU. No FPGA work.
...
[compile] OK -- cached compiled program ready. Run again to execute on the device
(or pass --full to do both in one call).
==============================================================
Niobium Run Summary
target: -
==============================================================
Total time : 512.0 ms
Compilation time : 512.0 ms
==============================================================
$ ./run.sh 04_multiply
[RUNTIME] Cache hit. Replaying on device.
...
Results:
product = (2, 4, 6, 8, 15, 18, 21, 24, ...)
[verify] ALL OUTPUTS CORRECT.
==============================================================
Niobium Run Summary
target: FHE_SIM
==============================================================
Total time : 387.0 ms
Runtime : 387.0 ms
FHE Execution Time (CPU) : 21.6 ms
==============================================================The first run only compiles — it never touches the device. The second run only runs the cached compiled program on the device. This mirrors how production Niobium applications work: compile cold, replay hot, across separate process invocations.
(Editing the source automatically invalidates the cache so the next run
re-compiles — you never need to remember --reset. In examples 01–09
this is more aggressive than the real compiler's cache;
see Three tiers of input shape above for
why and how to opt in to the production behavior.)
Want both phases in a single shell call? Pass
--full:./run.sh 04_multiply --reset --fullruns compile then replay back-to-back and populates all 4 banner rows in one shot.
The first run isn't just translating your OpenFHE calls for the device — the Niobium compiler optimizes the program before caching it, so the compiled program that runs on every later invocation does much less work than the code you wrote. That's why the per-invocation Runtime is a small fraction of the one-time Compilation cost.
This is the Niobium value proposition in one line: FHE math is expensive; compiling once and reusing the optimized program makes the per-run path practical.
By default, examples 01–09 and 12 embed inputs as C++ literals (tier 1 — optimised for readability of the FHE math). That's the right shape for learning operations. But it hides what the Niobium compiler's cache actually does on real customer workloads — where data lives outside source and changing it does NOT invalidate the cache.
Once you've worked through the examples, you can re-run any of them in production cache mode without touching the source:
./run.sh 01_add --pre-encrypted --reset
# Mode: --pre-encrypted (data externalized to .runs/fhe-sim/.inputs/)
# COMPILES — same operations, but the literal {1,2,...,8} is now
# encrypted into .runs/fhe-sim/.inputs/01_add.a.bin (and .b.bin).
./run.sh 01_add --pre-encrypted
# RUNTIME — loads .bin from disk, cache hit.
./run.sh 01_add --pre-encrypted --regen-input=a:100,200,300,400,500,600,700,800
# RUNTIME — same compiled program, but 'a' has been re-encrypted with new
# values. The cache STAYED VALID even though the data changed. This is
# the production "compile once, run many" model in action.
./run.sh 01_add --pre-encrypted
# RUNTIME — keeps running against the regen'd values until you reset
# or regenerate again.What the flag actually does (one sentence): in --pre-encrypted mode, the
helper's Session::encrypt(name, {...}) calls route through the same path
as Session::encrypt_or_load_bin(...) that example 11 uses — the
ciphertext lives on disk between runs, marked with a filename via 3-arg
tag_input, so the cached program is reused across data changes, not invalidated by them.
This is the actual cache discipline of a production Niobium application. Same examples you just learned on, now running against the same cache contract real production code uses.
| Mode | When | What you see |
|---|---|---|
| Default (no flag) | Phase 1 — learning the FHE operations | Tier-1 cache (any edit recompiles). Math is visible in source; [verify] ALL OUTPUTS CORRECT on hot runs. |
--pre-encrypted |
Phase 2 — seeing how the real compiler cache behaves | Tier-3 cache (data outside source, only operation changes recompile). Verify line skipped because data may have been regen'd. |
Two small gotchas exist once you're using --pre-encrypted —
edit-source-still-recompiles and mode-switching-needs---reset.
They're documented in
EXPLORE.md → "--pre-encrypted gotchas";
read that section once you've tried the four commands above and noticed
something unexpected.
Why this matters: the kit's binary-mtime workaround makes tier-1 examples easier to read but more aggressive than the real compiler's cache.
--pre-encrypted + --regen-inputis the kit's faithful demonstration of the production cache discipline. A customer who graduates from the kit and writes their own production-shaped code will be in this mode by default — no flag needed there, because the code itself externalizes data.
After each run the kit prints a summary banner from the SDK's nb-summary
tool — the standard FOG / Niobium run-summary telemetry. It's the same
banner you'll see on real customer workloads; the kit just wires it up so
you see it from day one.
| Row | What it measures |
|---|---|
| Total time | End-to-end wall-clock — from ./run.sh starting to it exiting. |
| Compilation time | First-time setup: compiling your computation, optimizing it, caching the device-runnable program. Populated on the cold run (cache miss); omitted on hot runs (nothing is being compiled). |
| Runtime | Per-invocation hot path: load the cached compiled program, push current inputs, kick off the device, pull results. Populated on hot runs; omitted on the cold run (it doesn't touch the device). |
FHE Execution Time (CPU) (on --fhe-sim) / FHE Execution Time (FPGA) (on --fpga) |
Just the device math itself, excluding host setup. Populated on hot runs only. On --fpga this is microseconds; on --fhe-sim it's plain OpenFHE-on-CPU wall-clock. |
This mirrors the real Niobium workflow — compile once, run many. Every production Niobium application splits these into separate process invocations: a cold-start compile, then any number of hot replays. The kit teaches exactly that pattern.
The cold run answers "how long was the one-time compile?" The hot run answers "what does every additional invocation cost?" These two numbers are what customers most often ask us about.
About
[verify] ALL OUTPUTS CORRECT.It means the decrypted device output matched a CPU-computed reference. Examples 10 and 11 skip it because their inputs can change between runs.
Want both numbers in one call? By default each ./run.sh runs one
phase — compile on the first invocation, replay on subsequent ones. Pass
--full to run both in a single call (this matches how the SDK's
make small-fpga style targets orchestrate their phases):
./run.sh 01_add --reset --full # banner shows all 4 rows in one shot
./run.sh 01_add --fpga --reset --full # same, on real hardware--full is a no-op on a cache hit (just runs the hot replay, same as a
plain ./run.sh).
What the FHE_SIM Execution row measures.
FHE_SIM is a functional simulator: it runs your computation through
OpenFHE on the CPU to verify the math, not to predict device timing. It
never touches the device, so it does no per-invocation device setup
(bitstream load, device configuration, PCIe transfers). That's why the
Execution row on --fhe-sim says FHE Execution Time (CPU), not
FHE Execution Time (FPGA) — it's a CPU figure, not a device measurement.
These examples are deliberately tiny — 8 slots, a handful of operations — to keep the focus on the model and the workflow. Their wall-clock times are dominated by fixed host overhead (program loading, encryption, device setup), so they are not representative of production-shaped workloads: matrix-vector products over thousands of rows, ML inference, similarity search over large databases.
Use this kit to learn the model and the workflow. Size up to production-shaped workloads to see representative performance.
| You see… | What's going on | Fix |
|---|---|---|
Cannot find $NIOBIUM_SDK_DIR/include |
The kit can't find your SDK (normally auto-detected on a FOG terminal). | export NIOBIUM_SDK_DIR=<path> if you know where it's installed — otherwise ask whoever set up your terminal. |
--fpga can't find the device or bitstream |
Your FOG terminal's FPGA is configured by your administrator. | Keep going with --fhe-sim; contact your FOG admin if you need the device. |
approximation error is too high after editing |
A stale compiled program or keys from before your edit. | ./run.sh <example> --reset to wipe the compiled program + keys and re-compile. |
First --fpga run pauses before output |
Normal — a one-time device setup; see the "First FPGA run does a one-time setup" callout above. | Let it finish; subsequent runs skip the setup. |
- A Niobium SDK install with
bin/ include/ lib/ share/underneath (run.shauto-detects it under$HOME). - A C++20 compiler (
g++orclang++) andmake.
This repository is released under the Apache License 2.0 (see
LICENSE). Running the examples additionally requires a Niobium
SDK install, which is provided to licensed Niobium customers under its own
license and comes pre-installed on FOG terminals. Without the SDK the
source is still useful as a reference, but ./run.sh will not build or run.
This repository is licensed under the Apache License 2.0 (see
LICENSE). The examples build against the Niobium SDK (libnbcc,
OpenFHE), which are provided under their own licenses.