Skip to content

NiobiumInc/niobium-starter-kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Niobium Starter Kit

⚠️ Requires a Niobium SDK install -- provided to licensed Niobium customers (the SDK is already present on FOG terminals). The source in this repository is Apache 2.0, but the examples link the proprietary Niobium compiler runtime (libnbcc) and will not build or run without the SDK.

If you do not have access, the source is still useful as a reference for the FHE compile/runtime workflow, but ./run.sh will fail. To explore FHE without the Niobium runtime, look at the OpenFHE examples, HElib's tutorial, or Microsoft SEAL.

Short, runnable examples of computing on encrypted data with the Niobium FHE accelerator. Clone it on a FOG terminal (Niobium's customer workstation, SDK pre-installed), run the examples, then edit them with your own operations and watch the result.

No prior FHE experience needed. If you have a FOG terminal, you already have everything you need.


Why this exists

Your data leaks at every layer of the modern stack — hospital records, banking transactions, model providers, ad networks. Anywhere data sits in plaintext, something or someone can read it.

Fully Homomorphic Encryption (FHE) lets a third party compute on your data without ever decrypting it. The accelerator adds, multiplies, and rotates encrypted numbers; only the key holder ever sees the result. Hospitals can run analytics on patient records they never decrypt. Banks can score fraud on transactions they cannot read. A cloud can serve an ML model on inputs it physically cannot see.

The math has existed since 2009. What changed is performance: until recently, FHE was orders of magnitude too slow to be practical. The Niobium accelerator narrows that gap enough for real workloads. This kit is the shortest path to using it — run a real FHE computation in 90 seconds, then start writing your own.

You've got this. If you've never touched FHE before, that's the whole point of this repository. Every example is short, runnable, and gets you one step closer to writing something you could hand to a customer.


What FHE isn't — read this before writing your own

A few things you'll naturally reach for that behave differently than in regular code. Knowing these upfront saves an hour of head-scratching:

  • if (a > b) doesn't work. CKKS has no native comparison or branching on encrypted data. People approximate "is a > b?" with a polynomial (a sigmoid-like curve evaluated on a − b). Out of scope for this kit, but reachable via EvalChebyshev / EvalPoly once you're comfortable.
  • Variable-length data doesn't work. A ciphertext is a fixed-size packed vector. You always operate on the full vector; mask irrelevant slots with multiplications by 0/1.
  • Exact integer equality doesn't work. CKKS is approximate. Ask "is it close to 42?" not "is it exactly 42?" (Other FHE schemes — BGV/BFV — are exact for integers but don't do CKKS's efficient real-number arithmetic; not in this kit.)
  • Random slot access doesn't work. slot[3] += 1 is not a thing. Rotation + masking is how you isolate one slot.

If your problem fits the "add / subtract / multiply / rotate / polynomial" shape, the rest of this kit will get you to a working program. If it needs comparisons or branching, you'll be approximating with polynomials, and EXPLORE.md has more.


30-second background — glossary

FHE (Fully Homomorphic Encryption) lets you compute on data while it stays encrypted. The accelerator never sees your plaintext — it adds, multiplies, and rotates ciphertext, and only the key holder can decrypt the result.

Term Meaning
ciphertext an encrypted number (or a packed list of numbers)
slot one lane of a packed list inside a single ciphertext
packing putting many numbers into one ciphertext, one per slot
rotation shift the slots inside a ciphertext left or right (a building block for reductions)
CKKS the FHE scheme used by this kit; approximate arithmetic on real numbers (lossy by design)
multiplicative depth how many multiplications deep your computation may go — a budget set up front
noise a small numerical error every ciphertext carries; multiplies grow it, depth budget bounds it
Estimated precision: N bits OpenFHE's printed estimate of usable bits remaining in the decrypted result (lower = closer to the noise floor)
key switching the cryptographic step that follows a multiply or a rotation; needs evaluation keys
compiled program the cached device-runnable instructions produced from your code
compile / runtime first run = build + cache the compiled program; later runs = just execute it

Quick start (90 seconds)

The first run compiles your example on the CPU (the device isn't touched yet, so no math result is shown). The device only runs on the second invocation, which is also where the kit prints the result and verifies it against a reference written during compile.

# First run — COMPILES (CPU only). No math output; the device hasn't run yet.
./run.sh 01_add
# ...
# [compile] OK -- cached compiled program ready. Run again to execute on the device
# (or pass --full to do both in one call).
# Second run — RUNS on the device, prints results, verifies against compile-time reference.
./run.sh 01_add
# ...
# Results:
#   sum = (11, 22, 33, 44, 55, 66, 77, 88, ... )
# [verify] ALL OUTPUTS CORRECT.

When an FPGA is available, the same pattern works on real hardware:

./run.sh 01_add --fpga          # cold: COMPILES on CPU (no FPGA work yet)
./run.sh 01_add --fpga          # hot:  RUNS on the FPGA + [verify] ALL OUTPUTS CORRECT

Want both phases in one invocation? Pass --full: ./run.sh 01_add --full compiles and runs back-to-back, with results printed once at the end.

That's the whole loop. run.sh auto-detects your Niobium SDK install, builds the example when needed, and handles simulator/FPGA setup. Each example is short and heavily commented -- open it, change the math, run again.


What's in this repo

Path What it is
examples/ The lessons — one file per FHE operation. This is what you edit.
common/starter_kit.h Shared helper that hides the compile/runtime plumbing. Don't touch unless curious.
run.sh The only script you run. Builds (if needed), sets up the environment, then runs the example.
Makefile What run.sh uses under the hood. You don't invoke it directly.
LICENSE / NOTICE Apache 2.0.
build/ (auto) Compiled binaries. Safe to delete; run.sh rebuilds as needed.
.runs/ (auto) Per-backend compiled-program caches and keys. Safe to delete; running re-compiles.

Try editing — see the change live

These examples are designed to be edited. Try this in 30 seconds:

  1. Open examples/01_add.cpp.
  2. Find the line auto sum = s.cc()->EvalAdd(a, b);.
  3. Change EvalAdd to EvalSub. Save.
  4. Run ./run.sh 01_add to compile against your edit, then ./run.sh 01_add again to run and see the math. (Or ./run.sh 01_add --full to do both in one shell call.)

You'll see the output flip from sums (11, 22, 33...) to differences (-9, -18, -27...). Notice you didn't run --reset or rebuild anything by hand — the kit detects the source change, throws away the old compiled program, and re-compiles against your edit. The same loop works for any operation — EvalSub, EvalMult, EvalSquare, and beyond. (EvalRotate also needs its shift declared in rotation_indices, and polynomial eval like EvalChebyshev is a later step — see EXPLORE.md.)

Want more? EXPLORE.md has tiered challenges (easy / medium / explore-the-limits), an operation cheat sheet, and an error-message reference. If something you tried didn't behave as expected, the Troubleshooting table below translates the common OpenFHE/Niobium messages.

Three knobs you'll touch when your math gets more interesting:

Knob Where When you change it
mult_depth 2nd arg of Session(...) Each chained multiply needs +1.
rotation_indices 3rd arg of Session(...) List every shift you pass to EvalRotate.
Inputs / outputs s.encrypt(...) and s.output(...) Whatever vectors and names your computation needs.

The examples (a learning path)

Work through them in order — each introduces one new idea.

# Example Tier Teaches
01 01_add 1 add two encrypted numbers; encrypt / decrypt basics
02 02_subtract 1 subtraction
03 03_multiply_by_constant 1 multiply a ciphertext by a public number
04 04_multiply 1 multiply two ciphertexts → multiplicative depth
05 05_square 1 x·x, a common building block
06 06_rotate 1 shift the slots of a packed vector (packing, rotation keys)
07 07_dot_product 1 a real algorithm: multiply + rotate-and-add to sum a vector
08 08_polynomial 1 x² + 2x + 1 — your first chained formula, real depth budgeting
09 09_average 1 mean of an encrypted vector — combines rotate-and-add + scalar mult
10 10_external_inputs 2 inputs move out of source into a readable .input.txt. Edit the data → straight to runtime.
11 11_pre_encrypted_inputs 3 the production shape: inputs live as pre-encrypted .bin files (3-arg tag_input). The cache stays valid across data changes.
12 12_variance 1 a worked from-scratch example: how to plan a new computation, budget multiplicative depth, and verify by hand before trusting the device.
13 13_raw_api graduation bridge — same encrypted add as example 01, but using the Niobium runtime API directly without the Session helper. Read this once you can write a kit-style workload and want to peek at what the helper hides.

Tier 1 examples can also be run in tier-3 cache mode via --pre-encrypted — see Phase 2.

When you've worked through these, EXPLORE.md is the next stop: tiered challenges (easy / medium / break-it), an operation cheat sheet, and an error-message reference.


Anatomy of an example

Every example in this kit has the same four-step skeleton. Once you can recognise these four steps you can write your own.

//   1. SETUP     — depth budget + any rotation amounts you'll use.
Session s(argc, argv, "<name>", /*mult_depth=*/1, /*rotation_indices=*/{});

//   2. INPUTS    — encrypted vectors, each tagged with a name.
auto a = s.encrypt("a", {1, 2, 3, 4, 5, 6, 7, 8});

//   3. COMPUTE   — your FHE math goes inside this lambda. Call s.output(...)
//                  for every result you want the accelerator to return.
s.compute([&] {
    auto y = s.cc()->EvalAdd(a, a);
    s.output("y", y);
});

//   4. RESULTS   — decrypt, print, and (on a re-run) verify device vs. CPU.
s.print_and_verify();

What goes inside the lambda is plain OpenFHEEvalAdd, EvalSub, EvalMult, EvalRotate, … See EXPLORE.md for the operation cheat-sheet.


Three tiers of input shape

Examples 01–09 use the simplest possible input shape — C++ literals — and the kit takes a deliberate shortcut: any source edit invalidates the cache. That's learning-mode, not the real compiler contract. The real compiler reuses the cached program across pure data changes; only changing the operations recompiles. You'll see the real behavior at examples 10 and 11, and you can promote any earlier example to it with --pre-encrypted (see Phase 2 below).

Tier Examples Where the inputs live What changes recompile vs. run
1. Literals 01–09, 12 C++ vectors inside the .cpp Kit shortcut: any edit recompiles (data and ops share a file).
2. Plaintext sidecar 10_external_inputs A readable .input.txt next to the example Edit the .txt → RUNTIME. Edit the .cpp → COMPILING.
3. Pre-encrypted sidecar 11_pre_encrypted_inputs .bin files written by a "client" step Replace the .bin files → RUNTIME. Edit the .cpp → COMPILING. Production shape.

Tier 1 keeps the lesson visible (you see the operation, the data, and the result in one file) and intentionally over-invalidates so "edit → save → rerun" feels instant. Tier 2 separates the two without hiding the data — open the .input.txt and read the numbers. Tier 3 is what real Niobium workloads ship: encrypted bytes on disk, the server never sees plaintext, and the cached compiled program is reused across thousands of queries.

If you only remember one thing: source is operations, data lives outside. The kit takes you from "literals" to "production-real" without hand-waving.


Make your own example

When you have your own problem to compute on encrypted data:

  1. Copy any example to a new file in examples/. Pick the one closest to what you want to do — 07_dot_product for vector math, 08_polynomial for chained scalar math, 06_rotate for slot manipulation.
  2. Rename the name= argument in Session(...) to match your new filename (no .cpp). Example: examples/my_thing.cppSession s(argc, argv, "my_thing", ...). (This name is the cache key, so each example gets its own compiled-program cache under .runs/<backend>/my_thing_a_0/.)
  3. Run it: ./run.sh my_thing.

That's it. The Makefile globs examples/*.cpp, so there is nothing to register. If you bump mult_depth or add a new rotation amount, just edit the second / third Session(...) argument; the kit re-compiles automatically.

If you outgrow C++ literals as inputs, switch to the pattern in examples/10_external_inputs.cpp (plaintext sidecar) or examples/11_pre_encrypted_inputs.cpp (pre-encrypted .bin files — the production shape).


Beyond the kit: SDK reference

The kit teaches FHE on Niobium through examples. docs/ covers the SDK surface a customer interacts with at runtime — the timing banner, the nb-run / nb-summary CLI, the bundled FBS workloads, and the CryptoContext parameters the SDK validates against.

Doc When to read
docs/banner.md After your first hot run — turn the four-row banner into something you can reason about.
docs/cli.md When you want to understand how the kit wraps each example, or wrap a binary of your own.
docs/workloads.md When you want to run the SDK's bundled FBS workloads at production scale.
docs/crypto-params.md Background for anyone eventually writing FHE code outside the kit's helper.

Start with docs/ once the kit's examples feel small.


Two backends

Same code, two devices:

Flag Backend What it is
(default) / --fhe-sim FHE_SIM real FHE math, executed on the CPU. Runs anywhere; great for developing.
--fpga FPGA real FHE math, executed on the Niobium accelerator.
./run.sh 04_multiply           # simulator
./run.sh 04_multiply --fpga    # real hardware

Each backend caches its own compiled program (under .runs/<backend>/), so switching backends starts fresh.

First FPGA run does a one-time setup. Before the first device run the runtime does a one-time setup step; later runs skip it.


Run-twice: compile once, run many times

Niobium works in two phases, like compiling a program once and then running the binary many times:

  1. Compile (first run) — the Niobium compiler compiles and optimizes your program, then caches a device-runnable program. Happens once per source version.
  2. Run (every run after) — the cached compiled program is sent straight to the device with your current input data. This is the production path.

Run any example twice and watch what each invocation does. The cold run is pure compile -- no math output, because the device hasn't run. The hot run prints results and verifies them.

$ ./run.sh 04_multiply
[COMPILE] Recording on CPU. No FPGA work.
...
[compile] OK -- cached compiled program ready. Run again to execute on the device
(or pass --full to do both in one call).

==============================================================
 Niobium Run Summary
  target: -
==============================================================
  Total time                :   512.0 ms
  Compilation time          :   512.0 ms
==============================================================

$ ./run.sh 04_multiply
[RUNTIME] Cache hit. Replaying on device.
...
Results:
  product = (2, 4, 6, 8, 15, 18, 21, 24, ...)

[verify] ALL OUTPUTS CORRECT.

==============================================================
 Niobium Run Summary
  target: FHE_SIM
==============================================================
  Total time                :   387.0 ms
  Runtime                   :   387.0 ms
  FHE Execution Time (CPU)  :    21.6 ms
==============================================================

The first run only compiles — it never touches the device. The second run only runs the cached compiled program on the device. This mirrors how production Niobium applications work: compile cold, replay hot, across separate process invocations.

(Editing the source automatically invalidates the cache so the next run re-compiles — you never need to remember --reset. In examples 01–09 this is more aggressive than the real compiler's cache; see Three tiers of input shape above for why and how to opt in to the production behavior.)

Want both phases in a single shell call? Pass --full: ./run.sh 04_multiply --reset --full runs compile then replay back-to-back and populates all 4 banner rows in one shot.


Why the runtime is fast

The first run isn't just translating your OpenFHE calls for the device — the Niobium compiler optimizes the program before caching it, so the compiled program that runs on every later invocation does much less work than the code you wrote. That's why the per-invocation Runtime is a small fraction of the one-time Compilation cost.

This is the Niobium value proposition in one line: FHE math is expensive; compiling once and reusing the optimized program makes the per-run path practical.


Phase 2: production cache mode for any example

By default, examples 01–09 and 12 embed inputs as C++ literals (tier 1 — optimised for readability of the FHE math). That's the right shape for learning operations. But it hides what the Niobium compiler's cache actually does on real customer workloads — where data lives outside source and changing it does NOT invalidate the cache.

Once you've worked through the examples, you can re-run any of them in production cache mode without touching the source:

./run.sh 01_add --pre-encrypted --reset
#  Mode: --pre-encrypted (data externalized to .runs/fhe-sim/.inputs/)
#  COMPILES — same operations, but the literal {1,2,...,8} is now
#  encrypted into .runs/fhe-sim/.inputs/01_add.a.bin (and .b.bin).

./run.sh 01_add --pre-encrypted
#  RUNTIME — loads .bin from disk, cache hit.

./run.sh 01_add --pre-encrypted --regen-input=a:100,200,300,400,500,600,700,800
#  RUNTIME — same compiled program, but 'a' has been re-encrypted with new
#  values. The cache STAYED VALID even though the data changed. This is
#  the production "compile once, run many" model in action.

./run.sh 01_add --pre-encrypted
#  RUNTIME — keeps running against the regen'd values until you reset
#  or regenerate again.

What the flag actually does (one sentence): in --pre-encrypted mode, the helper's Session::encrypt(name, {...}) calls route through the same path as Session::encrypt_or_load_bin(...) that example 11 uses — the ciphertext lives on disk between runs, marked with a filename via 3-arg tag_input, so the cached program is reused across data changes, not invalidated by them.

This is the actual cache discipline of a production Niobium application. Same examples you just learned on, now running against the same cache contract real production code uses.

When to use which mode

Mode When What you see
Default (no flag) Phase 1 — learning the FHE operations Tier-1 cache (any edit recompiles). Math is visible in source; [verify] ALL OUTPUTS CORRECT on hot runs.
--pre-encrypted Phase 2 — seeing how the real compiler cache behaves Tier-3 cache (data outside source, only operation changes recompile). Verify line skipped because data may have been regen'd.

Two small gotchas exist once you're using --pre-encryptededit-source-still-recompiles and mode-switching-needs---reset. They're documented in EXPLORE.md → "--pre-encrypted gotchas"; read that section once you've tried the four commands above and noticed something unexpected.

Why this matters: the kit's binary-mtime workaround makes tier-1 examples easier to read but more aggressive than the real compiler's cache. --pre-encrypted + --regen-input is the kit's faithful demonstration of the production cache discipline. A customer who graduates from the kit and writes their own production-shaped code will be in this mode by default — no flag needed there, because the code itself externalizes data.


What these timings show

After each run the kit prints a summary banner from the SDK's nb-summary tool — the standard FOG / Niobium run-summary telemetry. It's the same banner you'll see on real customer workloads; the kit just wires it up so you see it from day one.

Row What it measures
Total time End-to-end wall-clock — from ./run.sh starting to it exiting.
Compilation time First-time setup: compiling your computation, optimizing it, caching the device-runnable program. Populated on the cold run (cache miss); omitted on hot runs (nothing is being compiled).
Runtime Per-invocation hot path: load the cached compiled program, push current inputs, kick off the device, pull results. Populated on hot runs; omitted on the cold run (it doesn't touch the device).
FHE Execution Time (CPU) (on --fhe-sim) / FHE Execution Time (FPGA) (on --fpga) Just the device math itself, excluding host setup. Populated on hot runs only. On --fpga this is microseconds; on --fhe-sim it's plain OpenFHE-on-CPU wall-clock.

This mirrors the real Niobium workflow — compile once, run many. Every production Niobium application splits these into separate process invocations: a cold-start compile, then any number of hot replays. The kit teaches exactly that pattern.

The cold run answers "how long was the one-time compile?" The hot run answers "what does every additional invocation cost?" These two numbers are what customers most often ask us about.

About [verify] ALL OUTPUTS CORRECT. It means the decrypted device output matched a CPU-computed reference. Examples 10 and 11 skip it because their inputs can change between runs.

Want both numbers in one call? By default each ./run.sh runs one phase — compile on the first invocation, replay on subsequent ones. Pass --full to run both in a single call (this matches how the SDK's make small-fpga style targets orchestrate their phases):

./run.sh 01_add --reset --full          # banner shows all 4 rows in one shot
./run.sh 01_add --fpga --reset --full   # same, on real hardware

--full is a no-op on a cache hit (just runs the hot replay, same as a plain ./run.sh).

What the FHE_SIM Execution row measures. FHE_SIM is a functional simulator: it runs your computation through OpenFHE on the CPU to verify the math, not to predict device timing. It never touches the device, so it does no per-invocation device setup (bitstream load, device configuration, PCIe transfers). That's why the Execution row on --fhe-sim says FHE Execution Time (CPU), not FHE Execution Time (FPGA) — it's a CPU figure, not a device measurement.

These examples are deliberately tiny — 8 slots, a handful of operations — to keep the focus on the model and the workflow. Their wall-clock times are dominated by fixed host overhead (program loading, encryption, device setup), so they are not representative of production-shaped workloads: matrix-vector products over thousands of rows, ML inference, similarity search over large databases.

Use this kit to learn the model and the workflow. Size up to production-shaped workloads to see representative performance.


Troubleshooting

You see… What's going on Fix
Cannot find $NIOBIUM_SDK_DIR/include The kit can't find your SDK (normally auto-detected on a FOG terminal). export NIOBIUM_SDK_DIR=<path> if you know where it's installed — otherwise ask whoever set up your terminal.
--fpga can't find the device or bitstream Your FOG terminal's FPGA is configured by your administrator. Keep going with --fhe-sim; contact your FOG admin if you need the device.
approximation error is too high after editing A stale compiled program or keys from before your edit. ./run.sh <example> --reset to wipe the compiled program + keys and re-compile.
First --fpga run pauses before output Normal — a one-time device setup; see the "First FPGA run does a one-time setup" callout above. Let it finish; subsequent runs skip the setup.

Requirements

  • A Niobium SDK install with bin/ include/ lib/ share/ underneath (run.sh auto-detects it under $HOME).
  • A C++20 compiler (g++ or clang++) and make.

Getting access

This repository is released under the Apache License 2.0 (see LICENSE). Running the examples additionally requires a Niobium SDK install, which is provided to licensed Niobium customers under its own license and comes pre-installed on FOG terminals. Without the SDK the source is still useful as a reference, but ./run.sh will not build or run.


This repository is licensed under the Apache License 2.0 (see LICENSE). The examples build against the Niobium SDK (libnbcc, OpenFHE), which are provided under their own licenses.

About

Beginner-friendly FHE learning examples for the Niobium FHE accelerator. Requires a Niobium SDK install (provided to licensed customers, e.g. on FOG terminals). Apache 2.0 source; proprietary runtime.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors