Skip to content

clang-tidy: enable a bunch of checks#15501

Draft
lovesegfault wants to merge 60 commits intoNixOS:masterfrom
lovesegfault:tidy-ultra
Draft

clang-tidy: enable a bunch of checks#15501
lovesegfault wants to merge 60 commits intoNixOS:masterfrom
lovesegfault:tidy-ultra

Conversation

@lovesegfault
Copy link
Copy Markdown
Member

@lovesegfault lovesegfault commented Mar 17, 2026

this is a demo of what it'd look like if we enabled a lot more clang-tidy checks

60 commits off master. four tiers, each a clean cut point:

tier range +checks NOLINTs bugs
base 9078eb912...1c3e50be3 22 bugprone re-enabled, hicpp-exception-baseclass, 2 cppcoreguidelines, 2 custom (nix-*) 63 11
minimal 1c3e50be3...a1b65ad6d assert-side-effect, starts-ends-with 0 4
medium a1b65ad6d...8004578fd exception-escape, branch-clone, narrowing-conversions (tuned) 59
large 8004578fd...c21c54274 nix-foreign-exceptions 147

base — the bulk of the check enables. 22 previously-disabled bugprone checks turned on (implicit-widening, sizeof-expression, reserved-identifier, unsafe-functions, etc), c-style cast replacements across libutil/libstore/libfetchers, 2 checks ported from lix. 11 bugs: user-input regex throwing raw std::regex_error, over-broad catch(std::exception&), missing Error format args, string_view length not passed to sqlite, unsigned char cast in UTF-8 decode, control-flow exceptions not deriving from std::exception, dead variables.

minimal — 4 std::terminate risks: ~FdLock() throws during unwind, AutoCloseFD move-assign throws mid-move (also wasn't noexcept), two noexcept Callback overrides without try/catch (legacy-ssh one terminates on every call). plus a 5-line fix to our own check (hasType doesn't desugar using aliases) and 18 substr(0,N)==starts_with() auto-fixes.

medium — exception-escape found the 4 bugs above; remaining NOLINTs are operator==/operator<=> where only bad_alloc is possible. branch-clone all intentional. narrowing tuned to float↔int only (29 vs 148 raw).

large — foreign-exceptions found its 11 bugs in the base tier; this is the NOLINT bill. 56 compile-time regex literals, 20 nlohmann/boost/toml wrap boundaries, 12 C/thread/process boundaries, 10 control-flow (Exit), rest headers/tests/templates.

deferred

  • empty-catch — 50 sites tuned, no bugs, just missing // reason comments
  • unchecked-optional-access — 175 sites, one documented-never-null invariant explains 46

The %s placeholder was never given an argument. Because HintFmt disables
boost::format's too_few_args exception (see fmt.hh:42) and the single-arg
HintFmt(const std::string&) overload treats its input as a literal, the
'%s' appeared verbatim in the error message instead of the file path.

Found by the new nix-format-string-arity clang-tidy check.
The %s placeholder was never given an argument. printError expands to
fmt(), whose single-arg overload returns the string unchanged — the '%s'
appeared verbatim in the log message instead of the store URI. Line 294
right above this call already passes storeUri correctly.

Found by the new nix-format-string-arity clang-tidy check.
`string_view::operator[]` returns `char`, which is signed on Linux/macOS.
UTF-8 lead bytes ≥0x80 sign-extend to 0xFFFFFF80+ when assigned to
`uint32_t`. The subsequent `& 0x1fU` etc. masks happen to save us today,
but the intermediate value is wrong and fragile under refactoring.

Prepares for enabling bugprone-signed-char-misuse.
The single warning site (terminal.cc:24) was fixed in the previous
commit.
All warning sites are false positives where the macro argument is a
type name or template-head, NOLINT'd with rationale:

- value.hh, error.hh, variant-wrapper.hh: macro args used as types in
  declarations (`K & val`, `class newClass`). Parens would change
  meaning — `(K) & val` is cast+address-of.

- {common,serve,worker}-protocol-impl.hh: TEMPLATE expands to a
  template-head (`template<...>`), T is a type argument.

- globals.cc: DEFAULT_SOCKET_PATH relies on the expansion context
  providing a std::filesystem::path left operand for operator/.
  Parens would force `"a" / "b"` (no such operator).

- posix-fs-canonicalise.hh, path-info.cc (tests): expression args in
  designated-init / member-call positions with no precedence hazard.
The single warning was vfork() in processes.cc. Rather than NOLINT,
use ReportMoreUnsafeFunctions=false: vfork is deliberate (performance-
critical fork for the sandbox), carefully made stack-clobber-safe via
__attribute__((noinline)) — see the comment at processes.cc:210.

This CheckOption keeps the default POSIX Annex K coverage (gets, sprintf,
etc.) while dropping the "more unsafe" extended list that includes vfork.
All 3 warning sites are intentional grandparent dispatch, NOLINT'd:

- uds-remote-store.hh, ssh-store.cc: narFromPath() calls Store::narFromPath
  (uses local FS accessor) instead of RemoteStore::narFromPath (opens a
  daemon connection). Diamond-inheritance resolution — these stores have
  a local mount and don't need to round-trip through the daemon.

- https-store.cc (test-support): init() calls BinaryCacheStore::init to
  skip HttpBinaryCacheStore::init's network setup in tests.
The single warning site is a false positive: memchr() is bounded by
size(), not by a null terminator — it's searching *for* embedded nulls,
not relying on one.
The single warning site is idiomatic `while (++it != end() && ...)` —
pre-increment has well-defined sequencing relative to the rest of the
condition via &&'s left-to-right evaluation.
Detects boost::format arity mismatches in nix::BaseError-derived
exception constructors and nix::fmt() calls (including printError/debug/
notice/etc. macros, which all expand to fmt()).

Why this check matters: fmt.hh explicitly disables boost::format's
too_few_args_bit and too_many_args_bit exceptions, so arity mismatches
are 100% silent at runtime. Worse, with zero extra arguments the
single-parameter overloads of fmt()/HintFmt are selected, which treat
the string as a literal — '%s' appears verbatim in the output with
no error whatsoever.

The check counts placeholders (%s, %d, %1%, %2%, etc. — %% is literal)
and compares against the number of supplied arguments. It fires only
when the format string is a compile-time string literal; dynamic format
strings can't be checked.

This check already found and fixed two live bugs (see the preceding
fix(libfetchers) and fix(build-remote) commits).
Passing -1 as the length tells sqlite to find the NUL terminator,
but std::string_view is not guaranteed to be NUL-terminated. Pass
value.size() explicitly instead.
The local variable was assigned from getStrAttr but never used;
getAbsPath(input) on the next line is what is actually consumed.
In each case the target (return type, parameter type, or assignee)
already accepts std::optional<T>, so dereferencing the optional just
unwraps and immediately rewraps. Drop the dereference and let the
optional pass through directly.
std::move on a forwarding reference (auto&&) is incorrect when the
reference binds to an lvalue. Use std::forward<decltype(x)>(x) to
preserve value category.
free() takes void*, but passing char** triggers the multi-level
implicit pointer conversion warning. An explicit static_cast<void*>
documents intent and silences the warning.
Each .release() intentionally drops ownership without deletion:
ownership was already transferred (curl slist, realloc) or the
destructor would crash in the child process (logger after fork).
Cast to void to document intent and satisfy unused-return-value.
ASSERT_STREQ on string_view::data() relies on NUL termination
which string_view does not guarantee. Compare the string_view
directly with ASSERT_EQ instead.
Multiplying int literals and then widening the result to size_t,
uint64_t, or time_t risks intermediate overflow. Add a suffix to
the first operand so the multiplication itself is performed in the
wider type.

The profiles.cc case is a real latent bug: *days * 24 * 3600 would
overflow int at days > 24855 before widening to time_t.
Setting<T> has operator const T&() for implicit conversion, but
clang-tidy cannot see through the conversion operator when checking
for enum exhaustiveness. Call .get() explicitly so the check sees
the underlying enum type.
Zero warnings on clang 21. The prior concern about pointers to
aggregates no longer applies.
All three intentional .release() calls cast to void in a86e39c.
The AllowCastToVoid option remains set.
All three sites converted to std::forward in 7f1deeb.
The remaining warning in parser-tab.hh is bison-generated code,
already excluded via -exclude-header-filter in the CI runner.
…n-result

All 33 sites fixed with literal suffixes in 03d3b5b.
The single site in value.hh mkFailed() is intentional: Boehm GC
aborts on OOM rather than throwing, and std::terminate is the
intended behavior in a noexcept function anyway. NOLINT at the site.
The single warning in Magenta<T>::operator<< fires when T is
unsigned char, but the generic colorizer correctly passes through
whatever value was wrapped. NOLINT at the site.
One false positive (Setting<enum> conversion operator) fixed in
c8decae. The remaining char-scanner switch has no meaningful
default, so NOLINT at the site.
The two sites in derivation-goal.cc rely on moved-from containers
being empty, which holds for both libstdc++ and libc++. NOLINT
both sites so the check stays active for future code.
…erop

libsodium takes unsigned char*, libarchive uses void* callbacks.
reinterpret_cast for the char*/unsigned char* conversions,
static_cast for the void* callback data.
sqlite3_column_text returns const unsigned char*; curl callbacks
pass void* userdata. reinterpret_cast for the text conversions,
static_cast for the callback thunks.
- static_cast for void* callback data (GC allocator, socket addresses)
- reinterpret_cast for type-punning and C-API opaque handles
- const_cast where APIs demand non-const (POSIX exec argv, libgit2)
- Downcasts now use static_cast to document the assumption
- nix_api_store.hh: tmpl.data() replaces the cast entirely (C++17)
All 89 sites converted to explicit casts in the preceding four
commits. The two sites in value.hh remain C-style: isBlackhole()
and mkBlackhole() are inline for performance and cast &eBlackHole
(ExprBlackHole) to Expr*, but ExprBlackHole's inheritance from
Expr is only forward-declared at this point in the header. NOLINT
both; reinterpret_cast would work but would lie about the intent.
…tream> in headers

Including <iostream> in a header drags the std::cout/cerr static
initializers into every translation unit that transitively includes
it. Use <iosfwd> for forward declarations and <ostream> where the
full type is needed.

The .cc files that actually use std::cout/cin/cerr now include
<iostream> directly instead of relying on transitive includes.
@github-actions github-actions Bot added new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store repl The Read Eval Print Loop, "nix repl" command and debugger fetching Networking with the outside (non-Nix) world, input locking c api Nix as a C library with a stable interface labels Mar 17, 2026
@lovesegfault lovesegfault changed the title clang-tidy round 3: bug-hunting via tuned checks clang-tidy: enable a bunch of checks Mar 17, 2026
lovesegfault and others added 15 commits March 17, 2026 07:05
…ns checks

nix-forbidden-includes flags #include <iostream> in .hh files.
Including <iostream> in a header drags the std::cout/cerr static
initializers into every TU that transitively includes it. The check
fires via a PPCallbacks hook; simplified from Lix's collect-then-
replay to diagnose inline (matching upstream clang-tidy's pattern).

nix-foreign-exceptions flags throwing or catching exception types
that do not derive from nix::BaseError, which bypasses the trace
and hint-formatting machinery. std::bad_alloc and filesystem_error
are allowlisted. Also flags non-trivial std::basic_regex ctors
since they can throw std::regex_error.

Both checks ported from Lix with adaptations:
- Policy: forbidden-includes bans <iostream> in headers rather
  than nlohmann (Nix has no wrapper-header policy)
- Naming: BaseException -> BaseError
- Allowlist: added filesystem_error (caught legitimately at
  syscall boundaries and immediately converted to SysError)

Based on Lix changes:
  I3b71c32ecd16afc2246c946472f5629a1fa31f2c (ForeignExceptions)
  Icc7c578ff488ba520efac5d898572ccf4486e9a8 (ForeignExceptions: regex_error)
  I54ecc14f5bec5e2177729b41c3703216e76cc6a3 (ForbiddenIncludes)

Co-authored-by: eldritch horrors <pennae@lix.systems>
Three sites catch std::exception around nlohmann::json::parse, but
the only thing parse/get throw is nlohmann::json::exception. Narrow
so we do not accidentally swallow Error-derived exceptions.

build-remote.cc catches around openStore()->connect(), which only
throw Error-derived types. Narrow to Error so Interrupted, OOM,
etc. propagate correctly.

Found by nix-foreign-exceptions dry run.
Several CLI-facing commands accept regex patterns from users but
construct std::regex without wrapping, so an invalid pattern throws
raw std::regex_error with no context:

- nix-env -qa '(' (via DrvName::matches)
- nix profile remove --regex '('
- nix search '('
- nix-shell --exclude '('
- github: ref with regex metacharacters (via refs/(heads|tags)/%s)
- local-overlay-store: store path with regex metacharacters

Wrap each construction site and throw UsageError/Error/BadURL with
the offending pattern in the message.

The nix-build --exclude case also fixes a minor inefficiency: the
regex was being recompiled for every input derivation instead of
once per pattern.

Found by nix-foreign-exceptions dry run.
The throwsAllowed matcher used hasType(allowedExceptions) which does not
desugar type aliases. NativeSysError is `using NativeSysError = SysError`,
so throws of NativeSysError were flagged as foreign even though SysError
derives from BaseError.

Wrap in hasCanonicalType(recordType(hasDeclaration(...))) to canonicalize
before matching. Verified with clang-query against file-system.cc:
  has(expr(hasType(cxxRecordDecl(...)))):                    14 matches
  has(expr(hasType(hasCanonicalType(recordType(...))))):     22 matches

Eliminates all 15 NativeSysError false positives in file-system.cc.
lockFile() throws SysError when flock() fails with an unexpected errno.
If ~FdLock() runs during stack unwinding (e.g. an exception was thrown
while the lock was held), a second throw from lockFile() calls
std::terminate.

Move the destructor out-of-line to pathlocks.cc and wrap in try/catch +
ignoreExceptionInDestructor(), matching PathLocks::~PathLocks().

Found by bugprone-exception-escape.
The move-assignment operator called close() which throws SysError on
failure. If this throws mid-move, the object is left in a broken state
and that.fd is never reset (potential double-close). Worse, without
noexcept, containers like std::vector<AutoCloseFD> fall back to
copying on resize — but the copy constructor is deleted, so this
silently disables move optimizations in any container holding these.

Wrap close() in try/catch + ignoreExceptionInDestructor() (close()
only throws SysError, never Interrupted, so the destructor variant is
correct). Mark noexcept to match the move constructor.

Found by bugprone-exception-escape.
0 sites. Nix does not compile out asserts (NDEBUG is never set), so
side effects in assert() conditions would not be bugs today — but
enabling the check keeps it that way if build flags ever change.
Auto-fixed by modernize-use-starts-ends-with. 18 sites:
- 12 substr(0, N) == "..." allocates a temporary string
- 4 compare(0, N, "...") == 0
- 2 find("...") == 0

The substr cases were a real (minor) perf cost: allocating a temporary
std::string on every call just to compare the first N chars.

Notably hasPrefix() in util.cc was a hand-rolled starts_with.
0 sites after the preceding auto-fix commit.
queryPathInfoUncached (DummyStore) and queryRealisationUncached
(LegacySSHStore) are declared noexcept by the base-class contract but
called throwing code without a try/catch. Any exception escaping a
noexcept function calls std::terminate.

The established pattern (LocalStore, RemoteStore) wraps the body in
try { ... } catch (...) { callback.rethrow(); } to route exceptions
through the callback's promise instead of crashing.

DummyStore::queryPathInfoUncached: hashPath() does I/O and can throw.
LegacySSHStore::queryRealisationUncached: unsupported() always throws
Error; this method would terminate on every call.

Found by bugprone-exception-escape.
Found 4 real bugs, all fixed in earlier commits:
- FdLock::~FdLock() throwing during stack unwind -> std::terminate
- AutoCloseFD move-assign throwing mid-move -> broken state + double-close
- DummyStore::queryPathInfoUncached noexcept without try/catch
- LegacySSHStore::queryRealisationUncached: unsupported() always throws

Remaining 14 sites NOLINTed — all comparison operators on types with
map/set/variant/ref<> members where the only possible throw is
std::bad_alloc or ref<>::operator* null-check (never null by invariant):

- GENERATE_CMP_EXT/GENERATE_EQUAL macros in {built,derived}-path.cc:
  dereference ref<> which has a null check in operator*; ref<> is never
  null by construction invariant
- value.hh data()/size(): std::visit on a 2-alternative variant; variant
  never becomes valueless-by-exception in practice
- Defaulted operator==/operator<=> on BuildResult, VariantT: comparing
  variant/map members; only bad_alloc possible, which everything can throw
- attr-set.hh iterator++: copies a trivial iterator struct
- Input::operator==: compares Attrs (std::map); only bad_alloc possible
16 sites, all intentional. Branches have identical bodies but conditions
carry semantic meaning:

- daemon.cc/remote-store.cc/store-api.cc: FileIngestionMethod switch
  where NixArchive and Git both use NAR serialization (Git is not a
  wire format). Explicit comment at each site.
- logging.cc: lvlDebug and lvlVomit both map to ANSI '7' (no finer
  gradation in the terminal escape)
- unix/file-system.cc: nlink==1 and nlink==2 both add st_size to
  bytesFreed; 5-line comment explains the hardlink math
- names.cc: version comparison ladder with return true/false repeated;
  conditions document the ordering
- util.cc: X-macro expansion, all cases expand to `return suffix`
- print.cc: escape char chain; each condition documents which char
- filetransfer.cc: HTTP status classifier; err=Misc repeated but
  conditions document status ranges
- build.cc: outputPath=null for not-found vs found-but-empty-optional
- build-remote.cc: machine selection tiebreaker (best=true ladder)
- search.cc: recurse() called for different package tree shapes
- store-registration.cc, git-utils.cc, daemon.cc:285: classifier chains

No bugs found; the check stays useful for catching accidental copy-paste
in new code where the conditions ARE meant to differ.
Tuned with WarnOnIntegerNarrowingConversion=false. Integer narrowing
(int64->int32, unsigned->signed) is pervasive in this codebase and
harmless in practice — suppressing it leaves only float<->int
conversions where precision loss is a real concern.

29 sites at the tuned level, all intentional:

- eval.cc (6), primops.cc (2): Nix language int->float coercion is
  by design; the language spec allows 2^53 < v < 2^63 precision loss
- primops.cc ceil/floor (4): double->long after explicit range check
  against numeric_limits<NixInt>::min()
- progress-bar.cc (10): uint64_t->double for display percentages;
  byte counts never approach 2^53
- gc.cc (2): threshold ratio comparison (avail * 0.97)
- current-process.cc (1): cgroup CPU quota ratio
- build-remote.cc (4): load/speedFactor comparison; job counts are
  small, well under float mantissa precision

No bugs found at this tuning level. The check guards future code
doing unguarded double->int truncation.
147 sites across 69 files, all triaged as intentional:

Regex ctors with compile-time literals (56 sites): pattern is baked into
the binary; can't throw at runtime. Clusters in nix-channel.cc (8),
hilite.cc tests (8), nix-build.cc (6), url-name.cc (5).

Regex ctors wrapped in try/catch (5 sites): user-input regex already
wrapped per 8553748; check can't see the surrounding try.

Third-party wrap boundaries (20 sites): catch nlohmann/boost/toml/
regex_error and convert to nix::Error. These sites ARE the wrapping.

Control-flow exceptions (10 sites): Exit, GCLimitReached, BailOut are
deliberate non-BaseError control flow carrying only exit status.

C/thread/process boundaries (12 sites): libarchive callbacks, worker
threads, child processes, std::terminate handler, C API entry points.
Last-resort catch before crossing into foreign code.

Rethrow-and-wrap dispatch (8 sites): bare throw; in rethrowExceptionAsError,
ignoreException* implementations, handleExceptions top-level.

Stdlib-contract headers (7 sites): ref.hh, fun.hh, checked-arithmetic.hh,
static-string-data.hh, goal.hh throw std::invalid_argument; can't include
error.hh without circular deps or header bloat.

Type-dependent templates (3 sites): throw T where T always derives from
BaseError (EvalErrorBuilder, MakeError).

Test code (12 sites): tests can throw/catch stdlib exceptions freely.

Logic-error markers (1 site): std::logic_error for unreachable code.

All 11 real bugs found by this check were fixed in earlier commits
(8553748 user-input regex wraps, cbb3ac5 over-broad catches).
All 147 sites NOLINTed in the previous commit. The check now guards
against new code throwing/catching non-BaseError exceptions without
explicit justification.

Future violations must either wrap in nix::Error (preferred) or NOLINT
with a rationale explaining the boundary (C API, third-party lib,
control-flow exception, compile-time regex literal).
@xokdvium
Copy link
Copy Markdown
Contributor

YES! Thank you! <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c api Nix as a C library with a stable interface fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command repl The Read Eval Print Loop, "nix repl" command and debugger store Issues and pull requests concerning the Nix store with-tests Issues related to testing. PRs with tests have some priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants