Tags: g-cqd/apple-docs
Tags
docs(rfc): Swift-native transition strategy (repo-internal, not on th… …e docs site) rfcs/0001: goals and hard constraints (deps from apple/swiftlang/ pointfreeco only, vetted-exception process with Vapor as the named P6 candidate, system sqlite3/zstd/harfbuzz allowed), full current-state inventory, the bun:ffi + @_cdecl bridge with a per-module APPLE_DOCS_NATIVE kill switch and CI parity gates (golden NDCG/MRR, benchmarks, snapshot determinism), pinned toolchains with static-musl Linux cross-compilation, and phases P0–P7 from leaf hot functions and the from-scratch model2vec embedder through renderers, content, storage, servers, and the final single static binary. Lives in rfcs/ outside the VitePress srcDir so the production docs site never builds or indexes it; pointer added to ARCHITECTURE.md.
fix(snapshot): ship only the active pinned model — the models dir is … …also a download cache The bake-off left a 1.2GB gemma + 128MB bge in resources/models; the wholesale copyTreeFast shipped them and pushed beta.3 to 2.30GiB, over the 2GiB GitHub asset ceiling (publish guard caught it). Snapshots now copy resolveActiveSpec().hfId's subtree only, keeping the ADMX filter.
docs(rfcs): RFC 0002 — Swift-native embedder contract The P2 sketch graduates to its own RFC with hard gates: throughput (>=2x transformers.js / >=5x WASM-Intel), mmap'd weights with a peak-RSS budget and zero hot-loop allocation, vDSP-vs-portable-SIMD behind one internal API with tolerance-based (1e-5) vector parity + bit-exact quantized codes + unchanged golden NDCG/MRR, and a tokenizer-parity spike gating everything else. Packaging: internal ADEmbed target first, extraction criteria for a standalone package recorded. Weights artifact leans raw-matrix re-export at snapshot build (sha256-pinned) over an ONNX reader. RFC 0001: P2 points here; P1 marks the archiver shipped.
docs: refresh size, weight, and document-count claims from a clean-ro… …om audit Measured by installing snapshot-20260611 (stable) and snapshot-20260610-beta.3 into fresh data dirs, all three storage profiles each, on an M2 Pro: archive 1.89/1.90 GB (was 1.62), profiles ~4.6/~7.1/~10.5 GB on disk (were ~3/~5.5/~8.6), 353,325 documents, 361,823 rendered files per format, semantic chunk index ~0.7 GB (was ~0.5), DB 5.2 -> 2.7 GB under compact. Also documents the fresh-install beta-channel resolution rule.
fix(storage): recycle() respawns reader workers serially with retry CI (macos-26) hit "malformed database schema (sqlite_master)" in the reader-pool recycle test — the exact Darwin WAL/SHM bring-up race start() already guards against by booting slots serially with per-slot retry. recycle() still spawned all workers in parallel without retry, reintroducing the race on slow runners. It now reuses startSlot().
feat(scope): opt-in corpus scoping — scope.json + prune + scoped sync (… …#7) A scope.json in the data directory now names what to KEEP: sources, apple-docc frameworks, and whether fonts/SF Symbols stay. `apple-docs prune` trims an existing corpus to it without re-crawling (documents, FTS rows, vectors, raw payloads, relationships, and on-disk files all go; VACUUM reclaims the space; --dry-run previews). `sync` reads the same file, filtering adapters and apple-docc roots so refreshes never grow past the scope, and skips the Xcode enrichment merge that would flood excluded pages back in. No scope.json → byte-identical full-coverage behavior everywhere; the framework narrowing applies only to the apple-docc adapter so other sources' root namespaces are never filtered by it.
fix(storage): recycle() respawns reader workers serially with retry CI (macos-26) hit "malformed database schema (sqlite_master)" in the reader-pool recycle test — the exact Darwin WAL/SHM bring-up race start() already guards against by booting slots serially with per-slot retry. recycle() still spawned all workers in parallel without retry, reintroducing the race on slow runners. It now reuses startSlot().
fix(web): hash overlong path segments so every page builds and links Forty corpus keys carry a Swift init-signature segment past the 255-byte filesystem component limit (max observed: 319 bytes), so the static build's mkdir failed with ENAMETOOLONG and those pages silently never shipped. The canonical web path now maps any segment over 200 bytes to truncate(180)~sha1-12 — the same scheme keyPath has always used for storage files — applied at every emit site (list/tree hrefs, body links, breadcrumbs, canonicals, sitemap, the .md variant, and the build output path) so live and static URLs are identical. The live route resolves hashed paths back to raw keys via a lazy per-db map of the ~100 long keys, and still serves the raw overlong URL directly. /api/search additionally carries webPath on the rare hit whose site URL differs from its corpus key (web-only — MCP/CLI keep the raw path that read_doc accepts). dist-beta/ build artifacts are gitignored.
test(setup): advertise .tar.zst assets in the download mock
The main setup download test served .tar.zst bytes (from snapshotBuild) under
.tar.gz asset names, so setup routed to the gzip validator. macOS bsdtar was
lenient; GNU tar on ubuntu hard-failed ("not in gzip format"). The checksum-
rejection tests are untouched — they reject before validation.
PreviousNext