Problem
Two functions contain nearly identical input-normalization logic with no shared implementation:
core.ts validateParameters (lines 202–226) and getPdfMetadata.ts (lines 41–63) both contain:
- An identical
switch (true) block converting Buffer | string | Uint8Array | URL into DocumentInitParameters.data or DocumentInitParameters.url — including the non-obvious ordering constraint where Buffer.isBuffer must precede instanceof Uint8Array because Buffer extends Uint8Array
- The same four pdfjs performance flags set identically:
verbosity = VerbosityLevel.ERRORS, disableAutoFetch = true, disableStream = true, disableRange = true
Bug fixes or new input type support (e.g., ArrayBuffer) require two coordinated edits. The ordering hazard in the switch is silently duplicated. The four pdfjs flags are a library-level policy decision but appear scattered across two files.
Proposed Interface
A new file src/inputNormalizer.ts with one function and one type alias:
// src/inputNormalizer.ts
export type PdfInput = Buffer | string | Uint8Array | URL;
export async function buildDocumentInitParameters(
input: PdfInput,
): Promise<DocumentInitParameters>
// Returns DocumentInitParameters with data/url set + the 4 standard pdfjs flags.
// Password is NOT included — each caller handles it differently by design.
core.ts — validateParameters becomes:
import { buildDocumentInitParameters, type PdfInput } from '#afpp/src/inputNormalizer';
const validateParameters = async (input: PdfInput, options?: AfppParseOptions) => {
const documentInitParameters = await buildDocumentInitParameters(input);
documentInitParameters.password = options?.password; // layered on after
const scale = options?.scale ?? 1.0;
// ... scale validation
const concurrency = ...;
// ... concurrency validation
const encoding = options?.imageEncoding ?? 'png';
// ... encoding validation
return { concurrency, documentInitParameters, encoding, scale };
};
getPdfMetadata.ts — switch block deleted entirely:
import { buildDocumentInitParameters, type PdfInput } from '#afpp/src/inputNormalizer';
export async function getPdfMetadata(
input: PdfInput,
options?: Pick<AfppParseOptions, 'password'>,
): Promise<PdfMetadata> {
const documentInitParameters = await buildDocumentInitParameters(input);
// password handled via loadingTask.onPassword callback — unchanged
let isEncrypted = false;
const loadingTask = getDocument(documentInitParameters);
// ... rest unchanged
}
getPdfMetadata.ts also loses its readFile, DocumentInitParameters, and VerbosityLevel imports — all three move into inputNormalizer.ts.
The PdfInput type alias replaces the repeated inline Buffer | string | Uint8Array | URL union across all six public-facing function signatures and should be re-exported from index.ts.
Dependency Strategy
In-process — pure async computation with one I/O call (readFile). No network, no external service.
inputNormalizer.ts sits at the bottom of the dependency graph:
- Imports only:
node:fs/promises, pdfjs-dist (for VerbosityLevel and DocumentInitParameters type)
- No imports from any other
afpp source file
- No
@napi-rs/canvas, no p-limit
This keeps getPdfMetadata's module graph lean — it currently imports only a type from core.ts. Placing the function in core.ts instead would transitively pull @napi-rs/canvas and p-limit into getPdfMetadata's module graph unnecessarily.
buildDocumentInitParameters is not exported from index.ts — it is an internal implementation detail. Only PdfInput crosses the public API boundary.
Testing Strategy
New boundary tests to write (in a new test/inputNormalizer.test.ts):
- String path →
data: Uint8Array (mocking readFile)
Buffer → data: Uint8Array (verifies Buffer.isBuffer branch, not instanceof Uint8Array)
Uint8Array → data: Uint8Array (passes through as-is)
URL → url: URL
- Invalid input → throws
Error with message matching Invalid source type: ...
- All cases: returned object has
verbosity, disableAutoFetch, disableStream, disableRange set to the expected values
Existing tests: No changes needed. The existing input-type coverage in getPdfMetadata.test.ts, pdf2string.test.ts, and pdf2image.test.ts continues to serve as integration-level verification that the refactor didn't break behavior. The new unit tests are additive.
Implementation Recommendations
What buildDocumentInitParameters should own:
- The
switch (true) input-type dispatch (including the Buffer-before-Uint8Array ordering)
- The
readFile call for string paths
- The four pdfjs performance flags as a library invariant
What it should hide:
- The
Buffer.isBuffer / instanceof Uint8Array ordering constraint
- The
readFile async step
- The pdfjs flag names and values
What it should expose:
- A
DocumentInitParameters object ready to pass to getDocument()
- No password — callers set this themselves, since
core.ts sets it eagerly and getPdfMetadata.ts handles it reactively via onPassword
Caller migration:
validateParameters in core.ts: replace the switch block + 4 flag lines with one await buildDocumentInitParameters(input) call, then add documentInitParameters.password = options?.password
getPdfMetadata.ts: replace the switch block + 4 flag lines with one await buildDocumentInitParameters(input) call; remove readFile, DocumentInitParameters, and VerbosityLevel imports
- All public function signatures: replace
Buffer | string | Uint8Array | URL with PdfInput
Problem
Two functions contain nearly identical input-normalization logic with no shared implementation:
core.tsvalidateParameters(lines 202–226) andgetPdfMetadata.ts(lines 41–63) both contain:switch (true)block convertingBuffer | string | Uint8Array | URLintoDocumentInitParameters.dataorDocumentInitParameters.url— including the non-obvious ordering constraint whereBuffer.isBuffermust precedeinstanceof Uint8ArraybecauseBuffer extends Uint8Arrayverbosity = VerbosityLevel.ERRORS,disableAutoFetch = true,disableStream = true,disableRange = trueBug fixes or new input type support (e.g.,
ArrayBuffer) require two coordinated edits. The ordering hazard in the switch is silently duplicated. The four pdfjs flags are a library-level policy decision but appear scattered across two files.Proposed Interface
A new file
src/inputNormalizer.tswith one function and one type alias:core.ts—validateParametersbecomes:getPdfMetadata.ts— switch block deleted entirely:getPdfMetadata.tsalso loses itsreadFile,DocumentInitParameters, andVerbosityLevelimports — all three move intoinputNormalizer.ts.The
PdfInputtype alias replaces the repeated inlineBuffer | string | Uint8Array | URLunion across all six public-facing function signatures and should be re-exported fromindex.ts.Dependency Strategy
In-process — pure async computation with one I/O call (
readFile). No network, no external service.inputNormalizer.tssits at the bottom of the dependency graph:node:fs/promises,pdfjs-dist(forVerbosityLevelandDocumentInitParameterstype)afppsource file@napi-rs/canvas, nop-limitThis keeps
getPdfMetadata's module graph lean — it currently imports only a type fromcore.ts. Placing the function incore.tsinstead would transitively pull@napi-rs/canvasandp-limitintogetPdfMetadata's module graph unnecessarily.buildDocumentInitParametersis not exported fromindex.ts— it is an internal implementation detail. OnlyPdfInputcrosses the public API boundary.Testing Strategy
New boundary tests to write (in a new
test/inputNormalizer.test.ts):data: Uint8Array(mockingreadFile)Buffer→data: Uint8Array(verifiesBuffer.isBufferbranch, notinstanceof Uint8Array)Uint8Array→data: Uint8Array(passes through as-is)URL→url: URLErrorwith message matchingInvalid source type: ...verbosity,disableAutoFetch,disableStream,disableRangeset to the expected valuesExisting tests: No changes needed. The existing input-type coverage in
getPdfMetadata.test.ts,pdf2string.test.ts, andpdf2image.test.tscontinues to serve as integration-level verification that the refactor didn't break behavior. The new unit tests are additive.Implementation Recommendations
What
buildDocumentInitParametersshould own:switch (true)input-type dispatch (including theBuffer-before-Uint8Arrayordering)readFilecall for string pathsWhat it should hide:
Buffer.isBuffer/instanceof Uint8Arrayordering constraintreadFileasync stepWhat it should expose:
DocumentInitParametersobject ready to pass togetDocument()core.tssets it eagerly andgetPdfMetadata.tshandles it reactively viaonPasswordCaller migration:
validateParametersincore.ts: replace the switch block + 4 flag lines with oneawait buildDocumentInitParameters(input)call, then adddocumentInitParameters.password = options?.passwordgetPdfMetadata.ts: replace the switch block + 4 flag lines with oneawait buildDocumentInitParameters(input)call; removereadFile,DocumentInitParameters, andVerbosityLevelimportsBuffer | string | Uint8Array | URLwithPdfInput