feat: file preprocessing #1891

thomas-zahner · 2025-10-29T11:01:24Z

Addresses #1672

I've created a new repository for testing and documenting compatibility with other file formats: https://github.com/thomas-zahner/lychee-all
We might want to merge this information into the official docs later.

As mentioned in the issue this PR is heavily inspired by ripgrep's preprocessor.

Makefile

katrinafyi

Looks good! The most significant comment is the one about Skip's concept. I think a different collection of values would make more sense.

The rest of this comment is just my own commentary. Feel free to ignore it, and I certainly don't expect anything in this PR to change because of it.

When reading this PR, I had the thought that the new preprocess value has to be handled a lot and passed repeatedly to get to where it needs to be used. I think this is due to an architecture where it's all very hierarchical. It looks something like this (conceptually):

inputs
|> collector(basic_auth, skip, include_verbatim, client, preprocess, ...)

Here, collector calls other helper functions and has to amalgamate all their arguments. It is responsible for a lot of functionality, from resolving inputs all the way to link extraction and request building.

If the architecture was more like a flat pipeline, it would reduce the need for this argument injection. Instead, of one big "collector", it might look like this:

inputs
|> resolve_inputs(skip, glob_ignore_case)
|> preprocess_inputs(pre_cmd)
|> get_input_contents(basic_auth, retries, max_redirect)
|> extract_links(root_dir, base_url)

Hopefully, you can see how this reduces the parameters needed - each step only needs the parameters for its own functionality. A clear pipeline makes it much easier to implement features like --dump or --dump-inputs, which are just stopping at certain points in the pipeline (I started thinking about this because of the dumping issues). It also makes testing easier.

Anyway, this is all theoretical at the moment. I don't know if this is possible or how hard it would be. There is Chain in the codebase, but it's limited to homogenous pipeline functions. Anyway, as I said, nothing that needs to affects this PR right now.

fixtures/pre/error_message.sh

lychee-bin/src/options.rs

lychee-lib/src/types/file.rs

README.md

thomas-zahner · 2025-10-31T15:43:31Z

@katrinafyi Thanks for your thoughts.

If the architecture was more like a flat pipeline, it would reduce the need for this argument injection. Instead, of one big "collector", it might look like this

I really do like this idea and I totally agree. It would probably simplify quite a lot. IMO we could open up an issue to tackle that separately.

Edit: opened up #1898

lychee-bin/src/options.rs

lychee-lib/src/types/input/resolver.rs

lychee-lib/src/types/preprocessor/mod.rs

lychee-lib/src/types/error.rs

Co-authored-by: Matthias Endler <[email protected]>

…file

thomas-zahner requested a review from mre October 29, 2025 11:15

thomas-zahner commented Oct 29, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

thomas-zahner mentioned this pull request Oct 30, 2025

fix: inverted gitignore behaviour for --dump-inputs #1882

Merged

katrinafyi reviewed Oct 30, 2025

View reviewed changes

fixtures/pre/error_message.sh Show resolved Hide resolved

lychee-bin/src/options.rs Outdated Show resolved Hide resolved

lychee-bin/src/options.rs Outdated Show resolved Hide resolved

lychee-lib/src/types/file.rs Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

thomas-zahner changed the title ~~File preprocessing~~ feat: file preprocessing Oct 31, 2025

thomas-zahner force-pushed the file-preprocessing branch from 83a2411 to 02b5db8 Compare October 31, 2025 17:03

thomas-zahner mentioned this pull request Nov 1, 2025

Flatten processing pipeline #1898

Open

thomas-zahner force-pushed the file-preprocessing branch from 02b5db8 to 6e877bd Compare November 1, 2025 10:41

thomas-zahner added 6 commits November 1, 2025 11:59

Fix benches & remove TryFrom which was only used in bench

b6313e9

Basic concept working

d768176

Test preprocessor option

8f142e8

Extract Preprocessor type & add documentation

33ce218

Apply clippy suggestions

e21dd48

Update help message

39cd56b

thomas-zahner force-pushed the file-preprocessing branch from 6e877bd to 39cd56b Compare November 1, 2025 11:00

mre reviewed Nov 1, 2025

View reviewed changes

lychee-bin/src/options.rs Outdated Show resolved Hide resolved

lychee-lib/src/types/input/resolver.rs Outdated Show resolved Hide resolved

lychee-lib/src/types/preprocessor/mod.rs Outdated Show resolved Hide resolved

lychee-lib/src/types/error.rs Outdated Show resolved Hide resolved

thomas-zahner and others added 4 commits November 3, 2025 08:39

Update lychee-bin/src/options.rs

71ab3bc

Co-authored-by: Matthias Endler <[email protected]>

Apply review suggestions

5406d15

Remove ErrorKind::TestError to fix compilation errors for release pro…

dc800ea

…file

Update help message

b2d73a9

thomas-zahner merged commit 8011ef0 into lycheeverse:master Nov 4, 2025
7 checks passed

mre mentioned this pull request Nov 4, 2025

chore: release v0.21.1 #1894

Open

thomas-zahner mentioned this pull request Nov 7, 2025

File format preprocessing #1672

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: file preprocessing #1891

feat: file preprocessing #1891

Uh oh!

thomas-zahner commented Oct 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

katrinafyi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomas-zahner commented Oct 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: file preprocessing #1891

feat: file preprocessing #1891

Uh oh!

Conversation

thomas-zahner commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

katrinafyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomas-zahner commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thomas-zahner commented Oct 29, 2025 •

edited

Loading

thomas-zahner commented Oct 31, 2025 •

edited

Loading