Skip to content

Conversation

@katrinafyi
Copy link
Contributor

@katrinafyi katrinafyi commented Sep 5, 2025

for parts of the link checking pipeline including and downstream of InputContent, this PR replaces InputSource fields with ResolvedInputSource. specifically, this means the structs InputContent, Request, and Response.

this makes sense for two reasons:

  • conceptually, once you are able to read the content of a source, you must have resolved globs to a real path, so using ResolvedInputSource is more accurate.
  • in practice, InputContent's source field was never set to a FsGlob variant, it always used variant cases which are shared with ResolvedInputSource. this is because InputContent was constructed by pattern matching out of a ResolvedInputSource - also see the *_content methods in that code snippet. so there is no loss of information in making this PR, it is just a more precise type.

in particular, the second point above means this can be done with no changes to external behaviour (there is a breaking API change, of course). in fact, this PR is almost a direct text replacement, with the gain in line count only happening because of new multi-line expressions.

this PR is motivated because when working with relative URLs and base URLs, it would be nice to know that the FsGlob case is excluded, as it would be impossible to construct a URL relative to a glob.

i've tried to make this PR reasonably thorough in its replacement of usages but notably, the stats maps are still HashMap<InputSource, ...> rather than ResolvedInputSource. i tried to change this and it led to a can of worms with InputSource implementing Deserialize but ResolvedInputSource doesn't, so i abandoned it. so, we simply convert ResolvedInputSource back to InputSource before making the statistics.

one other outstanding todo might be to use Cow in ResolvedInputSource, as InputSource now does.

this PR is just some old changes I had lying around. I'm not too attached to this PR and could get by without it. it is a breaking API change, so feel free to leave it if you wish.

not needed due to the way `InputSource` was constructed in
`InputContent`. previously, this was constructed by pattern
matching out of ResolvedInputSource, so the InputSource variants
were guaranteed only be ResolvedInputSource variants. so, there is no
loss of information and no change in functionality.
@mre
Copy link
Member

mre commented Sep 5, 2025

Yes, I think that makes sense.

and adds Cow to ResolvedInputSource

Conflicts:
lychee-lib/src/types/input/content.rs
lychee-lib/src/types/request.rs
lychee-lib/src/utils/request.rs
@katrinafyi
Copy link
Contributor Author

katrinafyi commented Sep 6, 2025

ResolvedInputSource now Cow as well and conflicts are fixed.

I find that the 'static lifetime used on the Cow forces clones in situations that should really be a borrow. For instance, resolve_input has lifetime parameters and should be able to return a ResolvedInputSource containing a Cow::Borrowed with lifetime 'a. Instead, we have to .clone().

But this was already a clone before this PR, so it's probably fine for now. From a quick try, hoisting the lifetime to ResolvedInputSource<'a> forces it to appear in a lot more places ;-; This can be fixed in another PR if needed.

@mre mre merged commit 438e9b2 into lycheeverse:master Sep 9, 2025
6 checks passed
@mre
Copy link
Member

mre commented Sep 9, 2025

Great progress. Thanks @katrinafyi .

And yes, we could try a shorter lifetime. I personally wouldn't mind introducing an 'input lifetime but we'll have to see how noisy it gets. To be fair, raw string inputs are also a bit of an exception, so not sure if it's worth it.

@mre mre mentioned this pull request Sep 9, 2025
katrinafyi added a commit to rina-forks/lychee that referenced this pull request Oct 5, 2025
it looks like this:
```
$ lychee 'non-existing/*' '*.fdsamifdsa' 'no-matches?????' empty-dir
   [WARN ] *.fdsamifdsa: No files found for this input source
   [WARN ] no-matches?????: No files found for this input source
   [WARN ] non-existing/*: No files found for this input source
   [WARN ] empty-dir: No files found for this input source
  0/0 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links                                                                                                                                        🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors
```

this is implemented as a sneaky `log::warn!` which is probably
undesirable. however, the code isn't set up yet to properly pass errors
from the input resolving stage up to the top level.

even if it was, the "error on empty input" error should probably be put
behind a flag. it would also be tricky because the reporting expects a
ResolvedInputSource rather than InputSource, and these warnings arise in
the process of resolving. (maybe we could slightly wind back lycheeverse#1840 to
make this work)

anyway, much to think about.
katrinafyi added a commit to rina-forks/lychee that referenced this pull request Oct 5, 2025
it looks like this:
```
$ lychee 'non-existing/*' '*.fdsamifdsa' 'no-matches?????' empty-dir
   [WARN ] *.fdsamifdsa: No files found for this input source
   [WARN ] no-matches?????: No files found for this input source
   [WARN ] non-existing/*: No files found for this input source
   [WARN ] empty-dir: No files found for this input source
  0/0 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links                                                                                                                                        🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors
```

this is implemented as a sneaky `log::warn!` which is probably
undesirable. however, the code isn't set up yet to properly pass errors
from the input resolving stage up to the top level.

even if it was, the "error on empty input" error should probably be put
behind a flag. it would also be tricky because the reporting expects a
ResolvedInputSource rather than InputSource, and these warnings arise in
the process of resolving. (maybe we could slightly wind back lycheeverse#1840 to
make this work)

anyway, much to think about.
thomas-zahner pushed a commit that referenced this pull request Oct 9, 2025
* feat: print warning if input source matches no files

it looks like this:
```
$ lychee 'non-existing/*' '*.fdsamifdsa' 'no-matches?????' empty-dir
   [WARN ] *.fdsamifdsa: No files found for this input source
   [WARN ] no-matches?????: No files found for this input source
   [WARN ] non-existing/*: No files found for this input source
   [WARN ] empty-dir: No files found for this input source
  0/0 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links                                                                                                                                        🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors
```

this is implemented as a sneaky `log::warn!` which is probably
undesirable. however, the code isn't set up yet to properly pass errors
from the input resolving stage up to the top level.

even if it was, the "error on empty input" error should probably be put
behind a flag. it would also be tricky because the reporting expects a
ResolvedInputSource rather than InputSource, and these warnings arise in
the process of resolving. (maybe we could slightly wind back #1840 to
make this work)

anyway, much to think about.
This was referenced Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants