Skip to content

Conversation

@thomas-zahner
Copy link
Member

@thomas-zahner thomas-zahner commented Jun 11, 2025

Fixes #1433

We want to report redirects at the end of the link check process, the same way we do for example with --suggest. The output format markdown for example should then also contain a section with all listed redirects.

Summary bug

There is a category Redirected in the summary. However, redirects were never counted. This is fixed with 6b28e2b.
This means echo "https://http.codes/301 " | cargo run - --format detailed yielded

📝 Summary
---------------------
🔍 Total............1
✅ Successful.......1
⏳ Timeouts.........0
🔀 Redirected.......0
👻 Excluded.........0
❓ Unknown..........0
🚫 Errors...........0

now it yields

📝 Summary
---------------------
🔍 Total............1
✅ Successful.......0
⏳ Timeouts.........0
🔀 Redirected.......1
👻 Excluded.........0
❓ Unknown..........0
🚫 Errors...........0

Questions

  1. I think it makes sense to show a Redirects section in the end just as we currently do with Suggestions. But I think we should show them by default without introducing a new feature/flag. Tying the feature to --suggest wouldn't make sense in my opinion. Do you agree?
  2. As you can see in this PR I introduced a redirect_map. It is used to keep track of the redirects that lychee followed. This allows us to create the new summary section. (not yet implemented) The map is wrapped in an Arc<Mutex<_>>. Do you think this is okay? I probably could try to use channels instead. The reason for the Arc<Mutex<_>> is that redirect::Policy::custom takes Fn(Attempt) -> Action + Send + Sync + 'static as argument, so we can't mutate data in there. I also briefly considered getting rid of redirect::Policy::custom and implement our own mechanism. But that turns out to be way more effort then necessary.

@thomas-zahner thomas-zahner requested a review from mre June 11, 2025 13:53
@mre
Copy link
Member

mre commented Jun 11, 2025

Glad you're tackling this.

  1. I think it makes sense to show a Redirects section in the end just as we currently do with Suggestions. But I think we should show them by default without introducing a new feature/flag. Tying the feature to --suggest wouldn't make sense in my opinion. Do you agree?

Yeah, that's fine. Although the naming of the flag is a bit confusing then. Maybe --show-suggestions to make it more explicit? I know that it's another breaking change, but it might be worth it in the long run to change the naming now.

  1. As you can see in this PR I introduced a redirect_map. It is used to keep track of the redirects that lychee followed. This allows us to create the new summary section. (not yet implemented) The map is wrapped in an Arc<Mutex<_>>. Do you think this is okay? I probably could try to use channels instead. The reason for the Arc<Mutex<_>> is that redirect::Policy::custom takes Fn(Attempt) -> Action + Send + Sync + 'static as argument, so we can't mutate data in there. I also briefly considered using getting rid of redirect::Policy::custom and implement our own mechanism. But that turns out to be way more effort then necessary.

Looking at planned features (like per-host rate-limiting and recursion), I believe we'll end up adding more metadata to requests and responses in the future anyway.

The question is if we should start with this PR already and instead of a redirect-map, we add a redirects field, which is a Vec<Url> or so. It could serve as a sort of "audit log" for a request, and my hope is that it fits in nicely with the request chain. There might be constraints I'm not aware of yet, though.

@thomas-zahner
Copy link
Member Author

Sure, no problem. Thanks for the idea in your second point, I'll take a look at it.

Maybe --show-suggestions to make it more explicit?

So you think we should rename the feature flag even though we don't change its functionality?

@mre
Copy link
Member

mre commented Jun 11, 2025

I think we should probably separate redirect tracking from the existing --suggest flag, though I'm finding it tricky to land on a definitive answer here. The core issue seems to be that --suggest was designed for "show alternative URLs you should use instead" but redirect tracking feels different - it's more about "what actually happened during requests."

I initially thought redirects could be considered suggestions since you might want to update links to point directly to the final destination, but the reality is more nuanced. Many redirects are intentional and shouldn't be "fixed" - URL shorteners, CDN routing, domain migrations with proper redirects, etc. Treating every redirect as a "suggestion to fix" might create noise and dilute the actionable nature of actual suggestions. If we at some point introduce an --apply flag, we can no longer separate between fixing broken links and fixing redirects, which are might not be an issue.

The use cases feel somewhat different too. Suggestions tend to answer "what's broken and needs fixing" while redirects answer "what path did this request take." Broken links generally force action - your site is broken until you fix them. Redirects are more about inefficiency - your site works fine, you're just making extra HTTP requests. Perhaps some users are also just curious about the redirect chain (similar to traceroute), but don't care about archive links?

I'm leaning toward keeping --suggest focused on actionable fixes for broken links and adding another flag, maybe --redirects, for redirect tracking. This would give users clearer expectations about what each flag does and let people choose exactly the information they need. But I can see arguments for keeping them together too - it's not as clear-cut as I would like it to be.

@thomas-zahner
Copy link
Member Author

Yes I agree with that. That was basically my initial proposal, maybe you misunderstood me.

Tying the feature to --suggest wouldn't make sense in my opinion. Do you agree?

--suggest can also be slow or unreliable which is another reason not to tie the features together into a single CLI flag.

But I think we should show them by default without introducing a new feature/flag.

What do you think about this? I'm also okay with your proposal of a new flag called --redirects. But maybe we want to always show the redirects, only depending on the verbosity, output format and mode.

@mre
Copy link
Member

mre commented Jun 12, 2025

What do you think about this? I'm also okay with your proposal of a new flag called --redirects. But maybe we want to always show the redirects, only depending on the verbosity, output format and mode.

Yes! Good idea to show them by default. Or alternative with --verbose? But by default is fine of course.

@mre
Copy link
Member

mre commented Jun 12, 2025

Yes I agree with that. That was basically my initial proposal, maybe you misunderstood me.

Yes, probably. Sorry about that.

@thomas-zahner
Copy link
Member Author

The question is if we should start with this PR already and instead of a redirect-map, we add a redirects field, which is a Vec or so. It could serve as a sort of "audit log" for a request, and my hope is that it fits in nicely with the request chain. There might be constraints I'm not aware of yet, though.

Yes, it probably makes sense to do that at one point. However, this will not resolve the Arc<Mutex<_>> situation these are two separate concerns. The problem is that we build a single reqwest Client (which makes sense) for all requests and pass a single redirect::Policy argument for handling redirects. Given this situation we need to somehow keep track of the redirects, which I did with Arc<Mutex<_>> for now. I could now try to use another approach, such as channels. The only other alternative I see, is to not use reqwest's redirection handling and implement our own mechanism. (your suggestion might also assume this) But looking at the code I think we should try to use reqwest's battle tested redirection handling as it would be too much (unnecessary) effort to implement our own mechanisms.

@mre
Copy link
Member

mre commented Jun 20, 2025

Makes sense, yeah; reqwest::redirect::Policy is pretty nice.

The draft looks good.
Perhaps we would like to introduce an actual type for tracking the redirects? How about this?

use std::{collections::HashMap, sync::{Arc, Mutex}};
use url::Url;

#[derive(Debug, Clone)]
pub struct RedirectTracker(Arc<Mutex<HashMap<Url, Url>>>);

impl RedirectTracker {
    pub fn new() -> Self {
        Self(Arc::new(Mutex::new(HashMap::new())))
    }

    pub fn record_redirect(&self, original: Url, resolved: Url) {
        if let Ok(mut map) = self.0.lock() {
            map.insert(original, resolved);
        }
    }

    pub fn get_resolved(&self, original: &Url) -> Option<Url> {
        self.0.lock().ok()?.get(original).cloned()
    }

    pub fn all_redirects(&self) -> HashMap<Url, Url> {
        self.0.lock().unwrap_or_else(|_| {
            // Handle poisoned mutex gracefully
            HashMap::new()
        }).clone()
    }

    pub fn clear(&self) {
        if let Ok(mut map) = self.0.lock() {
            map.clear();
        }
    }
}

impl Default for RedirectTracker {
    fn default() -> Self {
        Self::new()
    }
}

I also thought of a simple type alias, but I believe this is a bit more declarative and encapsulates the actual tracking.
However, I don't want us to lose the ability to refactor the code later, so feel free to keep it simple.

@thomas-zahner
Copy link
Member Author

@mre Thanks, I like the RedirectTracker type 👍

@thomas-zahner thomas-zahner force-pushed the redirect-reporting branch 2 times, most recently from 59d8919 to 7513537 Compare July 4, 2025 14:45
@thomas-zahner
Copy link
Member Author

FYI

In 7513537 I removed Status::Redirected. It was not used anyways and conceptually it should not exist. Status represents the final state after having checked a link. So after following redirects the status should resolve normally.

When the redirect limit is reached the resulting status should be treated just like any normal status code. Depending on the accept config this will be either Error(ErrorKind::RejectedStatusCode(3XX)) or Ok(3XX) when --accept is configerd as such.

This is the best and simplest approach in my opinion because of the configurability with --accept. I've also removed ErrorKind::TooManyRedirects because it's not used. It was only ever produced by converting from a reqwest::Error when e.is_redirect. But this only happens when something really is wrong with HTTP redirection (e.g. with the headers),
so converting that into ErrorKind::NetworkRequest(e) makes more sense. If we didn't overwrite the redirection policy and used the default policy than it migh have made sense to keep it. But this is no longer the case and the behaviour is covered with the new test test_max_redirects.

@mre
Copy link
Member

mre commented Jul 4, 2025

Fully agree. Great explanation. 👌

@thomas-zahner thomas-zahner marked this pull request as ready for review September 4, 2025 13:51
@thomas-zahner
Copy link
Member Author

@mre Thanks for taking a look already! I've just now finished some final changes and the PR is now in a state I'm happy with, apart from this one other comment. Will reply to your comments later.

@thomas-zahner
Copy link
Member Author

In 7513537 I removed Status::Redirected. It was not used anyways and conceptually it should not exist.

I've reconsidered this statement and realised that this is not the path we should follow. I've reverted mentioned commit, so that Status::Redirected still exists. It is true, that previously Status::Redirected was useless because a link check could never resolve to that status.

Summary

Updated default behaviour

echo 'http://google.com/jobs' | lychee -

🔍 1 Total (in 0s) ✅ 1 OK 🚫 0 Errors

Now we list the Redirects category separately. It's no longer OK, however lychee still exits with 0.
This makes it much easier for users to realise that their links might potentially be outdated.

echo 'http://google.com/jobs' | cargo run -

🔍 1 Total (in 0s) ✅ 0 OK 🚫 0 Errors 🔀 1 Redirects

Max redirects

The following lead to a hard error.

echo 'http://google.com/jobs' | lychee - --max-redirects 0

Issues found in 1 input. Find details below.

[stdin]:
   [ERROR] http://google.com/jobs | Too many redirects: error following redirect for url (http://google.com/jobs)

🔍 1 Total (in 0s) ✅ 0 OK 🚫 1 Error

Now it does no longer result in a "hard" error, but the link is just no longer followed so that the result will normally be a 3XX status code.

echo 'http://google.com/jobs' | cargo run - --max-redirects 0

Issues found in 1 input. Find details below.

[stdin]:
     [301] http://google.com/jobs | Rejected status code (this depends on your "accept" configuration): Moved Permanently

🔍 1 Total (in 0s) ✅ 0 OK 🚫 1 Error

Detailed output and JSON

Previously it was impossible to discern redirects.

echo 'http://google.com/jobs' | lychee - --format json --verbose

     [200] http://google.com/jobs

{
  "total": 1,
  "successful": 1,
  "unknown": 0,
  "unsupported": 0,
  "timeouts": 0,
  "redirects": 0,
  "excludes": 0,
  "errors": 0,
  "cached": 0,
  "success_map": {
    "stdin": [
      {
        "url": "http://google.com/jobs",
        "status": {
          "text": "200 OK",
          "code": 200
        }
      }
    ]
  },
  "error_map": {},
  "suggestion_map": {},
  "excluded_map": {},
  "duration_secs": 0,
  "detailed_stats": true
}

Now it is and users get new useful information.

echo 'http://google.com/jobs' | cargo run - --format json --verbose

     [200] http://google.com/jobs | Redirect: Followed 2 redirects resulting in OK.

{
  "total": 1,
  "successful": 0,
  "unknown": 0,
  "unsupported": 0,
  "timeouts": 0,
  "redirects": 1,
  "excludes": 0,
  "errors": 0,
  "cached": 0,
  "success_map": {},
  "error_map": {},
  "suggestion_map": {},
  "redirect_map": {
    "stdin": [
      {
        "url": "http://google.com/jobs",
        "status": {
          "text": "Redirect",
          "code": 200,
          "redirects": [
            "http://google.com/jobs",
            "https://www.google.com/jobs",
            "https://jobs.google.com/about/"
          ]
        }
      }
    ]
  },
  "excluded_map": {},
  "duration_secs": 0,
  "detailed_stats": true
}

@thomas-zahner thomas-zahner requested a review from mre September 5, 2025 13:28
@mre
Copy link
Member

mre commented Sep 9, 2025

[301] http://google.com/jobs | Rejected status code (this depends on your "accept" configuration): Moved Permanently

This error message sounds a bit misleading to me given that changing the accept configuration would not improve the situation. Maybe we should change it to "Too many redirects"?

Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of switching to macros? Or was it necessary because of the changes?

pub(crate) suggestion_map: HashMap<InputSource, HashSet<Suggestion>>,
/// Map to store excluded responses (if `detailed_stats` is enabled)
/// Store redirected responses with their redirection list (if `detailed_stats` is enabled)
pub(crate) redirect_map: HashMap<InputSource, HashSet<ResponseBody>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really store the redirection list here as the comment suggests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see my previous comment section Detailed output and JSON. It's the new redirect_map field. I've added it for a more detailed JSON output. Maybe with their redirection list was a bit misleading? I've removed it now.

..
} = request.try_into()?;

// Allow filtering based on element and attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was dead code but can we replace it with a TODO? I think we still want to do that. We could even create an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, creating an issue for that definitely makes sense. Could you do that? Because I'm not quite sure what the comment and disabled code tried to convey. To me, the comment does not seem to relate to the code below.

Is the comment requesting a feature like: Allow filtering based on HTML elements and HTML attributes? If so, the comment is in the wrong place.

@thomas-zahner
Copy link
Member Author

thomas-zahner commented Sep 11, 2025

[301] http://google.com/jobs | Rejected status code (this depends on your "accept" configuration): Moved Permanently

This error message sounds a bit misleading to me given that changing the accept configuration would not improve the situation. Maybe we should change it to "Too many redirects"?

@mre That was the behaviour before this PR, it always resulted in Too many redirects. Now as we simply stop following further redirects when the limit is reached. Providing --accept 301 actually does make the error go away as indicated by the error message. So I think the error message is perfectly accurate.

What's the benefit of switching to macros? Or was it necessary because of the changes?

The macros mock_server and redirecting_mock_server previously were two duplicated functions in the tests of lychee-lib and lychee-bin. Now we share the single definition. mock_server was a macro already. I then ported all functions from lychee-lib/src/test_utils.rs to the new test-utils crate to make them reusable across both crates. I chose to make the functions macros because otherwise we have dependency cycles which is rightfully prevented by the compiler. For example get_mock_client_response uses ClientBuilder. This means test-utils depends on lychee-lib if it weren't a macro. For testing we add test-utils as dev-dependency to lychee-lib. This creates an unresolvable cycle. I've now update the doc comment to explain it.

Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 👌

@mre mre merged commit 83565bb into lycheeverse:master Sep 11, 2025
6 checks passed
@mre mre mentioned this pull request Sep 11, 2025
This was referenced Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show redirected URLs with --suggest

2 participants