Skip to content

Conversation

@provokateurin
Copy link
Member

@provokateurin provokateurin commented Nov 15, 2024

Summary

On my personal instance I'm currently facing the problem that a folder shared from another instance is no longer available, as the server went down. Unfortunately the remote server doesn't terminate the connection instantly, but it only gets stopped by the client timeout.
This makes literally every request to my own Nextcloud server 10 seconds longer, as the timeout happens every time.

Decreasing the timeout would help a little bit, but then in other scenarios it might also lead to unintended consequences, so it isn't a good way to workaround the problem.
When the remote server has a problem we must also cache the result, but for a much shorter duration (e.g. 1h), just to prevent that every request tries to contact the remote server again.
Now we still check once in a while if the server is still there, but not on every request that uses the Filesystem which was slowing down everything dramatically.

Checklist

@provokateurin provokateurin added bug 3. to review Waiting for reviews labels Nov 15, 2024
@provokateurin provokateurin added this to the Nextcloud 31 milestone Nov 15, 2024
@provokateurin provokateurin requested review from a team, ArtificialOwl, nfebe and sorbaugh and removed request for a team November 15, 2024 13:27
@provokateurin
Copy link
Member Author

/backport to stable30

@provokateurin
Copy link
Member Author

/backport to stable29

@provokateurin
Copy link
Member Author

/backport to stable28

@ArtificialOwl
Copy link
Member

not a huge fan of caching for so long, might be interesting (and not too much of overkill) to have a background process that check the status of faulty remote instance on its own and reset faulty cache on success

@provokateurin
Copy link
Member Author

I can also reduce the error caching to 5m or even lower, but we need to prevent that this request occurs on every request which can lead to this kind of self DOS.

@provokateurin provokateurin force-pushed the fix/ocmdiscoveryservice/cache-errors branch from 746094e to cc8e69c Compare November 25, 2024 09:29
@skjnldsv
Copy link
Member

I now experienced the opposite. WHile working on that area, I got one of my instance pending an update. So returning a 503.
I couldn't understand why this was still throwing despite me having done the update.

I think some errors should be left out from caching.

@provokateurin
Copy link
Member Author

True, maybe it could be more precise. But the previous behavior was also completely unacceptable for end-users with the entire instance being super slow on every request.

@skjnldsv skjnldsv mentioned this pull request Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3. to review Waiting for reviews bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants