Skip to content

[lakeFSFS] Add retries and configurable timeouts to lakeFS API calls #5664

@arielshaqed

Description

@arielshaqed

As we did in the Spark metadata client for GC. If an API call times out and the exception leaks it can be really expensive on Spark! First the entire job is retried, this can cause partitions to be recomputed. And if it times out enough times the entire is aborted and all work is pretty much lost. Note that when lakeFS is under load it gets worse with more partitions rather than better :-/

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions