[graveler] A single failed range and metarange operation can fail a commit or merge

[Observed by a customer]

## What happened

A [DNS error](https://isitdns.com/) on a major cloud provider brought down a large commit at a user site.  This could have been retried and would have succeeded.

User received this error.  A human can deduce that the issue is an unresponsive DNS (UDP port 53) that caused an operation to fail.  However a machine cannot, and has no idea what to do with an ISE.

```
[2023-10-99 99:99:99.999] {foo.py} ERROR - Internal Server Error: {"message":"commit: close writer ns=s3://AWS-BUCKET/ metarange id=fff555: failed closing metarange writer: sstable store (fff555): adapter put s3://AWS-BUCKET/ fff555: Put \"https://AWS-BUCKET.s3.amazonaws.com/_lakefs/fff555\": dial tcp: lookup AWS-BUCKET.s3.amazonaws.com on 172.31.99.99:53: read udp 10.0.99.99:48835-\u003e172.31.99.99:53: i/o timeout"}
```

## What we should do

We should either:
* Retry operations; or
* Give sufficient information on the API response to allow the caller to retry operation.

## Preferences

I prefer adding information to the API response: an optional response field (JSON body or header) as well as a readable message that user code can use to trigger a retry.  As a server, lakeFS has no idea how hard it should try vs. how quickly it should fail.  By passing the problem onto the user, we give them more information.  This could be handled automatically -- by wrapping expensive commit operations in a backoff -- or it could be handled manually -- by the user reading the message and deciding that it may be worth retrying this operation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[graveler] A single failed range and metarange operation can fail a commit or merge #6766

What happened

What we should do

Preferences

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[graveler] A single failed range and metarange operation can fail a commit or merge #6766

Description

What happened

What we should do

Preferences

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions