Refactor replicate execution method to not iterate the origin bucket#2906
Refactor replicate execution method to not iterate the origin bucket#2906bwplotka merged 3 commits intothanos-io:masterfrom
Conversation
GiedriusS
left a comment
There was a problem hiding this comment.
Two comments but overall LGTM!
| metas, partials, err := rs.fetcher.Fetch(ctx) | ||
| if err != nil && metas == nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Perhaps we should log the error if metas != nil && err != nil?
There was a problem hiding this comment.
It is good to log the error here, but kind of duplicate.
I just check the code, only here returns metas and error at the same time. In this case, partials != nil, so we will log the partial meta errors later.
There was a problem hiding this comment.
hm the if err != nil && metas == nil { feels like code smell to me though. It just looks wrong from outside which can lead to bugs in future:
- Inside implementation can change, and metas can be produced without partial for some reason - then we will miss an error.
- Even if we say it makes sense, maybe another person will come here and refactor it anyway as it looks like error can be missed.
On other hand, shouldn't replicate FAIL on any error spotted? Even if we failed to read one block? Sometimes we heavily rely on replication status so this has to be heavily tested and controlled, ideally with failed metric being somewhere (:
|
CI failure is unrelated. Please take a look at this pr again |
eb9fe91 to
2d6d058
Compare
GiedriusS
left a comment
There was a problem hiding this comment.
AFAICT the CI failure seems related to the changes
af3a2a8 to
7eafd1c
Compare
Thanks, the previous problem was caused by the |
7eafd1c to
e5f0737
Compare
e5f0737 to
4b15ae5
Compare
|
Can I get a review for this? I think it is close. |
4b15ae5 to
f89065a
Compare
f89065a to
8d2e154
Compare
8d2e154 to
a75bef5
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Nice, some comments though, before merging (:
| metas, partials, err := rs.fetcher.Fetch(ctx) | ||
| if err != nil && metas == nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
hm the if err != nil && metas == nil { feels like code smell to me though. It just looks wrong from outside which can lead to bugs in future:
- Inside implementation can change, and metas can be produced without partial for some reason - then we will miss an error.
- Even if we say it makes sense, maybe another person will come here and refactor it anyway as it looks like error can be missed.
On other hand, shouldn't replicate FAIL on any error spotted? Even if we failed to read one block? Sometimes we heavily rely on replication status so this has to be heavily tested and controlled, ideally with failed metric being somewhere (:
pkg/replicate/scheme.go
Outdated
| return nil | ||
| } | ||
| for id, partialError := range partials { | ||
| level.Info(rs.logger).Log("msg", "failed to fetch block meta. Skipping.", "block_uuid", id.String(), "err", partialError) |
There was a problem hiding this comment.
Is this failure? This is actually valid state of blocks being potentially uploaded, no?
There was a problem hiding this comment.
Something like level.Info(rs.logger).Log("msg", "block meta not uploaded yet. Skipping.", "block_uuid", id.String()) r
There was a problem hiding this comment.
PartialError has 2 cases, one is meta.json not found, another one is meta.json corrupted. What about the corrupted case, is it an error or not?
For different obj store implementation, is it possible that one uploaded file is corrupted? If it is this case, how can we differentiate the case that one file is totally corrupted or still being uploaded?
There was a problem hiding this comment.
corrupted case, is it an error or not?
For current flow it's not error. We consider corrupted meta same as no meta.json as the reason is usually same - partial upload (:
ddc233c to
c8140be
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Responded to comment, that still needs to be addressed, otherwise LGTM!
Thanks!
pkg/replicate/scheme.go
Outdated
| return nil | ||
| } | ||
| for id, partialError := range partials { | ||
| level.Info(rs.logger).Log("msg", "failed to fetch block meta. Skipping.", "block_uuid", id.String(), "err", partialError) |
There was a problem hiding this comment.
corrupted case, is it an error or not?
For current flow it's not error. We consider corrupted meta same as no meta.json as the reason is usually same - partial upload (:
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
c8140be to
f1079be
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Thanks!
Changelog entry could be better, but we can fix it later.
|
|
||
| ### Changed | ||
|
|
||
| - [#2906](https://github.com/thanos-io/thanos/pull/2906) Tools: Refactor Bucket replicate execution. Removed all `thanos_replicate_origin_.*` metrics. |
There was a problem hiding this comment.
Technically not need to mention refactor, just metric changes (:
|
Whoop 🥳 🎊 I'll work on #2979. |
|
Nice work @yeya24 |
|
💪 |
…hanos-io#2906) * Refactor replicate execution Signed-off-by: Ben Ye <yb532204897@gmail.com> * return error when fetcher gets an error Signed-off-by: Ben Ye <yb532204897@gmail.com> * address feedback Signed-off-by: Ben Ye <yb532204897@gmail.com> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>

Signed-off-by: Ben Ye yb532204897@gmail.com
Changes
In the previous implementation, we are iterating the origin bucket, and each time, we use metaFetcher to fetch all meta files. But this seems not necessary, we can just fetch meta files once at the beginning.
Verification