Skip to content

Conversation

@MichaHoffmann
Copy link
Contributor

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Consider the hash of the full chunk in deduplication

Verification

Added unittest case

Supposed to fix #6661

@MichaHoffmann MichaHoffmann force-pushed the mhoffm-fix-dedup-heap-multiple-chunks branch 4 times, most recently from fd27b6b to 7d324ef Compare August 26, 2023 19:03
@MichaHoffmann MichaHoffmann force-pushed the mhoffm-fix-dedup-heap-multiple-chunks branch from afc144f to 4eda600 Compare August 26, 2023 19:24
@MichaHoffmann
Copy link
Contributor Author

The e2e tests work for me locally 🤔

}
_, _ = hasher.Write(buf)
}
aggrChunkHash := hasher.Sum64()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right, that this actually calculates a hash of hashes of all the fields?
I wonder how it differs from calculating the hash of all the fields data in terms of possible hash collisions...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the hash of field hashes ( mainly to reuse the field.Hash ).

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix itself LGTM but now that we know about this bug, the Hash field in *storepb.Chunk becomes kind of useless. Perhaps we could add a Hash field to storepb.AggrChunk in a separate PR before merging this? That hash field would encompass of the different fields. It's needed so that the querier wouldn't have to hash everything again 😄

@MichaHoffmann
Copy link
Contributor Author

The fix itself LGTM but now that we know about this bug, the Hash field in *storepb.Chunk becomes kind of useless. Perhaps we could add a Hash field to storepb.AggrChunk in a separate PR before merging this? That hash field would encompass of the different fields. It's needed so that the querier wouldn't have to hash everything again 😄

Storing hash in AggrChunk sounds correct! for backwards compatibility we need to do it in the querier for a bit right? I worry about the failing e2e test in CI that pass for me locally. Do you have an idea why that could be ?

@GiedriusS
Copy link
Member

The fix itself LGTM but now that we know about this bug, the Hash field in *storepb.Chunk becomes kind of useless. Perhaps we could add a Hash field to storepb.AggrChunk in a separate PR before merging this? That hash field would encompass of the different fields. It's needed so that the querier wouldn't have to hash everything again 😄

Storing hash in AggrChunk sounds correct! for backwards compatibility we need to do it in the querier for a bit right? I worry about the failing e2e test in CI that pass for me locally. Do you have an idea why that could be ?

I'll try to take a look but just adding a new Hash field shouldn't trigger them so I think we can do that change safely as it looks like we will need it either way

@MichaHoffmann MichaHoffmann force-pushed the mhoffm-fix-dedup-heap-multiple-chunks branch from 6d6c81d to 4d50e46 Compare August 28, 2023 18:30
@MichaHoffmann
Copy link
Contributor Author

Note: e2e tests worked for me previously because i didnt rebuild the docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

store: downsampled chunk deduplication bug in proxy_heap.go?

4 participants