v1.18: Add in metrics for detecting Redundant Pulls (backport of #199) #251
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
We had previously added in a metric for tracking gossip push messages through the network in PR: #32725. However, this metric does not account for redundant pull requests.
Redundant Pull: A node receives a message via
PullResponseand then receives the same message viaPush.Redundant Pulls prevent us from accurately calculating how well messages are propagating via
Push.Summary of Changes
Add in a metric to report when we receive a
Pushfor a message we already (and first) received viaPullResponsenum_push_dupschanged tonum_push_recvand now tracks the number of times we have received a push message.cluster_info_crds_statscallednum_redundant_pull_responses. It counts the number of times a unique message is received via a Redundant PullIdentifying redundant Pulls:
PullRequestthat successfully updatescrds.table, set thenum_push_recvof this message to 0. Setnum_push_recvto 1 if it is aPushMessagePush, it will fail to insert. Since the already existing entry hasnum_push_recv == 0, we know this is a Redundant Pull.CrdsStats.num_redundant_pull_responsesCalculating fraction of messages received via Redundant Pull:
Monogon Simulation: % of redundant pulls in a 100 validator cluster
We see a mean Redundant Pull percentage of ~0.2%. But it does seem to increase with the number of validators
