-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix light client synchronization on master #3301
Conversation
|
(will reopen once tests are fixed) |
|
After discussing this elsewhere we decided to go with the following approach of changing the struct Epoch {
/// ...
/// The authorities for this Epoch
pub authorities: Vec<AuthorityId>,
/// The weights for authorities of previous Epoch (which should be starting when this one is announced).
pub weights: Vec<BabeAuthorityWeight>,
} |
| /// Finalize a node in the tree and all its ancestors. The given function | ||
| /// `is_descendent_of` should return `true` if the second hash (target) is | ||
| // a descendent of the first hash (base). | ||
| pub fn finalize_with_ancestors<F, E>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very similar to what we do in prune method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. If I call prune(), for example, for block N in test storage, it leaves M node. If called for H, the H node is left in the storage. This isn't what expected from finalize_with_ancestors(). Probably it could be refactored to support both cases (or we could call prune() from within finalize_with_ancestors()), but I haven't found a straightforward way to do that.
|
So after another discussion, original patch (which stores authorities for the next session in the storage) has been applied. |
andresilva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
I believe I cleaned up all redundant changes. The only suspicious change is about preventing double |
andresilva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
depends on #3413
closes #3289 (please ignore comments there, because it is fixed in other, more appropriate way)
WARNING: it won't sync with FF, because it's genesis block is build in the incompatible (with this implementation) way. If we need a testnet that supports light clients, let's restart FF.
So it turned out that validators we're using at session0 and session1 are always the same. And we could use the same set of BABE authorities in both epoch#0 and epoch#1 (thanks to @andresilva for mentioning that). So the fix is simply to use the same data (except for epoch index) for first two epochs. But then there have been several issues and controversial decisions (in my impl) that I wanted to mention:
[Authority0, Authority1], but at epoch1 it is[Authority1, Authority0]. That is because session module mutates validators passed using config here. Since this mutation could also mutate storage like here, it is impossible to use it twice to reproduce the same mutation in the babe module (again - thanks to Andre for mentioning that). The decision is to use temporary in-memory storage as a way to pass mutated set of authorities from session module to babe module. This leads to three other issues that I've ran into (2, 3, 4);on_initialize()calls. And since session module from itson_initialize()calls BABE'on_initialize()AND session initialization now happens before BABE => BABE was incorrectly determined epoch boundaries. The solution was to explicitly call BABE'on_initialize()fromshould_end_session(). So now it could be called twice => I've added additional check so that it won't mutate the same data several times;OpaqueKeysassociated type to the babe module to be able to decode mutated set of session validators at the genesis block. The problem is that session doesn't know who's going to use that mutated set => it should encode all the available keys;MaybeSpanEpochinstead of simplyEpochin the consensus cache. There's specialGenesisentry in this enum that holds data for two epochs (epoch0 and epoch1). When we're dealing with this entry AND we have failed header verification using epoch0, then we retry with epoch1. This way we allow epoch0 headers verification on light clients;::AUTHORITIEScache from Babe, because it seems unnecessary. It was only used to inform Grandpa that it should emit/expect justification for imported block. But imo Babe epoch change isn't that different to Aura authorities change => Grandpa will consider if justification required for the block by checking if anything has been inserted into consensus cache (in Aura it'll be new authorities set, in Babe - new epoch data).