Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@svyatonik
Copy link
Contributor

@svyatonik svyatonik commented Aug 5, 2019

depends on #3413

closes #3289 (please ignore comments there, because it is fixed in other, more appropriate way)
WARNING: it won't sync with FF, because it's genesis block is build in the incompatible (with this implementation) way. If we need a testnet that supports light clients, let's restart FF.

So it turned out that validators we're using at session0 and session1 are always the same. And we could use the same set of BABE authorities in both epoch#0 and epoch#1 (thanks to @andresilva for mentioning that). So the fix is simply to use the same data (except for epoch index) for first two epochs. But then there have been several issues and controversial decisions (in my impl) that I wanted to mention:

  1. in current implementation the set of authorities at epoch0 and epoch1 is the same, but order differs. I.e. at epoch0 it is [Authority0, Authority1], but at epoch1 it is [Authority1, Authority0]. That is because session module mutates validators passed using config here. Since this mutation could also mutate storage like here, it is impossible to use it twice to reproduce the same mutation in the babe module (again - thanks to Andre for mentioning that). The decision is to use temporary in-memory storage as a way to pass mutated set of authorities from session module to babe module. This leads to three other issues that I've ran into (2, 3, 4);
  2. in current implementation babe module is initialized before session module => we haven't yet computed mutated set when babe is initialized. The solution is to change order of initialization here. But then we have another issue (3);
  3. the order of modules initialization also determines the order of their on_initialize() calls. And since session module from its on_initialize() calls BABE' on_initialize() AND session initialization now happens before BABE => BABE was incorrectly determined epoch boundaries. The solution was to explicitly call BABE' on_initialize() from should_end_session(). So now it could be called twice => I've added additional check so that it won't mutate the same data several times;
  4. I had to add OpaqueKeys associated type to the babe module to be able to decode mutated set of session validators at the genesis block. The problem is that session doesn't know who's going to use that mutated set => it should encode all the available keys;
  5. core babe code now holds new enum MaybeSpanEpoch instead of simply Epoch in the consensus cache. There's special Genesis entry in this enum that holds data for two epochs (epoch0 and epoch1). When we're dealing with this entry AND we have failed header verification using epoch0, then we retry with epoch1. This way we allow epoch0 headers verification on light clients;
  6. I've also removed insertion to ::AUTHORITIES cache from Babe, because it seems unnecessary. It was only used to inform Grandpa that it should emit/expect justification for imported block. But imo Babe epoch change isn't that different to Aura authorities change => Grandpa will consider if justification required for the block by checking if anything has been inserted into consensus cache (in Aura it'll be new authorities set, in Babe - new epoch data).

@svyatonik svyatonik added A0-please_review Pull request needs code review. M4-core labels Aug 5, 2019
@svyatonik svyatonik added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Aug 5, 2019
@svyatonik
Copy link
Contributor Author

(will reopen once tests are fixed)

@svyatonik svyatonik closed this Aug 5, 2019
@svyatonik svyatonik reopened this Aug 5, 2019
@svyatonik svyatonik removed the A3-in_progress Pull request is in progress. No review needed at this stage. label Aug 5, 2019
@andresilva
Copy link
Contributor

After discussing this elsewhere we decided to go with the following approach of changing the Epoch digest:

struct Epoch {
	/// ...

	/// The authorities for this Epoch
	pub authorities: Vec<AuthorityId>,
	/// The weights for authorities of previous Epoch (which should be starting when this one is announced).
	pub weights: Vec<BabeAuthorityWeight>,
}

/// Finalize a node in the tree and all its ancestors. The given function
/// `is_descendent_of` should return `true` if the second hash (target) is
// a descendent of the first hash (base).
pub fn finalize_with_ancestors<F, E>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very similar to what we do in prune method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If I call prune(), for example, for block N in test storage, it leaves M node. If called for H, the H node is left in the storage. This isn't what expected from finalize_with_ancestors(). Probably it could be refactored to support both cases (or we could call prune() from within finalize_with_ancestors()), but I haven't found a straightforward way to do that.

@svyatonik
Copy link
Contributor Author

So after another discussion, original patch (which stores authorities for the next session in the storage) has been applied.

@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A4-gotissues labels Aug 15, 2019
Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@svyatonik svyatonik added A1-onice and removed A0-please_review Pull request needs code review. labels Aug 16, 2019
@svyatonik
Copy link
Contributor Author

I believe I cleaned up all redundant changes. The only suspicious change is about preventing double on_intialize() in session module, but it'll cleaned in #3380 anyway.

@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A1-onice labels Aug 16, 2019
Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gavofyork gavofyork merged commit 20ae487 into master Aug 16, 2019
@gavofyork gavofyork deleted the fix_flaming_fir_light_sync branch August 16, 2019 09:51
@gnunicorn gnunicorn modified the milestones: 2.1, Polkadot Mar 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Light client fails to sync flaming fir

8 participants