Fix OOM error in large parentchain syncs with sidechain feature. #1493

clangenb · 2023-11-15T02:31:36Z

The problem behind the OOM error was that we had synced the whole parentchain before we imported the blocks into the light client. This PR changes the behaviour to trigger the import of the parentchain blocks after each chunk in the sync_parentchain_and_import_blocks function. The behaviour of sync_parentchain has not been changed.

This is because it is either used by non-primary workers, which have only small syncs, as they receive a light-client-db from the primary worker at startup, or it is used in the teeracle/offchain worker modes, which directly import the parentchain blocks when they are synced.

However, naively fixing this lead to another suboptimal behaviour: During the initial sync of the parentchain, we always put 1000 blocks and their corresponding events into the queue, just to be popped immediately afterwards by a subsequent ffi-ecall. This was so inefficient that the syncing became very slow. Hence, I introduced an is_syncing flag into the sync_parentchain ecall, which bypasses the queuing process and directly imports it, this also made the trigger_parentchain_block_import ecall obsolete, thus and I removed it.

Closes sidechain oom & preserve db #1462.

Another minor fix is included that:

Fixes startup error: spam the loop to get parentchain blocks until it has finalized the sync target. #1465

…e arg to a borrow

# Conflicts: # service/src/parentchain_handler.rs

# Conflicts: # core/parentchain/block-import-dispatcher/src/lib.rs # core/parentchain/block-import-dispatcher/src/triggered_dispatcher.rs # service/src/main_impl.rs

brenzi

LGTM

I manually tested our privacy sidechain demo (two rococo-local parachains, one sidechain worker) with this and all works

brenzi · 2023-12-01T10:24:35Z

core/parentchain/block-import-dispatcher/src/triggered_dispatcher.rs

+				parentchain_id
+			);
+			self.block_importer
+				.import_parentchain_blocks(blocks, events)


I believe this simple is_syncing bool flag is not enough

It needs to be clear to other validateers what parentchain block is the first one which must be considered for indirect invocation (in SCV and OCW mode). therefore, that "first relevant parentchain block should be stored in the STF state. Most likely, we should define this as the block when the primary worker enclave got registered for the first time, because the shard state will only be written to once the primary worker is in sync and starts to process parentchain blocks. caveat: if the shardid differs from mrenclave and the shardid gets registered a lot later, we will be importing too many blocks - but I don't think this is critical

callin it "is_syncing" is misleading. we're always syncing. I think it should be called "is_initial_sync" or similar

for the initial sync, the events and indirect invocations shouldn't even be searched for and they shouldn't be validated using proofs to accelerate the process

I agree with all points. I think there was a reason why it has been implemented the way it is now, but I forgot, and I don't see a reason why this is currently like that.

Substrate introduced the term is_major_sync, which is true whenever a node is far behind the best block. This concept does not really apply in our case, as we will send the light-client db to an outdated worker. However, what I am trying to say is that we stay somewhat in the substrate-jargon if we use is_initial_sync.

wouldn't it be more descriptive to call the flag ignore_invocations?

hmmm, if this is the only thing that this flag is going to be used for yes, but I am not 100% if it is. Regardless, as long as we keep it like that, I agree. It could also be something like: verify_only.

Ah, we could also use this as some point maybe for non-authoring sidechain rpc-nodes, if we ever want to have them.

clangenb added 6 commits November 15, 2023 11:16

improve log levels

986d2d5

[service] sidechain_start_untrusted_rpc_serverdowngrade tokyo handl…

d84bcd1

…e arg to a borrow

[service/main] distinguish better between different setups

24522cd

[service/main] extract duplicate code

c2fac9f

[service/main] fix clippy

bc77a5b

[service/main] fix clippy with teeracle

ee4bf37

clangenb mentioned this pull request Nov 15, 2023

Fix OOM error in large parentchain syncs #1468

Closed

clangenb changed the title ~~Cl/fix sidechain oom~~ Fix OOM error in large parentchain syncs with sidechain feature. Nov 15, 2023

clangenb added A0-core Affects a core part A3-sidechain B1-releasenotes C3-medium 📣 Elevates a release containing this PR to "medium priority" E0-breaksnothing labels Nov 15, 2023

clangenb added 8 commits November 29, 2023 12:19

Merge branch 'master' into cl/fix-sidechain-oom

9c6abe8

# Conflicts: # service/src/parentchain_handler.rs

introduce is_syncing in sync_parentchain

3acdc2e

remove obsolete trigger_parentchain_block ffi

1ab60ee

remove unused import

fb3db5e

fix clippy

9c25d88

fix test compilation

d907646

Merge branch 'master' into cl/fix-sidechain-oom

4d65577

# Conflicts: # core/parentchain/block-import-dispatcher/src/lib.rs # core/parentchain/block-import-dispatcher/src/triggered_dispatcher.rs # service/src/main_impl.rs

[service/main] extract init_proxied_shard_vault

d49ec56

clangenb requested a review from brenzi November 30, 2023 00:21

Merge branch 'master' into cl/fix-sidechain-oom

54f378e

brenzi approved these changes Nov 30, 2023

View reviewed changes

brenzi merged commit 597c787 into master Nov 30, 2023

brenzi reviewed Dec 1, 2023

View reviewed changes

clangenb deleted the cl/fix-sidechain-oom branch December 18, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix OOM error in large parentchain syncs with sidechain feature. #1493

Fix OOM error in large parentchain syncs with sidechain feature. #1493

Uh oh!

clangenb commented Nov 15, 2023 •

edited

Loading

Uh oh!

brenzi left a comment

Uh oh!

brenzi Dec 1, 2023 •

edited

Loading

Uh oh!

clangenb Dec 2, 2023

Uh oh!

brenzi Dec 2, 2023

Uh oh!

clangenb Dec 3, 2023

Uh oh!

clangenb Dec 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix OOM error in large parentchain syncs with sidechain feature. #1493

Fix OOM error in large parentchain syncs with sidechain feature. #1493

Uh oh!

Conversation

clangenb commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brenzi left a comment

Choose a reason for hiding this comment

Uh oh!

brenzi Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clangenb Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

brenzi Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

clangenb Dec 3, 2023

Choose a reason for hiding this comment

Uh oh!

clangenb Dec 3, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clangenb commented Nov 15, 2023 •

edited

Loading

brenzi Dec 1, 2023 •

edited

Loading