Automatic Example Collator #67

coriolinus · 2020-02-21T13:21:57Z

This PR automates the checklist in the README, adding docker files and scripting sufficient to automatically run two relay chain nodes, a collator chain node, extract the genesis state and wasm file, use those files to register the parachain, and verify the collator is continuing to produce new blocks over time.

Demonstrated to produce blocks, but as of right now there's still trouble getting it to respond to external queries on its ports.

Also shrink the build context by excluding some extraneous data.

Also set default branch appropriately, and have the stop command clean itself up more thoroughly.

…llator - Exclude the docker/ directory from build context because we're never going to build recursively, and this prevents spurious cache misses - build the parachain collator in three stages. The build stage is discarded; the collator stage has a wrapper script to simplify generating the right bootnodes flags, and the default stage has just the binary in a small runtime. - build_collator.sh collects appropriate build flags for the dockerfile - inject_bootnodes.sh discovers the testnet node IDs and inserts them into the arguments list for cumulus-test-parachain-collator

- Ignore the scripts directory to reduce spurious cache misses. - Move inject_bootnodes.sh from the scripts directory into the root: It can't stay in the scripts directory, because that's ignored; I didn't want to invent _another_ top-level subdirectory for it. That decision could certainly be appealed, though. - Move docker-compose.yml, add dc.sh, modify *_collator.sh: by taking docker-compose.yml out of the root directory, we can further reduce cache misses. However, docker-compose normally has a strong expectation that docker-compose.yml exist in the project root; it takes a moderately complicated invocation to override that expectation. That override is encoded in dc.sh; the updates to the other scripts are just to use the override. The expectation as of now is that scripts/run_collator.sh runs both chain nodes and the collator, generates the genesis state into a volume with a transient container, and runs the collator as specified in the repo README. Upcoming work: Steps 5 and 6 from the readme.

The biggest change here is adding the testing_net network to the collator node's networks list. This lets it successfully connect to the alice and bob nodes, which in turn lets it get their node IDs, which was the blocker for a long time. Remove httpie in favor of curl: makes for a smaller docker image, and has fewer weird failure modes within docker. Unfortunately this doesn't yet actually connect to the relay chain nodes; that's the next area to figure out.

- Manually enumerate the set of source directories to copy when building. This bloats the cache a bit, but means that rebuilds on script changes don't bust that cache, which saves a _lot_ of time. - Un-.dockerignore the scripts directory; it's small and will no longer trigger cache misses. - Move inject_bootnodes.sh back into scripts directory for better organization. - inject_bootnodes.sh: use rpc port for rpc call and p2p port for generating the bootnode string. I'm not 100% sure this is correct, but upwards of 80% at least. - docker-compose.yml: reorganize the launch commands such that alice and bob still present the same external port mapping to the world, but within the docker-compose network, they both use the same (standard) p2p, rpc, and websocket ports. This makes life easier for inject_bootnodes.sh The collator node still doesn't actually connect, but I think this commit still represents real progress in that direction.

In the end, it was four characters: -- and two = signs in the launch arguments. They turn out to be critical characters for correct operation, though! Next up: automating step 5.

We can't just copy the blob in the builder stage because the volumes aren't available at that point. Rewrite build_collator.sh into build_docker.sh and update for generality.

…collator This is likely to be discarded; the Python library in use is 3rd party and not well documented, while the official polkadot-js repo has a CLI tool: https://github.com/polkadot-js/tools/tree/master/packages/api-cli

Doesn't work at the moment because it depends on two api-cli features which I added today, which have not yet made it out into a published release. Next up: figure out how to add the `api-cli` at its `master` branch, then run tests to ensure the collator is producing blocks. Then, automate the block production tests.

This is a really weird bug. After running `scripts/run_collector.sh`, which brings everything up, it's perfectly possible to get into a state very much like what the registrar is in, and communicate with the blockchain without issue: ```sh $ docker run --rm --net cumulus_testing_net para-reg:latest polkadot-js-api --ws ws://172.28.1.1:9944 query.sudo.key Thu 20 Feb 2020 12:19:20 PM CET { "key": "5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY" } ``` However, the registrar itself, doing the same thing from within `register_para.sh`, is failing to find the right place in the network: ``` /runtime/cumulus_test_parachain_runtime.compact.wasm found after 0 seconds /genesis/genesis-state found after 0 seconds 2020-02-20 10:43:22 API-WS: disconnected from ws://172.28.1.1:9944 code: '1006' reason: 'connection failed' _Event { type: 'error', isTrusted: false, _yaeti: true, target: W3CWebSocket { _listeners: {}, addEventListener: [Function: _addEventListener], removeEventListener: [Function: _removeEventListener], dispatchEvent: [Function: _dispatchEvent], _url: 'ws://172.28.1.1:9944', _readyState: 3, _protocol: undefined, _extensions: '', _bufferedAmount: 0, _binaryType: 'arraybuffer', _connection: undefined, _client: WebSocketClient { _events: [Object: null prototype] {}, _eventsCount: 0, _maxListeners: undefined, config: [Object], _req: null, protocols: [], origin: undefined, url: [Url], secure: false, base64nonce: 'aJ6J3pYDz8l5owVWHGbzHg==', [Symbol(kCapture)]: false }, onclose: [Function (anonymous)], onerror: [Function (anonymous)], onmessage: [Function (anonymous)], onopen: [Function (anonymous)] }, cancelable: true, stopImmediatePropagation: [Function (anonymous)] } ``` They should be connected to the same network, running the same image, doing the same call. The only difference is the file existence checks, which really shouldn't be affecting the network state at all. Pushing this commit to ask for outside opinions on it, because this is very weird and I clearly don't understand some part of what's happening.

The problem was that the registrar container was coming up too fast, so the Alice node wasn't yet ready to receive connections. Using a well-known wait script fixes the issue. Next up: verify that the collator is in fact building blocks.

It didn't take much! The biggest issue was that the genesis state was previously being double-encoded.

parity-cla-bot · 2020-02-21T13:22:00Z

It looks like @coriolinus signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

bkchr

CI integration comes in the next pr?

scripts/build_polkadot.sh

docker/docker-compose.yml

docker/test-parachain-collator.dockerfile

Co-Authored-By: Bastian Köcher <[email protected]>

Pro: future-proofing against the time we add or remove a directory Con: changing any file in the workspace busts Rust's build cache, which takes a long time.

yangmiok · 2020-09-03T07:24:41Z

This PR automates the checklist in the README, adding docker files and scripting sufficient to automatically run two relay chain nodes, a collator chain node, extract the genesis state and wasm file, use those files to register the parachain, and verify the collator is continuing to produce new blocks over time.

Is there a detailed description of the steps？

JoshOrndorff · 2020-09-03T14:11:24Z

@Zhangtianai There is a tutorial that takes you through spinning up the relay chain and parachains locally. substrate.dev/cumulus-workshop

…uction (paritytech#67)

coriolinus added 21 commits February 6, 2020 12:18

add polkadot build script

c0c35ba

Add scripting to bring up a simple alice-bob example net

32d9303

Demonstrated to produce blocks, but as of right now there's still trouble getting it to respond to external queries on its ports.

enable external rpc access to the nodes

35cd700

Also shrink the build context by excluding some extraneous data.

Ensure external RPC access works

cd19677

Also set default branch appropriately, and have the stop command clean itself up more thoroughly.

Merge remote-tracking branch 'origin/master' into prgn-collator-script

78a4e60

enable external websocket access to indexer nodes

d244115

Get the collator talking to the indexer nodes

e698987

In the end, it was four characters: -- and two = signs in the launch arguments. They turn out to be critical characters for correct operation, though! Next up: automating step 5.

Add runtime stage to collect runtime wasm blob into volume

8968c5a

We can't just copy the blob in the builder stage because the volumes aren't available at that point. Rewrite build_collator.sh into build_docker.sh and update for generality.

Fix broken parachain registrar

54e98d4

The problem was that the registrar container was coming up too fast, so the Alice node wasn't yet ready to receive connections. Using a well-known wait script fixes the issue. Next up: verify that the collator is in fact building blocks.

Merge remote-tracking branch 'origin/master' into prgn-collator-script

d2b0905

fixes which cause the collator to correctly produce new parachain blocks

ba59157

It didn't take much! The biggest issue was that the genesis state was previously being double-encoded.

add documentation for running the parachain automatically

d53f004

Add health check to collator

f260c00

minor scripting improvements

0ff7fb5

coriolinus requested a review from bkchr February 21, 2020 13:21

coriolinus self-assigned this Feb 21, 2020

bkchr approved these changes Feb 21, 2020

View reviewed changes

coriolinus and others added 2 commits February 21, 2020 16:01

Apply suggestions from code review

55d9701

Co-Authored-By: Bastian Köcher <[email protected]>

Docker: copy the whole workspace in one go

e202ffc

Pro: future-proofing against the time we add or remove a directory Con: changing any file in the workspace busts Rust's build cache, which takes a long time.

coriolinus merged commit 28ad999 into master Feb 21, 2020

coriolinus deleted the prgn-collator-script branch February 21, 2020 15:20

coriolinus mentioned this pull request Aug 21, 2020

Get v1 Rococo running on a localnet paritytech/polkadot#1620

Closed

5 tasks

Maharacha pushed a commit to Maharacha/cumulus that referenced this pull request May 10, 2023

[runtimes] minor pallet tweaks from statemine; fixes flaky block-prod…

56dd6fe

…uction (paritytech#67)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic Example Collator #67

Automatic Example Collator #67

Uh oh!

coriolinus commented Feb 21, 2020

Uh oh!

parity-cla-bot commented Feb 21, 2020

Uh oh!

bkchr left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yangmiok commented Sep 3, 2020

Uh oh!

JoshOrndorff commented Sep 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Automatic Example Collator #67

Automatic Example Collator #67

Uh oh!

Conversation

coriolinus commented Feb 21, 2020

Uh oh!

parity-cla-bot commented Feb 21, 2020

Uh oh!

bkchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yangmiok commented Sep 3, 2020

Uh oh!

JoshOrndorff commented Sep 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants