Add per-chain aggregate software/hardware telemetry #464

koute · 2022-03-31T11:11:09Z

This is a companion for the following PR: paritytech/substrate#11062

This adds a new tab in the telemetry GUI which will aggregate and display the software/hardware telemetry generated by substrate.

System information telemetry:

Hardware benchmark telemetry:

Obviously for actual production chains these tables will be more richly filled out; these screenshots were taken just for my local tiny testnet with 3~4 nodes.

This was roughly inspired by the Steam Hardware Survey.

There are two types of tables here:

A top-N table.

These will show the top N (currently top 10) most common values sorted according to how common they are. Those kinds of tables are currently: "Version", "Operating System", "Cpu Architecture", "CPU", "CPU Cores", "Linux Distribution" and "Linux Kernel".

So for example, if across all of the nodes there are 100 different CPU models being used only the top 10 most popular ones will be shown, with all of the other models summed into a single "Other" row at the end of the table.
Ordered tables.

These will show values ordered in a preset order, with a preset possible number of buckets. Those kinds of tables are currently: "Is Virtual Machine?", "Memory", "CPU Speed", "Memory Speed", "Disk Speed (sequential writes)" and "Disk Speed (random writes)"

So for example, for the CPU speed benchmark there are only a preset number of rows, e.g. "Less than 0.1x", "0.1x", "0.3x", ..., "0.7x", "Baseline", "1.1x", "1.3x", ..., "2.0x", "3.0x", ..., "At least 5.0x". Each node will get assigned to one of these buckets.

If a node didn't send a given metric it will be counted in a separate "Unknown" row at the end of the table. If a table has only the unknown row (e.g. because none of the nodes running on a given chain support a given metric) then the table won't be shown again.

This is compatible with the current nodes running in the wild. Without my substrate PR (paritytech/substrate#11062) there will be only these tables shown since that's what's already being sent in telemetry: "Version", "Operating System", "CPU Architecture".

This will most likely still require some tweaking. We might also consider adding some sort of filtering options (e.g. show only the stats for validator nodes), add the ability to download this data as a .csv or make it available through Grafana. Nevertheless this should be a solid MVP.

cc @emostov @ggwpez @paritytech/sdk-node

backend/telemetry_core/src/state/chain.rs

backend/telemetry_core/src/state/node.rs

frontend/src/components/Stats/Stats.css

jsdw · 2022-04-11T12:22:47Z

frontend/src/components/Stats/Stats.css

+.Stats-category {
+  text-align: left;
+  background-color: #fff;
+  margin-bottom: 2.5rem;


I'm not a huge fan of the thick black margin between each category, but I'm also not bothered enough to suggest an alternative! It'll be easier to see what it's like I think when it's deployed with real node data!

Yeah, I did it to make it easier to see where one category ends and another starts, but in general I fully expect that we might tweak this once we can see how it looks like with real data. (The number of entries in the tables will also probably need tweaking, etc.)

frontend/src/components/Stats/Stats.tsx

koute · 2022-04-12T12:25:09Z

@jsdw Thanks for the review!

I've applied all of your suggestions; I've also improved TS typing so that there aren't any any anymore.

Since I've split up some of the code into their own separate files you might want to review the newest changes commit-by-commit to make it easier to review (the commits which moved the code into their own files contain only changes necessary to facilitate the move and nothing else).

jsdw

Thanks; that's great, I'm happy with this now! When it's merged to master we can deploy in a staging env to give it a better workout (may need to prod to get the rococo nodes updated).

TarikGul

Read through all the past comments and suggestions, and it all looks great. Thanks for the contribution, curious to also see how it looks with live data :)

👍

koute · 2022-04-27T09:46:37Z

Thanks!

I took the liberty of merging it in. So I guess now it'd be a good idea to try and test it out in a staging environment? Who should I talk to to help me set that up? (:

TarikGul · 2022-04-27T13:59:21Z

@koute So it was actually deployed to staging already and passed https://gitlab.parity.io/parity/mirrors/substrate-telemetry/-/jobs/1526662. Each time a PR is merged into master it automatically gets deployed, but I dont actually know where to access it off the top of my head.

cc: @ArshamTeymouri Any idea what the address is?

Add per-chain aggregate software/hardware telemetry

ec61728

koute requested a review from a team as a code owner March 31, 2022 11:11

Fix tests' compilation

4ff0d52

koute mentioned this pull request Mar 31, 2022

Add new hardware and software metrics paritytech/substrate#11062

Merged