Skip to content

Conversation

jgallagher
Copy link
Contributor

This should prevent #7299 from occurring on new invocations of RSS; I'll add a followup PR (with more testing) that adds the CHECK constraint mentioned in that issue and a migration that does cleanup of any bad dataset address rows.

While I was in here, fixed #7115. It was smaller than I thought; should've done that a while ago.

let mut datasets = BTreeMap::new();
for d in sled_config.datasets.datasets.values() {
// Only the "Crucible" dataset needs to know the address
let address = sled_config.zones.iter().find_map(|z| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to how we determine address is the actual bugfix. Everything else in the PR is just refactoring machinery or test stuff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding: the essence of the fix is the new if *d.name.kind() == DatasetKind::Crucible check, right? Thus, we'll definitely never try to set the address of anything other than a Crucible dataset. But further, when we later find a Crucible zone on the same pool, we know it's the zone for the same dataset, not some other zone (which was another problem here before).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Previously, this was: if there is a Crucible zone on the same zpool as this dataset (which should always be true, since we create a Crucible zone on every zpool), then set the address to the address of that Crucible zone. Now, it's: if this is a Crucible dataset, find the Crucible zone on this dataset's zpool and use its address (or bail! if we couldn't find it, which should never happen); if this is not a Crucible dataset, always provide None.

}

#[test]
fn only_crucible_datasets_have_addresses() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good.

I also wonder if blippy should check this and we should have a test here that runs the RSS blueprint through blippy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Added the check to blippy in dabf6d4 and reworked this test in 9283a4a. If I revert the address fix, this test fails as expected:

blippy report for blueprint f132be37-704b-4abb-9909-7b9ef21d681e: 77 notes
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Cockroach 00b274ad-b10a-4cb7-a665-aa618655d215) has an address: [fd00:1122:3344:102::9]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Debug 0900ff91-91b7-4f6e-9fb6-68523453f690) has an address: [fd00:1122:3344:102::9]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_ba91f10a-922e-4537-83d3-32d00ff87962" } 112ba78d-3e85-41cd-b2b4-a6049d786a5a) has an address: [fd00:1122:3344:102::9]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_74f3c7d3-a74f-4e15-a53e-46ba3c787b1a" } 153aa310-1ff9-4fe9-bf81-8dca4be0f251) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Cockroach 17f26a1f-56a8-4f2d-a4a7-100a4dfb0820) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZoneRoot 2723669c-8b65-4db5-9ee2-a9583fe4f5b9) has an address: [fd00:1122:3344:102::a]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_7ef01de0-7f9a-40eb-b1f6-78b12d82e090" } 3b038a4d-02d9-452a-ba18-33f35f373061) has an address: [fd00:1122:3344:102::c]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Debug 408d9ed9-efb2-4413-b215-df70b9808b63) has an address: [fd00:1122:3344:102::a]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZoneRoot 4a0e404e-6a1c-42f3-b573-a370f6659952) has an address: [fd00:1122:3344:102::9]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZoneRoot 6bbe366a-5179-467d-856f-ae6f9f0b89a7) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_cockroachdb_920e6b36-778d-4c6d-969c-3a15adeccfb2" } 76c329ef-74af-4466-b870-b7047fec4fae) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Debug 7ee7d8fd-074d-4ee5-af35-89234f53c58b) has an address: [fd00:1122:3344:102::b]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZoneRoot 85b1b231-7095-44da-a15b-b42190b159ec) has an address: [fd00:1122:3344:102::b]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_internal_dns_0f81c906-5837-46a0-9783-0aa017985f49" } a38f9842-446d-48d2-89c8-7332507e85cd) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_pantry_f30b3acd-1880-4ad4-b302-f29ace8701bf" } a4d0876b-ef12-4e06-9023-18324263fecb) has an address: [fd00:1122:3344:102::a]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZoneRoot ab51ae7b-9109-430d-9811-65fcdd8b7239) has an address: [fd00:1122:3344:102::c]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_nexus_dee1f67f-ae8d-4fda-bf16-65617e8a7d56" } c2042440-8b9a-4e11-807b-9560a7f4544c) has an address: [fd00:1122:3344:102::9]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (InternalDns c5405bc9-d7e3-4023-85fc-221e532a9644) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_43342b20-07e9-4227-996d-8c764b647095" } c6f8c19f-0d3a-418b-8d7c-e2187fa214e9) has an address: [fd00:1122:3344:102::b]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Debug cf9ba08b-1de2-44a8-aaa2-0ecc341666c3) has an address: [fd00:1122:3344:102::c]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_ntp_09261117-8013-414f-bf02-9a8297bb5143" } d443dac7-9185-48c3-a70b-2675e0a6df68) has an address: [fd00:1122:3344:102::c]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_oximeter_ab58ca7e-b889-48d9-9286-6d8fd75607ed" } e7cc1d74-2717-4e27-b971-430dedee3ba1) has an address: [fd00:1122:3344:102::c]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_e78618ec-511f-482c-96ab-34bc614c366f" } f6987d50-0970-485b-ba23-1948225f4eae) has an address: [fd00:1122:3344:102::a]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_cockroachdb_73466355-e404-47a5-b50d-e61d0774a5de" } f90b3592-88b8-440e-abf8-8889e6bfe974) has an address: [fd00:1122:3344:102::9]:32345 (only Crucible datasets should have addresses)
  sled a74bc74d-b7eb-4172-9887-de2e0cd393c4: FATAL note: non-Crucible dataset (Debug fca5e24f-6334-4e4e-a2fa-19f656310c52) has an address: [fd00:1122:3344:102::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_ntp_bee306ef-ab29-4301-8570-fe1f79283d10" } 0c270e5c-c934-47c0-b5ff-8fa16c32b091) has an address: [fd00:1122:3344:101::c]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_bfac5ba3-9fd4-4e0d-ba19-1b46e7a59352" } 11b1470c-cae1-4de2-a255-0b55b72679df) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_f9df425f-ef23-4668-8b73-8898f630efd8" } 15d477da-58e4-41e3-bb48-6c32c8718f38) has an address: [fd00:1122:3344:101::a]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_pantry_60289a43-6d95-48f6-803b-eb02fcbb121d" } 26c96f6f-f91f-4a8a-9a0b-7e629852cc8a) has an address: [fd00:1122:3344:101::c]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_46949634-e5a2-4f07-b4b8-111f48bea916" } 30dcdd5a-6cbf-4a04-9ce4-b97d1e2c8c23) has an address: [fd00:1122:3344:101::c]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_cc472ff6-b6eb-43a2-962b-533a2e29d0cf" } 3170c516-ecb7-4d3b-9d34-396c31ae8bd7) has an address: [fd00:1122:3344:101::9]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (InternalDns 34b94a26-4597-464c-90ea-fc643c414ca2) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZoneRoot 48c08afc-5e2a-4d82-956a-f5fa8b966c2e) has an address: [fd00:1122:3344:101::a]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_nexus_e25c1efa-9c17-4ade-a47c-3711a2823c52" } 62e317b9-5ce7-4712-afba-12efe71b068a) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_cockroachdb_01bbe830-8672-480b-92b9-aa9eec55f82f" } 7d356d78-bdef-4c0e-978e-97358b49dd6e) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Debug 87863294-5d4e-4080-827b-b8c3297c16ae) has an address: [fd00:1122:3344:101::a]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (ExternalDns 927a54f8-c53b-47ce-967a-ac6a8408b251) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_cockroachdb_a87059c8-8233-4372-971f-7d0e938b7578" } 92911e44-23e2-4adf-afe5-d461cef3a4d5) has an address: [fd00:1122:3344:101::9]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZoneRoot 95649f0a-ffb8-40bf-8648-aafac489d9b4) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_60ce7e72-cc05-493a-b125-0c940ea46e43" } 9ea290a2-7277-48c1-9a99-ad55e7828461) has an address: [fd00:1122:3344:101::b]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Debug a631f7a9-21fc-4323-8885-edef63979f3b) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZoneRoot a7709f1d-14be-4d6f-9e90-7655e658668b) has an address: [fd00:1122:3344:101::9]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZoneRoot a9ad84c0-80d0-429c-bc77-b85a6a28cf15) has an address: [fd00:1122:3344:101::b]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Cockroach ac51fbae-336c-41f2-9f5a-4e597b4ddc18) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Debug b0575ad9-ed37-49fc-9a13-3b780f49dbe1) has an address: [fd00:1122:3344:101::c]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZoneRoot bfe85ef7-863d-4310-88a0-079876f675ad) has an address: [fd00:1122:3344:101::c]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_external_dns_454abc91-c201-46b6-9a66-f6cf5742f3a0" } d9480eb0-da92-4512-8525-913b2e477160) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Cockroach e48af89e-ccfa-4522-956b-a3fcdd527e68) has an address: [fd00:1122:3344:101::9]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Debug e4dc72dd-2962-40fd-946b-ad85d23651dc) has an address: [fd00:1122:3344:101::b]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (Debug e4e82575-198f-4825-a148-9deaead1ba2b) has an address: [fd00:1122:3344:101::9]:32345 (only Crucible datasets should have addresses)
  sled b7f401d0-3f61-41a6-a8a9-131ea006eefd: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_internal_dns_3281c065-bc0f-4f07-8045-461f8a42ccd7" } ef45078c-a158-449f-b574-5c547c698f05) has an address: [fd00:1122:3344:101::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Debug 0b30ff3e-eb12-410d-bb28-48bdf2125f89) has an address: [fd00:1122:3344:103::9]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (ExternalDns 15c17ad4-2239-4a47-abb5-f2a3a6125a24) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Clickhouse 1dd6bd3b-1307-4b97-bcc3-2554c433a3e2) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_33dbcc43-e514-4b43-9d50-3c77b6985fe5" } 243ba625-4803-4826-a34e-c707f8dd9b3a) has an address: [fd00:1122:3344:103::9]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZoneRoot 24b235ef-7a0f-4ea2-86c4-ba6dd98fed4d) has an address: [fd00:1122:3344:103::a]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Debug 2c8c08ba-4fd3-4a39-b371-d2db631b507e) has an address: [fd00:1122:3344:103::b]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_clickhouse_8fc54d94-22b8-428d-9408-ac4913c99e08" } 41a3d90f-61c5-40f0-a956-17abe935006a) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_3470f0cd-ebee-4701-88ae-e90c244f4449" } 4e253672-4ec4-4388-9abd-2e4d3488e7ed) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_pantry_42913bc8-76c8-4ac5-a990-b107e275413d" } 65a1be78-122d-48d4-b175-4789b3154702) has an address: [fd00:1122:3344:103::c]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (InternalDns 6a50ad6a-2da9-42a7-9de6-4d9a9721669f) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Cockroach 79d6e9cd-cf95-4581-9735-db8c7effe4a1) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_cockroachdb_350b225f-5c5e-435b-ba7b-1ad77a10d058" } 7de59cd3-30ec-4699-a1bf-65241c1b42ef) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_d05d8d3c-5dea-46bb-bf7b-21a7f85492c5" } 894fabe8-1de6-483c-9042-d92757281088) has an address: [fd00:1122:3344:103::c]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_external_dns_30ca1b46-a8ff-4a0e-b367-671a2f76ec89" } a65ead7e-5dc2-45d7-a72b-7e78c68b3ec9) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Debug a6d53ff9-1c29-4fdd-845d-ea1ce05757ae) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZoneRoot a7563bc4-a6fe-4892-9e94-0e1fd5611ea8) has an address: [fd00:1122:3344:103::c]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Debug b4c0869c-6949-458e-bd44-1d326262de7e) has an address: [fd00:1122:3344:103::a]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_ntp_8f88f820-da03-445b-a7fc-2d43c3d45736" } b6279fa5-3dee-4e2b-957d-7a662d4cf00d) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_878c9489-fb0f-4d72-a4d8-7ac554f0d26a" } bf2aa708-ea27-4c6e-b656-ee0cbe14c1c2) has an address: [fd00:1122:3344:103::b]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_crucible_2210ce5a-865f-4e91-ac6c-e42a4e07e996" } c01134bc-7e2e-459b-b729-a1a54ad92678) has an address: [fd00:1122:3344:103::a]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZoneRoot cce94596-2688-47d6-9c2b-ead379481cda) has an address: [fd00:1122:3344:103::b]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_internal_dns_9ffb59d2-4ab4-4364-b82c-4e3bdf70e5f5" } e4cdf4a0-feb7-4dd9-a18c-cf6355bd062d) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZoneRoot e6a1e900-8d3e-49c3-8e5a-dc2f47793f65) has an address: [fd00:1122:3344:103::8]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (Debug e6ada2d4-e2c8-4d51-ba86-1134dd659c7d) has an address: [fd00:1122:3344:103::c]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZoneRoot ec2c9353-837d-4bcf-8315-88a179ded392) has an address: [fd00:1122:3344:103::9]:32345 (only Crucible datasets should have addresses)
  sled c2f594e4-e093-4d27-b8c1-d3bbbde8ef0c: FATAL note: non-Crucible dataset (TransientZone { name: "oxz_nexus_1863fab5-738c-4ac4-a198-5994005ce084" } f79ac290-5416-4998-8825-6ade31fc2cb5) has an address: [fd00:1122:3344:103::9]:32345 (only Crucible datasets should have addresses)

thread 'rack_setup::service::test::rss_blueprint_is_blippy_clean' panicked at sled-agent/src/rack_setup/service.rs:1991:13:
RSS blueprint should have no blippy notes
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

let mut datasets = BTreeMap::new();
for d in sled_config.datasets.datasets.values() {
// Only the "Crucible" dataset needs to know the address
let address = sled_config.zones.iter().find_map(|z| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding: the essence of the fix is the new if *d.name.kind() == DatasetKind::Crucible check, right? Thus, we'll definitely never try to set the address of anything other than a Crucible dataset. But further, when we later find a Crucible zone on the same pool, we know it's the zone for the same dataset, not some other zone (which was another problem here before).

@jgallagher
Copy link
Contributor Author

I had to add a third fake sled in 5acca22 to avoid test flakes that looked like this:

blippy report for blueprint a0907961-221e-4580-9ea9-203739fab190: 1 note
  sled 0886a108-a9db-4051-b5b7-2b863a153ce4: FATAL note: zpool oxp_96b84e97-2e61-4502-9011-8b15e5c7bd62 has two zone filesystems of the same kind (CruciblePantry 276ea0ef-aa46-4cd6-b1fa-c1aadd01c5c0 and CruciblePantry 5ddf375a-edef-4efb-87e7-c08eaf8537fc)

(I saw similar failures with kind Nexus instead of CruciblePantry.)

I believe the cause here is:

  • RSS cycles through sleds when placing services. With only 2 sleds and a redundancy of 3 for both Nexus, and the pantry, one sled has to run 2 instances of Nexus / Pantry.
  • When choosing the filesystem zpool for Nexus, Pantry, Oximeter, and NTP, RSS randomly selects a zpool. This does not consider whether we might have already placed a zone of a given kind on that zpool, so has a chance to double up, which blippy complains about.

Technically blippy is overly-conservative here; unlike durable datasets, transient zone root datasets for zones of the same kind do not have name collisions, because their name includes the ID of the zone. So from sled-agent's point of view, it's okay to have two transient zone roots on the same zpool. I'm inclined to leave the check in blippy, at least for now, because it seems more understandable to say "at most 1 dataset of a given kind on a given zpool"? That's a weak argument, though.

We could change RSS to take prior placement into account to avoid this, but it's a little messy, and we would get it for free with #5272 which we do still want to do. Given this only shows up if running RSS on a 1- or 2-sled system, @davepacheco suggested the straightforward fix of making the test use 3 sleds, both because it avoids this failure and because it's a more realistic test: we want to know that the blueprints RSS generates on real systems are valid, and don't care as much about the blueprints it generates on 1- or 2-sled dev systems.

@davepacheco
Copy link
Collaborator

Technically blippy is overly-conservative here; unlike durable datasets, transient zone root datasets for zones of the same kind do not have name collisions, because their name includes the ID of the zone. So from sled-agent's point of view, it's okay to have two transient zone roots on the same zpool. I'm inclined to leave the check in blippy, at least for now, because it seems more understandable to say "at most 1 dataset of a given kind on a given zpool"? That's a weak argument, though.

For what it's worth: I think it's perfectly fine from a correctness perspective to have multiple instances of a service on a disk and I think that's actually pretty valuable in development because otherwise you need much bigger systems to do basic redundancy testing. But I still think blippy should warn about this because we wouldn't want a production system to have an inordinate number of instances of one service on a single-point-of-failure (disk or sled).

@jgallagher
Copy link
Contributor Author

Technically blippy is overly-conservative here; unlike durable datasets, transient zone root datasets for zones of the same kind do not have name collisions, because their name includes the ID of the zone. So from sled-agent's point of view, it's okay to have two transient zone roots on the same zpool. I'm inclined to leave the check in blippy, at least for now, because it seems more understandable to say "at most 1 dataset of a given kind on a given zpool"? That's a weak argument, though.

For what it's worth: I think it's perfectly fine from a correctness perspective to have multiple instances of a service on a disk and I think that's actually pretty valuable in development because otherwise you need much bigger systems to do basic redundancy testing. But I still think blippy should warn about this because we wouldn't want a production system to have an inordinate number of instances of one service on a single-point-of-failure (disk or sled).

Sounds good to me. We'll need to downgrade the severity of the blippy note for this, I think. Right now "multiple datasets of the same kind on the same pool" is a Fatal note, but I guess it's okay for it to be "multiple durable datasets on the same zpool" is Fatal and "multiple transient zone roots of the same zone kind on the same zpool" is just Warn?

@jgallagher jgallagher merged commit 0603f0b into main Jan 7, 2025
17 checks passed
@jgallagher jgallagher deleted the john/rss-dataset-address branch January 7, 2025 20:48
davepacheco added a commit that referenced this pull request Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename DatasetName::dataset() to DatasetName::kind()
3 participants