Skip to content

Conversation

@guillaumemichel
Copy link
Collaborator

@guillaumemichel guillaumemichel commented Aug 26, 2025

Connectivity State Machine

Benefits

If a node isn't bootstrapped, or is offline for a prolonged period, new provide operations should error rather than adding cids to the queue, and consuming memory.

It helps in the case the client is offline when adding a lot of data, or in the case of the LAN DHT if no DHT server is on the LAN.

State Machine

Connectivity package:

// State Machine starting in OFFLINE state (when `Start()` is called)
//  1. OFFLINE state:
//     - Calls `checkFunc` with exponential backoff until node is found ONLINE.
//     - Calls to `TriggerCheck()` are ignored while OFFLINE.
//     - When `checkFunc` returns true, state changes to ONLINE and
//     `onOnline()` callback is called.
//  2. ONLINE state:
//     - Calls to `TriggerCheck()` will call `checkFunc` only if at least
//     `onlineCheckInterval` has passed since the last check.
//     - If `TriggerCheck()` returns false, switch state to DISCONNECTED.
//  3. DISCONNECTED state:
//     - Calls `checkFunc` with exponential backoff until node is found ONLINE.
//     - Calls to `TriggerCheck()` are ignored while DISCONNECTED.
//     - When `checkFunc` returns true, state changes to ONLINE and
//     `onOnline()` callback is called.
//     - After `offlineDelay` has passed in DISCONNECTED state, state changes
//     to OFFLINE and `onOffline()` callback is called.

What the callbacks do:

  • onOnline():
    • make a few GCP requests to estimate the prefix length to be used with the schedule
    • refresh schedule with keys that may have been added while the node was OFFLINE
    • resume any pending work (provide and reprovide queue)
  • onOffline():
    • clear provide queue
    • invalidate cached prefix length (set to -1) to signal that ProvideOnce, StartProviding, RefreshSchedule and AddToSchedule should return an error.

It feels like this PR is adding complexity :(

Notes:

  • This PR changes the Provide interface by returning errors for ProvideOnce, StartProviding, and StopProviding (as well as RefreshSchedule and AddToSchedule).
    • This seems fine since the operation remains async
    • Error is returned only if supplied keys cannot be queued
    • Returning errors is better than silently failing
  • connectivity tests
    • AI generated
    • exponential backoff isn't tested
    • tests will be rewritten when migrating to quartz

To do before merging:

@guillaumemichel guillaumemichel requested review from a team as code owners August 26, 2025 20:35
@guillaumemichel guillaumemichel mentioned this pull request Aug 26, 2025
56 tasks
@guillaumemichel guillaumemichel force-pushed the provider-connectivity-mgmt branch from 6093296 to 02d60e0 Compare August 26, 2025 20:41
@guillaumemichel guillaumemichel force-pushed the provider-connectivity-mgmt branch from 02d60e0 to eedd08e Compare August 26, 2025 20:54
Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor comments.

@guillaumemichel guillaumemichel merged commit e903f4c into provider Sep 4, 2025
9 checks passed
@guillaumemichel guillaumemichel deleted the provider-connectivity-mgmt branch September 4, 2025 13:18
guillaumemichel added a commit to guillaumemichel/go-libp2p-kad-dht that referenced this pull request Sep 17, 2025
* update ConnectivityChecker

* ai tests

* update connectivity state machine

* remove useless connectivity funcs

* connectivity: get rid of internal state

* docs and tests

* fix(dual/provider): don't prevent providing if a DHT returns an error

* address review
guillaumemichel added a commit to guillaumemichel/go-libp2p-kad-dht that referenced this pull request Sep 17, 2025
* update ConnectivityChecker

* ai tests

* update connectivity state machine

* remove useless connectivity funcs

* connectivity: get rid of internal state

* docs and tests

* fix(dual/provider): don't prevent providing if a DHT returns an error

* address review
guillaumemichel added a commit that referenced this pull request Sep 17, 2025
* update ConnectivityChecker

* ai tests

* update connectivity state machine

* remove useless connectivity funcs

* connectivity: get rid of internal state

* docs and tests

* fix(dual/provider): don't prevent providing if a DHT returns an error

* address review
guillaumemichel added a commit that referenced this pull request Sep 17, 2025
* update ConnectivityChecker

* ai tests

* update connectivity state machine

* remove useless connectivity funcs

* connectivity: get rid of internal state

* docs and tests

* fix(dual/provider): don't prevent providing if a DHT returns an error

* address review
guillaumemichel added a commit that referenced this pull request Sep 18, 2025
* update ConnectivityChecker

* ai tests

* update connectivity state machine

* remove useless connectivity funcs

* connectivity: get rid of internal state

* docs and tests

* fix(dual/provider): don't prevent providing if a DHT returns an error

* address review
guillaumemichel added a commit that referenced this pull request Sep 18, 2025
* update ConnectivityChecker

* ai tests

* update connectivity state machine

* remove useless connectivity funcs

* connectivity: get rid of internal state

* docs and tests

* fix(dual/provider): don't prevent providing if a DHT returns an error

* address review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants