Skip to content

Multi-node / distributed setup (scaling to huge datasets / multi-tenant) #213

@joepio

Description

@joepio

Atomic-server has been designed to run on low-end hardware, as a self-hosted server. This was why I decided to use an embedded database (sled) and search engine (tantivy).

However, this introduces a problem. What do you do when the physical constraints of a single machine are exceeded by the demands? For example, when a single CPU is not fast enough to host your data, or if the RAM is bottlenecking performance, or if the disk size is insufficient? This is where distributed setups come in handy, but the current architecture is not designed to deal with this.

This thread is for exploring what might need to happen to facilitate a multi-node setup.

Thoughts

  • I think it makes sense to send all incoming Commits to all nodes.
  • I think using actix+websockets, sending binary object over the network should work pretty well
  • We need new messages for registering new nodes. They should probably tell the network that they would like to receive some range of resources that they are responsible for.
  • Both the subject-property-value store and the value-property-subject stores should probably be distributed. Should nodes specialize in one or the other?
  • I have no idea on how to distribute tantivy, but the docs suggest that it's been designed to allow for distributed search. QuickWit search might be a nice example of how to approach this.
  • We could utilize the fact that Drives are stored on different subdomains. We could have one node that selects where to redirect the traffic to, dependingon the subdomain.

Interesting tools

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions