Multi-node / distributed setup (scaling to huge datasets / multi-tenant)

Atomic-server has been designed to run on low-end hardware, as a self-hosted server. This was why I decided to use an embedded database (sled) and search engine (tantivy). 

However, this introduces a problem. What do you do when the physical constraints of a single machine are exceeded by the demands? For example, when a single CPU is not fast enough to host your data, or if the RAM is bottlenecking performance, or if the disk size is insufficient? This is where distributed setups come in handy, but the current architecture is not designed to deal with this.

This thread is for exploring what might need to happen to facilitate a multi-node setup.

## Thoughts

- I think it makes sense to send all incoming `Commits` to all nodes. 
- I think using actix+websockets, sending binary object over the network should work pretty well
- We need new messages for `registering` new nodes. They should probably tell the network that they would like to receive some range of resources that they are responsible for.
- Both the `subject-property-value` store and the `value-property-subject` stores should probably be distributed. Should nodes specialize in one or the other?
- I have no idea on how to distribute tantivy, but the docs suggest that it's been designed to allow for distributed search. [QuickWit search might be a nice example of how to approach this](https://github.com/quickwit-inc/quickwit/tree/main/quickwit-search).
- We could utilize the fact that Drives are stored on different subdomains. We could have one node that selects where to redirect the traffic to, dependingon the subdomain.

## Interesting tools

- [SeaWeedFS](https://github.com/chrislusf/seaweedfs), virtual filesystem / distributed storage system
- [Firecracker](https://github.com/firecracker-microvm/firecracker) for running multiple VMS on one device
- [tikv](https://github.com/tikv/tikv) distributed KV store (used as an [alternative storage solutions to sled](https://github.com/TommyCpp/monolith/blob/945c4ea73eeb93f975053e977455cd1cc3b4bcc5/src/storage/tikv_storage.rs) in the `monolith` project, might be inspring)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-node / distributed setup (scaling to huge datasets / multi-tenant) #213

Thoughts

Interesting tools

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-node / distributed setup (scaling to huge datasets / multi-tenant) #213

Description

Thoughts

Interesting tools

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions