-
Notifications
You must be signed in to change notification settings - Fork 488
[ADR] JMAP: Avoid ElasticSearch on critical reads #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
8deb190
a4ae094
532cccf
bc12fbc
5a3640f
2fe64cb
25c7d92
70f7898
5a03c80
892da60
3907a83
a95aa3d
f8d1b5d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,14 +23,14 @@ for this component. | |
|
|
||
| Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts. | ||
|
||
|
|
||
| Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows: | ||
| - Updates of flags leads to updates of the all Email object, leading to sparse segments | ||
| Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws: | ||
| - Updates of flags lead to updates of the all Email object, leading to sparse segments | ||
| - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position) | ||
| - We noticed some very slow traces against ElasticSearch, even for simple queries. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any clue why?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And clue why. But ElasticSearch slow performance likely would require its own ADR. That's a lengthy topic. Paging is one, there's many others. I described scrolling & data mutabilityu above. |
||
|
|
||
| Regarding Distributed James data-stores responsibilities: | ||
| - Cassandra is the source of truth for metadata, its storage need to be adapted to known access patterns. | ||
| - ElasticSearch allows resolution of arbitrary queries, and perform full text search. | ||
| - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this sentence
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cassandra is the source of truth for metadata -> I think you have no problem understanding this its storage needs to be adapted to known access patterns. -> This come from Cassandra storage constraints. You need to plan your reads ahead (or allow filtering and kill your cluster) It seems pretty clear to me as it is, please do not hesitate to suggest enhencements.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh yes, i now understand. The ambiguity comes from the fact I expect responsibilities in this list, not details about how Cassandra works.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's responsibility is to handle known, common data access pattern, that's not mutually exclusive. |
||
| - ElasticSearch allows resolution of arbitrary queries, and performs full text search. | ||
|
|
||
| ## Decision | ||
|
|
||
|
|
@@ -47,9 +47,100 @@ Administrators would be offered a configuration option to turn this view on and | |
|
|
||
| If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch. | ||
chibenwa marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments. | ||
| Furthermore, we expected better performances by resolving such queries against Cassandra. | ||
| Furthermore, we expect better performances by resolving such queries against Cassandra. | ||
chibenwa marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Alternatives | ||
|
|
||
| Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep | ||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| resolving all `Email/query` against ElasticSearch. | ||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Example of optimized JMAP requests | ||
|
|
||
| ### A: Email list sorted by sentAt, with limit | ||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| RFC-8621: | ||
|
|
||
| ``` | ||
| ["Email/query", | ||
| { | ||
| "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6", | ||
| "filter: { | ||
| "inMailbox":"abcd" | ||
| } | ||
| "comparator": [{ | ||
| "property":"sentAt", | ||
| "isAscending": false | ||
| }] | ||
| }, | ||
| "c1"] | ||
| ``` | ||
|
|
||
| Draft: | ||
|
|
||
| ``` | ||
| [["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]] | ||
| ``` | ||
|
|
||
| ### B: Email list sorted by sentAt, with limit, after a given receivedAt date | ||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| RFC-8621: | ||
|
|
||
| ``` | ||
| ["Email/query", | ||
| { | ||
| "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6", | ||
| "filter: { | ||
| "inMailbox":"abcd", | ||
| "after": "aDate" | ||
| } | ||
| "comparator": [{ | ||
| "property":"sentAt", | ||
| "isAscending": false | ||
| }] | ||
| }, | ||
| "c1"] | ||
| ``` | ||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### C: Email list sorted by sentAt, with limit, after a given sentAt date | ||
|
|
||
| Draft: | ||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ``` | ||
| [["getMessageList", {"filter":{"after":"aDate", "inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]] | ||
| ``` | ||
|
|
||
chibenwa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Cassandra table structure | ||
|
|
||
| Several tables are required in order to implement this view on top of Cassandra. | ||
|
|
||
| Eventual denormalization consistency can be enforced by using BATCH statements. | ||
|
|
||
| A table allows sorting messages of a mailbox by sentAt, allows answering A and C: | ||
|
|
||
| ``` | ||
| TABLE email_query_view_sent_at | ||
| PRIMARY KEY mailboxId | ||
| CLUSTERING COLUMN sentAt | ||
| CLUSTERING COLUMN messageId | ||
| ``` | ||
|
|
||
| A table allows filtering emails after a receivedAt date. Given a limited number of results, soft sorting and limits can | ||
| be applied using the sentAt column. This allows answering B: | ||
|
|
||
| ``` | ||
| TABLE email_query_view_sent_at | ||
| PRIMARY KEY mailboxId | ||
| CLUSTERING COLUMN receivedAt | ||
| CLUSTERING COLUMN messageId | ||
| COLUMN sentAt | ||
| ``` | ||
|
|
||
| Finally upon deletes, receivedAt and sentAt should be known. Thus we need to provide a lookup table: | ||
|
|
||
| ``` | ||
| TABLE email_query_view_date_lookup | ||
| PRIMARY KEY mailboxId | ||
| CLUSTERING COLUMN messageId | ||
| COLUMN sentAt | ||
| COLUMN receivedAt | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.