[ADR] JMAP: Avoid ElasticSearch on critical reads#259
[ADR] JMAP: Avoid ElasticSearch on critical reads#259chibenwa wants to merge 13 commits intoapache:masterfrom
Conversation
|
https://issues.apache.org/jira/browse/JAMES-3440 is the JIRA entry for this... |
mbaechler
left a comment
There was a problem hiding this comment.
Lot of small comments by I agree with this feature.
However, please ensure that paging is actually working with these new projections.
| So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance | ||
| for this component. | ||
|
|
||
| Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts. |
There was a problem hiding this comment.
What do you expect? If you loose any service you loose James availability: S3, Cassandra, RabbitMQ, ElasticSearch.
Why would we want to support unavailability of highly available services in the first place?
There was a problem hiding this comment.
If I loose ES, given that ADR content, I only loose advanced search.
My customers will be waaaay less complaining about "not having search" that "not being able to read their emails".
Why would we want to support unavailability of highly available services in the first place?
I and the people I work with are human, we do software, there will be unavailability on some of those services.
The question now is how we deal with it.
| Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws: | ||
| - Updates of flags lead to updates of the all Email object, leading to sparse segments | ||
| - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position) | ||
| - We noticed some very slow traces against ElasticSearch, even for simple queries. |
There was a problem hiding this comment.
And clue why.
But ElasticSearch slow performance likely would require its own ADR. That's a lengthy topic.
Paging is one, there's many others. I described scrolling & data mutabilityu above.
| - We noticed some very slow traces against ElasticSearch, even for simple queries. | ||
|
|
||
| Regarding Distributed James data-stores responsibilities: | ||
| - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns. |
There was a problem hiding this comment.
I don't understand this sentence
There was a problem hiding this comment.
Cassandra is the source of truth for metadata
-> I think you have no problem understanding this
its storage needs to be adapted to known access patterns.
-> This come from Cassandra storage constraints. You need to plan your reads ahead (or allow filtering and kill your cluster)
It seems pretty clear to me as it is, please do not hesitate to suggest enhencements.
There was a problem hiding this comment.
oh yes, i now understand. The ambiguity comes from the fact I expect responsibilities in this list, not details about how Cassandra works.
There was a problem hiding this comment.
It's responsibility is to handle known, common data access pattern, that's not mutually exclusive.
rouazana
left a comment
There was a problem hiding this comment.
the problem of position and limit is hard, it could have consequences on Cassandra.
ElasticSearch is meant to have some native position & limit capabilities. I know we don't use them essentially because of rights managements, but maybe we are doing a misusage here expecting that Cassandra behaves better in this use case.
Anyway I'm ok to experience it, but we should really care of the global performance in this case.
| ``` No newline at end of file | ||
| ``` | ||
|
|
||
| Note that to handle position & limit, we need to fetch `position + limit` ordered items then removing `position` firsts items. No newline at end of file |
There was a problem hiding this comment.
so if I scroll quickly n times, I will generate 1+2+...+n = n*(n+1)/2 cassandra requests ~= O(n²)
that's pretty bad, no? Couldn't it be a cause of ElasticSearch slowness? Could it slow down Cassandra?
There was a problem hiding this comment.
True for Cassandra.
True for ElasticSearch.
JMAP includes some limits concurent call, rate limiting - that can help mitigating these concenrs in the future.
Couldn't it be a cause of ElasticSearch slowness?
Maybe for some.
I succeeded to clearly link some to reindexing as well thanks to @tuanlc .
Co-authored-by: Matthieu Baechler <matthieu.baechler@gmail.com>
Co-authored-by: Matthieu Baechler <matthieu.baechler@gmail.com>
Co-authored-by: Matthieu Baechler <matthieu.baechler@gmail.com>
We are returning a full list on metadata on every IMAP synchronisation (that does a full fetch because we do not support QRSYNC). Clients trigger this every 15 minutes or so, and it get executed (with extra metadata on mutable data) in 1-2 seconds for mailboxes around 200.000 mails. This is a VERY rare operation in JMAP. I'm not scared ;-) If you are (or other people are) they can turn that of. If users run into issues in production plateform, they can disable this. Of course if that turns out being a bad idea, that could be removed from the code base and this ADR abandonned. But let's give a chance to this experimental feature a chance first, because I really believe that is the best decision we can take about ElasticSearch. |
Co-authored-by: Matthieu Baechler <matthieu.baechler@gmail.com>
|
For solving the scrolling issue, we can design a (git like) DAG to store entries and associate a DAG node to a scrolling state by using state feature of JMAP. |
|
What is a DAG ? Or we can wait scrolling being a problem before over-engineering it. So far, we just don't know if the current proposal is good enough or not. |
|
The problem is, if you don't include the needed complexity from the start, you won't know how it will behave once you include the complexity and thus you may loose your time. A DAG is a direct acyclic graph, like git. Whatever the implementation (a DAG may not be the best idea), the idea is to have a "persistent structure" (every change creates a new immutable state) so that a scroll is bound to a given structure. RBDMS usually implements that using MVCC. JMAP state maps to this concept. I don't know what is the best implementation for that in Cassandra to be honest. |
I take the risk. This proposal is a small implementation effort. Discarding it when needed won't be a problem
Thanks for the explanation. |
Exactly why writing an ADR before doing a PoC may not be the best idea |
And doing a PoC without a proper ADR is often misunderstood. Here we have some kind of feature flag, so it can be easily tried and removed if not conclusive. The ADR is interesting because without it the first question I would have asked would have been: "why do you want to do this", and the second one "how do you handle pagination". And thus long debates which are really better explained here. |
@mbaechler I would be curious to know why you think that. Can you develop a bit? I think before starting complicated developments, having a flat, ordered list of changes, served from oldest to newest is way easier to implement than the "from newest to oldest using some intermediate temporary states" documented as an optimization by the spec. Would that be what you reference as a DAG? |
|
Merged |
No description provided.