Skip to content
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adjust formatting
  • Loading branch information
Techassi committed Aug 25, 2023
commit 70b9f12f5848d84873d817b64dde21ba2cea9293
128 changes: 84 additions & 44 deletions modules/contributor/pages/adr/ADR028-discovery-revision.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,28 +16,33 @@ v0.1, 2023-03-30

== Context and Problem Statement

// Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.
// Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to
// articulate the problem in form of a question.

This ADR is written with specific problems in mind, but the goal is to reach a generic mechanism for the discovery of products.
The current discovery mechanism is described https://docs.stackable.tech/home/stable/concepts/service_discovery.html[in our docs] (make sure to pick the 23.1 version).
This ADR is written with specific problems in mind, but the goal is to reach a generic mechanism for the discovery of
products. The current discovery mechanism is described
https://docs.stackable.tech/home/stable/concepts/service_discovery.html[in our docs] (make sure to pick the 23.1
version).

Basically when clients connect to products managed by Stackable, they need to have certain information about how to connect to these products.
Currently we expose some of these information, but not all (e.g. which ca cert the exposed product uses).
On the other hand it could be the case that users have external services which Stackable services should use, e.g.
an non-stackable HDFS where an Stackable Trino should connect to.
Basically when clients connect to products managed by Stackable, they need to have certain information about how to
connect to these products. Currently we expose some of these information, but not all (e.g. which ca cert the exposed
product uses). On the other hand it could be the case that users have external services which Stackable services should
use, e.g. an non-stackable HDFS where an Stackable Trino should connect to.

Question 1: How do we want to make products discoverable?
Question 2: How should clients verify the identity of the server (e.g. the cert of the server)
Question 3: How should clients authenticate themselves against the server?
*Question 1:* How do we want to make products discoverable?
*Question 2:* How should clients verify the identity of the server (e.g. the cert of the server)
*Question 3:* How should clients authenticate themselves against the server?

== Decision Drivers

We have some common use-cases that we need to express via the discovery mechanism:

1. Trino cluster
* Currently we don't write any discovery CM
* We need at least
** Coordinator address (currently only single coordinator is supported)
*** Ideally split it into the following attributes, so that clients can construct the URI. (trino python client e.g. needs all of these attributes separately and I don't want to rely on parsing and extracting the URI)
*** Ideally split it into the following attributes, so that clients can construct the URI. (trino python client e.g.
needs all of these attributes separately and I don't want to rely on parsing and extracting the URI)
**** protocol (http/https)
**** host
**** port
Expand Down Expand Up @@ -91,15 +96,16 @@ spec:

==== Pros

* Easier to use for consuming applications outside of Stackable, as they can simply mount the CM or download it to a file.
On the other hand when we start to put in yaml objects that need to be parsed we would loose the benefit.
* Easier to use for consuming applications outside of Stackable, as they can simply mount the CM or download it to a
file. On the other hand when we start to put in yaml objects that need to be parsed we would loose the benefit.
* Single API call to retrieve all running products

==== Cons

* No complex structure such as enums (we can mitigate this by sticking in custom yaml into the CM).
Users don't have any form of validation when creating their own discovery CM e.g. pointing to their existing HDFS.
* Cannot have two products with the same name, as the discovery CM name clashes. One solution could be to prefix the product name (e.g. trino-simple), This can impose other problems such as too long CM names.
* No complex structure such as enums (we can mitigate this by sticking in custom yaml into the CM). Users don't have any
form of validation when creating their own discovery CM e.g. pointing to their existing HDFS.
* Cannot have two products with the same name, as the discovery CM name clashes. One solution could be to prefix the
product name (e.g. trino-simple), This can impose other problems such as too long CM names.

=== [1] Discovery Object: Use dedicated CRD object for every product

Expand Down Expand Up @@ -130,15 +136,23 @@ spec:
==== Pros

* Validation by using e.g. complex enums
* Commons structure can be shared between all operators, such as `Listener` endpoints or tls server certificate information
* Commons structure can be shared between all operators, such as `Listener` endpoints or tls server certificate
information

==== Cons

* Operator A needs to compile against operator b to have access to it's discovery struct. An alternative would be to put the Discovery CRDs in operator-rs.
* Operator versioning hell. On the other hand we have the same problem with ConfigMaps, as e.g. a newly introduced key is missing because of an older hdfs operator version.
* Dependant Pods (such as hbase on hdfs) can not simply mount a CM containing the hdfs-site and core-site. Instead the hbase-operator needs to read the HdfsClusterDiscovery, copy the hdfs-site and core-site into a CM and than mount that into the hbase Pods. This can be solved by the HdfsClusterDiscovery to point to a CM that contains hdfs-site and core-site xmls.
* Multiple API calls need to retrieve all running Stackable service (in stackablectl or cockpit). This would be a single API call in case of discovery CM or a shared CRD for all product discoveries.
* Side-Note: `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are external systems, where we don't know anything about.
* Operator A needs to compile against operator b to have access to it's discovery struct. An alternative would be to put
the Discovery CRDs in operator-rs.
* Operator versioning hell. On the other hand we have the same problem with ConfigMaps, as e.g. a newly introduced key
is missing because of an older hdfs operator version.
* Dependant Pods (such as hbase on hdfs) can not simply mount a CM containing the hdfs-site and core-site. Instead the
hbase-operator needs to read the HdfsClusterDiscovery, copy the hdfs-site and core-site into a CM and than mount that
into the hbase Pods. This can be solved by the HdfsClusterDiscovery to point to a CM that contains hdfs-site and
core-site xmls.
* Multiple API calls need to retrieve all running Stackable service (in stackablectl or cockpit). This would be a single
API call in case of discovery CM or a shared CRD for all product discoveries.
* Side-Note: `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are
external systems, where we don't know anything about.

=== [1] Discovery Object: Use dedicated CRD object for every product - in combination with ConfigMap

Expand Down Expand Up @@ -234,27 +248,34 @@ spec:
==== Pros

* Only one struct in operator-rs => No cross-operator dependencies.
* Single API call to retrieve all stackable products. Question is if this really helps a lot, as callers probably also are interested in the status of the product, which needs further API calls (irrelevant - see Cons).
* Single API call to retrieve all stackable products. Question is if this really helps a lot, as callers probably also
are interested in the status of the product, which needs further API calls (irrelevant - see Cons).

==== Cons

* All product discoveries are versioned together. E.g. a new mandatory field for hdfs requires all operators to bump the Discovery CRD to `v2`. We hope that this does not happen too often.
* All product discoveries are versioned together. E.g. a new mandatory field for hdfs requires all operators to bump the
Discovery CRD to `v2`. We hope that this does not happen too often.
* Names can collide
* `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are external systems, where we don't know anything about. So in case we want to introduce a `Stacklet` object listing anyway, so the `Pro` regarding the API calls is irrelevant.
* `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are external
systems, where we don't know anything about. So in case we want to introduce a `Stacklet` object listing anyway, so
the `Pro` regarding the API calls is irrelevant.

=== [1] Discovery Object: Write the discovery to Product CR status

Instead of writing discovery information to dedicated objects - such as CM or custom CR - we "simply" write the discovery information to the status of the Cluster CR.
Instead of writing discovery information to dedicated objects - such as CM or custom CR - we "simply" write the
discovery information to the status of the Cluster CR.

==== Pros

==== Cons

* It does not enable users to bring their own product and talk to it from Stackable, e.g. a user-provided HDFS.
* It does not allow things such as a ZNode for Zookeeper as we either use the Zookeeper CR for discovery or we use a ZNode but than can't use a Zookeeper CR.
Currently we have the freedom of either connection to a Zookeeper root dir or a ZNode transparently.
* It does not allow things such as a ZNode for Zookeeper as we either use the Zookeeper CR for discovery or we use a
ZNode but than can't use a Zookeeper CR. Currently we have the freedom of either connection to a Zookeeper root dir or
a ZNode transparently.

=== [2] TLS: Discovery config contains SecretClass

The discovery includes the SecretClass used to obtain the ca.crt used to validate the *server* certificate

Trino discovery:
Expand Down Expand Up @@ -290,9 +311,10 @@ backends: # Don't look at the Superset CRD structure, we are only interested in

=== [2] TLS: Client needs to specify SecretClass
---
The discovery does *not* include the SecretClass used to obtain the *server* certificate.
Instead the client must specify which SecretClass should be used to verify the *server* certificate.
For usability reasons it can be omitted and defaults to the SecretClass the client uses for itself.

The discovery does *not* include the SecretClass used to obtain the *server* certificate. Instead the client must
specify which SecretClass should be used to verify the *server* certificate. For usability reasons it can be omitted and
defaults to the SecretClass the client uses for itself.

Trino discovery:
[source,yaml]
Expand Down Expand Up @@ -323,13 +345,16 @@ backends: # Don't look at the Superset CRD structure, we are only interested in

==== Pros

* Operator does not need to read/look at the DiscoveryConfig (as we can statically set up the secret-op tls secretClass volumes rather than retrieving them from the DiscoveryConfig).
* Some clients only support a single pki, in that case we could not give the ability to overwrite the secretClass coming from the product itself.
* Operator does not need to read/look at the DiscoveryConfig (as we can statically set up the secret-op tls secretClass
volumes rather than retrieving them from the DiscoveryConfig).
* Some clients only support a single pki, in that case we could not give the ability to overwrite the secretClass coming
from the product itself.

==== Cons

* The client has to know what pki/secretClass the server is using.
* Superset TrinoConnection could not only say "Connect this Superset and this Trino", but need to say "using this ca.crt to validate the Trino server"
* Superset TrinoConnection could not only say "Connect this Superset and this Trino", but need to say "using this ca.crt
to validate the Trino server"

=== [2] TLS: Include caCert in Discovery config

Expand All @@ -353,18 +378,22 @@ endpoint:

==== Pros

* Easier for external clients to use as they don't need to know the concept of SecretClasses and don't even need to run withing k8s.
* Easier for external clients to use as they don't need to know the concept of SecretClasses and don't even need to run
withing k8s.
* The client has to *not* know what pki/secretClass the server is using.

==== Cons

* BIG QUESTION: How should the product operator get the ca cert from the SecretClass it uses to get the *server* cert from?
** The secret-op could e.g. offer an HTTP api to fetch the ca.crt of a given SecretClass or e.g. write the ca.crt into the status of a SecretClass
* BIG QUESTION: How should the product operator get the ca cert from the SecretClass it uses to get the *server* cert
from?
** The secret-op could e.g. offer an HTTP api to fetch the ca.crt of a given SecretClass or e.g. write the ca.crt into
the status of a SecretClass


=== [2] TLS: Include SecretClass in discovery, user can override it

Trino discovery:

[source,yaml]
----
apiVersion: trino.stackable.tech/v1alpha1
Expand Down Expand Up @@ -413,12 +442,18 @@ authentication:
----

==== Pros
* IMPORTANT: This is the only thing the server can know (how he is verifying client identities). He can not recommend an SecretClass used to obtain the client credentials. E.g. he uses an LDAP AuthenticationClass, there is no way it can now what SecretClass provides credentials accepted by LDAP. (Most cases it will be a user logging into a WebUI and the LDAP credentials of the user are not even stored anywhere but just remembered by the user)
* IMPORTANT: This is the only thing the server can know (how he is verifying client identities). He can not recommend an
SecretClass used to obtain the client credentials. E.g. he uses an LDAP AuthenticationClass, there is no way it can
now what SecretClass provides credentials accepted by LDAP. (Most cases it will be a user logging into a WebUI and the
LDAP credentials of the user are not even stored anywhere but just remembered by the user)

==== Cons

* Operator has to read the AuthenticationClass to determine its type (pw/tls/keytab) and set up the needed volumes and commands.
// * The AuthenticationClass is meant to describe "how should a server verify connecting clients" and re-purpose it to mean "how a client should authenticate itself". Image a user creates a Secret `trino-users` with *only* a ca.crt and a SecretClass `trino-users` on top. The connecting client than has no way of knowing how to get a client cert.
* Operator has to read the AuthenticationClass to determine its type (pw/tls/keytab) and set up the needed volumes and
commands.
// * The AuthenticationClass is meant to describe "how should a server verify connecting clients" and re-purpose it to
// mean "how a client should authenticate itself". Image a user creates a Secret `trino-users` with *only* a ca.crt
// and a SecretClass `trino-users` on top. The connecting client than has no way of knowing how to get a client cert.

=== [3] Authentication: Add SecretClass to Discovery Config

Expand All @@ -434,7 +469,8 @@ authentication:
==== Cons

* Operator has to read the SecretClass to determine its type (pw/tls/keytab) and set up the needed volumes and commands.
* Image then SecretClass is of type `k8sSearch`. The connection client (e.g. controlled via superset-operator) than has no idea if he should expect a tls.crd or a keytab when mounting the SecretClass.
* Image then SecretClass is of type `k8sSearch`. The connection client (e.g. controlled via superset-operator) than has
no idea if he should expect a tls.crd or a keytab when mounting the SecretClass.

=== [3] Authentication: Add needed details

Expand All @@ -454,7 +490,8 @@ authentication:

==== Pros

* Operator has *not* to read the SecretClass to determine its type (pw/tls/keytab), as the type is already encoded in the Discovery config.
* Operator has *not* to read the SecretClass to determine its type (pw/tls/keytab), as the type is already encoded in
the Discovery config.

==== Cons

Expand All @@ -464,7 +501,8 @@ Trino discovery does not provide any information on how to authenticate

==== Cons

* Not viable, as users need to know how to connect, and are not expected to try 50 different auth methods. We need to give them a AuthenticationClass, that says them e.g. what LDAP or PKI is used.
* Not viable, as users need to know how to connect, and are not expected to try 50 different auth methods. We need to
give them a AuthenticationClass, that says them e.g. what LDAP or PKI is used.

== Decision Outcome

Expand All @@ -473,7 +511,9 @@ Trino discovery does not provide any information on how to authenticate
[3] Authentication: `Authentication: Add AuthenticationClass to Discovery Config`

=== Appendix A
Let's model a kerberos secured HDFS with the Options "TLS: Include caCert in Discovery config" and "Authentication: Add needed details"

Let's model a kerberos secured HDFS with the Options "TLS: Include caCert in Discovery config" and "Authentication:
Add needed details"

[source,yaml]
----
Expand Down