Reported in https://github.com/orgs/stackabletech/discussions/35
Currently we hard-code svc.cluster.local in a lot of places.
This is bad, as some user installations have a non-default installation by doing stuff such as https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/, e.g. --cluster-domain=<default-local-domain>.
We need users to be able to configure the Service DNS suffix.
Possible solutions:
- Add an cli flag (or better: env var) to the product operators which overrides the
svc.cluster.local default.
- More ideal: Somehow let operators detect the DNS suffix of the k8s cluster and use that
- Maybe listener-operator can help us here
- [...]
Research Tasks
Refinement
Option 1: Use ENV var only
- The operator uses an ENV var (e.g.
CLUSTER_DNS_SUFFIX) deployed via helm (openshift will differ)
- Will default to
cluster.local if not set
Pro
- Easy to implement
- Straight forward, no need parsing resolv.conf
- Foundation for Option 2 that can be extended
- Implementation does not differ for Kubernetes / OS
Con
- Openshift/Olm: We cannot set the var
CLUSTER_DNS_SUFFIX for the secret and listener operator (due its special daemonset deployment).
We can however edit the demonset afterwards and add the env var (cumbersome but possible).
Option 2: Use ENV var + kubernetes + dns suffix auto detection
-
Operator reads an env var e.g. CLUSTER_DNS_SUFFIX (containing e.g. my-cluster.local)
- If this exists use the suffix provided in there and return
-
If CLUSTER_DNS_SUFFIX does not exist, determine whether we run in kubernetes or not via
- Checking e.g.
KUBERNETES_SERVICE_HOST variable
- Checking e.g.
KUBERNETES_SERVICE_PORT variable
-
If we run in Kubernetes, read and parse the resolv.conf
cat /etc/resolv.conf
search sble-operators.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.243.21.53
options ndots:5
We need to parse the "shortest" entry in the last "search" entry (here: cluster.local).
If we do not run in Kubernetes, we default to cluster.local and return
-
If this did not result in a proper DNS suffix we do not default, but error out. There wont be any working deployment.
Pro
- Non breaking (unless we act upon positive research results from research task 1)
- Definite improvement for Kubernetes and Openshift (the resolv.conf parsing)
- Implementation does not differ for Kubernetes / OS
Con
- Openshift/Olm: We cannot set the
CLUSTER_DNS_SUFFIX for the secret and listener operator (due its special demonset deployment).
We can however edit the demonset afterwards and add the env var (cumbersome but possible).
- The auto-detection is only a 95% solution, since there may be edge cases not considered for now. But users can set their DNS explicitly using the config env var.
Option 3: DNS operator (OS) / Custom object (Kubernetes) containing the DNS suffix read by all operators
Pro
- Generic solution utilizing OS DNS operator or kube-dns / coredns
Con
- Operators will differ for kubernetes / openshift
- Currently operators do not "know" if they run on Kubernetes or OS (its just our templating around that)
- More implementation effort
Option 4: Provide a config map in the namespace of the operators containing common shared settings
- We deploy an additional configmap in the operator namespace containing shared settings like cluster domain
Pro
- Implementation does not differ for Kubernetes / OS
- Simple solution
- Possible preperation for other shared settings
Con
- Which namespace? Possible name collisions if installed in default?
- Not sure if that is possible / acceptable in OS
- Deployed by us or by user?
Outcome
Reported in https://github.com/orgs/stackabletech/discussions/35
Currently we hard-code
svc.cluster.localin a lot of places.This is bad, as some user installations have a non-default installation by doing stuff such as https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/, e.g.
--cluster-domain=<default-local-domain>.We need users to be able to configure the Service DNS suffix.
Possible solutions:
svc.cluster.localdefault.Research Tasks
cluster.localand let the DNS lookup do its thing? What about secret op certs? (timeboxed 2h) -> does not work with secret opKUBERNETES_SERVICE_DNS_DOMAINRefinement
Option 1: Use ENV var only
CLUSTER_DNS_SUFFIX) deployed via helm (openshift will differ)cluster.localif not setPro
Con
CLUSTER_DNS_SUFFIXfor the secret and listener operator (due its special daemonset deployment).We can however edit the demonset afterwards and add the env var (cumbersome but possible).
Option 2: Use ENV var + kubernetes + dns suffix auto detection
Operator reads an env var e.g.
CLUSTER_DNS_SUFFIX(containing e.g.my-cluster.local)If
CLUSTER_DNS_SUFFIXdoes not exist, determine whether we run in kubernetes or not viaKUBERNETES_SERVICE_HOSTvariableKUBERNETES_SERVICE_PORTvariableIf we run in Kubernetes, read and parse the
resolv.confWe need to parse the "shortest" entry in the last "search" entry (here:
cluster.local).If we do not run in Kubernetes, we default to
cluster.localand returnIf this did not result in a proper DNS suffix we do not default, but error out. There wont be any working deployment.
Pro
Con
CLUSTER_DNS_SUFFIXfor the secret and listener operator (due its special demonset deployment).We can however edit the demonset afterwards and add the env var (cumbersome but possible).
Option 3: DNS operator (OS) / Custom object (Kubernetes) containing the DNS suffix read by all operators
Pro
Con
Option 4: Provide a config map in the namespace of the operators containing common shared settings
Pro
Con
Outcome