Skip to content

Check that an upgrade can be performed on an existing cluster without data loss (cycling demo) #752

@razvan

Description

@razvan

Description

Test the upgrade of the cycling demo from 25.3 to 25.7 works and the data is preserved.

Summary

  • The demo needs to be patched to give the HBase master more memory.
  • Upgrading the SDP works.
  • HBase fails to start because the 25.3 image and the 25.7 configurations are incompatible.
  • Deleting and recreating the HBase cluster fixes it.
  • The cycling-triodata table contains the same data.

Protocol

Test SDP release upgrade with the cycling demo

install the 25.3 demo version

❯ stackablectl demo install --release 25.3 hbase-hdfs-load-cycling-data

HBase shell errors out

❯ kubectl exec -it hbase-master-default-0 -- bin/hbase shell
command terminated with exit code 137

Fixed by increasing the Hbase memory limit from 1Gi to 2Gi after which it worked

hbase:001:0> describe 'cycling-tripdata'
Table cycling-tripdata is ENABLED
cycling-tripdata, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}
COLUMN FAMILIES DESCRIPTION
...
{NAME => 'started_at', INDEX_BLOCK_ENCODING => 'NONE', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BL
OOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536 B (64KB)'}

12 row(s)
Quota is disabled
Took 0.7271 seconds
hbase:002:0>

Uninstall 25.3 ops

demos on  main [$] took 1m42s
❯ stackablectl release uninstall 25.3

Uninstalled release "25.3"

Use "stackablectl release list" to list available releases.

Patch crds (copy&paste from release notes)

Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/airflow-operator/25.7.0/deploy/helm/airflow-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "airflowclusters.airflow.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/authenticationclasses.authentication.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/s3connections.s3.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/s3buckets.s3.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/druid-operator/25.7.0/deploy/helm/druid-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "druidclusters.druid.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/hbaseclusters.hbase.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/hdfsclusters.hdfs.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/hive-operator/25.7.0/deploy/helm/hive-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "hiveclusters.hive.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/kafka-operator/25.7.0/deploy/helm/kafka-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "kafkaclusters.kafka.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/listenerclasses.listeners.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/listeners.listeners.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/podlisteners.listeners.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/nifi-operator/25.7.0/deploy/helm/nifi-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "nificlusters.nifi.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/opa-operator/25.7.0/deploy/helm/opa-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "opaclusters.opa.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/secretclasses.secrets.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/secret-operator/25.7.0/deploy/helm/secret-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "truststores.secrets.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/truststores.secrets.stackable.tech created
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/stackabletech/secret-operator/25.7.0/deploy/helm/secret-operator/crds/crds.yaml": customresourcedefinitions.api
extensions.k8s.io "secretclasses.secrets.stackable.tech" already exists
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/25.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "sparkapplications.spark.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/25.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "sparkhistoryservers.spark.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/25.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "sparkconnectservers.spark.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/sparkapplications.spark.stackable.tech created
customresourcedefinition.apiextensions.k8s.io/sparkhistoryservers.spark.stackable.tech created
customresourcedefinition.apiextensions.k8s.io/sparkconnectservers.spark.stackable.tech created
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/superset-operator/25.7.0/deploy/helm/superset-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "supersetclusters.superset.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/superset-operator/25.7.0/deploy/helm/superset-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "druidconnections.superset.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/trino-operator/25.7.0/deploy/helm/trino-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "trinoclusters.trino.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/trino-operator/25.7.0/deploy/helm/trino-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "trinocatalogs.trino.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/zookeeperclusters.zookeeper.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/zookeeperznodes.zookeeper.stackable.tech replaced

Install release SDP release 25.7

stackablectl release install 25.7

Hbase pods crash loop.

Master logs

2025-07-24T09:43:36,934 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.NumberFormatException: For input string: "${HBASE_SERVICE_PORT}"
    at java.lang.NumberFormatException.forInputString(Unknown Source) ~[?:?]
    at java.lang.Integer.parseInt(Unknown Source) ~[?:?]
    at java.lang.Integer.parseInt(Unknown Source) ~[?:?]
    at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1534) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:1270) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.MasterRpcServices.<init>(MasterRpcServices.java:424) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:737) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:670) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:474) ~[hbase-server-2.6.1.jar:2.6.1]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:?]
    at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3403) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:248) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3423) ~[hbase-server-2.6.1.jar:2.6.1]
2025-07-24T09:43:37,018 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
    at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3412) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:248) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3423) ~[hbase-server-2.6.1.jar:2.6.1]
Caused by: java.lang.NumberFormatException: For input string: "${HBASE_SERVICE_PORT}"

Patched the hbase sts to update the image, but got the error

+ HBASE_ROLE_NAME=master
+ HBASE_ROLE_SERVICE_PORT=hbase-master-default.default.svc.cluster.local
+ HBASE_PORT_NAME=16000
/stackable/hbase/bin/hbase-entrypoint.sh: line 19: $4: unbound variable
stream closed EOF for default/hbase-master-default-0 (hbase) 

Deleted the stacklet and recereated it with the same Hbase version worked.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions