Deploy MySQL-MGR Without TopoLVM

This guide provides instructions for deploying MySQL Group Replication (MGR) instances on clusters where TopoLVM is not available, such as clusters running MicroOS (SUSE Linux Enterprise Micro) or other environments that use alternative storage provisioners.

Background

The Challenge

MySQL-MGR typically relies on TopoLVM for dynamic volume provisioning. However, some environments do not support TopoLVM:

  • MicroOS clusters: The root filesystem is read-only, and TopoLVM may not be installed.
  • Bare-metal clusters: Some clusters use local storage with manual PV provisioning.
  • Test/evaluation environments: Lightweight setups that don't include TopoLVM.

Without proper storage configuration, MySQL pods will fail to start due to PVC binding failures or permission issues.

The Solution

Use a local StorageClass with manually provisioned PersistentVolumes (PVs). This guide covers:

  • Dependency installation: Ensure all required operators are installed before deploying MySQL-MGR.
  • StorageClass creation: Set up a local StorageClass with WaitForFirstConsumer binding.
  • PV provisioning: Create local PVs with correct paths, permissions, and reclaim policies.
  • ClickHouse storage fix (optional): Avoid the read-only file system error caused by the query-analytics plugin's hardcoded host path.
  • Instance creation: Deploy MySQL-MGR using the local StorageClass.

Prerequisites

Required Operators

Before deploying MySQL-MGR, ensure the following core operators are installed on the cluster:

OperatorPurpose
application-services-coreCore platform services (includes etcd-sync, log-agent, monitoring stack, etc.)
rds-operatorRelational database service operator

If you need query analytics features (slow query logging, query analysis), the following operators are also required:

OperatorPurpose
clickhouse-operatorClickHouse database operator (used by query analytics)
query-analytics-operatorQuery analytics and slow query logging
WARNING

On read-only filesystems (e.g., MicroOS), you must configure the ClickHouse storage path in the RdsInstaller CR before installing the query-analytics-operator. Otherwise, the operator will immediately create a ClickHouse PV at the default /cpaas/ck host path, which will fail with read-only file system. See Step 3 for details.

INFO

If your cluster uses TopoLVM, the topolvm-operator is also required. Since this guide covers non-TopoLVM deployments, it is not needed here.

You can verify that the operators are installed by checking their CSVs:

kubectl get csv -A | grep -E "application-services-core|rds-operator|clickhouse-operator|query-analytics-operator"

All operators should show Succeeded in the PHASE column.

Cluster Requirements

  • At least 3 worker nodes are recommended for production deployments (MySQL-MGR requires 3 members for group replication quorum). Single-member deployments are possible for testing.
  • Sufficient CPU and memory resources on each node.
  • A writable directory path available on all nodes (e.g., /opt/local-pv/ on MicroOS).

Step 1: Create a Local StorageClass

If your cluster does not already have a suitable StorageClass, create one:

kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mgr-local-pv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF
WARNING

volumeBindingMode: WaitForFirstConsumer is required so that PVs are bound only after a pod is scheduled to a specific node. This ensures data locality.

INFO

The StorageClass name is arbitrary. This guide uses mgr-local-pv to distinguish it from the Rancher local-path dynamic provisioner. You can use any name, but must reference it consistently in subsequent steps.

Step 2: Provision Local PersistentVolumes

Each MySQL-MGR member requires one PV. For a 3-member cluster, create at least 3 PVs, distributed across different nodes.

Create Directories on Nodes

On MicroOS, the root filesystem is read-only. Use /opt/local-pv/ as the base path:

# Run on each worker node, or use a privileged pod
for i in $(seq 1 9); do
  mkdir -p /opt/local-pv/pv$i
  chmod 777 /opt/local-pv/pv$i
done
WARNING
  • Path: On MicroOS, do not use /data or other root-level paths — they will fail with read-only file system. Use /opt/local-pv/ instead.
  • Permissions: chmod 777 is a convenience workaround because MySQL containers run as a non-root user with a UID/GID that may not match the host. Without write permissions, mysqld --initialize will fail with Permission denied. For production environments, consider setting ownership to the specific UID used by the MySQL container (typically 999:999) instead of using 777.
  • Stale data: If reusing PV directories, remove all existing files first with rm -rf /opt/local-pv/pvN/*. MySQL initialization fails if the data directory is not empty.

Create PV Resources

First, identify your node names:

kubectl get nodes -o wide

Then create PVs, mapping each to a specific node. For a 3-member cluster across 3 nodes:

# Define your node names
NODE1=<node1-hostname>
NODE2=<node2-hostname>
NODE3=<node3-hostname>

# Create 3 PVs per node (9 total) to allow for scaling and reprovisioning
nodes=("$NODE1" "$NODE2" "$NODE3")
for n in "${!nodes[@]}"; do
  node="${nodes[$n]}"
  for i in 1 2 3; do
    idx=$(( n * 3 + i ))
    kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv-$idx
spec:
  capacity:
    storage: 20Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: mgr-local-pv
  local:
    path: /opt/local-pv/pv$idx
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - $node
EOF
  done
done

Verify that all PVs are created and available:

kubectl get pv -l '!pv-is-managed' | grep mgr-local-pv

All PVs should show Available status before proceeding.

INFO
  • Use persistentVolumeReclaimPolicy: Delete instead of Retain. With Retain, PVs go to Released state after PVC deletion and cannot be reused without manual cleanup.
  • Note that for manually provisioned local PVs, deleting the PV object does not automatically clean up the data on disk. You may still need to manually clear directories before reuse.
  • Create more PVs than the minimum required to accommodate scaling and reprovisioning. However, extra PVs do not help with failover: if a node is lost, the PVs bound to it cannot be migrated. Recovery from node loss requires manual reprovisioning and data resync.

Step 3 (Optional): Configure ClickHouse Storage Path on Read-Only Filesystems

INFO

This step is only needed if you plan to use query analytics features (slow query logging, query analysis). If you do not use query analytics, skip this step — MySQL-MGR instances will function normally without it.

The query-analytics plugin deploys ClickHouse using a host directory PV. By default, the host path is /cpaas/ck. On MicroOS or other systems with a read-only root filesystem, this causes ClickHouse pods to fail with:

mkdir /cpaas: read-only file system
WARNING

You must configure the hostPath before installing the query-analytics-operator. Once installed, the operator immediately reconciles and creates the ClickHouse PV with the default /cpaas/ck path, which will fail on read-only filesystems.

Configure hostPath before installing query-analytics-operator

  1. Install the rds-operator first (if not already installed). The RdsInstaller CR is created by the rds-operator during its initialization.

  2. Find the RdsInstaller resource:

    kubectl get rdsinstaller -A
  3. Edit the RdsInstaller to override the ClickHouse host path:

    kubectl edit rdsinstaller <name> -n <namespace>

    Set spec.slowSQLCK.hostPath to a writable path on the node:

    spec:
      slowSQLCK:
        hostPath: /opt/local-pv/ck
    INFO

    On MicroOS, use a path under /opt/ (e.g., /opt/local-pv/ck). The directory will be created automatically with DirectoryOrCreate type.

  4. Now install the clickhouse-operator and query-analytics-operator. The ClickHouse PV will be created using the configured path instead of the default /cpaas/ck.

Fix an existing deployment with the wrong hostPath

If the query-analytics-operator was already installed with the default path and ClickHouse pods are failing, update the hostPath as above, then delete the existing PV and recreate the ClickHouse pods:

# Find and delete the old PV with the wrong hostPath
kubectl get pv | grep ck
kubectl delete pv <pv-name>

# Find ClickHouse pods
kubectl get pod -n <namespace> | grep clickhouse
# Delete them to trigger recreation with the new path
kubectl delete pod -n <namespace> <clickhouse-pod-names>

Step 4: Create a MySQL-MGR Instance

  1. Create the password secret:

    kubectl -n ${namespace} create secret generic mgr-${instance_name}-password \
      --from-literal=clusterchecker=${password} \
      --from-literal=exporter=${password} \
      --from-literal=manage=${password} \
      --from-literal=root=${password}
  2. Create the MySQL CR with the local StorageClass:

    kubectl apply -n ${namespace} -f - <<EOF
    apiVersion: middleware.alauda.io/v1
    kind: Mysql
    metadata:
      labels:
        mysql/arch: mgr
      name: ${instance_name}
    spec:
      mgr:
        enableStorage: true
        members: 3
        monitor:
          enable: true
        resources:
          server:
            limits:
              cpu: "2"
              memory: 4Gi
            requests:
              cpu: "2"
              memory: 4Gi
        router:
          replicas: 2
          resources:
            limits:
              cpu: 800m
              memory: 640Mi
            requests:
              cpu: 800m
              memory: 640Mi
          svcRO:
            type: ClusterIP
          svcRW:
            type: ClusterIP
        volumeClaimTemplate:
          spec:
            resources:
              requests:
                storage: 20Gi
            storageClassName: mgr-local-pv
      params:
        mysql:
          mysqld:
            character_set_server: utf8mb4
            default_storage_engine: InnoDB
            default_time_zone: "+08:00"
      version: "8.0"
    EOF
    INFO
    • The password secret must follow the naming convention mgr-${instance_name}-password. The operator discovers it automatically by this name.
    • Set storageClassName: mgr-local-pv (or your StorageClass name) in the volumeClaimTemplate section. This replaces the default sc-topolvm.
    • The version field is required. Use "8.0" for MySQL 8.0.
  3. Monitor the instance status:

    kubectl get mysql ${instance_name} -n ${namespace} -w

    Wait until the STATE field shows ready.

  4. Verify PVC binding and pod placement:

    kubectl get pvc -n ${namespace}
    kubectl get pod -n ${namespace} -o wide | grep ${instance_name}

    Confirm that each PVC is bound to a PV and that pods are distributed across different nodes.

Troubleshooting

PVCs Stuck in Pending

Symptom: PVCs remain in Pending state and pods are not scheduled.

Cause: No available PVs match the StorageClass name or node affinity, or PVs are in Released state.

Fix:

# Check PV status
kubectl get pv | grep mgr-local-pv
# Check PVC events for details
kubectl describe pvc -n <namespace> <pvc-name>
# Verify StorageClass name matches between PV and PVC
kubectl get pv <pv-name> -o jsonpath='{.spec.storageClassName}'

MySQL Pods Stuck in CrashLoopBackOff

Symptom: MySQL pods crash with Permission denied during initialization.

Cause: PV directories do not have write permissions for the non-root MySQL user.

Fix:

# On the node where the pod is scheduled
chmod 777 /opt/local-pv/pvN

MySQL Pods Fail with "data directory has files in it"

Symptom: mysqld --initialize fails because the data directory is not empty.

Cause: PV directory contains stale data from a previous deployment.

Fix:

# On the node, or via a privileged pod
rm -rf /opt/local-pv/pvN/*

PVs Stuck in Released State

Symptom: PVs show Released status and cannot be bound to new PVCs.

Cause: PVs were created with persistentVolumeReclaimPolicy: Retain.

Fix: Delete the Released PVs and recreate them with persistentVolumeReclaimPolicy: Delete:

kubectl delete pv <pv-name>
# Then recreate following Step 2

Instance Stuck in ErrorReconcile

Symptom: The MySQL instance shows ErrorReconcile condition and the STATE field shows available instead of ready, even though all pods are running.

Cause: The operator's reconcile loop hit a conflict error updating the CR status.

Fix: Annotate the MySQL CR to trigger a fresh reconciliation:

kubectl annotate mysql ${instance_name} -n ${namespace} \
  force-reconcile="$(date +%s)" --overwrite

ClickHouse Pods Fail with "read-only file system"

Symptom: ClickHouse pods fail to start with mkdir /cpaas: read-only file system.

Cause: The query-analytics plugin's default ClickHouse host path (/cpaas/ck) does not exist on read-only filesystems.

Fix: Follow Step 3 to set spec.slowSQLCK.hostPath in the RdsInstaller CR to a writable path.