Customize High Availability
High availability designs follow three main principles:
- Elimination of single points of failure
- Fault detection
- Reliable failover points
Verrazzano provides a means to eliminate single points of failure among critical Verrazzano components. This is accomplished by increasing replica counts, anti-affinity rules, and implementing replicated data for components that rely on MySQL and OpenSearch.
ha.yaml file illustrates how the
prod profile can be extended to configure a highly available Verrazzano installation. The increased replica counts, along with the anti-affinity rules inherited from the
prod profile, ensure that the pods of each component are distributed across the Kubernetes cluster nodes.
MySQL and OpenSearch are configured to replicate data among replicas to avoid data loss.
ha-oci-ccm.yaml file configures a highly available Verrazzano installation on OCNE.
ha-ext-lb.yaml file configures a highly available Verrazzano installation using external load balancers.
Fault detection is managed natively by using Kubernetes
Services and Istio
VirtualServices that detect failed pods and route traffic to the remaining replicas.
MySQL and OpenSearch provide reliable failover points for the replicated data.
The result of these measures would be no loss of service if a cluster node became unavailable. For more information regarding node failure and recovery, read the Node Failure Guide.
When using the
ha.yaml file, consider the following:
- It does not ensure a fault-tolerant environment. Your applications still must be designed and implemented as highly available.
- Running additional replicas of components will increase resource requirements. At least four CPUs, 100 GB disk storage, and 64 GB RAM available on the Kubernetes worker nodes is required.
- Additional customizations may be required for your environment, including other customizations described in individual sections.
Follow these best practices for a highly available Verrazzano installation:
- Size your Kubernetes cluster according to your node failure tolerance and workload requirements.
- Set the default
Storage Classto one with a
WaitForFirstConsumer. This is important for being able to recover from an
Availability Domainor zone failure.
- Set the replica counts to values that correspond to your node failure tolerance.
To install the example high availability configuration using the Verrazzano CLI:
$ vz install -f https://raw.githubusercontent.com/verrazzano/verrazzano/release-1.6/examples/ha/ha.yaml
Using the Verrazzano CLI, install the example high availability configuration on OCNE as follows:
$ vz install -f https://raw.githubusercontent.com/verrazzano/verrazzano/release-1.6/examples/ha/ha-oci-ccm.yaml
Using the Verrazzano CLI, install the example high availability configuration with external load balancers as follows:
$ vz install -f https://raw.githubusercontent.com/verrazzano/verrazzano/release-1.6/examples/ha/ha-ext-lb.yaml
An OKE in-place upgrade scales the cluster to N-1 nodes, where N is the original number of nodes, so you must scale the cluster back to N. This will start a new replacement node using the new node pool. For more information on in-place upgrades, see Performing an In-Place Worker Node Kubernetes Upgrade by Updating an Existing Node Pool.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.