Fault injection testing

You can test the fault tolerance of your cluster by deleting a VM in order to inject a fault. Once a VM is deleted, you can monitor the availability and recovery of the cluster.

Requirements

Ensure you meet the following requirements before using fault injection testing:

  • You have connected your BigAnimal cloud account with your Azure subscription. See Setting up your Azure Marketplace account for more information.
  • You should have permissions in your Azure subscription to view and delete VMs.
  • You have PGD CLI installed. See Installing PGD CLI for more information.
  • You have created a pgd-cli-config.yml file in your home directory. See Configuring PGD CLI for more information.

Fault injection testing steps

Fault injection testing consists of the following steps:

  1. Verifying cluster health
  2. Determining the write leader node for your cluster
  3. Deleting a write leader node from your cluster
  4. Monitoring cluster health

Verifying Cluster Health

Use the following commands to monitor your cluster health, node info, raft, replication lag, and write leads.

pgd check-health -f pgd-cli-config.yml
pgd verify-cluster -f pgd-cli-config.yml
pgd show-nodes -f pgd-cli-config.yml
pgd show-raft  -f pgd-cli-config.yml
pgd show-replslots –verbose -f pgd-cli-config.yml
pgd show-subscriptions -f pgd-cli-config.yml
pgd show-groups -f pgd-cli-config.yml

You can use pgd help for more information on these commands.

To list the supported commands, enter:

pgd help

For help with a specific command and its parameters, enter pgd help <command_name>. For example:

pgd help show-nodes

Determining the write leader node for your cluster

pgd show-groups -f pgd-cli-config.yml
Output
Group               Group ID         Type     Write Leader     
--------        ------------------   —---     ------------     
world           3239291720  global           p-x67kjp3fsq-d-1 
p-x67kjp3fsq-a  2456382099  data     world   p-x67kjp3fsq-a-1 
p-x67kjp3fsq-c  4147262499  data     world                    
p-x67kjp3fsq-d  3176957154  data     world   p-x67kjp3fsq-d-1

In this example, the write leader node is p-x67kjp3fsq-a-1.

Deleting a write leader node from your cluster

To delete a write lead node from the cluster:

  1. Log into BigAnimal.

  2. In a separate browser window, log into your Microsoft Azure subscription.

  3. In the left navigation of BigAnimal portal, choose Clusters.

  4. Choose the cluster to test fault injection with and copy the string value from the URL. The string value is located after the underscore.

    Delete a write lead

  1. In your Azure subscription, paste the string into the search and prefix it with dp- to search for the data plane.

    • From the results, choose the Kubernetes service from the Azure Region that your cluster is deployed in.

    Delete a write lead 2

  1. Identify the Kubernetes service for your cluster.

    Delete a write lead

Note

Don't delete the Azure Kubernetes VMSS here or sub resources directly.

  1. Browse to the Data Plane, choose Workloads, and locate the Kubernetes resources for your cluster to delete a chosen node. Delete a write lead 3

Monitoring cluster health

After deleting a cluster node, you can monitor the health of the cluster using the same PGD CLI commands that you used to verify cluster health.