Customize OpenSearch

Customize your OpenSearch cluster configuration

Verrazzano supports two cluster topologies for an OpenSearch cluster:

  • A single-node cluster: master, ingest, and data roles performed by a single node.
  • A multi-node cluster configuration with separate master, data, and ingest nodes.

For information about the default OpenSearch cluster configurations provided by Verrazzano, see Installation Profiles.

Plan cluster topology

Start with an initial estimate of your hardware needs. The following recommendations will provide you with initial, educated estimates, but for ideal sizing, you will need to test them with representative workloads, monitor their performance, and then reiterate.

Storage requirements

Input Description Value
\(s\) Stored data size in GiB (log size per day * days to retain). User defined
\(sr\) Shard replica count per index. User defined
\(o\) Overall overhead, which is a constant. 1.45

Minimum storage requirement = \( ( s * ( 1 + sr ) ) * o \)

Example

If you have \(s\) = 66 GiB (6 GiB of log size per day * 11 days to retain) and, if you choose one shard replica per index, which makes \(sr\) = 1

Then, minimum storage requirement = \((66 * (1 + 1) ) * 1.45\) = 192 GiB

Overhead, which is defined in the previous table, can be further explained as follows.

Input Description Value
\(io\) Indexing overhead: Extra space used other than the actual data, which is generally 10% ( 0.1 ) of the index size. 1 + 0.1 = 1.1
\(lrs\) Linux reserved space: Linux reserves 5% of the file system for the root user for some OS operations. 1- 0.05 = .95
\(oo\) OpenSearch overhead: OpenSearch keeps a maximum 20% of the instance for segment merges, logs, and other internal operations. 1- 0.2 = 0.8

Overall overhead \(o\) = \( io / lrs / oo \) = 1.45

Memory

For every 100 GiB of your storage requirement, you should have 8 GiB of memory.

With reference to the Example:

For 192 GiB of storage requirement, you need 16 GiB of memory.

Number of data nodes

Input Description Value
\(ts\) Total storage in GiB. User defined
\(mem\) Memory per data node in GiB. User defined
\(md\) Memory:data ratio (1:30 ratio means that you have 30 times more storage on the node than you have RAM; the value used would be 30). User defined
\(fc\) One data node for failover capacity, which is a constant. 1

ROUNDUP \(ts / mem / md + fc\)

With reference to the Example:

\(ts\) = 192 GiB , \(mem\) = 8 GiB , \(md\) = 1:10 and \(fc\) = 1

Then, number of data nodes = ROUNDUP \( 192 / 8 / 10 + 1 \) = 3

JVM heap memory

The heap size is the amount of RAM allocated to the JVM of an OpenSearch node. The OpenSearch process is very memory intensive and close to 50% of the memory available on a node should be allocated to the JVM. The JVM machine uses memory for indexing and search operations. The other 50% is required for the file system cache, which keeps data that is regularly accessed in memory. As a general rule, you should set -Xms and -Xmx to the same value, which should be 50% of your total available RAM, subject to a maximum of (approximately) 31 GiB.

CPU

Hardware requirements vary dramatically by workload, but, typically, two vCPU cores for every 100 GiB of your storage requirement is sufficient.

With reference to the Example:

For 192 GiB of storage, the vCPU cores required are four.

Shard size

For logging, shard sizes between 10 GiB and 50 GiB typically perform well. For search-intensive operations, 10-25 GiB typically is a good shard size. Overall, it is a best practice that, for a single shard, the OpenSearch shard size should not go above 50GiB. When the shards exceed 50 GiB, you will have to reindex your data.

Primary shards count

Input Description Value
\(s\) Stored data size in GiB (log size per day * days to retain). User defined
\(sh\) Desired shard size in GiB. User defined
\(io\) Indexing overhead: Extra space used other than the actual data which is generally 10% of the index size. 0.1

Primary shards = \( ( s * (1 + io) ) / sh \)

With reference to the Example:

\(s\) = 66 GiB and if you choose shard size \(sh\) = 30 GiB

Then, primary shards count = \( ( 66 * 1.1 )/ 30 \) = 2

You can customize Prometheus to enable Alertmanager and configure recommended alarms (add alert rules) to get insight into your OpenSearch cluster and take some actions proactively.

Use the OSDataNodeFilesystemSpaceFillingUp alert to indicate that the OpenSearch average disk usage has exceeded the specified threshold. Adjust the alert thresholds according to your needs.

kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    release: prometheus-operator
  name: prometheus-operator-os
  namespace: verrazzano-monitoring
spec:
  groups:
    - name: os
      rules:
        - alert: OSDataNodeFilesystemSpaceFillingUp
          annotations:
            runbook_url: <link to runbook>
            summary: Opensearch average disk usage exceeded 75%.
          expr: |-
                        1 - (es_fs_total_available_bytes{node=~".*data.*"}/ es_fs_total_total_bytes) > .75
          for: 30m
          labels:
            severity: warning
  EOF

Configure cluster topology

You can customize the node characteristics of your OpenSearch cluster by using the spec.components.opensearch.nodes field in the Verrazzano custom resource. When installing or upgrading Verrazzano, you can use this field to define an OpenSearch cluster using node groups.

The following example overrides the dev installation profile, OpenSearch configuration (a single-node cluster with 1Gi of memory and ephemeral storage) to use a multi-node cluster (three master nodes, and three combination data/ingest nodes) with persistent storage.

apiVersion: install.verrazzano.io/v1beta1
kind: Verrazzano
metadata:
  name: custom-opensearch-example
spec:
  profile: dev
  components:
    opensearch:
      nodes:
        - name: master
          replicas: 3
          roles:
            - master
          storage:
            size: 50Gi
          resources:
            requests:
              memory: 1.5Gi
        - name: data-ingest
          replicas: 3
          roles:
            - data
            - ingest
          storage:
            size: 100Gi
          resources:
            requests:
              memory: 1Gi
        # Override the default node groups because we are providing our own topology.
        - name: es-master
          replicas: 0
        - name: es-data
          replicas: 0
        - name: es-ingest
          replicas: 0

Listing the pods and persistent volumes in the verrazzano-system namespace for the previous configuration shows that the expected nodes are running with the appropriate data volumes.

$ kubectl get pvc,pod -l verrazzano-component=opensearch -n verrazzano-system

# Sample output
NAME                                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/elasticsearch-master-vmi-system-master-0      Bound    pvc-9ace042a-dd68-4975-816d-f2ca0dc4d9d8   50Gi       RWO            standard       5m22s
persistentvolumeclaim/elasticsearch-master-vmi-system-master-1      Bound    pvc-8bf68c2c-235e-4bd5-8741-5a5cd3453934   50Gi       RWO            standard       5m21s
persistentvolumeclaim/elasticsearch-master-vmi-system-master-2      Bound    pvc-da8a48b1-5762-4669-98f0-8479f30043fc   50Gi       RWO            standard       5m21s
persistentvolumeclaim/vmi-system-data-ingest                     Bound    pvc-7ad9f275-632b-4aac-b7bf-c5115215937c   100Gi      RWO            standard       5m23s
persistentvolumeclaim/vmi-system-data-ingest-1                   Bound    pvc-8a293e51-2c20-4cae-916b-1ce46a780403   100Gi      RWO            standard       5m23s
persistentvolumeclaim/vmi-system-data-ingest-2                   Bound    pvc-0025fcef-1d8c-4307-977c-3921545c6730   100Gi      RWO            standard       5m22s

NAME                                                   READY   STATUS     RESTARTS   AGE
pod/coherence-operator-6ffb6bbd4d-bpssc                1/1     Running    1          8m2s
pod/fluentd-ndshl                                      2/2     Running    0          5m51s
pod/oam-kubernetes-runtime-85cfd899d8-z9gv6            1/1     Running    0          8m14s
pod/verrazzano-application-operator-5fbcdf6655-72tw9   1/1     Running    0          7m49s
pod/verrazzano-authproxy-5f9d479455-5bvvt              2/2     Running    0          7m43s
pod/verrazzano-console-5b857d7b47-djbrk                2/2     Running    0          5m51s
pod/verrazzano-monitoring-operator-b4b446567-pgnfw     2/2     Running    0          5m51s
pod/vmi-system-data-ingest-0-5485dcd95d-rkhvk          2/2     Running    0          5m21s
pod/vmi-system-data-ingest-1-8d7db6489-kdhbv           2/2     Running    1          5m21s
pod/vmi-system-data-ingest-2-699d6bdd9c-z7nzx          2/2     Running    0          5m21s
pod/vmi-system-grafana-7947cdd84b-b7mks                2/2     Running    0          5m21s
pod/vmi-system-kiali-6c7bd6658b-d2zq9                  2/2     Running    0          5m37s
pod/vmi-system-osd-7d47f65dfc-zhjxp   2/2     Running    0          5m21s
pod/vmi-system-master-0                                2/2     Running    0          5m21s
pod/vmi-system-master-1                                2/2     Running    0          5m21s
pod/vmi-system-master-2                                2/2     Running    0          5m21s
pod/weblogic-operator-666b548749-lj66t                 2/2     Running    0          7m48s

Running the command kubectl describe pod -n verrazzano-system vmi-system-data-ingest-0-5485dcd95d-rkhvk shows the requested amount of memory.

Containers:
  es-data:
    ...
    Requests:
      memory:   1Gi

Default Index State Management policies

Index State Management (ISM) policies configure OpenSearch to manage the data in your indices. You can use policies to automatically rollover and prune old data, preventing your OpenSearch cluster from running out of disk space.

To help you manage issues, such as low disk space, the following two ISM policies are created by default:

  • vz-system: Manages the data in the Verrazzano system index.

    vz-system

  • vz-application: Manages the data in the application-related indices having the pattern, verrazzano-application*.

    vz-application

Both ISM policies have three states:

  • Hot: This is the default state. If the primary shard size is greater than the defined size (5 GB for vz-system and 1 GB for vz-application) or the index age is greater than the defined number of days (30 days for vz-system and 7 days for vz-application), then the index will be rolled over.
  • Cold: In this state, the index will be closed if the index age is greater than the defined number of days (30 days for vz-system and 7 days for vz-application). A closed index is blocked for read or write operations and does not allow any operations that the opened indices allow.
  • Delete: In this state, the index will be deleted if the index age is greater than the defined number of days (35 days for vz-system and 12 days for vz-application).

Override default ISM policies

The vz-system and vz-application policies are immutable and any change to these policies will be reverted immediately. However, the following two methods will override this behavior:

  • Disable default policies: You can disable the use of these default policies by setting the flag spec.components.opensearch.disableDefaultPolicy to true in the Verrazzano CR. This will delete the default ISM policies.
  • Override default policies: Both these default policies have a zero (0) priority. You can override the default policies by creating policies with policy.ism_template.priority greater than 0. To create/configure your own policies, see Configure ISM Policies.

Configure ISM policies

Verrazzano lets you configure OpenSearch ISM policies using the Verrazzano custom resource. The ISM policy created by Verrazzano will contain two states: ingest and delete. The ingest state can be configured only for the rollover action. The rollover action for the ingest state will be configured based on the rollover configuration provided in the Verrazzano custom resource.

The following policy example configures OpenSearch to manage indices matching the pattern my-app-*. The data in these indices will be automatically pruned every 14 days, and will be rolled over if an index meets at least one of the following criteria:

  • Is three or more days old
  • Contains 1,000 documents or more
  • Is 10 GB in size or larger

apiVersion: install.verrazzano.io/v1beta1
kind: Verrazzano
metadata:
  name: custom-opensearch-example
spec:
  profile: dev
  components:
    opensearch:
      policies:
        - policyName: my-app
          indexPattern: my-app-*
          minIndexAge: 14d
          rollover:
            minIndexAge: 3d
            minDocCount: 1000
            minSize: 10gb
The previous Verrazzano custom resource will generate the following ISM policy.

{
  "_id" : "my-app",
  "_version" : 17,
  "_seq_no" : 16,
  "_primary_term" : 1,
  "policy" : {
    "policy_id" : "my-app",
    "description" : "__vmi-managed__",
    "last_updated_time" : 1671096525963,
    "schema_version" : 12,
    "error_notification" : null,
    "default_state" : "ingest",
    "states" : [
      {
        "name" : "ingest",
        "actions" : [
          {
            "rollover" : {
              "min_size" : "10gb",
              "min_doc_count" : 1000,
              "min_index_age" : "3d"
            }
          }
        ],
        "transitions" : [
          {
            "state_name" : "delete",
            "conditions" : {
              "min_index_age" : "14d"
            }
          }
        ]
      },
      {
        "name" : "delete",
        "actions" : [
          {
            "delete" : { }
          }
        ],
        "transitions" : [ ]
      }
    ],
    "ism_template" : [
      {
        "index_patterns" : [
          "my-app-*"
        ],
        "priority" : 1,
        "last_updated_time" : 1671096525963
      }
    ]
  }
}

NOTE: The ISM policy created using the Verrazzano custom resource contains a minimal set of configurations. To create a more detailed ISM policy, you can also use the OpenSearch REST API. To create a policy using the OpenSearch API, do the following:

$ PASS=$(kubectl get secret \
    --namespace verrazzano-system verrazzano \
    -o jsonpath={.data.password} | base64 \
    --decode; echo)

$ HOST=$(kubectl get ingress \
    -n verrazzano-system vmi-system-os-ingest \
    -o jsonpath={.spec.rules[0].host})

$ curl -ik -X PUT --user verrazzano:$PASS https://$HOST/_plugins/_ism/policies/policy_3 \
    -H 'Content-Type: application/json' \
    --data-binary @- << EOF
{
  "policy": {
    "description": "ingesting logs",
    "default_state": "ingest",
    "states": [
      {
        "name": "ingest",
        "actions": [
          {
            "rollover": {
              "min_doc_count": 5
            }
          }
        ],
        "transitions": [
          {
            "state_name": "search"
          }
        ]
      },
      {
        "name": "search",
        "actions": [],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "5m"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}
EOF
To view existing policies, do the following:

$ curl -ik \
    --user verrazzano:$PASS https://$HOST/_plugins/_ism/policies

Override the default index template

Verrazzano provides a default index template, verrazzano-data-stream. For creating an index, the default index template has a few predefined settings, like the number of shards and replicas, dynamic mappings for fields, and such. However, you can override the default index template and use your own, preferred index template.

To do that, you need to copy the contents of the default index template and change the settings, as desired, and then create your index template with a higher priority so that the new template will override the default one.

You can use the OpenSearch Dev Tools Console to send given queries to OpenSearch. To open the console, select Dev Tools on the main OpenSearch Dashboards page and write your queries in the editor pane on the left side of the console.

To get the existing, default template:

$ GET /_index_template/verrazzano-data-stream

Override default number of shards and replicas

In initial Verrazzano v1.5 installations (not upgrades), the default index template creates one shard and one replica for each index. (In previous and upgrade installations, it creates five shards and one replica). To change the default number of shards and replicas, get the default index template, change the number of shards and replicas to the desired values, and create a new index template with higher priority.

Here is an example that creates a new index template and changes the number of shards to 3 and replicas to 2.

$ PUT _index_template/my-template
    {
        "index_patterns" : [
          "verrazzano-application-myapp*"
        ],
        "template" : {
          "settings" : {
            "index" : {
              "mapping" : {
                "total_fields" : {
                  "limit" : "2000"
                }
              },
              "refresh_interval" : "5s",
              "number_of_shards" : "3",
              "auto_expand_replicas" : "0-1",
              "number_of_replicas" : "2"
            }
          },
          "mappings" : {
            "dynamic_templates" : [
              {
                "message_field" : {
                  "path_match" : "message",
                  "mapping" : {
                    "norms" : false,
                    "type" : "text"
                  },
                  "match_mapping_type" : "string"
                }
              },
              {
                "object_fields" : {
                  "mapping" : {
                    "type" : "object"
                  },
                  "match_mapping_type" : "object",
                  "match" : "*"
                }
              },
              {
                "all_non_object_fields" : {
                  "mapping" : {
                    "norms" : false,
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "ignore_above" : 256,
                        "type" : "keyword"
                      }
                    }
                  },
                  "match" : "*"
                }
              }
            ],
            "properties" : {
              "@timestamp" : {
                "format" : "strict_date_time||strict_date_optional_time||epoch_millis",
                "type" : "date"
              }
            }
          }
        },
        "priority" : 201,
        "data_stream" : {
          "timestamp_field" : {
            "name" : "@timestamp"
          }
        }
}
With this example, new indices that match the verrazzano-application-myapp* index pattern will be created with three shards and two replicas, and other indices that don’t match will continue to be created with the default number of shards and replicas. For more information, see Index templates in the OpenSearch documentation.

Override default mappings and field types

The default index template uses dynamic mapping to store all fields as text and keyword. For your application, if you want to store a field as a different type, get the default index template, change the mappings for the desired fields, and then create a new index template with a higher priority.

Here is an example that creates a new index template, for applications in the myapp* namespace, which dynamically maps all long fields to integers and explicitly maps age and ip_address fields as integer and ip respectively.

$ PUT _index_template/my-template
    {
        "index_patterns" : [
          "verrazzano-application-myapp*"
        ],
        "template" : {
          "settings" : {
            "index" : {
              "mapping" : {
                "total_fields" : {
                  "limit" : "2000"
                }
              },
              "refresh_interval" : "5s",
              "number_of_shards" : "1",
              "auto_expand_replicas" : "0-1",
              "number_of_replicas" : "0"
            }
          },
          "mappings" : {
            "dynamic_templates" : [
              {
                "long_as_int" : {
                  "mapping" : {
                    "type" : "integer"
                  },
                  "match_mapping_type" : "long"
                }
              },
              {
                "message_field" : {
                  "path_match" : "message",
                  "mapping" : {
                    "norms" : false,
                    "type" : "text"
                  },
                  "match_mapping_type" : "string"
                }
              },
              {
                "object_fields" : {
                  "mapping" : {
                    "type" : "object"
                  },
                  "match_mapping_type" : "object",
                  "match" : "*"
                }
              },
              {
                "all_non_object_fields" : {
                  "mapping" : {
                    "norms" : false,
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "ignore_above" : 256,
                        "type" : "keyword"
                      }
                    }
                  },
                  "match" : "*"
                }
              }
            ],
            "properties" : {
              "@timestamp" : {
                "format" : "strict_date_time||strict_date_optional_time||epoch_millis",
                "type" : "date"
              },
              "age" : {
                "type" : "integer"
              },
              "ip_address" : {
                "type" : "ip",
                "ignore_malformed" : true
              }
            }
          }
        },
        "priority" : 201,
        "data_stream" : {
          "timestamp_field" : {
            "name" : "@timestamp"
          }
        }
}
With this example, new indices that match the verrazzano-application-myapp* index pattern will store age and ip_address fields as integer and ip instead of text. Also, long data fields will be stored as integer. For more information, see Mappings and field types in the OpenSearch documentation.

Configure pre-existing indices after overriding the default index template

For your application, if you already have indices created by OpenSearch that are based on the default index template, then complete the steps in the following sections to configure them.

Rollover data stream

The mappings for existing indices cannot be changed, so you will need to rollover the data stream for your application to create an index. Then, OpenSearch will start indexing data based on the newer template that you created.

To rollover the data stream:

POST /verrazzano-application-myapp/_rollover

NOTE: The default ISM policy that Verrazzano provides regularly rolls over the index after meeting certain conditions, so there might not be a requirement to manually rollover the index.

Refresh the index pattern

To see the updated mappings for your fields on the Discover page, you need to refresh the index pattern for your application.

To refresh the index pattern:

  1. On the main OpenSearch Dashboards page, under the Management section, navigate to Stack Management in the Dock.
  2. Then, go to Index Pattern > verrazzano-application*. If you have created a separate index pattern for your application, then select that.
  3. Click the Refresh field list icon in the upper, right-hand side of the page.

refresh-field-list-icon

Reindex indices

After refreshing the field list, if you see a warning about a mapping conflict, you need to reindex your previous indices. The mapping conflict arises because the previous indices have different mappings for fields than the newer indices, which were created based on the new index template with different mappings.

To reindex previous indices:

POST _reindex
{
  "conflicts" : "proceed",
   "source" : {
      "index" : [
         ".ds-verrazzano-application-myapp-000001"
      ]
   },
   "dest" : {
      "index" : "verrazzano-application-myapp",
      "op_type" : "create"
   }
}

Under source, list all the previous indices that were created based on the default index template. After reindexing is complete, Refresh the index pattern again. For more information, see Reindex data in the OpenSearch documentation.

Install OpenSearch and OpenSearch Dashboards plug-ins

Verrazzano supports OpenSearch and OpenSearch Dashboard plug-in installation by providing plug-ins in the Verrazzano custom resource. To install plug-ins for OpenSearch, you define the field spec.components.opensearch.plugins in the Verrazzano custom resource.

The following Verrazzano custom resource example installs the analysis-stempel and opensearch-anomaly-detection plug-ins for OpenSearch:

apiVersion: install.verrazzano.io/v1beta1
kind: Verrazzano
metadata:
  name: custom-opensearch-example
spec:
  profile: dev
  components:
    opensearch:
      plugins:
        enabled: true
        installList:
          - analysis-stempel
          - https://repo1.maven.org/maven2/org/opensearch/plugin/opensearch-anomaly-detection/2.2.0.0/opensearch-anomaly-detection-2.2.0.0.zip

Pre-built plug-ins for OpenSearch

Here are some pre-built plug-ins that are bundled with the OpenSearch image:

  • analysis-icu
  • analysis-kuromoji
  • analysis-phonetic
  • analysis-smartcn
  • ingest-attachment
  • mapper-murmur3
  • mapper-size
  • opensearch-alerting
  • opensearch-index-management
  • opensearch-job-scheduler
  • opensearch-notifications
  • opensearch-notifications-core
  • prometheus-exporter
  • repository-s3

There are three ways to specify a plug-in in the plugins.installList:

  • Specify a plug-in by name:

    There are some pre-built additional plug-ins that are the only plug-ins you can install by name.

    installList:
            - analysis-icu
    
  • Specify a plug-in from a remote ZIP file:

    Provide the URL to a remote ZIP file that contains the required plug-in.

    installList:
            - https://repo1.maven.org/maven2/org/opensearch/plugin/opensearch-anomaly-detection/2.2.0.0/opensearch-anomaly-detection-2.2.0.0.zip
    

  • Specify a plug-in using Maven coordinates:

    Provide the Maven coordinates for the available artifacts and versions hosted on Maven Central.

    installList:
            - org.opensearch.plugin:opensearch-anomaly-detection:2.2.0.0
    

For OpenSearch Dashboard, you can provide the plug-ins by defining the field spec.components.opensearch-dashboards.plugins in the Verrazzano custom resource.

Pre-built plug-ins for OpenSearch Dashboards

Here are pre-built plug-ins that are bundled with the OpenSearch Dashboard image:

  • alertingDashboards
  • indexManagementDashboards
  • notificationsDashboards

Here is a Verrazzano custom resource example to install plug-ins for the OpenSearch Dashboards:

apiVersion: install.verrazzano.io/v1beta1
kind: Verrazzano
metadata:
  name: custom-opensearch-example
spec:
  profile: dev
  components:
    opensearchDashboards:
      plugins:
        enabled: true
        installList:
          - <URL to OpenSearch Dashboard plugin ZIP file>