ElasticSearch performance#

Performance tuning is an art. There are very many factors that can influence it and it is difficult to know up front which of them is going to be decisive in improving the performance of your Intelligence Center. Often the simple tweaking of an apparently insignificant setting can have a decisive effect. Nevertheless, the following performance parameters are the most common to pay attention to:

Your requirements
Query crafting
Cluster architecture

Your requirements #

Data storage #

If you are only going to use EclecticIQ Intelligence Center for analysis only, and will be regularly cleaning up your data store, then your storage needs will be minimal (as a rough indicator less than one million entities and observables). In that case, a single server will probably be sufficient.

However, if you are also going to use EclecticIQ Intelligence Center to share data, then the following should provide you with an indication of how many servers you need:

# Entities & observables	# servers
<= 10 million	1
> 10 million & <= 40 million	5
> 40 million	11

Tip

Example 5-server architecture

Tip

Example 11-server architecture

Availability #

ElasticSearch uses replicas of primary shards to increase fail-over and performance. To provide enough capacity for shard replication, you might want to consider increasing the number of servers you use to three or more depending on your availability requirements.

Performance #

A lot of research has been done to determine what response time users find acceptable when performing queries on a database. The duration of “slow” is obviously quite subjective but as a rule the industry has defined a query taking longer than two seconds to be slow. Use this as a benchmark when sizing your capacity to meet your users’ needs.

Hardware budget #

Although this is a bit of a no-brainer, just adding more and more virtual servers will not help if your underlaying hardware capacity is inadequate. You may want to lay out extra cash for faster CPUs.

Query crafting #

The data collected from a database can be accessed in different ways, through different data-structures, and in different orders. Each way typically requires different processing time. Consequently, the way you craft your queries can have a significant impact on their performance. We recommend you check out ElasticSearch’s own documentation for tips on how to craft efficient queries.

Cluster architecture #

Tip

EclecticIQ offers an ElasticSearch diagnostics tool that provides you with a number of useful metrics to help gauge the health of your ElasticSearch configuration.

File system cache #

Make sure that there is enough memory available for the filesystem cache because Elasticsearch relies on it to speed up queries, as well as to buffer I/O operations during indexing.

EclecticIQ’s ElasticSearch diagnostics tool includes a metric (JVM heap size) that helps determine if you have enough cache memory.

Cluster health #

Cluster health depends on the extent to which shards have been allocated to a cluster. In a healthy cluster, all the necessary shards have been allocated.

EclecticIQ’s ElasticSearch diagnostics tool includes a cluster health metric.

Shard and cluster size #

The recommended shard size is between 10 and 50 GB. To keep to this recommendation, you may need to either split the index, or re-index.

If your data volume grows, you could also delete data that has become obsolete. A hot-warm architecture help here (and reduce hardware costs too).

Unfortunately however, there is no “one-size-fits-all” solution to Elasticsearch sizing. It needs to be constantly monitored and reevaluated if the current cluster architecture fits your data volume, expected response times and ingestion volume. See the Introduction to ElasticSearch sizing webinar.

EclecticIQ’s ElasticSearch diagnostics tool includes a shard size metric.