ElasticSearch performance#
Performance tuning is an art. There are very many factors that can influence it and it is difficult to know up front which of them is going to be decisive in improving the performance of your Intelligence Center. Often the simple tweaking of an apparently insignificant setting can have a decisive effect. Nevertheless, the following performance parameters are the most common to pay attention to:
Your requirements
Query crafting
Cluster architecture
Your requirements#
Data storage#
If you are only going to use EclecticIQ Intelligence Center for analysis only, and will be regularly cleaning up your data store, then your storage needs will be minimal (as a rough indicator less than one million entities and observables). In that case, a single server will probably be sufficient.
However, if you are also going to use EclecticIQ Intelligence Center to share data, then the following should provide you with an indication of how many servers you need:
# Entities & observables |
# servers |
---|---|
<= 10 million |
1 |
> 10 million & <= 40 million |
5 |
> 40 million |
11 |
Availability#
ElasticSearch uses replicas of primary shards to increase fail-over and performance. To provide enough capacity for shard replication, you might want to consider increasing the number of servers you use to three or more depending on your availability requirements.
Performance#
A lot of research has been done to determine what response time users find acceptable when performing queries on a database. The duration of “slow” is obviously quite subjective but as a rule the industry has defined a query taking longer than two seconds to be slow. Use this as a benchmark when sizing your capacity to meet your users’ needs.
Hardware budget#
Although this is a bit of a no-brainer, just adding more and more virtual servers will not help if your underlaying hardware capacity is inadequate. You may want to lay out extra cash for faster CPUs.
Query crafting#
The data collected from a database can be accessed in different ways, through different data-structures, and in different orders. Each way typically requires different processing time. Consequently, the way you craft your queries can have a significant impact on their performance. We recommend you check out ElasticSearch’s own documentation for tips on how to craft efficient queries.
Cluster architecture#
Tip
EclecticIQ offers an ElasticSearch diagnostics tool that provides you with a number of useful metrics to help gauge the health of your ElasticSearch configuration.
Resource sharing#
If possible, do not run other processes on the ElasticSearch host server because there will be competition for resources and performance will suffer as a result. If you do need to run ElasticSearch together with other processes on the same server, provide separation and guaranteed access to resources by using Docker containers for example.
File system cache#
Make sure that there is enough memory available for the filesystem cache because Elasticsearch relies on it to speed up queries, as well as to buffer I/O operations during indexing.
EclecticIQ’s ElasticSearch diagnostics tool includes a metric (JVM heap size) that helps determine if you have enough cache memory.
Cluster health#
Cluster health depends on the extent to which shards have been allocated to a cluster. In a healthy cluster, all the necessary shards have been allocated.
EclecticIQ’s ElasticSearch diagnostics tool includes a cluster health metric.