Sync the search database

Run the eiq-platform search sync-data command to sync the Elasticsearch indexing and search database with the PostgreSQL main database.

About syncing the search database

Over time, the Elasticsearch indexing and search database may go out of sync with the PostgreSQL main database. The causes can be different: large re-ingestion operations, database migrations following product upgrades, malformed source data, or operational wear and tear.

For example:

  • An entity may exist in Elasticsearch, but not in PostgreSQL.
    It is possible to search for the entity, but the entity data no longer exists in the PostgreSQL database.

  • An entity may exist in PostgreSQL, but not in Elasticsearch.
    It is not possible to search for the entity, even though the entity data does exist in the PostgreSQL database.

  • An entity in Elasticsearch may have a different timestamp value than the same entity in PostgreSQL.

If you notice that the same entity has different last_updated_at timestamp values in the indexing and search database and in the main database, run the eiq-platform search sync-data command to sync the indexing and search database with the main one.

Sync the search database

The eiq-platform search sync-data command checks the value of an entity last_updated_at timestamp field to assess if the entity in the Elasticsearch indexing and search database is in sync with the same entity in the PostgreSQL main database.
The command iterates over all entities in the Elasticsearch indexing and search database, and in the PostgreSQL main database.

Since PostgreSQL is the single source of truth for the platform data, any discrepancies are resolved by copying data from PostgreSQL to Elasticsearch.

Depending on system load and the total amount of entities in the platform, the command can take several hours to complete execution.
The eiq-platform search sync-data command has a low impact on system load; therefore, it is safe to run it while the platform is in active use.

Run inside venv

The eiq-platform command line interface offers a set of commands and utilities to diagnose and to address common operational issues that may occur over time.
The eiq-platform command line interface is available in the platform Python virtual environment: activate it and, if necessary, import the platform environment variables.

  • Open a terminal session.

  • In the terminal, log in to the platform through SSH:

    # The platform user name must have admin access rights
    ssh ${platform_user_name}@${platform_host_name}
     
    # Or:
    ssh ${platform_user_name}@${platform_ip_address}
     
    # Example:
     
    # Or:

  • Obtain root-level access by running sudo -i:

    # Root-access login shell
    sudo -i

    To access resources as a different user than the currently active one, append -u:

    # Grant the currently logged in user root-level access
    sudo -i
     
    # Grant root-level access to a different user
    sudo -i -u ${user_name}
     
    # Run a command as a different user, with root-level access
    sudo -i -u ${user_name} ${command} ${options}

  • Activate a Python virtual environment for the platform:

    source /opt/eclecticiq-platform-backend/bin/activate

Run the sync command

eiq-platform search sync-data checks the value of an entity last_updated_at timestamp field to check if the entity needs syncing.
For more information about the command, append --help to it, and then press ENTER on your keyboard.

Command

To start checking if the Elasticsearch indexing and search database and the PostgreSQL main database are in or out of sync:

eiq-platform search sync-data

Parameters

Parameters

Type

Description

Required

Default

--check-results-file

String

You must specify a file name and a path to location to save the report with the summary and the out-of-sync entities.

The report file format is JSON (.json).
Optionally, you can save the report also as a plain text file (.txt).

Yes

-

--batch-size

Integer

The command analyzes entities in batches.
Specify the maximum amount of entities to group in a batch.
Very large batches may slow down the process, or produce timeouts.

No

1000

--changed-after

Date

Checks only entities whose last_updated_at timestamp is more recent than the date you specify here.

If you specify date and time, date and time values must be wrapped in quotes.

Allowed formats:

  • "yyyy-mm-dd hh:mm:ss"

  • "yyyy-mm-dd hh:mm"

  • yyyy-mm-dd

No

-

--changed-before

Date

Checks only entities whose last_updated_at timestamp is older than the date you specify here.

If you specify date and time, date and time values must be wrapped in quotes.

Allowed formats:

  • "yyyy-mm-dd hh:mm:ss"

  • "yyyy-mm-dd hh:mm"

  • yyyy-mm-dd

No

-

--dry-run

Boolean

Checks syncing inconsistencies between PostgreSQL and Elasticsearch.
It returns a report with the results.
It does not sync the databases.

No

False

--entity-types

String

To limit the sync check only to the selected entity types, specify one or more entity types.
If you specify multiple entity types, separate them with a comma (",").

To check and, if necessary, sync entity relationships, specify relation.

The entity type names correspond to the data.type values.
Allowed values:

  • campaign

  • course-of-action

  • eclecticiq-sighting

  • exploit-target

  • incident

  • indicator

  • package

  • report

  • threat-actor

  • ttp

No

-

--print-diffs

Boolean

Includes in the results an overview of the detected differences in PostgreSQL and Elasticsearch for outdated and strange entities.
If you turn on this option, the operation may take longer.
Use it for debugging purposes.

No

False

Result

By default, command execution events are output to stdout:

{
"event":"Starting data check",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_commons",
"timestamp":"2019-11-29T15:34:10.294127Z"
}
{
"event":"Data check finished",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_commons",
"n_missing":0,
"n_not_deleted":0,
"n_ok":0,
"n_outdated":0,
"n_strange":0,
"timestamp":"2019-11-29T15:34:10.441502Z"
}
{
"event":"Writing check_results file",
"filename":"/tmp/search-sync.txt",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_commons",
"timestamp":"2019-11-29T15:34:10.441910Z"
}
{
"event":"Deleting 0 entities that should not be in Elasticsearch",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_in_elasticsearch",
"timestamp":"2019-11-29T15:34:10.442565Z"
}
{
"event":"Triggering reindexing of batch 1 of 0 in Elasticsearch",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_in_elasticsearch",
"timestamp":"2019-11-29T15:34:10.442789Z"
}
{
"event":"reindexing.tasks_to_run",
"index":"stix",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_in_elasticsearch",
"tasks":0,
"timestamp":"2019-11-29T15:34:10.443004Z"
}
{
"event":"sync-data.done for entities",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_in_elasticsearch",
"timestamp":"2019-11-29T15:34:10.443531Z",
"took":"1.677e-05m",
"total":0
}

The report file is a JSON object:

{
"entity_ids_to_delete": [],
"entity_ids_to_reindex": [],
"entity_types": [
"relation"
],
"n_missing": 0,
"n_not_deleted": 0,
"n_ok": 0,
"n_outdated": 0,
"n_strange": 0
}

Field

Type

Value

entity_ids_to_delete

Array

Lists IDs of all the entities in the Elasticsearch indexing and search database that do not exist also in the PostgreSQL main database.

Entity IDs are alphanumeric strings corresponding to the data.id and id field values in the entity JSON structure.

These entities are deleted from the indexing and search database.

entity_ids_to_reindex

Array

Lists IDs of all the entities that need re-indexing in the Elasticsearch indexing and search database because of data inconsistencies with regard to the same data in the PostgreSQL main database.

Entity IDs are alphanumeric strings corresponding to the data.id and id field values in the entity JSON structure.

These entities are synced in Elasticsearch by retrieving the corresponding data from PostgreSQL.

n_ok

Integer

Returns the total number of in-sync entities between the two databases.

n_outdated

Integer

Returns the total number of entities whose last_updated_at timestamp is older in Elasticsearch than in PostgreSQL.

These entities are synced in Elasticsearch by retrieving the corresponding data from PostgreSQL.

n_missing

Integer

Returns the total number of entities that are stored in PostgreSQL, but not in Elasticsearch.

These entities are synced in Elasticsearch by retrieving the corresponding data from PostgreSQL.

n_not_deleted

Integer

Returns the total number of entities that are stored in Elasticsearch, but not in PostgreSQL.
These are entities that were deleted from the main database, and that were left behind in the indexing and search database.

These entities are deleted from the Elasticsearch indexing and search database.

n_strange

Integer

Returns the total number of entities whose last_updated_at timestamp is more recent in Elasticsearch than in PostgreSQL.
This is an anomaly that should not occur under normal operational circumstances.

These entities are synced in Elasticsearch by retrieving the corresponding data from PostgreSQL.

Example
eiq-platform search sync-data --check-results-file result.txt --dry-run --entity-types indicator,ttp,relation