Sync the graph database#
Tip
Neo4j is disabled by default from IC 2.12.0 onward. Graphs use a new graph API based on PostgreSQL and Elasticsearch instead. For more information, see Update platform_settings.py.
For more information, see Update platform_settings.py
Run the eiq-platform graph sync-data
command to sync the Neo4j graph database
with the PostgreSQL main database.
About syncing the graph database#
Over time, the Neo4j graph database may go out of sync with the PostgreSQL main database. The causes can be different: large re-ingestion operations, database migrations following product upgrades, malformed source data, or operational wear and tear.
For example:
An entity may exist in Neo4j, but not in PostgreSQL.
The entity may be loaded in the graph, but the entity data no longer exists in the PostgreSQL database.
An entity may exist in PostgreSQL, but not in Neo4j.
It is not possible to load for the entity in the graph, even though the entity data does exist in the PostgreSQL database.
An entity in Neo4j may have a different timestamp value than the same entity in PostgreSQL.
If you notice that the same entity has different last_updated_at
timestamp values in the graph database and in the main database, run the eiq-platform graph sync-data
command to sync the graph database with the main one.
Sync the graph database#
The eiq-platform graph sync-data
command checks the value of an entity last_updated_at
timestamp field to assess if the entity in the Neo4j graph database is in sync with the same entity in the PostgreSQL main database.
The command iterates over all entities in the Neo4j graph database, and in the PostgreSQL main database.
Since PostgreSQL is the single source of truth for the platform data, any discrepancies are resolved by copying data from PostgreSQL to Neo4j.
Depending on system load and the total amount of entities in the platform, the command can take several hours to complete execution.
The eiq-platform graph sync-data
command has a low impact on system load; therefore, it is safe to run it while the platform is in active use.
Run inside venv#
The eiq-platform
command line interface offers a set of commands and utilities to diagnose and to address common operational issues that may occur over time.
The eiq-platform
command line interface is available in the platform Python virtual environment: activate it and, if necessary, import the platform environment variables.
Open a terminal session.
In the terminal, log in to the platform through SSH:
# The platform user name must have admin access rights ssh ${platform_user_name}@${platform_host_name} # Or: ssh ${platform_user_name}@${platform_ip_address} # Example: ssh [email protected] # Or: ssh [email protected]
Obtain root-level access by running
sudo -i
:# Root-access login shell sudo -i
To access resources as a different user than the currently active one, append
-u
:# Grant the currently logged in user root-level access sudo -i # Grant root-level access to a different user sudo -i -u ${user_name} # Run a command as a different user, with root-level access sudo -i -u ${user_name} ${command} ${options}
Activate a Python virtual environment for the platform:
source /opt/eclecticiq-platform-backend/bin/activate
Run the sync command#
eiq-platform graph sync-data
checks the value of an entity last_updated_at
timestamp field to check if the entity needs syncing.
For more information about the command, append --help
to it, and then press ENTER on your keyboard.
Command#
To start checking if the Neo4j graph database and the PostgreSQL main database are in or out of sync:
eiq-platform graph sync-data
Parameters#
Parameters |
Type |
Description |
Required |
Default |
---|---|---|---|---|
|
String |
You must specify a file name and a path to location to save the report with the summary and the out-of-sync entities. The report file format is JSON ( Optionally, you can save the report also as a plain text file ( |
Yes |
- |
|
Integer |
The command analyzes entities in batches. Specify the maximum amount of entities to group in a batch. Very large batches may slow down the process, or produce timeouts. |
No |
1000 |
|
Date |
Checks only entities whose Allowed formats:
|
No |
- |
|
Boolean |
Checks syncing inconsistencies between PostgreSQL and Neo4j. It returns a report with the results. It does not sync the databases. |
No |
False |
|
String |
To limit the sync check only to the selected entity types, specify one or more entity types. If you specify multiple entity types, separate them with a comma ( The entity type names correspond to the data.type values. Allowed values:
|
No |
- |
|
Boolean |
Includes in the results an overview of the detected differences in PostgreSQL and Neo4j for outdated ( If you turn on this option, the operation may take longer. Use it for debugging purposes. |
No |
False |
Result#
By default, command execution events are output to stdout
:
{
"event":"Starting data check",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_commons",
"timestamp":"2019-11-29T14:51:24.304016Z"
}
{
"event":"Data check finished",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_commons",
"n_missing":0,
"n_not_deleted":0,
"n_ok":0,
"n_outdated":0,
"n_strange":0,
"timestamp":"2019-11-29T14:51:24.558769Z"
}
{
"event":"Writing check_results file",
"filename":"/tmp/graph-sync.txt",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_commons",
"timestamp":"2019-11-29T14:51:24.559070Z"
}
{
"event":"Deleting 0 entities that should not be in Neo4J",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_in_neo4j",
"timestamp":"2019-11-29T14:51:24.559624Z"
}
{
"event":"sync-data.done",
"level":"info",
"logger":"eiq.platform.scripts.sync_data_in_neo4j",
"timestamp":"2019-11-29T14:51:24.559813Z",
"took":"4.327e-06m",
"total":0
}
The report file is a JSON object:
{
"entity_ids_to_delete": [],
"entity_ids_to_reindex": [
"393c0d67-89ac-4a41-9d50-853c9bd0f034"
],
"n_missing": 0,
"n_not_deleted": 0,
"n_ok": 2287,
"n_outdated": 1,
"n_strange": 0
}
Field |
Type |
Value |
---|---|---|
|
Array |
Lists IDs of all the entities in the Neo4j graph database that do not exist also in the PostgreSQL main database. Entity IDs are alphanumeric strings corresponding to the data.id and id field values in the entity JSON structure. These entities are deleted from the graph database. |
|
Array |
Lists IDs of all the entities that need re-indexing in the Neo4j graph database because of data inconsistencies with regard to the same data in the PostgreSQL main database. Entity IDs are alphanumeric strings corresponding to the data.id and id field values in the entity JSON structure. These entities are synced in Neo4j by retrieving the corresponding data from PostgreSQL. |
|
Integer |
Returns the total number of in-sync entities between the two databases. |
|
Integer |
Returns the total number of entities whose These entities are synced in Neo4j by retrieving the corresponding data from PostgreSQL. |
|
Integer |
Returns the total number of entities that are stored in PostgreSQL, but not in Neo4j. These entities are synced in Neo4j by retrieving the corresponding data from PostgreSQL. |
|
Integer |
Returns the total number of entities that are stored in Neo4j, but not in PostgreSQL. These are entities that were deleted from the main database, and that were left behind in the graph database. These entities are deleted from the Neo4j graph database. |
|
Integer |
Returns the total number of entities whose These entities are synced in Neo4j by retrieving the corresponding data from PostgreSQL. |
Example#
eiq-platform graph sync-data --check-results-file result.txt --dry-run --entity-types indicator,ttp