Backup guidelines CentOS#
Back up platform data to restore it when you upgrade or reinstall the platform, and as a disaster recovery mitigation strategy.
As a best practice, we recommend you implement a backup strategy for platform data.
Backups are crucial in scenarios such as:
Platform upgrade to a newer release.
Platform reinstallation.
Disaster recovery.
Before starting a platform backup, verify the following points:
No users are signed in to the platform.
No read/write activity is in progress on the PostgreSQL, Elasticsearch or Neo4j databases.
No pending queues: this means that no read/write activity is in progress on the Redis database.
No running Celery tasks.
The platform is not running.
The core data you should include in a platform backup are:
Configuration files.
Databases.
Any SSL keys associated with the platform and its dependencies.
The exact steps to back up platform data may vary, depending on your specific environment hardware, software, configuration, and setup.
Therefore, consider the following as a set of generic guidelines on backing up platform data.
Platform configuration#
The following is a full list of configuration file locations. Back up these files before performing an upgrade:
# General
- /etc/environment
- /etc/yum.repos.d/eclecticiq-ic.repo
# Platform
- /etc/eclecticiq/platform_settings.py
- /etc/eclecticiq/opentaxii.yml
- /etc/eclecticiq/proxy_url
- /etc/default/eclecticiq-platform
- /etc/default/eclecticiq-platform-backend-worker-outgoing-transports
- /etc/default/eclecticiq-platform-backend-worker-common
- /etc/default/eclecticiq-platform-backend-worker-outgoing-transports-priority
- /etc/default/eclecticiq-platform-backend-worker-discovery
- /etc/default/eclecticiq-platform-backend-worker-reindexing
- /etc/default/eclecticiq-platform-backend-worker-discovery-priority
- /etc/default/eclecticiq-platform-backend-worker-retention-policies
- /etc/default/eclecticiq-platform-backend-worker-entity-rules-priority
- /etc/default/eclecticiq-platform-backend-worker-retention-policies-priority
- /etc/default/eclecticiq-platform-backend-worker-extract-rules-priority
- /etc/default/eclecticiq-platform-backend-worker-utilities
- /etc/default/eclecticiq-platform-backend-worker-incoming-transports
- /etc/default/eclecticiq-platform-backend-worker-utilities-priority
- /etc/default/eclecticiq-platform-backend-worker-incoming-transports-priority
- /lib/systemd/system/eclecticiq-platform-backend-graphindex.service
- /lib/systemd/system/eclecticiq-platform-backend-ingestion.service
- /lib/systemd/system/[email protected]
- /lib/systemd/system/eclecticiq-platform-backend-opentaxii.service
- /lib/systemd/system/eclecticiq-platform-backend-scheduler.service
- /lib/systemd/system/eclecticiq-platform-backend-searchindex.service
- /lib/systemd/system/eclecticiq-platform-backend-services.service
- /lib/systemd/system/eclecticiq-platform-backend-web.service
- /lib/systemd/system/[email protected]
- /lib/systemd/system/eclecticiq-platform-backend-workers.service
- /lib/systemd/system/eclecticiq-secrets-setter.service
- /opt/eclecticiq-platform-backend/alembic.ini
# ElasticSearch
- /etc/eclecticiq-elasticsearch/elasticsearch.yml
- /etc/eclecticiq-elasticsearch/jvm.options
- /etc/eclecticiq-elasticsearch/log4j2.properties
- /etc/elasticsearch/elasticsearch-plugins.example.yml
- /etc/elasticsearch/elasticsearch.keystore
- /etc/elasticsearch/elasticsearch.yml
- /etc/elasticsearch/jvm.options
- /etc/elasticsearch/log4j2.properties
- /etc/elasticsearch/role_mapping.yml
- /etc/elasticsearch/roles.yml
- /etc/elasticsearch/users
- /etc/elasticsearch/users_roles
- /etc/systemd/system/elasticsearch.service.d/20-eclecticiq.conf
- /etc/sysconfig/elasticsearch
- /media/elasticsearch/nodes
- /media/elasticsearch/tmp
# Kibana
- /etc/eclecticiq-kibana/kibana.yml
- /etc/kibana/kibana.yml
- /etc/kibana/node.options
- /etc/systemd/system/kibana.service.d/20-eclecticiq_es_hosts.conf
# Logstash
- /etc/logstash/logstash.yml
- /etc/logstash/jvm.options
- /etc/logstash/log4j2.properties
- /etc/logstash/logstash-sample.conf
- /etc/logstash/pipelines.yml
- /etc/logstash/startup.options
- /etc/logstash/conf.d/eclecticiq.conf
- /etc/default/logstash
- /etc/systemd/system/logstash.service.d/20-eclecticiq-env-vars.conf
# Neo4j
- /etc/eclecticiq-neo4j/neo4j.conf
- /etc/eclecticiq-neo4j/template-neo4j.conf
# Neo4jbatcher, together with platform conf.
- /etc/eclecticiq-neo4jbatcher/neo4jbatcher.conf
- /lib/systemd/system/eclecticiq-neo4jbatcher.service
- /etc/systemd/system/eclecticiq-neo4jbatcher.service.d/20-eclecticiq.conf
# statsite
- /opt/statsite/etc/statsite.conf
- /opt/statsite/etc/elasticsearch_template.json
- /opt/statsite/etc/statsite.service
- /etc/systemd/system/statsite.service.d/override.conf
# Redis
- /etc/eclecticiq-redis/redis.conf
- /etc/eclecticiq-redis/local.conf
# - /etc/redis/redis.conf
- /etc/systemd/system/redis.service.d/20-eclecticiq_data_dir.conf
- /etc/sysctl.d/10-eclecticiq_overcommit_memory.conf
# Nginx
- /etc/eclecticiq-nginx/locations.conf.d/neo4jbatcher.conf
- /etc/eclecticiq-nginx/locations.conf.d/platform-frontend.conf
- /etc/eclecticiq-nginx/locations.conf.d/tip-backend.conf
- /etc/eclecticiq-nginx/nginx.centos.conf
- /etc/eclecticiq-nginx/nginx.common.conf
- /etc/eclecticiq-nginx/nginx.conf
- /etc/eclecticiq-nginx/nginx.rhel.conf
- /etc/eclecticiq-nginx/nginx.ubuntu.conf
- /etc/eclecticiq-nginx/proxy_params.conf
- /etc/eclecticiq-nginx/sites.conf.d/eclecticiq-default.conf
- /etc/systemd/system/nginx.service.d/20-eclecticiq.conf
# Postgres
- /etc/eclecticiq-postgres/eclecticiq-postgres.conf
- /etc/eclecticiq-postgres/listen-addresses.conf
- /etc/eclecticiq-postgres/pg_hba.conf
- /etc/systemd/system/postgresql-11.service.d/eclecticiq-postgres.conf
- /media/pgsql/11/data/postgresql.conf
# Postfix
- /etc/postfix/main.cf
# Logrotate
- /etc/logrotate.d/eclecticiq
# Rsyslog
- /opt/eclecticiq-rsyslog-forwarder
- /etc/rsyslog.d/eclecticiq.conf
Platform databases#
The platform uses the following databases:
Database |
Description |
---|---|
Intel database. It stores all the information about entities, observables, relationships, taxonomy, and so on. |
|
Indexing and search database. It stores document data and search queries as JSON. |
|
Graph database. It stores node, edge, and property information to build and represent data relationships. |
It is best to include all databases in your backup strategy.
If for any reason it is not possible, make sure that at least the PostgreSQL database is backed up on a regular basis.
Backup guidelines#
Shut down the platform#
Note
If you back up the PostgreSQL database
by creating an SQL dump with the
pg_dump
or the pg_dumpall
commands,
you do not need to stop the
platform backend services.
It is possible to run pg_dump
or
pg_dumpall
while the database is in use.
If you are shutting down the platform before performing an upgrade or a database backup, stop platform components in the order described below to make sure that:
No Celery tasks are left over in the queue.
No read/write activity is in progress in Redis.
This prevents hanging tasks in the queue from interfering with the upgrade or backup procedures.
To stop systemd-managed platform services through the command line:
systemctl stop eclecticiq-platform-backend-services
Check Celery queues. They should be empty:
# Launch redis-cli $ redis-cli $ > llen enrichers $ > llen integrations $ > llen priority_enrichers $ > llen priority_providers $ > llen priority_utilities $ > llen providers $ > llen reindexing $ > llen utilities
To delete a non-empty Celery queue:
# Launch redis-cli $ redis-cli # Delete the entity ingestion queue $ > del "queue:ingestion:inbound" # Delete the graph ingestion queue $ > del "queue:graph:inbound" # Delete the search indexing queue $ > del "queue:search:inbound"
Stop the remaining Celery workers:
systemctl stop eclecticiq-platform-backend-worker*.service
Check that there are no leftover PID files
First, make sure that no platform-related PID is running:
ps auxf | grep beat
If any platform-related PIDs are running, terminate them with the kill command.
Manually remove any leftover PID files with the rm command.
Usually, PID files are stored in
/var/run
.
As a final inspection, you may want to get a snapshot overview of the currently running processes:
ps auxf
If you suspect that a specific process may be hanging or that it may still be running, look for it by searching for its name:
ps auxf | grep ${process_name}
Back up the PostgreSQL database#
You can back up a PostgreSQL database in several ways:
SQL dump#
Note
If you back up the PostgreSQL database
by creating an SQL dump with the
pg_dump
or the pg_dumpall
commands,
you do not need to stop the
platform backend services.
It is possible to run pg_dump
or
pg_dumpall
while the database is in use.
The quickest way to backup a PostgreSQL database is to perform an SQL dump.
To generate a SQL dump of the database, run the
pg_dump
or the
pg_dumpall
command.
To create an .sql dump file, run the following command:
sudo -i -u postgres /usr/pgsql-11/bin/pg_dumpall > /tmp/db_dump.sql
To restore a backed up database from an .sql dump file, run the following command:
psql -U postgres < ./db_dump.sql
Note
About restoring an SQL dump
When restoring a backup copy to an empty cluster, set
postgres
as a user.Make sure the selected user for the restore operation has root-level access rights.
An easy way to restore a database dump created with pg_dumpall is to load it to an empty cluster.
If you try to load the backup copy to an existing copy of the same database, the process may return error messages because it tries to create relations that already exist in the target database.
File system level backup#
File system level backup creates backup copies of the PostgreSQL file used to store the database data corpus. The files to back up are stored in the database cluster specified with the initdb command during database storage location initialization.
This approach requires database downtime.
To back up a database by archiving the files PostgreSQL uses to store data in the database:
Go to the PostgreSQL data directory, typically
../pgsql/${version}/data/
:cd /var/lib/pgsql/11/data/
Create a .tar archive containing the entire content of the data directory:
tar -cvf db_backup.tar *
To restore a database by copying the files PostgreSQL uses to store data in the database:
Copy the .tar archive you just created to the target environment.
In the target environment, delete any content in the data directory:
rm -rf /var/lib/pgsql/11/data/*
Go to the PostgreSQL data directory where you want to restore the database data, typically
../pgsql/${version}/data/
:cd /var/lib/pgsql/11/data/
Decompress the .tar archive:
tar -xvf db_backup.tar
Continuous archiving#
[Continuous archiving]https://www.postgresql.org/docs/11/continuous-archiving.html allows backing up and restoring a snapshot of the database in the state it was at a given point in time.
It combines file system level backup with write-ahead logging (WAL).
These are the main steps you need to carry out to set up this backup strategy:
Configure the write-ahead log behavior.
Usually, a section in the
postgresql.conf
file contains the WAL parameters you need to define.For example, you would typically set
wal_level
toarchive
,archive_mode
toon
, and you may wish to set anarchive_command
, as well as anarchive_timeout
.Perform a base backup by running
pg_basebackup
.As an alternative, you can manage backups with
pgbarman
, an open source tool you can download here.
Back up the Elasticsearch database#
The Elasticsearch official documentation includes sections with explanations of key concepts, as well as step-by-step tutorials on the following topics:
Use the snapshot API to create snapshots of an index or a whole cluster.
Elasticsearch database backup example#
Key and parameter names, as well as values in the code examples are placeholders. Replace them with appropriate names and values, depending on your environment and system configuration.
Example of an Elasticsearch database backup procedure:
Create a directory to save the Elasticsearch backup/snapshot to.
Make sure the
elasticsearch
user has read and write access to it.cd /db-backup # Create Elasticsearch backup dir mkdir elasticsearch # Make sure the 'elasticsearch' user can access it chown elasticsearch:elasticsearch elasticsearch/
Add the database backup path to the
/etc/eclecticiq-elasticsearch/elasticsearch.yml
configuration file:path.repo: ["/db-backup/elasticsearch"]
Restart Elasticsearch:
systemctl restart elasticsearch
Define a repository for the backup:
curl -X PUT "http://localhost:9200/_snapshot/backup" -H "Content-Type: application/json" -d '{"type": "fs","settings":{"location":"/db-backup/elasticsearch"}}' # copy-paste version: $ curl -X PUT "http://localhost:9200/_snapshot/backup" -H "Content-Type: application/json" -d '{"type": "fs","settings":{"location":"/db-backup/elasticsearch"}}' # Example of a successful response {"acknowledged": true}
Verify that the newly created repository is correctly defined:
curl -X GET "localhost:9200/_snapshot/_all?pretty=true" { "backup" : { "type" : "fs", "settings" : { "location" : "/db-backup/elasticsearch" } } }
Make a snapshot of the Elasticsearch database:
curl -X PUT "http://localhost:9200/_snapshot/backup/snapshot-201707281255?wait_for_completion=true" # Check if the backup was successful: ls -al /db-backup/elasticsearch/* # Example of a successful Elasticsearch database backup: -rw-r--r--. 1 elasticsearch elasticsearch 39 Jul 28 10:58 /db-backup/elasticsearch/index -rw-r--r--. 1 elasticsearch elasticsearch 4273 Jul 28 10:55 /db-backup/elasticsearch/meta-snapshot-201707281255.dat -rw-r--r--. 1 elasticsearch elasticsearch 907 Jul 28 10:58 /db-backup/elasticsearch/snap-snapshot-201707281255.dat /db-backup/elasticsearch/indices: total 8 drwxr-xr-x. 41 elasticsearch elasticsearch 4096 Jul 28 10:55 . drwxr-xr-x. 3 elasticsearch elasticsearch 4096 Jul 28 10:58 .. drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:57 audit_v2 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 documents_v2 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 draft-entities_v4 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:56 extracts_v2 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 -kibana drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:57 logstash-2017.06.23 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 logstash-2017.06.29 ... drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 logstash-2017.07.27 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:56 logstash-2017.07.28 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:57 .meta_v2 drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 .scripts drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 stix_v4
To restore a backed up snapshot:
curl -X POST 'http://localhost:9200/_snapshot/backup/snapshot-201707281255/_restore'
To monitor the snapshot restore process:
curl -X GET 'http://localhost:9200/_recovery/'
Back up the Neo4j database#
The Neo4j official documentation includes sections with explanations of backup concepts and procedures:
Neo4j Community does not include a backup tool.
The following examples provide simple suggestions to manually start a backup operation.
Before starting backing up data, stop all backend services, including Neo4j.
To stop systemd-managed platform services through the command line:
systemctl stop eclecticiq-platform-backend-services
Make sure Neo4j has stopped:
systemctl status neo4j
Once Neo4j is stopped, browse to a working directory with enough free space to store the Neo4j
graph.db
database backup copy.Avoid using
/tmp
.For example, create the backup in
/media/neo4j
:tar graph.db-backup-${backup-date}.tar /media/neo4j/databases/graph.db/
The Neo4j data location is defined in the
/etc/eclecticiq-neo4j/neo4j.conf
configuration file:# Name of the active database to mount dbms.active_database=graph.db # Path to the data dir containing graph.db dbms.directories.data=${data_dir} # ex.: neo4j/data
Restart Neo4j#
Start all backend services, including Neo4j:
To start
systemd
-managed platform services through the command line:systemctl start eclecticiq-platform-backend-services
Run this command to start platform services, and to start the platform instance through the command line.
Make sure that Neo4j is running:
systemctl status neo4j
Data recovery#
To restore backed up data, follow the standard recommendations and procedures for PostgreSQL, Elasticsearch, and Neo4j:
Reindex Elasticsearch#
Before you start (re)indexing or migrating Elasticsearch, do the following:
Make sure that ingestion, indexing, and core platform processes are not running.
If they are running, stop them:
systemctl stop eclecticiq-platform-backend-services
Specify the platform settings environment variable by exporting it.
Run
eiq-platform search reindex
to index or to reindex the Elasticsearch database.search reindex
takes one argument:index-name
.Its value is the name of an existing Elasticsearch index.
Default Elasticsearch indices for the platform:
audit
documents
draft-entities*
extracts*
logstash*
statsite*
(This index works with both StatsD and Statsite)
stix*
search reindex
copies data from the PostgreSQL database to the Elasticsearch database, which is then indexed.
sudo -i -u eclecticiq EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py /opt/eclecticiq-platform-backend/bin/eiq-platform search reindex --index=${name_of_the_index}
Migrate Elasticsearch indices#
To make sure you are applying the latest Elasticsearch schema, migrate Elasticsearch indices.
The migration process is idempotent. It sets up and builds the required/specified indices with aliases and mappings, and it updates the index mapping templates, if necessary.
Make sure that ingestion, indexing, and core platform processes are not running. If they are running, stop them.
To stop systemd-managed platform services through the command line:
systemctl stop eclecticiq-platform-backend-services
Specify the platform settings environment variable by exporting it.
Run
eiq-platform search upgrade
to migrate Elasticsearch indices.The command runs in the background.
In case of an SSH disconnection, the process should keep running normally.
This upgrade action is idempotent:
First, it tries to update the Elasticsearch index mappings in-place.
If it is not possible, it proceeds to reindex the existing Elasticsearch indices.
sudo -i -u eclecticiq EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py /opt/eclecticiq-platform-backend/bin/eiq-platform search upgrade
Index migration log messages, if any, are printed to the terminal.
If no messages are printed to the terminal, the process completed successfully.
Check for failed packages#
During a data restore operation, you may wish to check if the scenario that created the need for a data backup restore also caused some packages to be partially or incorrectly ingested into the platform.
The blobs associated with problematic packages are not marked a successful.
To retrieve a list with these packages, execute the following SQL query against the PostgreSQL database:
SELECT id, processing_status FROM blob WHERE processing_status NOT IN ('success', 'pending');
The query returns all packages whose status is not success
or pending
, as well as additional information on the
packages, when available.
Examine the response to evaluate whether you want to try ingesting the packages again.