Backup guidelines CentOS#

Back up platform data to restore it when you upgrade or reinstall the platform, and as a disaster recovery mitigation strategy.

As a best practice, we recommend you implement a backup strategy for platform data.

Backups are crucial in scenarios such as:

Platform upgrade to a newer release.
Platform reinstallation.
Disaster recovery.

Before starting a platform backup, verify the following points:

No users are signed in to the platform.
No read/write activity is in progress on the PostgreSQL, Elasticsearch or Neo4j databases.
No pending queues: this means that no read/write activity is in progress on the Redis database.
No running Celery tasks.
The platform is not running.

The core data you should include in a platform backup are:

Configuration files.
Databases.
Any SSL keys associated with the platform and its dependencies.

The exact steps to back up platform data may vary, depending on your specific environment hardware, software, configuration, and setup.

Therefore, consider the following as a set of generic guidelines on backing up platform data.

Platform configuration#

The following is a full list of configuration file locations. Back up these files before performing an upgrade:

# General
- /etc/environment
- /etc/yum.repos.d/eclecticiq-ic.repo
# Platform
- /etc/eclecticiq/platform_settings.py
- /etc/eclecticiq/opentaxii.yml
- /etc/eclecticiq/proxy_url

- /etc/default/eclecticiq-platform
- /etc/default/eclecticiq-platform-backend-worker-outgoing-transports
- /etc/default/eclecticiq-platform-backend-worker-common
- /etc/default/eclecticiq-platform-backend-worker-outgoing-transports-priority
- /etc/default/eclecticiq-platform-backend-worker-discovery
- /etc/default/eclecticiq-platform-backend-worker-reindexing
- /etc/default/eclecticiq-platform-backend-worker-discovery-priority
- /etc/default/eclecticiq-platform-backend-worker-retention-policies
- /etc/default/eclecticiq-platform-backend-worker-entity-rules-priority
- /etc/default/eclecticiq-platform-backend-worker-retention-policies-priority
- /etc/default/eclecticiq-platform-backend-worker-extract-rules-priority
- /etc/default/eclecticiq-platform-backend-worker-utilities
- /etc/default/eclecticiq-platform-backend-worker-incoming-transports
- /etc/default/eclecticiq-platform-backend-worker-utilities-priority
- /etc/default/eclecticiq-platform-backend-worker-incoming-transports-priority

- /lib/systemd/system/eclecticiq-platform-backend-graphindex.service
- /lib/systemd/system/eclecticiq-platform-backend-ingestion.service
- /lib/systemd/system/[email protected]
- /lib/systemd/system/eclecticiq-platform-backend-opentaxii.service
- /lib/systemd/system/eclecticiq-platform-backend-scheduler.service
- /lib/systemd/system/eclecticiq-platform-backend-searchindex.service
- /lib/systemd/system/eclecticiq-platform-backend-services.service
- /lib/systemd/system/eclecticiq-platform-backend-web.service
- /lib/systemd/system/[email protected]
- /lib/systemd/system/eclecticiq-platform-backend-workers.service
- /lib/systemd/system/eclecticiq-secrets-setter.service

- /opt/eclecticiq-platform-backend/alembic.ini

# ElasticSearch
- /etc/eclecticiq-elasticsearch/elasticsearch.yml
- /etc/eclecticiq-elasticsearch/jvm.options
- /etc/eclecticiq-elasticsearch/log4j2.properties
- /etc/elasticsearch/elasticsearch-plugins.example.yml
- /etc/elasticsearch/elasticsearch.keystore
- /etc/elasticsearch/elasticsearch.yml
- /etc/elasticsearch/jvm.options
- /etc/elasticsearch/log4j2.properties
- /etc/elasticsearch/role_mapping.yml
- /etc/elasticsearch/roles.yml
- /etc/elasticsearch/users
- /etc/elasticsearch/users_roles
- /etc/systemd/system/elasticsearch.service.d/20-eclecticiq.conf
- /etc/sysconfig/elasticsearch

- /media/elasticsearch/nodes
- /media/elasticsearch/tmp

# Kibana
- /etc/eclecticiq-kibana/kibana.yml
- /etc/kibana/kibana.yml
- /etc/kibana/node.options
- /etc/systemd/system/kibana.service.d/20-eclecticiq_es_hosts.conf

# Logstash
- /etc/logstash/logstash.yml
- /etc/logstash/jvm.options
- /etc/logstash/log4j2.properties
- /etc/logstash/logstash-sample.conf
- /etc/logstash/pipelines.yml
- /etc/logstash/startup.options
- /etc/logstash/conf.d/eclecticiq.conf
- /etc/default/logstash
- /etc/systemd/system/logstash.service.d/20-eclecticiq-env-vars.conf

# Neo4j
- /etc/eclecticiq-neo4j/neo4j.conf
- /etc/eclecticiq-neo4j/template-neo4j.conf

# Neo4jbatcher, together with platform conf.
- /etc/eclecticiq-neo4jbatcher/neo4jbatcher.conf
- /lib/systemd/system/eclecticiq-neo4jbatcher.service
- /etc/systemd/system/eclecticiq-neo4jbatcher.service.d/20-eclecticiq.conf

# statsite
- /opt/statsite/etc/statsite.conf
- /opt/statsite/etc/elasticsearch_template.json
- /opt/statsite/etc/statsite.service
- /etc/systemd/system/statsite.service.d/override.conf

# Redis
- /etc/eclecticiq-redis/redis.conf
- /etc/eclecticiq-redis/local.conf
# - /etc/redis/redis.conf
- /etc/systemd/system/redis.service.d/20-eclecticiq_data_dir.conf
- /etc/sysctl.d/10-eclecticiq_overcommit_memory.conf

# Nginx
- /etc/eclecticiq-nginx/locations.conf.d/neo4jbatcher.conf
- /etc/eclecticiq-nginx/locations.conf.d/platform-frontend.conf
- /etc/eclecticiq-nginx/locations.conf.d/tip-backend.conf
- /etc/eclecticiq-nginx/nginx.centos.conf
- /etc/eclecticiq-nginx/nginx.common.conf
- /etc/eclecticiq-nginx/nginx.conf
- /etc/eclecticiq-nginx/nginx.rhel.conf
- /etc/eclecticiq-nginx/nginx.ubuntu.conf
- /etc/eclecticiq-nginx/proxy_params.conf
- /etc/eclecticiq-nginx/sites.conf.d/eclecticiq-default.conf
- /etc/systemd/system/nginx.service.d/20-eclecticiq.conf

# Postgres
- /etc/eclecticiq-postgres/eclecticiq-postgres.conf
- /etc/eclecticiq-postgres/listen-addresses.conf
- /etc/eclecticiq-postgres/pg_hba.conf
- /etc/systemd/system/postgresql-11.service.d/eclecticiq-postgres.conf
- /media/pgsql/11/data/postgresql.conf

# Postfix
- /etc/postfix/main.cf

# Logrotate
- /etc/logrotate.d/eclecticiq

# Rsyslog
- /opt/eclecticiq-rsyslog-forwarder
- /etc/rsyslog.d/eclecticiq.conf

Platform databases#

The platform uses the following databases:

Database	Description
PostgreSQL	Intel database. It stores all the information about entities, observables, relationships, taxonomy, and so on.
Elasticsearch	Indexing and search database. It stores document data and search queries as JSON.
Neo4j	Graph database. It stores node, edge, and property information to build and represent data relationships.

It is best to include all databases in your backup strategy.

If for any reason it is not possible, make sure that at least the PostgreSQL database is backed up on a regular basis.

Backup guidelines#

Shut down the platform#

Note

If you back up the PostgreSQL database by creating an SQL dump with the pg_dump or the pg_dumpall commands, you do not need to stop the platform backend services.

It is possible to run pg_dump or pg_dumpall while the database is in use.

If you are shutting down the platform before performing an upgrade or a database backup, stop platform components in the order described below to make sure that:

No Celery tasks are left over in the queue.
No read/write activity is in progress in Redis.

This prevents hanging tasks in the queue from interfering with the upgrade or backup procedures.

To stop systemd-managed platform services through the command line:
```
systemctl stop eclecticiq-platform-backend-services
```

Check Celery queues. They should be empty:

# Launch redis-cli
$ redis-cli

$ > llen enrichers

$ > llen integrations

$ > llen priority_enrichers

$ > llen priority_providers

$ > llen priority_utilities

$ > llen providers

$ > llen reindexing

$ > llen utilities

To delete a non-empty Celery queue:

# Launch redis-cli
$ redis-cli

# Delete the entity ingestion queue
$ > del "queue:ingestion:inbound"

# Delete the graph ingestion queue
$ > del "queue:graph:inbound"

# Delete the search indexing queue
$ > del "queue:search:inbound"

Stop the remaining Celery workers:

systemctl stop eclecticiq-platform-backend-worker*.service

Check that there are no leftover PID files
- First, make sure that no platform-related PID is running:
```
ps auxf | grep beat
```
- If any platform-related PIDs are running, terminate them with the kill command.
- Manually remove any leftover PID files with the rm command.
  
  Usually, PID files are stored in /var/run.
As a final inspection, you may want to get a snapshot overview of the currently running processes:
```
ps auxf
```
If you suspect that a specific process may be hanging or that it may still be running, look for it by searching for its name:
```
ps auxf | grep ${process_name}
```

Back up the PostgreSQL database#

You can back up a PostgreSQL database in several ways:

SQL dump #

Note

If you back up the PostgreSQL database by creating an SQL dump with the pg_dump or the pg_dumpall commands, you do not need to stop the platform backend services.

It is possible to run pg_dump or pg_dumpall while the database is in use.

The quickest way to backup a PostgreSQL database is to perform an SQL dump.

To generate a SQL dump of the database, run the pg_dump or the pg_dumpall command.

To create an .sql dump file, run the following command:

sudo -i -u postgres /usr/pgsql-11/bin/pg_dumpall > /tmp/db_dump.sql

To restore a backed up database from an .sql dump file, run the following command:
```
psql -U postgres < ./db_dump.sql
```

Note

About restoring an SQL dump

When restoring a backup copy to an empty cluster, set postgres as a user.
Make sure the selected user for the restore operation has root-level access rights.
An easy way to restore a database dump created with pg_dumpall is to load it to an empty cluster.
If you try to load the backup copy to an existing copy of the same database, the process may return error messages because it tries to create relations that already exist in the target database.

File system level backup creates backup copies of the PostgreSQL file used to store the database data corpus. The files to back up are stored in the database cluster specified with the initdb command during database storage location initialization.

This approach requires database downtime.

To back up a database by archiving the files PostgreSQL uses to store data in the database:

Go to the PostgreSQL data directory, typically ../pgsql/${version}/data/:
```
cd /var/lib/pgsql/11/data/
```
Create a .tar archive containing the entire content of the data directory:
```
tar -cvf db_backup.tar *
```

To restore a database by copying the files PostgreSQL uses to store data in the database:

Copy the .tar archive you just created to the target environment.
In the target environment, delete any content in the data directory:
```
rm -rf /var/lib/pgsql/11/data/*
```
Go to the PostgreSQL data directory where you want to restore the database data, typically ../pgsql/${version}/data/:
```
cd /var/lib/pgsql/11/data/
```
Decompress the .tar archive:
```
tar -xvf db_backup.tar
```

Continuous archiving #

[Continuous archiving]https://www.postgresql.org/docs/11/continuous-archiving.html allows backing up and restoring a snapshot of the database in the state it was at a given point in time.

It combines file system level backup with write-ahead logging (WAL).

These are the main steps you need to carry out to set up this backup strategy:

Configure the write-ahead log behavior.
Usually, a section in the postgresql.conf file contains the WAL parameters you need to define.
For example, you would typically set wal_level to archive, archive_mode to on, and you may wish to set an archive_command, as well as an archive_timeout.
Perform a base backup by running pg_basebackup.
As an alternative, you can manage backups with pgbarman, an open source tool you can download here.

Back up the Elasticsearch database#

The Elasticsearch official documentation includes sections with explanations of key concepts, as well as step-by-step tutorials on the following topics:

Back up an Elasticsearch cluster
Use the snapshot API to create snapshots of an index or a whole cluster.

Elasticsearch database backup example#

Key and parameter names, as well as values in the code examples are placeholders. Replace them with appropriate names and values, depending on your environment and system configuration.

Example of an Elasticsearch database backup procedure:

Create a directory to save the Elasticsearch backup/snapshot to.

Make sure the elasticsearch user has read and write access to it.

cd /db-backup

# Create Elasticsearch backup dir
mkdir elasticsearch

# Make sure the 'elasticsearch' user can access it
chown elasticsearch:elasticsearch elasticsearch/

Add the database backup path to the /etc/eclecticiq-elasticsearch/elasticsearch.yml configuration file:
```
path.repo: ["/db-backup/elasticsearch"]
```
Restart Elasticsearch:
```
systemctl restart elasticsearch
```

Define a repository for the backup:

curl -X PUT "http://localhost:9200/_snapshot/backup"
     -H "Content-Type: application/json"
     -d '{"type": "fs","settings":{"location":"/db-backup/elasticsearch"}}'

# copy-paste version:
$ curl -X PUT "http://localhost:9200/_snapshot/backup" -H "Content-Type: application/json" -d '{"type": "fs","settings":{"location":"/db-backup/elasticsearch"}}'

# Example of a successful response
{"acknowledged": true}

Verify that the newly created repository is correctly defined:

curl -X GET "localhost:9200/_snapshot/_all?pretty=true"
{
  "backup" : {
    "type" : "fs",
    "settings" : {
      "location" : "/db-backup/elasticsearch"
    }
  }
}

Make a snapshot of the Elasticsearch database:

curl -X PUT "http://localhost:9200/_snapshot/backup/snapshot-201707281255?wait_for_completion=true"

# Check if the backup was successful:
ls -al /db-backup/elasticsearch/*

# Example of a successful Elasticsearch database backup:
-rw-r--r--.  1 elasticsearch elasticsearch   39 Jul 28 10:58 /db-backup/elasticsearch/index
-rw-r--r--.  1 elasticsearch elasticsearch 4273 Jul 28 10:55 /db-backup/elasticsearch/meta-snapshot-201707281255.dat
-rw-r--r--.  1 elasticsearch elasticsearch  907 Jul 28 10:58 /db-backup/elasticsearch/snap-snapshot-201707281255.dat

/db-backup/elasticsearch/indices:
total 8
drwxr-xr-x. 41 elasticsearch elasticsearch 4096 Jul 28 10:55 .
drwxr-xr-x.  3 elasticsearch elasticsearch 4096 Jul 28 10:58 ..
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:57 audit_v2
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 documents_v2
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 draft-entities_v4
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:56 extracts_v2
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 -kibana
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:57 logstash-2017.06.23
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 logstash-2017.06.29
...
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 logstash-2017.07.27
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:56 logstash-2017.07.28
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:57 .meta_v2
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 .scripts
drwxr-xr-x.  3 elasticsearch elasticsearch   51 Jul 28 10:58 stix_v4

To restore a backed up snapshot:

curl -X POST 'http://localhost:9200/_snapshot/backup/snapshot-201707281255/_restore'

To monitor the snapshot restore process:

curl -X GET 'http://localhost:9200/_recovery/'

Back up the Neo4j database#

The Neo4j official documentation includes sections with explanations of backup concepts and procedures:

Configure Neo4j to enable backups
Back up the Neo4jdatabase

Neo4j Community does not include a backup tool.

The following examples provide simple suggestions to manually start a backup operation.

Before starting backing up data, stop all backend services, including Neo4j.

To stop systemd-managed platform services through the command line:
```
systemctl stop eclecticiq-platform-backend-services
```
Make sure Neo4j has stopped:
```
systemctl status neo4j
```
Once Neo4j is stopped, browse to a working directory with enough free space to store the Neo4j graph.db database backup copy.

Avoid using /tmp.

For example, create the backup in /media/neo4j:
```
tar graph.db-backup-${backup-date}.tar /media/neo4j/databases/graph.db/
```
The Neo4j data location is defined in the /etc/eclecticiq-neo4j/neo4j.conf configuration file:
```
# Name of the active database to mount
dbms.active_database=graph.db

# Path to the data dir containing graph.db
dbms.directories.data=${data_dir}  # ex.: neo4j/data
```

Restart Neo4j#

Start all backend services, including Neo4j:

To start systemd-managed platform services through the command line:
```
systemctl start eclecticiq-platform-backend-services
```
Run this command to start platform services, and to start the platform instance through the command line.
Make sure that Neo4j is running:
```
systemctl status neo4j
```

Data recovery#

To restore backed up data, follow the standard recommendations and procedures for PostgreSQL, Elasticsearch, and Neo4j:

Reindex Elasticsearch#

Before you start (re)indexing or migrating Elasticsearch, do the following:

Make sure that ingestion, indexing, and core platform processes are not running.

If they are running, stop them:
```
systemctl stop eclecticiq-platform-backend-services
```
Specify the platform settings environment variable by exporting it.
Run eiq-platform search reindex to index or to reindex the Elasticsearch database. search reindex takes one argument: index-name.

Its value is the name of an existing Elasticsearch index.

Default Elasticsearch indices for the platform:
- audit
- documents
- draft-entities*
- extracts*
- logstash*
- statsite*
  
  (This index works with both StatsD and Statsite)
- stix*
search reindex copies data from the PostgreSQL database to the Elasticsearch database, which is then indexed.

sudo -i -u eclecticiq EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py /opt/eclecticiq-platform-backend/bin/eiq-platform search reindex --index=${name_of_the_index}

Migrate Elasticsearch indices#

To make sure you are applying the latest Elasticsearch schema, migrate Elasticsearch indices.

The migration process is idempotent. It sets up and builds the required/specified indices with aliases and mappings, and it updates the index mapping templates, if necessary.

Make sure that ingestion, indexing, and core platform processes are not running. If they are running, stop them.

To stop systemd-managed platform services through the command line:
```
systemctl stop eclecticiq-platform-backend-services
```
Specify the platform settings environment variable by exporting it.
Run eiq-platform search upgrade to migrate Elasticsearch indices.

The command runs in the background.

In case of an SSH disconnection, the process should keep running normally.

This upgrade action is idempotent:

First, it tries to update the Elasticsearch index mappings in-place.
If it is not possible, it proceeds to reindex the existing Elasticsearch indices.

sudo -i -u eclecticiq EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py /opt/eclecticiq-platform-backend/bin/eiq-platform search upgrade

Index migration log messages, if any, are printed to the terminal.

If no messages are printed to the terminal, the process completed successfully.

Check for failed packages#

During a data restore operation, you may wish to check if the scenario that created the need for a data backup restore also caused some packages to be partially or incorrectly ingested into the platform.

The blobs associated with problematic packages are not marked a successful.

To retrieve a list with these packages, execute the following SQL query against the PostgreSQL database:

SELECT id, processing_status FROM blob WHERE processing_status NOT IN ('success', 'pending');

The query returns all packages whose status is not success or pending, as well as additional information on the packages, when available.

Examine the response to evaluate whether you want to try ingesting the packages again.

Backup guidelines CentOS#

Platform configuration#

Platform databases#

Backup guidelines#

Shut down the platform#

Back up the PostgreSQL database#

SQL dump#

File system level backup#

Continuous archiving#

Back up the Elasticsearch database#

Elasticsearch database backup example#

Back up the Neo4j database#

Restart Neo4j#

Data recovery#

Reindex Elasticsearch#

Migrate Elasticsearch indices#

Check for failed packages#

SQL dump #

File system level backup #

Continuous archiving #