Backup guidelines CentOS
Back up platform data to restore it when you upgrade or reinstall the platform, and as a disaster recovery mitigation strategy.
As a best practice, we recommend you implement a backup strategy for platform data.
Backups are crucial in scenarios such as:
Platform upgrade to a newer release.
Platform reinstallation.
Disaster recovery.
Before starting a platform backup, verify the following points:
No users are signed in to the platform.
No read/write activity is in progress on the PostgreSQL, Elasticsearch or Neo4j databases.
No pending queues: this means that no read/write activity is in progress on the Redis database.
No running Celery tasks.
The platform is not running.
The core data you should include in a platform backup are:
Configuration files.
Databases.
Any SSL keys associated with the platform and its dependencies.
The exact steps to back up platform data may vary, depending on your specific environment hardware, software, configuration, and setup.
Therefore, consider the following as a set of generic guidelines on backing up platform data.
Platform configuration
The following is a full list of configuration file locations. Back up these files before performing an upgrade:
# General
-
/
etc
/
environment
# Platform
-
/
etc
/
eclecticiq
/
platform_settings.py
-
/
etc
/
eclecticiq
/
opentaxii.yml
-
/
etc
/
eclecticiq
/
proxy_url
-
/
etc
/
default
/
eclecticiq
-
platform
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
outgoing
-
transports
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
common
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
outgoing
-
transports
-
priority
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
discovery
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
reindexing
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
discovery
-
priority
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
retention
-
policies
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
entity
-
rules
-
priority
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
retention
-
policies
-
priority
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
extract
-
rules
-
priority
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
utilities
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
incoming
-
transports
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
utilities
-
priority
-
/
etc
/
default
/
eclecticiq
-
platform
-
backend
-
worker
-
incoming
-
transports
-
priority
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
graphindex.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
ingestion.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
opentaxii.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
scheduler.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
searchindex.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
services.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
web.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
platform
-
backend
-
workers.service
-
/
lib
/
systemd
/
system
/
eclecticiq
-
secrets
-
setter.service
-
/
opt
/
eclecticiq
-
platform
-
backend
/
alembic.ini
# ElasticSearch
-
/
etc
/
eclecticiq
-
elasticsearch
/
elasticsearch.yml
-
/
etc
/
eclecticiq
-
elasticsearch
/
jvm.options
-
/
etc
/
eclecticiq
-
elasticsearch
/
log4j2.properties
-
/
etc
/
elasticsearch
/
elasticsearch
-
plugins.example.yml
-
/
etc
/
elasticsearch
/
elasticsearch.keystore
-
/
etc
/
elasticsearch
/
elasticsearch.yml
-
/
etc
/
elasticsearch
/
jvm.options
-
/
etc
/
elasticsearch
/
log4j2.properties
-
/
etc
/
elasticsearch
/
role_mapping.yml
-
/
etc
/
elasticsearch
/
roles.yml
-
/
etc
/
elasticsearch
/
users
-
/
etc
/
elasticsearch
/
users_roles
-
/
etc
/
systemd
/
system
/
elasticsearch.service.d
/
20
-
eclecticiq.conf
-
/
etc
/
sysconfig
/
elasticsearch
-
/
media
/
elasticsearch
/
nodes
-
/
media
/
elasticsearch
/
tmp
# Kibana
-
/
etc
/
eclecticiq
-
kibana
/
kibana.yml
-
/
etc
/
kibana
/
kibana.yml
-
/
etc
/
kibana
/
node.options
-
/
etc
/
systemd
/
system
/
kibana.service.d
/
20
-
eclecticiq_es_hosts.conf
# Logstash
-
/
etc
/
logstash
/
logstash.yml
-
/
etc
/
logstash
/
jvm.options
-
/
etc
/
logstash
/
log4j2.properties
-
/
etc
/
logstash
/
logstash
-
sample.conf
-
/
etc
/
logstash
/
pipelines.yml
-
/
etc
/
logstash
/
startup.options
-
/
etc
/
logstash
/
conf.d
/
eclecticiq.conf
-
/
etc
/
default
/
logstash
-
/
etc
/
systemd
/
system
/
logstash.service.d
/
20
-
eclecticiq
-
env
-
vars
.conf
# Neo4j
-
/
etc
/
eclecticiq
-
neo4j
/
neo4j.conf
-
/
etc
/
eclecticiq
-
neo4j
/
template
-
neo4j.conf
# Neo4jbatcher, together with platform conf.
-
/
etc
/
eclecticiq
-
neo4jbatcher
/
neo4jbatcher.conf
-
/
lib
/
systemd
/
system
/
eclecticiq
-
neo4jbatcher.service
-
/
etc
/
systemd
/
system
/
eclecticiq
-
neo4jbatcher.service.d
/
20
-
eclecticiq.conf
# statsite
-
/
opt
/
statsite
/
etc
/
statsite.conf
-
/
opt
/
statsite
/
etc
/
elasticsearch_template.json
-
/
opt
/
statsite
/
etc
/
statsite.service
-
/
etc
/
systemd
/
system
/
statsite.service.d
/
override.conf
# Redis
-
/
etc
/
eclecticiq
-
redis
/
redis.conf
-
/
etc
/
eclecticiq
-
redis
/
local.conf
-
/
etc
/
redis
/
redis.conf
# Nginx
-
/
etc
/
eclecticiq
-
nginx
/
locations.conf.d
/
neo4jbatcher.conf
-
/
etc
/
eclecticiq
-
nginx
/
locations.conf.d
/
platform
-
frontend.conf
-
/
etc
/
eclecticiq
-
nginx
/
locations.conf.d
/
tip
-
backend.conf
-
/
etc
/
eclecticiq
-
nginx
/
nginx.centos.conf
-
/
etc
/
eclecticiq
-
nginx
/
nginx.common.conf
-
/
etc
/
eclecticiq
-
nginx
/
nginx.conf
-
/
etc
/
eclecticiq
-
nginx
/
nginx.rhel.conf
-
/
etc
/
eclecticiq
-
nginx
/
nginx.ubuntu.conf
-
/
etc
/
eclecticiq
-
nginx
/
proxy_params.conf
-
/
etc
/
eclecticiq
-
nginx
/
sites.conf.d
/
eclecticiq
-
default.conf
-
/
etc
/
systemd
/
system
/
nginx.service.d
/
20
-
eclecticiq.conf
# Postgres
-
/
etc
/
eclecticiq
-
postgres
/
eclecticiq
-
postgres.conf
-
/
etc
/
eclecticiq
-
postgres
/
listen
-
addresses.conf
-
/
etc
/
eclecticiq
-
postgres
/
pg_hba.conf
-
/
media
/
pgsql
/
11
/
data
/
postgresql.conf
# Postfix
-
/
etc
/
postfix
/
main.cf
# Logrotate
-
/
etc
/
logrotate.d
/
eclecticiq
# Rsyslog
-
/
opt
/
eclecticiq
-
rsyslog
-
forwarder
-
/
etc
/
rsyslog.d
/
eclecticiq.conf
Platform databases
The platform uses the following databases:
Database |
Description |
Intel database. It stores all the information about entities, observables, relationships, taxonomy, and so on. |
|
Indexing and search database. It stores document data and search queries as JSON. |
|
Graph database. It stores node, edge, and property information to build and represent data relationships. |
It is best to include all databases in your backup strategy.
If for any reason it is not possible, make sure that at least the PostgreSQL database is backed up on a regular basis.
Backup guidelines
Shut down the platform
If you back up the PostgreSQLdatabase by creating an SQL dump with the pg_dump or the pg_dumpall commands, you do not need to stop the platform backend services.
It is possible to run pg_dump or pg_dumpall while the database is in use.
If you are shutting down the platform before performing an upgrade or a database backup, stop platform components in the order described below to make sure that:
No Celery tasks are left over in the queue.
No read/write activity is in progress in Redis.
This prevents hanging tasks in the queue from interfering with the upgrade or backup procedures.
To stop systemd-managed platform services through the command line:
systemctl stop eclecticiq-platform-backend-services
Check Celery queues. They should be empty:
# Launch redis-cli
$ redis-cli
$ > llen enrichers
$ > llen integrations
$ > llen priority_enrichers
$ > llen priority_providers
$ > llen priority_utilities
$ > llen providers
$ > llen reindexing
$ > llen utilities
To delete a non-empty Celery queue:
# Launch redis-cli
$ redis-cli
# Delete the entity ingestion queue
$ > del
"queue:ingestion:inbound"
# Delete the graph ingestion queue
$ > del
"queue:graph:inbound"
# Delete the search indexing queue
$ > del
"queue:search:inbound"
Stop the remaining Celery workers:
systemctl stop eclecticiq-platform-backend-worker*.service
Check that there are no leftover PID files
First, make sure that no platform-related PID is running:
ps
auxf |
grep
beat
If any platform-related PIDs are running, terminate them with the kill command.
Manually remove any leftover PID files with the rm command.
Usually, PID files are stored in /var/run.
As a final inspection, you may want to get a snapshot overview of the currently running processes:
ps
auxf
If you suspect that a specific process may be hanging or that it may still be running, look for it by searching for its name:
ps
auxf |
grep
${process_name}
Back up the PostgreSQL database
You can back up a PostgreSQL database in several ways:
SQL dump (recommended)
File system level backup
Continuous archiving backup
SQL dump
If you back up the PostgreSQLdatabase by creating an SQL dump with the pg_dump or the pg_dumpall commands, you do not need to stop the platform backend services.
It is possible to run pg_dump or pg_dumpall while the database is in use.
The quickest way to backup a PostgreSQL database is to perform an SQL dump.
To generate a SQL dump of the database, run the pg_dump or the pg_dumpall command.
To create an .sql dump file, run the following command:
sudo
-i -u postgres
/usr/pgsql-11/bin/pg_dumpall
>
/tmp/db_dump
.sql
To restore a backed up database from an .sql dump file, run the following command:
psql -U postgres < .
/db_dump
.sql
About restoring an SQL dump
When restoring a backup copy to an empty cluster, set postgres as a user.
Make sure the selected user for the restore operation has root-level access rights.
An easy way to restore a database dump created with pg_dumpall is to load it to an empty cluster.
If you try to load the backup copy to an existing copy of the same database, the process may return error messages because it tries to create relations that already exist in the target database.
File system level backup
File system level backup creates backup copies of the PostgreSQL file used to store the database data corpus.
The files to back up are stored in the database cluster specified with the initdb command during database storage location initialization.
This approach requires database downtime.
To back up a database by archiving the files PostgreSQL uses to store data in the database:
Go to the PostgreSQL data directory, typically ../pgsql/${version}/data/:
cd
/var/lib/pgsql/11/data/
Create a .tar archive containing the entire content of the data directory:
tar
-cvf db_backup.
tar
*
To restore a database by copying the files PostgreSQL uses to store data in the database:
Copy the .tar archive you just created to the target environment.
In the target environment, delete any content in the data directory:
rm
-rf
/var/lib/pgsql/11/data/
*
Go to the PostgreSQL data directory where you want to restore the database data, typically ../pgsql/${version}/data/:
cd
/var/lib/pgsql/11/data/
Decompress the .tar archive:
tar
-xvf db_backup.
tar
Continuous archiving
Continuous archiving allows backing up and restoring a snapshot of the database in the state it was at a given point in time.
It combines file system level backup with write-ahead logging (WAL).
These are the main steps you need to carry out to set up this backup strategy:
Configure the write-ahead log behavior.
Usually, a section in the postgresql.conf file contains the WAL parameters you need to define.
For example, you would typically set wal_level to archive, archive_mode to on, and you may wish to set an archive_command, as well as an archive_timeout.Perform a base backup by running pg_basebackup.
As an alternative, you can manage backups with pgbarman, an open source tool you can download here.
Back up the Elasticsearch database
The Elasticsearch official documentation includes sections with explanations of key concepts, as well as step-by-step tutorials on the following topics:
Use the snapshot API to create snapshots of an index or a whole cluster.
Elasticsearch database backup example
Key and parameter names, as well as values in the code examples are placeholders.
Replace them with appropriate names and values, depending on your environment and system configuration.
Example of an Elasticsearch database backup procedure:
Create a directory to save the Elasticsearch backup/snapshot to.
Make sure the elasticsearch user has read and write access to it.
cd
/db-backup
# Create Elasticsearch backup dir
mkdir
elasticsearch
# Make sure the 'elasticsearch' user can access it
chown
elasticsearch:elasticsearch elasticsearch/
Add the database backup path to the /etc/eclecticiq-elasticsearch/elasticsearch.yml configuration file:
path.repo: ["/db-backup/elasticsearch"]
Restart Elasticsearch:
systemctl restart elasticsearch
Define a repository for the backup:
curl -X PUT
"http://localhost:9200/_snapshot/backup"
-H
"Content-Type: application/json"
-d
'{"type": "fs","settings":{"location":"/db-backup/elasticsearch"}}'
# copy-paste version:
$ curl -X PUT
"http://localhost:9200/_snapshot/backup"
-H
"Content-Type: application/json"
-d
'{"type": "fs","settings":{"location":"/db-backup/elasticsearch"}}'
# Example of a successful response
{
"acknowledged"
:
true
}
Verify that the newly created repository is correctly defined:
curl -X GET
"localhost:9200/_snapshot/_all?pretty=true"
{
"backup"
: {
"type"
:
"fs"
,
"settings"
: {
"location"
:
"/db-backup/elasticsearch"
}
}
}
Make a snapshot of the Elasticsearch database:
curl -X PUT
"http://localhost:9200/_snapshot/backup/snapshot-201707281255?wait_for_completion=true"
# Check if the backup was successful:
ls
-al
/db-backup/elasticsearch/
*
# Example of a successful Elasticsearch database backup:
-rw-r--r--. 1 elasticsearch elasticsearch 39 Jul 28 10:58
/db-backup/elasticsearch/index
-rw-r--r--. 1 elasticsearch elasticsearch 4273 Jul 28 10:55
/db-backup/elasticsearch/meta-snapshot-201707281255
.dat
-rw-r--r--. 1 elasticsearch elasticsearch 907 Jul 28 10:58
/db-backup/elasticsearch/snap-snapshot-201707281255
.dat
/db-backup/elasticsearch/indices
:
total 8
drwxr-xr-x. 41 elasticsearch elasticsearch 4096 Jul 28 10:55 .
drwxr-xr-x. 3 elasticsearch elasticsearch 4096 Jul 28 10:58 ..
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:57 audit_v2
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 documents_v2
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 draft-entities_v4
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:56 extracts_v2
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 -kibana
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:57 logstash-2017.06.23
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 logstash-2017.06.29
...
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 logstash-2017.07.27
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:56 logstash-2017.07.28
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:57 .meta_v2
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 .scripts
drwxr-xr-x. 3 elasticsearch elasticsearch 51 Jul 28 10:58 stix_v4
To restore a backed up snapshot:
curl -X POST
'http://localhost:9200/_snapshot/backup/snapshot-201707281255/_restore'
To monitor the snapshot restore process:
curl -X GET
'http://localhost:9200/_recovery/'
Back up the Neo4j database
The Neo4j official documentation includes sections with explanations of backup concepts and procedures:
Configure Neo4j to enable backups
Back up the Neo4jdatabase
Use neo4j-backup, the Neo4j backup tool (shipped with the Neo4j Enterprise edition only).
Neo4j Community does not include a backup tool.
The following examples provide simple suggestions to manually start a backup operation.
Before starting backing up data, stop all backend services, including Neo4j.
To stop systemd-managed platform services through the command line:systemctl stop eclecticiq-platform-backend-services
Make sure Neo4j has stopped:
systemctl status neo4j
Once Neo4j is stopped, browse to a working directory with enough free space to store the Neo4j graph.db database backup copy.
Avoid using /tmp.
For example, create the backup in /media/neo4j:tar
graph.db-backup-${backup-
date
}.
tar
/media/neo4j/databases/graph
.db/
The Neo4j data location is defined in the /etc/eclecticiq-neo4j/neo4j.conf configuration file:
# Name of the active database to mount
dbms.active_database=graph.db
# Path to the data dir containing graph.db
dbms.directories.data=${data_dir}
# ex.: neo4j/data
Restart Neo4j
Start all backend services, including Neo4j:
To start systemd- managed platform services through the command line:systemctl start eclecticiq-platform-backend-services
Run this command to start platform services, and to start the platform instance through the command line.
Make sure that Neo4j is running:
systemctl status neo4j
Data recovery
To restore backed up data, follow the standard recommendations and procedures for PostgreSQL, Elasticsearch, and Neo4j:
Reindex Elasticsearch
Before you start (re)indexing or migrating Elasticsearch, do the following:
Make sure that ingestion, indexing, and core platform processes are not running.
If they are running, stop them:systemctl stop eclecticiq-platform-backend-services
Specify the platform settings environment variable by exporting it.
Run eiq-platform search reindex to index or to reindex the Elasticsearch database.
search reindex takes one argument: index-name.
Its value is the name of an existing Elasticsearch index.
Default Elasticsearch indices for the platform:audit
documents
draft-entities*
extracts*
logstash*
statsite*
(This index works with both StatsD and Statsite)stix*
search reindex copies data from the PostgreSQL database to the Elasticsearch database, which is then indexed.
sudo
-i -u eclecticiq EIQ_PLATFORM_SETTINGS=
/etc/eclecticiq/platform_settings
.py
/opt/eclecticiq-platform-backend/bin/eiq-platform
search reindex --index=${name_of_the_index}
Migrate Elasticsearch indices
To make sure you are applying the latest Elasticsearch schema, migrate Elasticsearch indices.
The migration process is idempotent. It sets up and builds the required/specified indices with aliases and mappings, and it updates the index mapping templates, if necessary.
Make sure that ingestion, indexing, and core platform processes are not running. If they are running, stop them.
To stop systemd-managed platform services through the command line:systemctl stop eclecticiq-platform-backend-services
Specify the platform settings environment variable by exporting it.
Run eiq-platform search upgrade to migrate Elasticsearch indices.
The command runs in the background.
In case of an SSH disconnection, the process should keep running normally.
This upgrade action is idempotent:
First, it tries to update the Elasticsearch index mappings in-place.
If it is not possible, it proceeds to reindex the existing Elasticsearch indices.
sudo -i -u eclecticiq EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py /opt/eclecticiq-platform-backend/bin/eiq-platform search upgrade
Index migration log messages, if any, are printed to the terminal.
If no messages are printed to the terminal, the process completed successfully.
Check for failed packages
During a data restore operation, you may wish to check if the scenario that created the need for a data backup restore also caused some packages to be partially or incorrectly ingested into the platform.
The blobs associated with problematic packages are not marked a successful.
To retrieve a list with these packages, execute the following SQL query against the PostgreSQL database:
SELECT
id, processing_status
FROM
blob
WHERE
processing_status
NOT
IN
(
'success'
,
'pending'
);
The query returns all packages whose status is not success or pending, as well as additional information on the packages, when available.
Examine the response to evaluate whether you want to try ingesting the packages again.