Monitor system health

About monitoring system health

System administrators can use tools like Celery and Supervisor to monitor Intelligence Center tasks to check day-to-day operations, and to investigate in case of issues.

Monitor the Intelligence Center to ensure normal operation to research and identify the root cause of an issue, and to inspect the status of key Intelligence Center processes such as incoming and outgoing feeds, enrichers, ingestion queues, and tasks.

In the current context, monitoring covers on/off status only: the tasks and the commands described here enable verifying whether a task, a process, or a component is running or not.

Metrics and other types of measurements are outside the scope of the topic.

System administrators and DevOps engineers can run quick checks to inspect Intelligence Center operation, to identify issues and review errors, so that they can address them in a timely manner.

About root-level access

To successfully execute commands in the command line or in the terminal, you may require root-level access rights.

  • Obtain root-level access by running sudo -i:

    # Root-access login shell
    sudo -i
  • To access resources as a different user than the currently active one, append -u:

    # Grant the currently logged in user root-level access
    sudo -i
     
    # Grant root-level access to a different user
    sudo -i -u ${user_name}
     
    # Run a command as a different user, with root-level access
    sudo -i -u ${user_name} ${command} ${options}

Tools

Celery

The task runner.

It manages task execution and scheduling.

Redis

The message broker.

It handles background task processing by managing message queues based on the pub-sub pattern.

systemd

The initialization system to bootstrap processes and start services.

The process manager to start and stop processes.

Core components

Component

Address

Port

elasticsearch

localhost

9200

kibana

localhost

5601

logstash

localhost

6755

neo4j

localhost

7474

7473

eclecticiq-neo4jbatcher

127.0.0.1

4008

nginx

${web_server_name}

80

443

opentaxii

127.0.0.1

9000

platform

localhost

8008

postfix

${postfix_host_name}

25

587

postgresql

localhost

5432

redis

localhost

6379

statsite

127.0.0.1

8125

Monitoring

Intelligence Center monitoring covers two main areas:

Services

platform-api

The web application implementing the Intelligence Center API and the API endpoints.

The endpoints expose services that can be consumed by making API calls and by passing arguments.

nginx

Web server.

postfix

Email server.

opentaxii

TAXII server responsible for STIX data transport.

postgresql-11

PostgreSQL (main database).

redis

Redis (message broker).

elasticsearch

Elasticsearch search and indexing database.

kibana

Generates dashboard graphs.

logstash

Log and data aggregation, data pipeline and funneling.

neo4j

Neo4j graph database.

statsite

Gather stats such as counters, timers, discovered entities and so on, and it sends aggregates to Kibana through Elasticsearch.

Processes

graph-ingestion

Data funnel to the Neo4j graph database.

Handles data updates for Neo4j.

intel-ingestion

Intel ingestion through feeds and enrichers.

Consumes incoming data and saves it PostgreSQL, Neo4j, and Elasticsearch.

the Intelligence Center executes one intel-ingestion per processor core.

Running tasks are sequentially numbered starting from 0.

For example, a Intelligence Center instance running on a quad core machine normally executes 4 such processes, progressively numbered from intel-ingestion:0 to intel-ingestion:3.

eclecticiq-neo4jbatcher

Neo4j graph database batch processing application.

It lives on the same server hosting the Neo4j database.

It prepares data for ingestion into the Neo4j database.

search-ingestion

Search indexer.

Handles Elasticsearch data updates.

task

Celery-managed tasks such as enrichers, feed integrations, incoming feed data providers, and utilities.

Feeds

Incoming and outgoing feeds.

Enrichers

Enricher tasks.

Celery tasks

Other/Misc. Celery tasks.

Monitor components with systemd

Tool: systemd helps you inspect Intelligence Center components to verify if their services are running normally.

Use it to check the following components:

Component

Description

If it is not running…

elasticsearch

Elasticsearch search and indexing database.

No data searching and indexing capabilities are available.

kibana

Generates dashboard graphs.

The dashboard does not load correctly.

The /kibana/ API endpoint returns a HTTP 502 error.

logstash

Log and data aggregation, data pipeline and funneling.

No data aggregation, deduplication, and normalization.

neo4j

Neo4j graph database.

Graph data queries stop working.

It is not possible to poll the graph database.

eclecticiq-neo4jbatcher

Neo4j graph database batch processing application.

It lives on the same server hosting the Neo4j database.

It pre-processes and it queues data for ingestion into the Neo4j database.

Graph ingestion may slow down, and it may hang until it stops.

nginx

Web server.

No network connectivity to the Intelligence Center.

Requests to the web server return an HTTP 500 error.

opentaxii

TAXII server responsible for STIX data transport.

It is not possible to send or to receive data through the TAXII transport protocol.

postfix

Email server.

No automatic email notifications.

postgresql-11

PostgreSQL (main database).

It is not possible to access Intelligence Center data.

redis

Redis (message broker).

Tasks and processes may hang and/or behave unexpectedly.

statsite

Gathers stats such as counters, timers, discovered entities and so on.

It sends aggregates to Kibana through Elasticsearch.

No metrics about discovered entities, feed updates, and so on.

systemctl is systemd’s command line interface utility.

The commands can optionally take options:

systemctl ${options} ${command} ${component_name}

For a complete list of supported commands and options, see the systemd documentation.

To obtain a list of all running services, run the following command(s):

systemctl

The response is displayed in the following format:

UNIT             LOAD              ACTIVE            SUB                JOB DESCRIPTION
${service_name}  ${loaded_or_not}  ${active_or_not}  ${running_or_not}  ${description_of_the_job}

To verify if Nginx is running, run the following command(s):

systemctl status -l nginx.service

To verify if PostgreSQL is running, run the following command(s):

systemctl status -l postgresql-11.service

To verify if Redis is running, run the following command(s):

systemctl status -l redis.service

To verify if Logstash is running, run the following command(s):

systemctl status -l logstash.service

To verify if Elasticsearch is running, run the following command(s):

systemctl status -l elasticsearch.service

To retrieve status information about all these services at once, run the following command(s):

systemctl status -l nginx.service postgresql-11.service redis.service logstash.service elasticsearch.service

To retrieve status information about all the systemd-managed services whose name contains a specific search string, run the following command(s):

systemctl | grep "${search_string}"

Monitor processes

Monitor ingestion queues with Redis

Tool: Redis acts as a message broker for Celery-managed tasks.

redis-cli is Redis’s command line interface utility:

# Launch redis-cli
redis-cli
 
> ${command} ${item_name}

For a complete list of supported commands and options, see the redis-cli command reference.

Within the Intelligence Center Redis manages task queues.

Possibly the only command you need is the one that enables checking queue length: llen.

To inspect the Intelligence Center data ingestion queue length:

# Launch redis-cli
redis-cli
 
> llen "queue:ingestion:inbound"

To inspect the graph database queue length:

> llen "queue:graph:inbound"

To inspect the Elasticsearch data update queue length:

> llen "queue:search:inbound"

Monitor running tasks with Celery

Tool: Celery is the task runner that manages task execution and scheduling.

To use Celery to request task information, pass the following environment variable(s) with your request:

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py

Append Celery commands after the environment variable(s).

Celery commands have the following format:

celery -A ${module_name} ${command}

Ping Celery to see which tasks are up and listening.

This is the easiest way to check task running status. All active tasks reply with pong.

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py
/opt/eclecticiq/platform/api/bin/celery -A eiq.platform.taskrunner.app inspect ping

To inspect active tasks, run the following command(s):

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py
/opt/eclecticiq/platform/api/bin/celery -A eiq.platform.taskrunner.app inspect active

To inspect active tasks queues, run the following command(s):

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py
/opt/eclecticiq/platform/api/bin/celery -A eiq.platform.taskrunner.app inspect active_queues

To inspect scheduled tasks, run the following command(s):

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py
/opt/eclecticiq/platform/api/bin/celery -A eiq.platform.taskrunner.app inspect scheduled

To inspect overall task status, run the following command(s):

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py
/opt/eclecticiq/platform/api/bin/celery -A eiq.platform.taskrunner.app status

To request task statistics (exhaustive, but it can be verbose), run the following command(s):

export EIQ_PLATFORM_SETTINGS=/etc/eclecticiq/platform_settings.py
/opt/eclecticiq/platform/api/bin/celery -A eiq.platform.taskrunner.app inspect stats

(For further details, see the documentation on Celery ping, Celery workers, Celery worker statistics, and Celery monitoring)