Check ingestion performance

To identify performance issues as part of a troubleshooting process, Intelligence Center administrators can inspect ingestion metrics to understand how long background ingestion tasks wait before execution, how long it takes to execute them, and if the execution outcome is successful or not.

About ingestion performance metrics

As part of their monitoring and maintenance tasks, Intelligence Center administrators check ingestion performance to retrieve information that can help them answer questions such as:

  • Is new data coming in?

  • How fast is the Intelligence Center ingesting data?

  • How long does it take to process all queues and to complete ingestion?

  • Are incoming feeds working as expected, or are any slow feeds creating potential bottlenecks?

  • Is it advisable to increase the number of concurrent ingestion workers to to boost data throughput and to reduce ingest time?

  • Are system resources coping with the workload, or is it advisable to scale the system up or out?

Ingestion metrics provide data that can help answer questions such as these, and to make informed decisions about which courses of actions to pursue.

In the Intelligence Center, it is possible to inspect these metrics in Kibana, where you can submit search queries to the statsite* Elasticsearch indices .

To measure ingestion performance, you can query the statsite* indices with specific metrics:

Search for this metric…

…to answer this question

run_time

How long does it take to execute tasks, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

wait_time

How long are tasks pending, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

batch_size

What is the size of batches for tasks that support batching, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

enqueued

How many packages a queue holds, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

deduplicated

How many times are tasks deduplicated, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

failure

How many tasks failed or were retried over time, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

success

How many tasks completed successfully or were retried over time, based on:

  • Task name

  • Incoming feed

  • Incoming feed priority

View Statsite logs in Kibana

To view statsite* index logs, access Kibana.

Access Kibana

To access Kibana:

  • In the web browser address bar enter a URL with the following format:

    https://${platform_host_name}/private/kibana/app/kibana#

    Keep the trailing #

    Example: https://eclecticiq.platform.org/private/kibana/app/kibana#

Select the statsite* index

  1. In Kibana select Discover.

  2. In the Discover view select the following index:statsite*: it collects metrics about ingested and

    received packages, invalid or not well-formed lines in the ingested packages, as well as ingestion speed and performance.

    You can search for specific subsets by entering key/value pairs in the search input field.

    Example:

    grp:packets_received; grp:bad_lines_seen

    images/download/attachments/82474640/statsite_index_kibana.png

  3. To adjust the time interval, in the top-right corner click the clock icon images/plugins/servlet/confluence/placeholder/unknown-attachment.png , and choose an appropriate time range for the search.
    images/plugins/servlet/confluence/placeholder/unknown-attachment.png

Search the statsite* metrics

In Kibana, Statsite metrics live in the ingestion namespace.

Within this functionality area, you can search for ingestion-related metrics using filters such as group, target, and action, to look for a specific metric (the action) measuring a task (the target), which controls ingestion for an incoming feed with a designated priority level (the group).

Search filters

Description

ns:"ingestion"

Filters the scope.

The ingestion namespace groups logs and metrics related to Intelligence Center ingestion.

grp:"f${feed_id}-p${priority_level}"

Filters functionality areas inside the namespace.

Inside ingestion, this filter groups incoming feeds and their designated priority level.

tgt:"${task name}"

Filters tasks, workers, or services responsible for the functionality areas specified in the grp filter.

For example, tasks that ingest and process packages, process observables and enrichments, and finalize the ingestion process.

act:${metric name}"

Filters the type of metrics used to measure an outcome.

Available act metrics types:

  • batch_size

  • blob_bytes

  • create_entity_time

  • enqueued

  • entities_per_blob

  • graph_index_time

  • lock_acquisition_time

  • lock_held_time

  • new_eclecticiq-sighting

  • new_ttp

  • relations_per_blob

  • run_time

  • search_index_time

  • success

  • wait_time

type:${metric type}"

Filters how metrics are expressed.

Allowed values:

  • gauge: represents the metrics as a continuous variable value.

  • counter: represents the metrics as the total item count in a 30 second time interval.

  • timer_data: represents time metrics or metrics linked to priority, rate and sample rate measured in a 30 second time interval.

    They give an indication of the time spent processing a feed based on indicators such as mean and median values, priority (p + integer value), time to process a web request, and so on.

    Available timer_data metrics types:

    • count

    • lower

    • mean

    • median

    • p50

    • p95

    • p99

    • rate

    • sample_rate

    • stdev

    • sum

    • sum_sq

    • upper

As a rule of thumb, start from a relatively loose filter, and then start drilling down as needed, based on the search results Kibana returns.

Examples

Add a filter to search the statsite* index for any ingestion metrics:

  1. Click Add a filter +.

  2. In the filter editor, select the following filter,

    operator, and value: ns is ingestion

  3. Click Save.

images/download/attachments/82474640/statsite_index_kibana_ingestion_ns.png

Add a filter to search the statsite* index for ingestion metrics about waiting time in ingestion queues:

  1. Click Add a filter +.

  2. In the filter editor, select the following filter, operator, and value: ns is ingestion

  3. Click Add a filter +.

  4. In the filter editor, select the following filter, operator, and value: act is wait_time

  5. Click Save.

images/download/attachments/82474640/statsite_index_kibana_ingestion_ns_act.png

Add a filter to search the statsite* index for ingestion metrics about waiting time in ingestion queues related to the task that is responsible for package ingestion:

  1. Click Add a filter +.

  2. In the filter editor, select the following filter, operator, and value: ns is ingestion

  3. Click Add a filter +.

  4. In the filter editor, select the following filter, operator, and value: act is wait_time

  5. Click Add a filter +.

  6. In the filter editor, select the following filter, operator, and value: tgt is ingest_blob_task

  7. Click Save.

images/download/attachments/82474640/statsite_index_kibana_ingestion_ns_act_tgt.png

Add a filter to search the statsite* index for ingestion metrics about waiting time in ingestion queues related to the task that is responsible for package ingestion, and a specific feed with ID 2 and priority level set to 100:

  1. Click Add a filter +.

  2. In the filter editor, select the following filter, operator, and value: ns is ingestion

  3. Click Add a filter +.

  4. In the filter editor, select the following filter, operator, and value: act is wait_time

  5. Click Add a filter +.

  6. In the filter editor, select the following filter, operator, and value: tgt is ingest_blob_task

  7. Click Add a filter +.

  8. In the filter editor, select the following filter, operator, and value: grp is f2-p100

  9. Click Save.

images/download/attachments/82474640/statsite_index_kibana_ingestion_ns_act_tgt_grp.png

Visualize metrics

Metrics indicators actually stored and available for lookup may vary depending on the host environment and its configuration.

Create metrics visualizations

To explore metrics currently available in your system:

  1. In Kibana select Visualize.

  2. In the Visualize view click images/download/attachments/82474640/plus-square.svg-x24.png to create a new visualization.

  3. In the New Visualization dialog click the Data Table visualization type.

  4. In the Choose search source view click the statsite* search index.

  5. In the New Visualization, statsite* view, Data tab, under Buckets, Select buckets type click Split Rows.

  6. From the Aggregation drop-down menu select Terms.

  7. From the Field drop-down menu select a field from the selected search index.

    For example, select ns.

  8. Click images/download/attachments/82474640/caret-square-right.svg-x24.png (Apply changes).

  9. To adjust the time interval, in the top-right corner click the clock icon images/plugins/servlet/confluence/placeholder/unknown-attachment.png , and choose an appropriate time range for the search.

The table is populated with the top 5 ns values in the statsite* search index.

By default, the table view includes:

  • The top 5 values for the selected ns namespace term.

  • The total number of metrics available for each namespace value.

To narrow down the search filter and to start drilling down to explore more specific metrics, add sub-buckets; that is, additional filters based on other JSON fields.

To list more specific metrics currently available in your system:

  1. In the current statsite* view, Data tab, under Buckets click Add sub-buckets.

  2. Under Select buckets type click Split Rows.

  3. From the Aggregation drop-down menu select Terms.

  4. From the Field drop-down menu select a field from the selected search index.

    For example, select grp.

  5. Click images/download/attachments/82474640/caret-square-right.svg-x24.png (Apply changes).

The table is populated with the top 5 grp values for each top 5 ns value in the statsite* search index.

By default, the table view includes:

  • The top 5 values for the selected ns and grp namespace terms.

  • The total number of metrics available for each ns + grp namespace value pair.

Visualize ingestion metrics

Create a dedicated visualization in Kibana to view ingestion-specific metrics.

To explore all available metrics that measure data ingestion in the Intelligence Center:

  1. In Kibana select Visualize.

  2. In the Visualize view click images/download/attachments/82474640/plus-square.svg-x24.png to create a new visualization.

  3. In the New Visualization dialog click the Data Table visualization type.

  4. In the Choose search source view click the statsite* search index.

  5. In the New Visualization, statsite* view, Data tab, under Buckets, Select buckets type click Split Rows.

  6. From the Aggregation drop-down menu select Terms.

  7. From the Field drop-down menu select act.

  8. In the Size input field enter 100.

  9. Click the Options tab.

  10. In the Per Page input field, set the pagination limit to 100.

  11. Click images/download/attachments/82474640/caret-square-right.svg-x24.png (Apply changes).

  12. To adjust the time interval, in the top-right corner click the clock icon images/plugins/servlet/confluence/placeholder/unknown-attachment.png , and choose an appropriate time range for the search.

  13. In the search input field enter ns:ingestion.

  14. Click Refresh or press ENTER.

The table is populated with the first 100 type of metrics that measure data ingestion performance in the Intelligence Center.

The table view includes:

  • Up to 100 names defining the type of metrics – currently there are fewer than 100 metrics data points measuring ingestion.

  • The total number of metrics available for each act type of metrics.

Visualize processing time per incoming feed

Create a dedicated visualization in Kibana to review how much time incoming feed ingestion workers spend on ingesting feed data.

To review how much time ingestion workers spend on ingesting incoming data, and how much time ingestion workers spend on specific incoming feeds:

  1. In Kibana select Visualize.

  2. In the Visualize view click images/download/attachments/82474640/plus-square.svg-x24.png to create a new visualization.

  3. In the New Visualization dialog click the Data Table visualization type.

  4. In the Choose search source view click the statsite* search index.

  5. In the New Visualization, statsite* view, Data tab, under Buckets, Select buckets type click X-Axis.

  6. From the Aggregation drop-down menu select Date Histogram.

  7. From the Field drop-down menu select @timestamp.

  8. From the Interval drop-down menu select the desired time interval to use as a data bucket for the axis.

    For example: Auto.

  9. In the New Visualization, statsite* view, Data tab, under Metrics click Y-Axis.

  10. From the Aggregation drop-down menu select Sum.

  11. From the Field drop-down menu select sum.

  12. Click images/download/attachments/82474640/caret-square-right.svg-x24.png (Apply changes).

  13. To adjust the time interval, in the top-right corner click the clock icon images/plugins/servlet/confluence/placeholder/unknown-attachment.png , and choose an appropriate time range for the search.

  14. In the search input field enter act:"run_time".

  15. Click Refresh or press ENTER.

The histogram area is populated with a visual representation of the time spent ingesting data – Y axis – for the time interval bucket defined for the X axis.

You can further refine the histogram view to display the time spent ingesting data per incoming feed:

  1. In the New Visualization, statsite* view, Data tab, under Buckets click Add sub-buckets.

  2. Under Select buckets type select Split Series.

  3. From the Sub Aggregation drop-down menu select Terms.

  4. From the Field drop-down menu select grp.

  5. Set the Size value depending on the number of incoming feeds that are ingesting data concurrently, and whose processing time you want to inspect.

    For example: 20

  6. Click images/download/attachments/82474640/caret-square-right.svg-x24.png (Apply changes).

The histogram area is populated with a visual representation of the time spent ingesting data by each active incoming feed – Y axis – for the time interval bucket defined for the X axis.

  • Incoming feeds on the histogram are color-coded.

  • Feed labels include the incoming feed ID and the processing priority assigned to the feed.

    Format: f${feed ID}-p${priority value}

    Example: f42-p100 refers to an incoming feed with ID 42 and with priority 100.

Retrieve a feed ID

To retrieve an incoming feed ID:

  1. In the side navigation bar, go to Data configuration images/plugins/servlet/confluence/placeholder/unknown-attachment.png > Incoming feeds.

  2. In the Incoming feeds overview, click anywhere in the row corresponding to the incoming feed whose ID you want to retrieve.

    In the web browser address bar, the URL of the active Intelligence Center view is similar to the following example:

    https://${platform_host_name}/main/configuration/incoming-feeds?detail=42

    In the URL, the detail URL parameter holds the feed ID.

    In the example, the ID value is 42.

About incoming feed priority

Incoming feeds can be assigned a priority level to allocate more or fewer Intelligence Center resources during the ingestion process.

Feed priority is an integer within a range:

  • The lowest value is zero.

  • The highest value is set in /etc/eclecticiq/platform_settings.py :

    INGESTION_FEED_PRIORITY = 100

    By default, ingestion feed priority value is set to 100.