Address graph ingestion issues

Neo4j is disabled by default from IC 2.12.0 onward. Graphs use a new graph API based on PostgreSQL and Elasticsearch instead. For more information, see Update the settings.


During ingestion, incoming packages containing key values exceeding 4036 bytes in size fail to be ingested because the default Neo4j 3.5.x native-btree-1.0 index provider cannot process key sizes larger than 4036 bytes.

Issue

The platform may fail to ingest packages with very long key values. This scenario can occur with long URI strings containing concatenated URI parameters such as tokens or queries.
The error traceback contains the following error message:

Exception: Ingestion Exception:
Property value size: ${integer} of ${key value} is too large to index into this particular index.
Please see index documentation for limitations.

The error message originates from the graph application, and it is included in the platform traceback as is.

Impact

  • EclecticIQ Platform 2.7.x and earlier with Neo4j 3.5.x.

  • EclecticIQ Platform 2.7.x and earlier with Neo4j 3.5.x, before upgrading to EclecticIQ Platform 2.8.x.

The issue is solved in EclecticIQ Platform 2.8.x and later.

Cause

The default index provider for Neo4j 3.5.x is native-btree-1.0.
The native B+Tree index has a key size limit of 4036 bytes.

Mitigation

To ingest packages with key values larger than 4036 bytes, you need to enable support for larger key sizes by setting a different index provider for Neo4j 3.5.x.
An alternative index provider is lucene-1.0, whose key size limit is 32766 bytes.

EclecticIQ Platform 2.8.x and later solve the issue by setting lucene-1.0 as the default index provider for Neo4j 3.5.x.
To change native-btree-1.0 to lucene-1.0 as the default index provider for Neo4j 3.5.x in EclecticIQ Platform 2.7.x and earlier, you need to carry out a manual procedure to:

  1. Stop backend services

  2. Change index provider in Neo4j

  3. Reindex the graph database with the new index provider

  4. Start backend services

  5. Reingest failed packages, so that the new index provider can correctly process them.

How to set a different index provider for Neo4j

The procedure to change native-btree-1.0 to lucene-1.0 as the default index provider for Neo4j 3.5.x applies to EclecticIQ Platform 2.7.x and earlier.

Get root-level access

To complete the procedure, you must have root-level access in the server hosting the platform, and in the server hosting Neo4j.

  • Obtain root-level access by running sudo -i:

    # Root-access login shell
    sudo -i


    To access resources as a different user than the currently active one, append -u:

    # Grant the currently logged in user root-level access
    sudo -i
     
    # Grant root-level access to a different user
    sudo -i -u ${user_name}
     
    # Run a command as a different user, with root-level access
    sudo -i -u ${user_name} ${command} ${options}

Stop backend services and retrieve the Neo4j credentials

In the server hosting the platform:

  1. Stop all EclecticIQ backend services:

    systemctl stop $(systemctl list-units 'eclecticiq*'awk '{print $1}')

  2. Retrieve the user name and password credentials the platform uses to connect to the Neo4j database.
    This information is stored In the server hosting the platform, in /etc/eclecticiq/platform_settings.py:

    grep 'NEO4J_URL\|NEO4J_USER\|NEO4J_PASSWORD' /etc/eclecticiq/platform_settings.py

Back up and edit the Neo4j configuration

In the server hosting Neo4j:

  1. Back up the current /etc/eclecticiq-neo4j/neo4j.conf Neo4j configuration file:

    cd /etc/eclecticiq-neo4j/
    cp -p neo4j.conf neo4j.conf.orig

  2. Edit /etc/eclecticiq-neo4j/neo4j.conf to enable the Bolt network protocol and Neo4j Cypher Shell CLI:

    # Open Neo4j config file in Vim:
    vi /etc/eclecticiq-neo4j/neo4j.conf
     
    # In neo4j.conf enable Bolt and Cypher Shell:
    dbms.connector.bolt.enabled=True
    dbms.shell.enabled=True
     
    # Save and exit
    :wq!

  3. Restart the Neo4j service:

    systemctl restart neo4j

Change Neo4j index provider

In the server hosting Neo4j:

  1. Open a Cypher Shell instance, and connect to the Neo4j database:

    /bin/cypher-shell -u ${neo4j_username} -p ${neo4j_password}
  2. In Cypher Shell run the following commands to replace the current index provider with lucene-1.0, and to reindex the graph database with lucene-1.0 as the new index provider:

    DROP INDEX ON :Extract(value);
    CALL db.createIndex(":Extract(value)", "lucene-1.0");
    DROP CONSTRAINT ON ( extract:Extract ) ASSERT extract.id IS UNIQUE;
    CALL db.createUniquePropertyConstraint(":Extract(id)", "lucene-1.0");

    As a rule of thumb, reindexing the graph database using the new index provider can take about 3-5 minutes per million entities.

  3. After completing reindexing, list the new indices:

    CALL db.indexes();

    Response example:

    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | description | label | properties | state | type | provider | failureMessage |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "INDEX ON :Extract(kind)" | "Extract" | ["kind"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :Extract(platform_id)" | "Extract" | ["platform_id"] | "ONLINE" | "node_label_property" | {version: "2.0", key: "lucene+native"} | "" |
    | "INDEX ON :Extract(value)" | "Extract" | ["value"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :IntelEntity(meta.source)" | "IntelEntity" | ["meta.source"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :IntelEntity(meta.stix_id)" | "IntelEntity" | ["meta.stix_id"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :IntelEntity(sources)" | "IntelEntity" | ["sources"] | "ONLINE" | "node_label_property" | {version: "2.0", key: "lucene+native"} | "" |
    | "INDEX ON :IntelEntity(stix_id)" | "IntelEntity" | ["stix_id"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :IntelEntity(subtype)" | "IntelEntity" | ["subtype"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :IntelEntity(type)" | "IntelEntity" | ["type"] | "ONLINE" | "node_label_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :Extract(id)" | "Extract" | ["id"] | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :Extract(uid)" | "Extract" | ["uid"] | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :IntelEntity(id)" | "IntelEntity" | ["id"] | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"} | "" |
    | "INDEX ON :Migration(name)" | "Migration" | ["name"] | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"} | "" |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Disable Bolt and Cypher Shell

In the server hosting Neo4j:

  1. After completing the operation, edit /etc/eclecticiq-neo4j/neo4j.conf to disable the Bolt network protocol and Neo4j Cypher Shell CLI:

    # Open Neo4j config file in Vim:
    vi /etc/eclecticiq-neo4j/neo4j.conf
     
    # In neo4j.conf disable Bolt and Cypher Shell:
    dbms.connector.bolt.enabled=False
    dbms.shell.enabled=False
     
    # Save and exit
    :wq!

  2. Restart the Neo4j service:

    systemctl restart neo4j

Start backend services and reingest failed packages

In the server hosting the platform:

  1. Start the platform backend services:

    systemctl start eclecticiq-platform-backend-services
  2. Sign in to the platform GUI, and then proceed to reingest failed packages by initiating the action in the corresponding incoming feeds using the options available in the GUI.