Address graph ingestion issues#

Caution

This document applies to only EclecticIQ Platform 2.7.x and earlier.

Tip

Neo4j is disabled by default from IC 2.12.0 onward. Graphs use a new graph API based on PostgreSQL and Elasticsearch instead. For more information, see Update platform_settings.py.

For more information, see Update platform_settings.py

During ingestion, incoming packages containing key values exceeding 4036 bytes in size fail to be ingested because the default Neo4j 3.5.x native-btree-1.0 index provider cannot process key sizes larger than 4036 bytes.

Issue#

The platform may fail to ingest packages with very long key values. This scenario can occur with long URI strings containing concatenated URI parameters such as tokens or queries.

The error traceback contains the following error message:

Exception: Ingestion Exception:
Property value size: ${integer} of ${key value} is too large to index into this particular index.
Please see index documentation for limitations.

The Neo4j, and it is included in the platform traceback as is.

Impact#

  • EclecticIQ Platform 2.7.x and earlier with Neo4j 3.5.x.

  • EclecticIQ Platform 2.7.x and earlier with Neo4j 3.5.x, before upgrading to EclecticIQ Platform 2.8.x.

Note

The issue is solved in EclecticIQ Platform 2.8.x and later.

Cause#

The default index provider for Neo4j 3.5.x is native-btree-1.0. The native B+Tree index has a key size limit of 4036 bytes.

Mitigation#

To ingest packages with key values larger than 4036 bytes, you need to enable support for larger key sizes by setting a different index provider for Neo4j 3.5.x.

An alternative index provider is lucene-1.0, whose key size limit is 32766 bytes.

EclecticIQ Platform 2.8.x and later solve the issue by setting lucene-1.0 as the default index provider for Neo4j 3.5.x.

To change native-btree-1.0 to lucene-1.0 as the default index provider for Neo4j 3.5.x in EclecticIQ Platform 2.7.x and earlier, you need to carry out a manual procedure to:

Get root-level access#

To complete the procedure, you must have root-level access in the server hosting the platform, and in the server hosting Neo4j.

  • Obtain root-level access by running sudo -i:

    # Root-access login shell 
    sudo -i
    
  • To access resources as a different user than the currently active one, append -u:

    # Grant the currently logged in user root-level access 
    sudo -i
    
    # Grant root-level access to a different user
    sudo -i -u ${user_name}
    
    # Run a command as a different user, with root-level access
    sudo -i -u ${user_name} ${command} ${options}
    

Stop backend services and retrieve the Neo4j credentials#

In the server hosting the platform:

  1. Stop all EclecticIQ backend services:

    systemctl stop $(systemctl list-units 'eclecticiq*' | awk '{print $1}')
    
  2. Retrieve the user name and password credentials the platform uses to connect to the Neo4j database.

    This information is stored In the server hosting the platform, in /etc/eclecticiq/platform_settings.py:

    grep 'NEO4J_URL\|NEO4J_USER\|NEO4J_PASSWORD' /etc/eclecticiq/platform_settings.py
    

Back up and edit the Neo4j configuration#

In the server hosting Neo4j:

  1. Back up the current /etc/eclecticiq-neo4j/neo4j.conf Neo4j configuration file:

    cd /etc/eclecticiq-neo4j/
    cp -p neo4j.conf neo4j.conf.orig
    
  2. Edit /etc/eclecticiq-neo4j/neo4j.conf to enable the Bolt network protocol and Neo4j Cypher Shell CLI:

    # Open Neo4j config file in Vim:
    vi /etc/eclecticiq-neo4j/neo4j.conf
    
    # In neo4j.conf enable Bolt and Cypher Shell:
    dbms.connector.bolt.enabled=True
    dbms.shell.enabled=True
    
    # Save and exit
    :wq!
    
  3. Restart the Neo4j service:

    systemctl restart neo4j
    

Change Neo4j index provider#

In the server hosting Neo4j:

  1. Open a Cypher Shell instance, and connect to the Neo4j database:

    /bin/cypher-shell -u ${neo4j_username} -p ${neo4j_password}
    
  2. In Cypher Shell run the following commands to replace the current index provider with lucene-1.0, and to reindex the graph database with lucene-1.0 as the new index provider:

    DROP INDEX ON :Extract(value);
    CALL db.createIndex(":Extract(value)", "lucene-1.0");
    DROP CONSTRAINT ON ( extract:Extract ) ASSERT extract.id IS UNIQUE;
    CALL db.createUniquePropertyConstraint(":Extract(id)", "lucene-1.0");
    

    As a rule of thumb, reindexing the graph database using the new index provider can take about 3-5 minutes per million entities.

  3. After completing reindexing, list the new indices:

    CALL db.indexes();
    

    Response example:

    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | description                           | label         | properties       | state    | type                   | provider                               | failureMessage |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "INDEX ON :Extract(kind)"             | "Extract"     | ["kind"]         | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :Extract(platform_id)"      | "Extract"     | ["platform_id"]  | "ONLINE" | "node_label_property"  | {version: "2.0", key: "lucene+native"} | ""             |
    | "INDEX ON :Extract(value)"            | "Extract"     | ["value"]        | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :IntelEntity(meta.source)"  | "IntelEntity" | ["meta.source"]  | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :IntelEntity(meta.stix_id)" | "IntelEntity" | ["meta.stix_id"] | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :IntelEntity(sources)"      | "IntelEntity" | ["sources"]      | "ONLINE" | "node_label_property"  | {version: "2.0", key: "lucene+native"} | ""             |
    | "INDEX ON :IntelEntity(stix_id)"      | "IntelEntity" | ["stix_id"]      | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :IntelEntity(subtype)"      | "IntelEntity" | ["subtype"]      | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :IntelEntity(type)"         | "IntelEntity" | ["type"]         | "ONLINE" | "node_label_property"  | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :Extract(id)"               | "Extract"     | ["id"]           | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :Extract(uid)"              | "Extract"     | ["uid"]          | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :IntelEntity(id)"           | "IntelEntity" | ["id"]           | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"}        | ""             |
    | "INDEX ON :Migration(name)"           | "Migration"   | ["name"]         | "ONLINE" | "node_unique_property" | {version: "1.0", key: "lucene"}        | ""             |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

Disable Bolt and Cypher Shell#

In the server hosting Neo4j:

  1. After completing the operation, edit /etc/eclecticiq-neo4j/neo4j.conf to disable the Bolt network protocol and Neo4j Cypher Shell CLI:

    # Open Neo4j config file in Vim:
    vi /etc/eclecticiq-neo4j/neo4j.conf
    
    # In neo4j.conf disable Bolt and Cypher Shell:
    dbms.connector.bolt.enabled=False
    dbms.shell.enabled=False
    
    # Save and exit
    :wq!
    
  2. Restart the Neo4j service:

    systemctl restart neo4j
    

Start backend services and reingest failed packages#

In the server hosting the platform:

  1. Start the platform backend services:

    systemctl start eclecticiq-platform-backend-services
    
  2. Sign in to the platform GUI, and then proceed to reingest failed packages by initiating the action in the corresponding incoming feeds using the options available in the GUI.