Saving data#

Last but not least, the platform saves the ingested entities to the databases in the following order:

  1. Entity store (PostgreSQL).

  2. Search store (Elasticsearch).

  3. Graph store (Neo4j).

Saving data URIs and raw artifacts#

The extracted entity data that is stored inside observables ranges from short, simple data such as email addresses, domain names, IP addresses, and so on, to binary data.

When an entity contains binary data — for example, a file, a memory region, or packet capture (PCAP) data — the data can be represented as either a data URI or a CybOX raw artifact element.

During ingestion, extraction logic handles binary data URI and raw artifact objects embedded in CybOX objects in the following way:

Data URIs are extracted and stored as entity attachments and new hash values:

  • The data URI value is recalculated to a new hash: uri-hash-sha256.

    The SHA-256 hash value for uri-hash-sha256 is calculated over the UTF-8 encoding of the data URI string.

    The uri-hash-sha256 hash substitute enables entity correlation among entities containing the same data URI.

  • The binary data/raw content embedded in the data URI is decoded and processed:

    • The extracted binary data content is stored as an entity attachment similar to the name="Raw_Artifact" type="ArtifactObj:RawArtifactType" CybOX object.

    • The extracted content is hashed using SHA-512, SHA-256, SHA-1, and MD5.

      Each resulting hash is added to the relevant entities as an observable.

Example#

A data URI with image content nested inside a CybOX object generates the following output:

  • 1 uri-hash-sha256 hash to facilitate entity correlation.

  • 4 calculated hash observables: hash-sha512, hash-sha256, hash-sha1, and hash-md5.

  • 1 embedded JSON entity attachment (raw-artifact) with the extracted binary data representing the image content.

The following example shows a sample input along with the corresponding output.

dataUriExtractionSample(

      input={
          data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==
      },

      output={

          # Recalculated hash of the original URI:
          ('uri-hash-sha256:'
            'd16ae5d51dda6f58995171aa23c0fa5e'
            '6dcd9c777cf9c251c4be3b1d62fdf670'),

          # Multiple hashes of the decoded content:
          'hash-md5:3eacd0132310ea44cad756b378a3bc07',

          'hash-sha1:e2216a7e9b73f5cb0279351c78ce61c33475cea7',

          ('hash-sha256:'
            'bb229a48bee31f5d54ca12dc9bd960c6'
            '3a671f0d4be86a054c1d324a44499d96'),

          ('hash-sha512:'
            'bd9ab35dde3a5242b04c159187732e13'
            'b0a6da50ddcff7015dfb78cdd68743e1'
            '91eaf5cddedd49bef7d2d5a642c21727'
            '2a40e5ba603fe24ca676a53f8c417c5d'),

          # (Attachment) Raw artifact as embedded JSON with the content:
          ('raw-artifact:{"content": '
            '"R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==", '
            '"content_encoding": "base64", "type": "image/png"}'),
      }),