About datasets#
Datasets are generic containers that help you organize and manage sets of entities and observables around shared characteristics, context, and themes.
Datasets are arbitrary data collections: you can edit and delete their contents at any time.
Datasets are generic containers: you can create datasets to group entities for reference, for further analysis, to temporarily drop them and then pick them up at a later time, and so on.
Datasets help you organize your intelligence. You can create datasets to group information based on any criteria that matter to you. For example, you can create datasets to group entities based on:
Entity type.
A specific threat scenario you are analyzing.
An incident.
A threat actor.
A targeted victim, and so on.
Or you can create datasets based on themes, for example:
Countries.
ATP-groups.
Vulnerability types.
Targeted infrastructure.
Subdividing a heterogeneous cyber threat intelligence corpus into smaller, more consistent, and more manageable chunks brings structure and clarity. This helps you see the forest for the trees, so that you can identify what matters to you quicker and more efficiently.
About dataset access control#
To control user access to datasets, save them to workspaces.
Like graphs, datasets inherit their access control rights from the workspace(s) they belong to.
Only workspace owners and collaborators can access datasets that belong to a workspace.
Static and dynamic datasets#
Static or dynamic?
Summary |
|
---|
Static and dynamic datasets have different computational costs, with the former being more expensive than the latter.
About static datasets
As a general guideline, it is better to avoid applying rules to static datasets.
Static datasets are defined in the PostgreSQL database.
Each time data are added to or removed from static datasets, the database tables need be updated accordingly. This process can be expensive, and as a consequence performance can slow down.
If you apply rules to static datasets, an entity with the most recent
timestamp
replaces the same entity with an older timestamp
in the static
dataset.
This can be a newer version of the entity, as well as the same version
of the entity with changes only in its meta
content section:
Changes to the
data
section of an entity create a new version of the entity.They also add a new log entry to the entity history to record the changes.
Changes to the
meta
section of an entity do not create a new version of the entity.However, they do update the
timestamp
value of thelast_updated_at
database field.Update strategies rely on the
last_updated_at
database field to identify entities whosetimestamp
value was updated since the previous execution of the outgoing feed. Entities with a more recent timestamp value compared to the previous execution of the outgoing feed are packaged and included in the published content of the outgoing feed.
About dynamic datasets
Dynamic datasets are rule-friendlier.
If you apply rules to dynamic datasets, a more recent version of an entity a rule retrieves is used to replace the corresponding previous version in the dynamic dataset only if the new version satisfies the search query criteria.
This is computationally cheaper and faster.