Configure deduplication#
Key Terminology#
Unified Entity An Entity that aggregates information from multiple Source Entities that have been identified as duplicates. Unified Entities display combined data from all their contributing Entities and cannot be edited directly.
Contributing Entity An Entity that has been merged into a Unified Entity. Contributing Entities retain all their original data and cannot be merged into another Unified Entity. They remain accessible for viewing and editing, and any changes automatically update the parent Unified Entity.
Standalone Entity An Entity that has not been merged into a Unified Entity and exists independently in the system.
Unified Entity source The deduplication process assigns this source to all Unified Entities created. Users with access to this source can view and access the Unified Entities. The source name appears on all Unified Entities.
Preferred source An optional source designation in the deduplication configuration that takes precedence when contributing Entities contain conflicting property values. When specified, the Preferred source’s values are used in the Unified Entity for fields where conflicts exist.
Matching criteria The configured rules that determine which Entities should be deduplicated, including Entity types, sources, and property consolidation settings. Each matching criterium defines a specific scope for deduplication.
Deduplication logic The system’s matching algorithm and consolidation rules that identify duplicate Entities and determine how to merge their properties into Unified Entities.
Permissions#
To view Unified Entities created through deduplication, users require the read consolidation-policies permission.
To modify the deduplication rule, users require the read consolidation-policies permission.
Create Deduplication rule#
From the navigation menu, select Data configuration
> Deduplication rule.Under Unification Group and access, select a Group from the dropdown. This Group will own all Unified Entities created by Deduplication.
Under Rule execution, toggle the Status switch to on to enable Deduplication. When first enabled, no Unified Entities exist in the system. You must run Deduplication on existing data or wait for new Entities to be ingested.
(Optional) To process existing Entities immediately, select RUN NOW next to Run on existing data. The system will perform an initial bulk merge of existing data based on the configured Deduplication logic.
Under Deduplication logic, configure the matching criteria:
Match Entities criteria#
Entities will be unified if they have the same type and similar name or alias, and are from the selected sources.
Under Entity types, select which Entity types should be deduplicated under this criterium. By default, all supported Entity types from all sources are included in deduplication.
Under Sources, select which Sources should be included in this deduplication criterium.
Only Entities ingested from the selected sources will be considered for deduplication. Do not select sources if you do not want their data to be merged.
Unified Entity properties
Define how the system should populate properties when Entities are unified. You can prioritize values from a selected source, or let the system default logic determine which information to include.
(Optional) Under Preferred source, select a source whose information should be privileged when the same Entity from different sources contains conflicting property values. When multiple sources report different information for the same property, the Preferred source’s value will be used in the Unified Entity. If left blank, Deduplication will use system logic to determine which property values to include in the Unified Entity.
Under TLP, select how to determine the TLP marking for Unified Entities:
Highest value from source: Apply the most restrictive TLP value from contributing Entities (e.g., if sources have TLP:GREEN and TLP:AMBER, the Unified Entity will be TLP:AMBER)
Lowest value from source: Apply the least restrictive TLP value from contributing Entities (e.g., if sources have TLP:GREEN and TLP:AMBER, the Unified Entity will be TLP:GREEN)
Pick from preferred source: Inherit the TLP value from the Preferred source selected above
Custom: Select a specific TLP value to assign to all Unified Entities created under this criterium
Under Half-life, select how to determine the Half-life for Unified Entities:
Highest value from source: Apply the highest Half-life value from contributing Entities
Lowest value from source: Apply the lowest Half-life value from contributing Entities
Pick from preferred source: Inherit the Half-life value from the Preferred source selected above
Custom: Select a specific Half-life value to assign to all Unified Entities created under this criterium
Select Save.
(Optional) To create additional matching criteria with different Entity types or deduplication settings:
Select + ADD CONFIGURATION.
Follow the steps above to configure the new matching criterium.
Important: Each Entity type can only be included in one configuration. Once an Entity type has been selected in a configuration, it cannot be selected again in any additional configuration.
Entity matching#
When Deduplication is enabled, the system automatically identifies and unifies Entities that match across multiple sources based on configurable matching criteria. Entities are matched based on their identifiers. The matching logic varies by Entity type:
For Threat Actor, Intrusion Set, Malware, and Tool:
Matching uses both title and alias fields
Valid matches occur when:
title matches title
title matches alias
alias matches alias
For all other Entity types:
Matching uses only the title field
Valid matches occur when:
title matches title
Matching rules:
All string comparisons are case-insensitive and apply character normalization:
Whitespace, hyphens (-), underscores (_), periods (.), and colons (:) are normalized
For example: “APT29”, “APT 29”, “APT-29”, “APT_29”, “APT.29”, and “APT:29” are all considered equivalent
For Threat Actor, Intrusion Set, Malware, and Tool entities, the system automatically removes common prefixes before comparison:
“Threat Actor: “, “Threat Actor - “
“Intrusion Set: “, “Intrusion Set - “
“Malware: “, “Malware - “
“Tool: “, “Tool - “
For example, “Threat Actor: APT29” matches with “APT29” after prefix removal.
Special constraint for Malware:
Family entities are only merged with other family entities
Non-family entities are only merged with other non-family entities
Merge constraints#
Once an Entity is merged into a Unified Entity, it becomes a contributing Entity. Contributing Entities cannot:
Be merged into another Unified Entity
Be manually merged with other Entities
Property consolidation#
The consolidation process runs automatically after any merge or when a contributing Entity is updated. When Entities are unified, the system consolidates their properties using the following logic: General consolidation rules: For most fields, the system follows this priority order:
If a Preferred source is specified, use the value from that source.
If no Preferred source is specified, use the value from the entity with the highest source reliability (A→F)
If reliability is equal or unavailable, use the value from the most recently updated entity
Field-specific behavior:
Text fields (title, descriptions): Use the priority order above
Multi-value fields (tags, aliases, motivations, capabilities): Aggregate values from all contributing entities
Date fields:
Start time: Use the earliest (oldest) date
End time, Observed time: Use the latest (newest) date
Confidence: Use the priority order above; if no Preferred source, use the lowest confidence value
TLP and Half-life: Follow the configuration selection (highest/lowest from source, preferred source, or custom value)
Source fields:
Source name: The Unified Group name
Source reliability: The Unified Group’s reliability value
Disabling the Deduplication Rule#
Disable the deduplication rule by toggling the Status switch to off. The following behavior occurs:
Existing Unified Entities remain intact and searchable. No Unified Entities are deleted or affected by disabling the rule.
Automatic merging of new Entities stops. Subsequently newly ingested Entities will no longer be automatically deduplicated based on the configured matching criteria.
Manual merge actions remain available. Users can still manually merge and un-merge Entities through the user interface.