Search | Using Tokenizers#
You can apply a tokenizer to your searches on EclecticIQ Intelligence Center.
Ingested data is indexed in Elasticsearch. Elasticsearch analyzes incoming data streams, and it breaks up data into tokens.
Tokens are smaller meaningful bits of information. The tokenization process is based on predefined rule sets.
If a data field is not mapped in the Elasticsearch index mapping , Elasticsearch stores also non-analyzed versions of the analyzed and tokenized data.
This version of the data holds the original, non-analyzed and non-tokenized, value of the data.
Elasticsearch can apply multiple tokenizers to text fields. This enables searching for and retrieving content using different search strategies:
Search based on the Elasticsearch standard tokenizer .
Search based on the Elasticsearch pattern tokenizer .
Search based on an alphanumeric tokenizer that uses any non-alphanumeric characters as token separators (
[^a-zA-Z0-9_]
).Search for non-tokenized data.
Search for non-tokenized data spelled backward (reverse text).
Search for tokens and keywords#
You can search for analyzed and tokenized, as well as for non-analyzed and non-tokenized data.
Elasticsearch analyzes and tokenizes ingested content using its grammar-based standard tokenizer : it splits content in text elements, based on the Unicode Text Segmentation algorithm.
Example: A search for data.city_name.tokens:"King's Landing"
returns [ King's, Landing ]
You can also search for indexed content based on different tokenization criteria.
To do so, append the following parameters to the JSON paths pointing to the JSON data field names whose values you want to look up:
Parameter |
Description |
---|---|
|
Apply the alphanumeric tokenizer. Use any
non-alphanumeric characters as token separators ( This is useful when searching alphanumeric IDs that should not be split into multiple tokens. Token delimiters include white space, punctuation, hyphen, apostrophe, and quotes. Example: A search for |
|
Apply the Elasticsearch keyword tokenizer. It returns the data exactly as it was received. The output data is the same as the corresponding input. This is useful when searching text where words are joined together by characters such as hyphens, underscores, or other characters that the other tokenizers would interpret as token separators. Example: A search for
|
|
Apply the Elasticsearch reverse token filter. It reverses the order of the original input data. Example: A search for
|
Examples
Add tokens
, keyword
, or keyword_r
to the JSON data
field names whose values you want to search and retrieve.
The following examples search for observable values and enrichment observable values.
Field |
Description |
---|---|
|
Non-alphanumeric characters are token separators. Non-alphanumeric characters in the observable value are replaced and then split by whitespace to create tokens. |
|
The original observable value is returned as is, without any modifications. |
|
The original observable value is returned spelled backward (reverse text). |
|
Non-alphanumeric characters are token separators. Non-alphanumeric characters in the enrichment observable value are replaced and then split by whitespace to create tokens. |
|
The original enrichment observable value is returned as is, without any modifications. |
|
The original enrichment observable value is returned spelled backward (reverse text). |
Search for raw field values#
You can bypass tokenization and search for raw,
non-tokenized field values by appending a trailing .raw
element.
To access raw, non-tokenized values in a field, append a
trailing .raw
element to the JSON path representing the
field name.
Format: ${field.namejson.path}.raw
Example
Field |
Description |
---|---|
|
Enables accessing the indexed, tokenized field value. It is possible to retrieve the field value by looking for any of its constituent tokens. Any search literal or data pattern that matches any, or at least one word in the title, returns the whole title content. In the example, the field returns an entity name or its alias, if any; otherwise, its STIX title. |
|
Enables accessing the indexed, non-tokenized field value. It is possible to retrieve the field value by looking for the whole field value as a string. In the example, the field returns an entity name or its alias, if any; otherwise, its STIX title. |
Search in root elements other than data
#
To specify selection criteria pointing to entity data outside the predefined data root JSON object, you can define a different root element than data.
For example, you may want a rule to return matches based on specific tags, metadata, or observable attributes.
To set a JSON path defining a field name other than data
as a root field, prefix the field name with raw.
:
raw.
must be the first element in the JSON path defining the field name.The second element in the JSON path after
raw.
becomes the designated JSON path root element for the specified path.
Example
|
Custom root field |
Targeted entity data |
---|---|---|
|
tags |
Enables accessing entity tag field values through searching, filtering, and rules. |
|
extracts.kind |
Enables accessing observable type field values through searching, filtering, and rules. |