Regular expressions for extracting observables#

The browser extension uses a set of pre-defined regular expressions to extract observables from web documents. These regular expressions are used when you extract observables using the context menu.

You can find these regular expressions here:

  1. Click the EclecticIQ Browser Extension icon EIQ icon on your browser toolbar.

  2. In the open browser extension window, select the gear icon Settings to open the Settings page.

  3. With the Settings page open, select the Regular Expressions tab.

    Regular expressions in the browser extension.

    Regular expressions in the browser extension.#

Customize regular expressions#

To customize these regular expressions used by the browser extension:

  1. Click the EclecticIQ Browser Extension icon EIQ icon on your browser toolbar.

  2. In the open browser extension window, select the gear icon Settings to open the Settings page.

  3. With the Settings page open, select the Regular Expressions tab.

  4. Edit the field for the observable type you need a custom expression for.

  5. Select Save Options to save your settings. The next time you use the context menu to extract observables from a web document, your custom regular expressions will be applied.

Defanged observables#

When collecting IoCs (Indicators of Compromise) from various sources of threat intelligence, you may find that the source has “defanged” the IoCs published.

The browser extension can extract observables from most kinds of defanged IoCs, such as URIs written as:

# Example of 'defanged' IoC
hxxp://www.example.com

If the browser extension cannot find and extract the defanged IoC, you can customize the regular expression the browser extension uses to extract that observable type.

Note

The browser extension does not include the square brackets when it extracts an IPv4 observable from defanged IPv4 addresses. It does this using non-capturing groups.

For more information, see Capturing and non-capturing groups.

Multiple matches in a single IoC#

You can extract more than one observable from a single IoC if it matches more than one of the regular expressions the browser extension is configured with.

By default, the browser extension uses regular expressions that avoid matching a single IoC more than once. To have the browser extension extract more than 1 observable from a single IoC, you have to rewrite the regular expressions it uses.

For example, to have the browser extension extract both a domain and e-mail address from <user@example.com>, set the browser extension to use the following regular expressions:

Observable type

Regular expression

Domain

((?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+)([a-z0-9][a-z0-9-]{0,61}[a-z])

E-mail

([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?))

Capturing and non-capturing groups#

The browser extension only extracts text that matches capturing groups. This means that the browser extension:

  • Extracts text that matches a capturing group, written as (\<expression_here>);

  • Does not extract text that matches a non-capturing group, written as (?:\<expression_here>);

  • Does not extract text that is not written in a capturing group. For example, in the expression:

    \w+\@(\w+\.\w+)
    

    The \\w+\\@ portion of the regular expression is matched but not included as part of the extracted result by the browser extension.

    Note

    The browser extension by default does this with IPv4 addresses. It excludes the square brackets in defanged IPv4 addresses from extraction. For example, 1\[.\]1\[.\]1\[.\]1 is extracted as 1.1.1.1.

For example, to match only the domain portion of an email address, we can write:

(?:\w+\@)(\w+\.\w+)

We can break this expression down into:

  • A non-capturing group written as (?:\w+\@);

  • and a capturing group written as (\w+.\w+).

Capturing and non-capturing groups.

Capturing and non-capturing groups.#

Using this regular expression, the browser extension will match all email addresses, but only extract the domain portion of the email address.