Filters Reference

Use the following filters to focus only on the data you need.

Escaping reserved characters

If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for (1+1)=2, you would need to write your query as (1+1)\=2.

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

Parameter
Description
Example

site.domain

Limit the results to a specific site or sites

site.domain:pbbnzshcgemf3d5y.onion

site.type

Filter posts based on the site type. Possible site types:

  • News
  • Blogs
  • Discussions
  • Chats
  • Marketplaces

site.type:blogs

site.name

Filter posts based on the site name (in some cases there are multiple domains with the same name)

site.name:(Tochka Free Market)

site.is_live

A Boolean field (true/false) stating if the site is online or offline

Return pages from domains
that are now online:
site.is_live:true

title

A textual Boolean query describing the keywords that should (or shouldn’t) appear in the thread title

title:(0dayz OR 0days)

text

A textual Boolean query describing the keywords that should (or shouldn’t) appear in the text

text:(0dayz OR 0days)

language

The language of the post. The default is any

Find posts in French or Italian:
(language:french OR language:italian)

author

Return posts written by a specific author or actor

Find posts written by Thewiseguys:
author:Thewiseguys

published

A timestamp (in milliseconds) that enables you to filter pages that were published before or after a certain date/time

Note: field will only be populated if it is automatically detected by our system

Return posts published after
Thu, 30 Mar 2017 09:16:28 GMT: thread.published:>
1490865388000

crawled

A timestamp (in milliseconds) that enables you to filter pages that were crawled before or after a certain date/time

Return pages crawled after
Thu, 30 Mar 2017 09:16:28 GMT:
crawled:>1490865388000

extended.network

Filter posts by network

Possible values are: tor,
zeronet, i2P, openbazaar,
telegram, discord, irc, openweb

extended.external_link

Search for pages that include links to another site

Search for pages that linked to
LinkedIn (note that both the
slashes and colons are prefixed
by a backslash):

extended.external_link:https\:\/\/
www.linkedin.com*

extended.external_image

Search for pages that include links to image files

Search for pages that contain
links to Jpeg image types.

extended.external_image:*.jpg

extended.external_video

Search for pages that include links to video files

Search for pages that linked
contain links to AVI video files.

extended.external_video:*.avi

extended.file_link

Search for pages that include links to files (according to the list supported in extended.file_type)

Search for pages that include
links to pdf files.

extended.file_link:*.pdf

extended.file_type

Filter posts based on the file type crawled. Possible file types:

  • html
  • zip
  • rar
  • tar
  • 7z
  • pdf
  • txt
  • xls
  • xlsx
  • doc
  • docx
  • sql

Search only on sql documents:

extended.file_type:sql

extended.required_login

A Boolean field (true/false) stating if the content is password protected

Return content posted on
forums and marketplaces that
require authentication:

extended.required_login:true

extended.rating

The rating parameter provides the star rating for the review, a floating number between 0.0 to 5.0.

Note: only for reviews on marketplaces

Return all the posts with rating
greater than 0:

rating:>0

enriched.category

Filter posts (English only) that fall into one of the following 7 categories:

  • drugs
  • pii
  • hacking
  • terror
  • weapons
  • sexual
  • financial

Return posts that were
categorized as related to drugs:

enriched.category:drugs

enriched.email.value

Filter by full or partial email address entity

Search for all posts that include
gmail accounts.

enriched.email.value:*@gmail.com

enriched.email.count

Filter by a number of email address mentions per post

Search for all posts that include
more than 40 emails.

enriched.email.count:>40

enriched.ssn.value

Filter by full or partial social security number (SSN)

Search for all posts that include
the following ssn number.

enriched.ssn.value:"630-52-2919"

enriched.ssn.count

Filter by number of social security number (SSN) mentions per post

Search for all posts that include
more than 2 ssns.

enriched.ssns.count:>2

enriched.credit_card.value

Filter by full or partial credit card (CC) number entity

Search for all posts that include
credit card numbers that starts with "4580".

enriched.credit_card.value:4580*

enriched.credit_card.count

Filter by number of credit card mentions per post

Search for all posts that include
more than 20 credit card numbers.

enriched.credit_card.count:>20

enriched.phone.value

Filter by full or partial phone number entity

Search for all posts that include
phone numbers that starts with "+1212".

enriched.phone.value:+1212*

enriched.phone.count

Filter by the number of phone number mentions per post

Search for all posts that include
more than 10 phone numbers.

enriched.credit_card.count:>10

enriched.wallet_id.value

Filter by full or partial crypto-currency wallet ID entity

enriched.wallet_id.count

Filter by the number of crypto-currency wallet mentions per post

Search for all posts that include
more than 10 wallet IDs.

enriched.wallet_id.count:>10

enriched.person.value

Filter by full or partial person name entity

Search for all posts that include
the person name "dan".

enriched.person.value:dan

enriched.person.count

Filter by the number of person name mentions per post

Search for all posts that include
more than 3 persons.

enriched.person.count:>3

enriched.organization.value

Filter by full or partial organization entity name

Search for all posts that include
the organization name "cnn".

enriched.organization.value:cnn

enriched.organization.count

Filter by the number of organization mentions per post

Search for all posts that include
more than 10 organizations.

enriched.organization.count:>10

enriched.location.value

Filter by full or partial location entity name

Search for all posts that include
the location name "israel".

enriched.location.value:israel

enriched.location.count

Filter by the number of locations mentions per post

Search for all posts that include
more than 10 locations.

enriched.location.count:>10

image_label

Find posts containing images with a certain object inside. The object is represented by a label such as person, Wedding, Car etc..

image_label:person

Thread Filters

A thread contains global information about the content of the whole page and its content. A thread can contain multiple posts grouped together.

Parameter
Description
Example

thread.title

A textual Boolean query describing the keywords that should (or shouldn’t) appear in the thread title

Search for posts containing the word "glass" and not "metal" in their title:

thread.title:glass -thread.title:metal

thread.section_title

A textual Boolean query describing the keywords that should (or shouldn’t) appear in the site’s section where the post was published

Search for the posts containing the word food only under sections with a title that contains the word "restaurants":

food AND thread.section_title:restaurants

thread.url

Get all the posts of a specific thread (note that you must escape the http:// part of the URL like so: http\:\/\/).

thread.spam_score

A score value between 0 to 1, indicating how spammy the thread text is

Return threads with spam score lower or equals to 0.8:

thread.spam_score:<=0.8

thread.published

A timestamp (in milliseconds) that enables you to filter posts that were published before or after a certain date/time

Return threads published after Thu, 30 Mar 2017 09:16:28 GMT: thread.published:>
1490865388000


What's Next

Output Reference