Data Breach Detection

Data Breach Detection: uncover potential Personal Information (PII) leaks for Organizations and People using

Leaked Data Processing

As you can see in the diagram below, our main data sources are either database dumps or a leaked snippet from any source in the web or darknets. We start by cutting each dump into documents of up to 100 records, then we sanitize the data in order to keep compromised data safe.
For example - credit cards, social security numbers (SSN) and more.

PII Leaks Data Process

PII Leaks Data Process

Data Breach Document - Main Sections

Our Data Breach Doc is composed of the following main sections:

  • root - includes general fields related to the document such as UUID (unique identifier to the document in our repository), author, referring_url, language, text, crawled
  • file - the leaked physical file properties
  • leak - the leak metadata, such as name, compromised fields
  • records - list of one or more records found in the document that are related to the query

Below is an example leak found for a query using the email value

Example of a PII leaks result

Example of a PII leaks result

Data Breaches - Data Consumption Model & Expected Records.

As explained above, we process each leaked file as follows:
Leak file is split into List of Documents
Each Document can contain up to 100 Records
A record is a row in the leaked file
This means in each query one can receive, per page, 1 Document and 1 to 100 records based on the matched entities.
The next page will lead to the next Document with 1 to 100 records, again # of records is determined by the matched entities queried.
In most of cases, each document expected to contain 1 records of the entity leaked, as the documents are part of a bigger leak file and the chances that the same entity (email or credit card) will reappear are low.
The consumption model will be similar to Cyber API, the user is limited to a monthly quota of API queries, please refer to the Sales for more information.

As can be seen below, the field moreDocsAvailable holds the number of remaining documents that include leaked information.

Permission Model

Access to the service is limited due to security reasons
We have two models of access to the API:

Exact Model -

This means only full details of the searched entity can be provided in the query
Query -[email protected]

Exact Email Search

Exact Email Search

Partial Mode -

This means that partial text queries are allowed, based on the Webhose approval process
Query -*
Query -*

For more information about the permissions, please refer to the Domain Authorization API section.

More information on the fields is provided in the Output Reference Section.

Updated 2 months ago

Data Breach Detection

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.