Data Breach Detection: uncover potential Personal Information (PII) leaks for Organizations and People using Webhose.io
As you can see in the diagram below, our main data sources are either database dumps or a leaked snippet from any source in the web or darknets. We start by cutting each dump into documents of up to 100 records, then we sanitize the data in order to keep compromised data safe.
For example - credit cards, social security numbers (SSN) and more.
PII Leaks Data Process
Our Data Breach Doc is composed of the following main sections:
- root - includes general fields related to the document such as UUID (unique identifier to the document in our repository), author, referring_url, language, text, crawled
- file - the leaked physical file properties
- leak - the leak metadata, such as name, compromised fields
- records - list of one or more records found in the document that are related to the query
Below is an example leak found for a query using the email value
Example of a PII leaks result
As explained above, we process each leaked file as follows:
Leak file is split into List of Documents
Each Document can contain up to 100 Records
A record is a row in the leaked file
This means in each query one can receive 1 Document and up to 100 records
The next page will lead to the next Document with up to 100 records
The consumption model will be similar to Cyber API, the user is limited to a monthly quota of API queries, please refer to the Sales for more information.
As can be seen below, the field moreDocsAvailable holds the number of remaining documents that include leaked information.
Access to the service is limited due to security reasons
We have two models of access to the API:
This means only full details of the searched entity can be provided in the query
Query - http://webhose.io/piiFilter?token=XXX&format=json&q=email.value:email@example.com
Exact Email Search
This means that partial text queries are allowed, based on the Webhose approval process
Query - http://webhose.io/piiFilter?token=xxx&format=json&q=email.value:*@webhose.io
Query - http://webhose.io/piiFilter?token=xxx&format=json&q=cc.value:4580*
For more information about the permissions, please refer to the Domain Authorization API section.
More information on the fields is provided in the Output Reference Section.