Crawling TOR V3: we now automatically discover and scan the more anonymous & secure TOR V3 onions.
Crawling external files content: Webhose data crawler downloads and scans the content of external files, even if they are compressed. The following formats are supported: zip, rar, tar, 7z, pdf, txt, xls, xlsx, doc, docx, sql.
Darkweb only: our system automatically crawls content behind password protected forums and marketplaces. Captchas and passwords are no longer a barrier.
Saving structured data from news & blogs: we now supply content from news sites and blogs on the dark web, not only forums and marketplaces.
Text Categorization: using machine learning, Webhose is now trained to identify and categorize relevant posts in eight categories: Drugs, IDs & Passwords, Passports (PII), Hacking, Terror, Weapons, Pirate, Sexual, and Financial, giving you the ability to consume relevant data by category.
New field - rating: for reviews only on marketplaces.
New field - files_links: list of all files (internal & external) URLs that we found in the current post.
New field - external_videos: list of links to videos files.
New field - referring_url: The URL that referred our crawler to the current doc URL.
New field - file_type: The format of the file that contains the content crawled (zip, rar, tar, 7z, pdf, txt, xls, xlsx, doc, docx, sql).
New field - required_login: Boolean field that specifies if the current content was found in a password protected site.