, , ,

Where Did My Compromised Data Go? Crawling in the Deep Web

Written By

Michael Neher

Target in 2013, Equifax in 2017, Marriott in 2018, and CapitalOne earlier this year; the news is replete with stories about companies being hacked, compromised, or otherwise falling victim to data breaches. But what happens to these appropriated hauls of data? 

While it is impossible to say what malicious actors may have done with all the data from a given hack, there are several “usual suspects.” Although hacker's reasons for compromising a system are myriad, the most common motivation is monetary gain. Some hackers cash in by holding data hostage, eliciting victims for ransom or cover-up money. But what about others who simply exfiltrate information, how might they profit from this plundered data? Actors seeking to exploit data for profit often seek valuable data – and the most valuable types of data today are Personally Identifiable Information (“PII”) and Protected Health Information (“PHI”). One venue, in particular, distinguishes itself as the most convenient forum for hackers to offer valuable, protected, and ill-gotten data for sale: The Deep Web. 

Often called by its misnomer, the “Dark Web,” the Deep Web is a hidden network that is only accessible via a specialized Internet connection. The Deep Web offers its sites some protection by randomizing website names, such as hXXp://hss3uruhjo2xfogfq.Xnion (characters transposed with ‘X’ for user protection) instead of hXXp://www.notevil.com. These sites are often featured on the news for hosting the sale of illegal items, ranging from drugs to social security numbers to stolen software. Thus, one method for locating leaked data is to crawl the deep web to see if that information is for sale.

“Crawling” is a special type of searching on the Deep Web, as regular indexing search engines (such as Google or Bing) are unable to find Deep Web pages because no links point to them or the pages exist behind logins and captchas (those task-based tests that websites use to tell computers and humans apart, such as typing the letters of a distorted image or matching photos). Crawling involves several techniques to accomplish an effective search, most of which must be done manually to avoid automation detection.

As you might imagine, this requires far more time. Often, sites require logins to even just browse their contents. This necessitates the manual creation of both temporary email addresses and PGP keys (standing for “Pretty Good Privacy – a public key encryption program that has become the most popular standard for email encryption). Even when you can access a Deep Web site, searching sites can be prohibitive. Many Deep Web sites leave out search functions; or, if they do have search features, there is a time restriction to limit the speed of searches. Given these restrictions, crawling the Dark Web involves both automated and manual techniques to effectively find information. 

To find information on the Deep Web, start by combing through indexes. Indexes sort categories of data by fields into categories, not unlike early search engines in the old days of the Internet: One must click-through various categories and see the sites available on the Deep Web. Although indexes are great for finding the new location of Deep Web sites that have been moved due to government take-downs or hacks, they require a user to manually review the categories to find the sought-after information within the Deep Web. When looking for black market PII or PHI, indexes can help locate the Deep Web’s most updated marketplaces and forums

Marketplaces are akin to eBay or Amazon, where the sale of products is conducted. On the deep web, however, these marketplaces are used for illegal sales. Approximately 80% of transactions are for drugs, while the remainder is made up of stolen credit cards, social security numbers, passports, IDs, pirated software, dumps of documents, and other miscellaneous items. This is an ever-changing landscape due to many government initiatives to remove such illegal sites, so the addresses, names, and details are always influx and non-permanent (hence the need to identify them via indexes).

Forums are a great source of information on the Deep Web for investigation purposes. Deep Web forums function just like Reddit [define] or other online forums, except the discussions are mostly focused on illegal topics and criminality, ranging from instructions for selling of stolen data, for making drugs, or even for hiring hitmen. These forums often have minimal features which prevent Deep Web search engines from exploring them too quickly. Again, most forums require a unique user account, which also allows an investigator to act as a trusted source for requests on the forums.

Next, forums and marketplaces are combed to see if anyone is selling relevant data. We search for data that would match the breached information, and search to see whether anyone is asking for help with hacking a similar domain name. The searches are mostly done manually with non-identifiable, but unique information. This means rare last names, document names, business location names, or other information that is likely to be referenced for a sale. It must be noted: due to the unsavory nature of the deep web, no social security numbers, full names, or any PII or PHI should be used for searching to protect the data.

Recently, Capsicum was informed of a Ransomware attack at a client’s medical offices and was asked to determine if a data breach had occurred, among other remediating services. To accomplish the data breach task, Capsicum interviewed key personnel, performed analysis on a backup image of the breached server, searched for relevant data traffic and access logs, confirmed that personal health information (PHI) had been de-identified, and conducted a search of the Deep Web to opine regarding whether the data was exfiltrated

Capsicum learned that our client’s data had been transferred from the client’s server; although because the client had a strong cyber defense in place, social security numbers, addresses, and phone numbers were encrypted. However, names, birthdays and locations were exfiltrated in plain text. Although the nature of the attack indicated that the hackers were focused on the client’s payroll system, out of an abundance of caution, Capsicum searched the Deep Web for evidence of data related to patients that could have resulted from the breach incident.

After searching through over 200 sites on the deep web (such as TORch, NotEvil, ParaZite, SilkRoad and many others), and thousands of pages and forums within those sites, nothing indicating a breach of data from the client took place. There were no signs of either the sale of data or discussions targeting the client or referencing the hack. While no one can crawl the entire Deep Web, especially because many sites are constantly changing, and while the attacker could wait to sell the data, it nonetheless does give the client confidence to know that the stolen data has not been widely broadcast across illicit sites for anyone to find.
About Us

From day one, Capsicum Group, LLC has provided clients in multiple sectors across the globe with complex legal, tech and regulatory programs. For over twenty years, we’ve perfected our team and our services, allowing us to tackle digital forensics, e-discovery, data recovery, cybersecurity and regulatory compliance projects of any size, from clients that range from local governmental agencies to multi-billion-dollar corporations. No matter the level of complexity, Capsicum Groups’ team of tech and legal experts is equipped with the leading-edge technology and intensive strategies for your success. Learn more about our forensic recovery and cybersecurity services in California, Florida, New York, Philadelphia, and Texas by connecting with us at www.capsicumgroup.com.

If you have additional questions regarding any of

our services, please feel free to contact us either by phone or email.