User Tools

Site Tools


leak_search

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
leak_search [2024/11/20 13:48]
kaduuwikiadmin
leak_search [2025/03/06 13:48] (current)
kaduuwikiadmin [How up to date is the data?]
Line 11: Line 11:
 Monitoring whether your organization’s name appears in Dark Web forums, Onion-, I2P and paste sites can help you detect potential insider threats, enabling you to prevent data leaks and other incidents that may cause damage to your organization. Dark Web monitoring involves actively searching and tracking the Dark Web for information about your organization, including leaked or stolen data, compromised passwords, breached credentials, intellectual property, and other sensitive data. Please note: This functionality refers to a DB search. If you look for a [[https://kaduu.io/live-breach-threat-intelligence/|Live Search]], please consult [[hacker_forum_search_-_surface_web|this]] article. Monitoring whether your organization’s name appears in Dark Web forums, Onion-, I2P and paste sites can help you detect potential insider threats, enabling you to prevent data leaks and other incidents that may cause damage to your organization. Dark Web monitoring involves actively searching and tracking the Dark Web for information about your organization, including leaked or stolen data, compromised passwords, breached credentials, intellectual property, and other sensitive data. Please note: This functionality refers to a DB search. If you look for a [[https://kaduu.io/live-breach-threat-intelligence/|Live Search]], please consult [[hacker_forum_search_-_surface_web|this]] article.
  
-==== How up to date is the data? ====+==== How do we find the leaks? ==== 
 + 
 +Our team of full-time analysts conducts daily monitoring of various platforms, including hacker forums on the surface web, darknet, and Telegram channels. Each analyst is assigned a clearly defined area of focus, ensuring comprehensive coverage across different sources. 
 + 
 +When we discover a data breach being offered for free, we promptly download and thoroughly investigate it. For breaches listed for sale, we acquire sample data whenever possible to notify our clients if their sensitive information might be at risk of exposure. 
 + 
 +All collected information that could hold value for our clients is meticulously indexed. This includes a variety of formats, such as database files (e.g., SQL dumps), physical documents (e.g., Word, Excel, PDF), or text-based leak data (e.g., CSV or TXT files). Each breach undergoes a strict internal verification process by our team to confirm its authenticity and relevance. 
 + 
 +Once verified, we enrich the data with metadata to provide essential context. Metadata includes details such as the source of the leak, the date of the breach, the type of data involved, and other relevant information. After this process, the verified and indexed data is uploaded to our system, making it searchable and accessible to all clients for further investigation. 
 + 
 +**Key Explanations for Potentially Unclear Terms:** 
 +  * Surface web: The publicly accessible portion of the internet that standard search engines like Google can index. 
 +  * Darknet: A part of the internet that requires specific software (e.g., Tor) to access. It's often associated with anonymity and illegal activities but is also used for privacy purposes. 
 +  * Telegram channels: Groups or channels on the Telegram messaging platform, commonly used for communication and sharing information, including by malicious actors. 
 +  * SQL dumps: Copies of entire databases, often leaked during breaches. 
 +  * CSV/TXT files: Common formats for storing text data, typically containing structured information like lists, logs, or tables. 
 +  * Metadata: Additional information about a file or data set that helps to describe, organize, and manage the data more effectively (e.g., time, location, type). 
 + 
 +==== How up to date and accurate is the data? ==== 
 + 
 +Our credential database is updated daily by a dedicated team of analysts who actively monitor and extract data from hacker forums, Telegram channels, and various darknet sources. The credentials available in our database search are those that have already been publicly leaked—often because hackers failed to sell them and instead chose to distribute them for free. 
 + 
 +If you are searching for newer, actively traded credentials, you should use our live search or the hacker forum database search on the deep web. These tools provide real-time insights into fresh leaks before they become widely available. 
 + 
 +**Data Accuracy and Duplicate Entries** 
 + 
 +Credential leaks often get repackaged and redistributed in collections and archives, leading to duplicate entries. While our system works to filter out redundancies, users may still encounter repeated data across different breaches. 
 + 
 +Furthermore, due to the age of many datasets, a significant portion of credentials—often exceeding 90%—may no longer be valid. This occurs because: 
 + 
 +  * Users change their passwords after a breach is exposed. 
 +  * Accounts may be deleted or suspended by the service provider. 
 +  * Credentials become obsolete as new security measures are implemented. 
 + 
 +The older the dataset, the higher the probability that the credentials are no longer functional. Since these credentials are publicly available, they are accessible to anyone, diminishing their immediate value to attackers. 
 + 
 +**Why Monitoring is More Important than Retrospective Analysis** 
 + 
 +Rather than relying solely on static historical reports, continuous monitoring of leaked credentials is essential. A one-time report over an extended period is not as effective as ongoing surveillance because: 
 +Even if 99% of leaked credentials are outdated, the remaining 1% of active credentials still pose a security risk. 
 + 
 +Leaked credentials provide critical intelligence beyond just direct access, such as: 
 + 
 +  * Employee usage of third-party services with company accounts (e.g., logging into Netflix or other non-business platforms using corporate credentials). 
 +  * Password patterns that reveal predictable behavior. For example, if a user previously used Summer2024, there's a chance their next password could be Summer2025. 
 +  * Cross-service password reuse, which allows attackers to map out vulnerabilities across multiple platforms. 
 +  * Exposure assessment, measuring how frequently an employee's email appears in different leaks, making them more susceptible to phishing and targeted attacks.
  
-The database is updated daily from our analysts. We use different [[how_do_we_find_the_data_in_kaduu|discovery methods]] (manual and automated).  
  
 ===== What is a leak? ===== ===== What is a leak? =====
Line 28: Line 73:
  
  
-===== What details are provided within the leaks? =====+===== Leak Details and Downloads =====
  
 You will find basic metadata such as the date of discovery and publication (1). This is the date when the leak was discovered by our team. However, the data may have been stolen earlier. The leak may also include a website reference (2) if the leak originated from a hacked website. You will find basic metadata such as the date of discovery and publication (1). This is the date when the leak was discovered by our team. However, the data may have been stolen earlier. The leak may also include a website reference (2) if the leak originated from a hacked website.
Line 41: Line 86:
  
 {{::leak4.png?900|}} {{::leak4.png?900|}}
 +
 +If you want to investigate the details of the leak, but cant download the file because of the size restrictions, you can use the leakID to investigate the content. Click on the leak to see the leak ID:
 +
 +{{::leakid.png?800|}}
 +
 +Then you can use that ID to search for the content you are intersted in (sample query "jpmorgan.com AND leakId:423a5128-a40b-3050-88be-79b5e07ffaa6"):
 +
 +{{::leakid2.png?900|}}
 +
 +
 +If you want to see all the data in a leak, you can also just query the ID itself like "leakId:423a5128-a40b-3050-88be-79b5e07ffaa6" and export all the data.
  
  
Line 47: Line 103:
 Leak results often contain "Tags" assigned for the leak, which briefly show what information is contained within the leak. The system supports the following tags at the moment: Leak results often contain "Tags" assigned for the leak, which briefly show what information is contained within the leak. The system supports the following tags at the moment:
  
-  Address: "Includes address details from individuals or organizations." +  Address: "Includes address details from individuals or organizations." 
-  Company: "Contains information specifically related to business or organizational addresses." +  Company: "Contains information specifically related to business or organizational addresses." 
-  Credit-Card: "Encompasses credit card data. Note: This applies to the overall leak, and specific search results may not always include credit card information." +  Credit-Card: "Encompasses credit card data. Note: This applies to the overall leak, and specific search results may not always include credit card information." 
-  CSV: "The data in the leak is formatted as a CSV (Comma Separated Values) file." +  CSV: "The data in the leak is formatted as a CSV (Comma Separated Values) file." 
-  DOB: "Contains date of birth, contributing to personal information details." +  DOB: "Contains date of birth, contributing to personal information details." 
-  Email: "Includes one or more email addresses." +  Email: "Includes one or more email addresses." 
-  Hash: "User passwords within the leak are encrypted or hashed." +  Hash: "User passwords within the leak are encrypted or hashed." 
-  Identity: "Involves information that can be used to identify a person or entity, such as ID numbers or unique identifiers." +  Identity: "Involves information that can be used to identify a person or entity, such as ID numbers or unique identifiers." 
-  IP: "Contains IP addresses within the leaked data." +  IP: "Contains IP addresses within the leaked data." 
-  JSON: "The leak is formatted in JSON (JavaScript Object Notation)." +  JSON: "The leak is formatted in JSON (JavaScript Object Notation)." 
-  Log: "Originates from stealer logs, typically containing a variety of extracted user data." +  Log: "Originates from stealer logs, typically containing a variety of extracted user data." 
-  Mix: "A combo list combining various types of data, often usernames and passwords." +  Mix:A combo list combining various types of data, often usernames and passwords." 
-  Name: "Includes the names of individuals." +  Name: "Includes the names of individuals." 
-  Paper: "Contains scanned images of physical documents or papers." +  Paper: "Contains scanned images of physical documents or papers." 
-  Password: "Features clear text passwords without encryption." +  Password: "Features clear text passwords without encryption." 
-  Phone: "Contains one or more phone numbers." +  Phone: "Contains one or more phone numbers." 
-  SQL: "Data is in the form of an SQL (Structured Query Language) dump." +  SQL: "Data is in the form of an SQL (Structured Query Language) dump." 
-  URL: "Includes web addresses or URLs." +  URL: "Includes web addresses or URLs." 
-  Username: "Contains usernames that may be associated with various accounts." +  Username: "Contains usernames that may be associated with various accounts." 
-  Business: "The source of the leak is a business-related app, organization, or service." +  Business: "The source of the leak is a business-related app, organization, or service." 
-  Private: "The source of the leak is from private use apps, organizations, or services." +  Private: "The source of the leak is from private use apps, organizations, or services." 
-  Account: "Includes account information, potentially across various platforms." +  Account: "Includes account information, potentially across various platforms." 
-  Adult: "Associated with adult content or services." +  Adult: "Associated with adult content or services." 
-  PII: "Contains Personally Identifiable Information, which can be used to identify an individual." +  PII: "Contains Personally Identifiable Information, which can be used to identify an individual." 
-  Politics: "The source or content of the leak is associated with political entities or activities."+  Politics: "The source or content of the leak is associated with political entities or activities."
  
  
Line 111: Line 167:
  
 So what is the best search strategy? The answer is: it doesn't exist. Every customer has different domain names or brands. If a customer uses a very generic word or a very short word, the number of search results will be enormous. In such a case, you have to approach it slowly with targeted manual queries. Therefore, each customer should first be individually examined manually by an analyst before creating automated alerts. So what is the best search strategy? The answer is: it doesn't exist. Every customer has different domain names or brands. If a customer uses a very generic word or a very short word, the number of search results will be enormous. In such a case, you have to approach it slowly with targeted manual queries. Therefore, each customer should first be individually examined manually by an analyst before creating automated alerts.
- 
- 
  
  
Line 186: Line 240:
 ==== Reasons for Duplicate Leak Results ==== ==== Reasons for Duplicate Leak Results ====
  
-=== Repacking of Data ===+=== Repacking of Data (Combolists) ===
  
-Many hackers repurpose existing leaks by combining them into new data archives, often referred to as combolists. These are collections of login credentials repackaged for distribution.+Many hackers repurpose existing leaks by combining them into new data archives, often referred to as combolists. These are collections of login credentials repackaged for distribution. In the past most famous combolists had names Like "collection I" or "collection II". In this case a cyber security analyst was collecting all leaks and saved them in one large archive called "collection" which was a so called combo list.
  
 To maintain database integrity, the Kaduu team ensures that combolists with a similarity index of 100% are excluded. A similarity index of 100% means that all the data in the leak already exists in Kaduu's database. To maintain database integrity, the Kaduu team ensures that combolists with a similarity index of 100% are excluded. A similarity index of 100% means that all the data in the leak already exists in Kaduu's database.
Line 209: Line 263:
 This discrepancy is a technical limitation that prevents automated detection of duplicates. This discrepancy is a technical limitation that prevents automated detection of duplicates.
  
-===== Planned Improvements by Kaduu =====+==== Planned Improvements by Kaduu ====
  
 To address these challenges, Kaduu is enhancing its capabilities through the upcoming platform darknetsearch.com. The following measures are planned: To address these challenges, Kaduu is enhancing its capabilities through the upcoming platform darknetsearch.com. The following measures are planned:
leak_search.1732106884.txt.gz · Last modified: 2024/11/20 13:48 by kaduuwikiadmin