Differences

This shows you the differences between two versions of the page.

--- leak_search [2024/04/24 17:11]
kaduuwikiadmin [API Skript to extract accounts from leaks]
+++ leak_search [2025/05/21 09:38] (current)
kaduuwikiadmin
@@ Line 11: / Line 11: @@
 Monitoring whether your organization’s name appears in Dark Web forums, Onion-, I2P and paste sites can help you detect potential insider threats, enabling you to prevent data leaks and other incidents that may cause damage to your organization. Dark Web monitoring involves actively searching and tracking the Dark Web for information about your organization, including leaked or stolen data, compromised passwords, breached credentials, intellectual property, and other sensitive data. Please note: This functionality refers to a DB search. If you look for a [[https://kaduu.io/live-breach-threat-intelligence/|Live Search]], please consult [[hacker_forum_search_-_surface_web|this]] article.
-==== How up to date is the data? ====
+==== How do we find the leaks? ====
+Our team of full-time analysts conducts daily monitoring of various platforms, including hacker forums on the surface web, darknet, and Telegram channels. Each analyst is assigned a clearly defined area of focus, ensuring comprehensive coverage across different sources.
+When we discover a data breach being offered for free, we promptly download and thoroughly investigate it. For breaches listed for sale, we acquire sample data whenever possible to notify our clients if their sensitive information might be at risk of exposure.
+All collected information that could hold value for our clients is meticulously indexed. This includes a variety of formats, such as database files (e.g., SQL dumps), physical documents (e.g., Word, Excel, PDF), or text-based leak data (e.g., CSV or TXT files). Each breach undergoes a strict internal verification process by our team to confirm its authenticity and relevance.
+Once verified, we enrich the data with metadata to provide essential context. Metadata includes details such as the source of the leak, the date of the breach, the type of data involved, and other relevant information. After this process, the verified and indexed data is uploaded to our system, making it searchable and accessible to all clients for further investigation.
+**Key Explanations for Potentially Unclear Terms:**
+  * Surface web: The publicly accessible portion of the internet that standard search engines like Google can index.
+  * Darknet: A part of the internet that requires specific software (e.g., Tor) to access. It's often associated with anonymity and illegal activities but is also used for privacy purposes.
+  * Telegram channels: Groups or channels on the Telegram messaging platform, commonly used for communication and sharing information, including by malicious actors.
+  * SQL dumps: Copies of entire databases, often leaked during breaches.
+  * CSV/TXT files: Common formats for storing text data, typically containing structured information like lists, logs, or tables.
+  * Metadata: Additional information about a file or data set that helps to describe, organize, and manage the data more effectively (e.g., time, location, type).
+==== How many forums do we cover? ====
+The Ecosystem of Cybercriminal Forums and Channels
+The dark web and deep web contain a complex ecosystem of websites where various types of stolen data are exchanged. These platforms include:
+  * Hacker forums
+  * Credit card shops
+  * Stealer log markets
+  * Document forgery hubs
+  * Bank credential resale forums
+  * Telegram channels and groups
+These platforms vary in accessibility and intent. Some are public, but most require registration or even invitation. On these platforms, actors either sell or give away data, depending on its freshness, quality, and strategic value.
+**Why Some Data is Free and Others Are Sold**
+  * Free leaks: Often older data, reused credentials, public breaches, or given away to gain reputation.
+  * For-sale data: Usually fresh stealer logs, newly acquired credit card dumps, banking credentials, PII, checks, or synthetic identities.
+**Data Types Monitored**
+Kaduu focuses on the following categories:
+  * Leaked account credentials (email-password combos)
+  * Stealer logs (logins, browser sessions, cookies)
+  * Bank logins (online banking access)
+  * Credit card data (dumps, fullz)
+  * Checks and cash-out materials (US, EU, UK)
+  * Fake or stolen documents (passports, IDs, utility bills)
+  * Personal Identifiable Information (PII) (name, SSN, address, DOB)
+**Our Coverage as of February 2025**
+Kaduu monitors a broad range of sources across the darknet and deep web. We distinguish between automated crawling and manual investigations by our analyst team:
+  * For well-structured sites such as forums where credit card data is traded, or paste sites, we use automated scrapers that visit these platforms at predefined intervals. This process is fully automated, and the extracted data is stored directly in our database.
+  *
+  * Our analyst team manually visits a curated list of forums and Telegram channels on a daily basis to identify potential data leaks. After thorough inspection, any relevant findings are manually labeled and uploaded to our system for further analysis.
+The following statistics provide insight into our infrastructure:
+**1. Forums Specialized in Credit Cards, Accounts, and Checks**
+Total monitored: 154
+  * Require authentication: 151
+  * Tor-based (dark web): 64
+  * Clearnet (deep web): 90
+These forums are often highly specialized and structured. Our tools focus on extracting listings of items for sale such as credit card batches or fullz packages.
+**2. Hacker Forums**
+  * Total monitored: 303
+  * Visited manually daily: 41
+  * Crawled daily by tools: 23
+  * Occasionally visited: 239
+Manual visits target forums with irregular structures or where members share free leaks. This enables human analysts to filter, extract, and describe valuable data that might otherwise be missed.
+**3. Telegram Channels**
+  * Total monitored: 538
+  * Parsed daily by tools: 534
+  * Manually checked daily: 4
+Telegram has become a major hub for distributing stealer logs, combo lists, and free leaks. Parsing tools extract relevant messages and attachments. Manual visits focus on groups with obfuscated or irregular data drops.
+**4. Paste Sites**
+  * Total monitored: 34
+  * Parsed daily by tools: 34
+  * Manually checked daily: 0
+Paste sites are used to exchange information anonymously. We scrape them daily and save the data in our database.
+**Manual vs. Automated Monitoring**
+  * Manual Review: Crucial for detecting free leaks, irregular formats, and human interpretation. Analysts download and inspect content, match it against existing data, and classify it for clients.
+  * Automated Scraping: Ideal for structured data listings, especially in well-organized shops. These tools collect sale offers with metadata (e.g., date, price, type of data) and push them into the database for client search and alerting.
+==== How up to date and accurate is the data? ====
+Our credential database is updated daily by a dedicated team of analysts who actively monitor and extract data from hacker forums, Telegram channels, and various darknet sources. The credentials available in our database search are those that have already been publicly leaked—often because hackers failed to sell them and instead chose to distribute them for free.
+If you are searching for newer, actively traded credentials, you should use our live search or the hacker forum database search on the deep web. These tools provide real-time insights into fresh leaks before they become widely available.
+**Data Accuracy and Duplicate Entries**
+Credential leaks often get repackaged and redistributed in collections and archives, leading to duplicate entries. While our system works to filter out redundancies, users may still encounter repeated data across different breaches.
+Furthermore, due to the age of many datasets, a significant portion of credentials—often exceeding 90%—may no longer be valid. This occurs because:
+  * Users change their passwords after a breach is exposed.
+  * Accounts may be deleted or suspended by the service provider.
+  * Credentials become obsolete as new security measures are implemented.
+The older the dataset, the higher the probability that the credentials are no longer functional. Since these credentials are publicly available, they are accessible to anyone, diminishing their immediate value to attackers.
+**Why Monitoring is More Important than Retrospective Analysis**
+Rather than relying solely on static historical reports, continuous monitoring of leaked credentials is essential. A one-time report over an extended period is not as effective as ongoing surveillance because:
+Even if 99% of leaked credentials are outdated, the remaining 1% of active credentials still pose a security risk.
+Leaked credentials provide critical intelligence beyond just direct access, such as:
+  * Employee usage of third-party services with company accounts (e.g., logging into Netflix or other non-business platforms using corporate credentials).
+  * Password patterns that reveal predictable behavior. For example, if a user previously used Summer2024, there's a chance their next password could be Summer2025.
+  * Cross-service password reuse, which allows attackers to map out vulnerabilities across multiple platforms.
+  * Exposure assessment, measuring how frequently an employee's email appears in different leaks, making them more susceptible to phishing and targeted attacks.
-The database is updated daily from our analysts. We use different [[how_do_we_find_the_data_in_kaduu|discovery methods]] (manual and automated).
 ===== What is a leak? =====
@@ Line 28: / Line 155: @@
-===== What details are provided within the leaks? =====
+===== Leak Details and Downloads =====
 You will find basic metadata such as the date of discovery and publication (1). This is the date when the leak was discovered by our team. However, the data may have been stolen earlier. The leak may also include a website reference (2) if the leak originated from a hacked website.
@@ Line 41: / Line 168: @@
 {{::leak4.png?900|}}
+If you want to investigate the details of the leak, but cant download the file because of the size restrictions, you can use the leakID to investigate the content. Click on the leak to see the leak ID:
+{{::leakid.png?800|}}
+Then you can use that ID to search for the content you are intersted in (sample query "jpmorgan.com AND 	leakId:423a5128-a40b-3050-88be-79b5e07ffaa6"):
+{{::leakid2.png?900|}}
+If you want to see all the data in a leak, you can also just query the ID itself like "leakId:423a5128-a40b-3050-88be-79b5e07ffaa6" and export all the data.
@@ Line 47: / Line 185: @@
 Leak results often contain "Tags" assigned for the leak, which briefly show what information is contained within the leak. The system supports the following tags at the moment:
-  * account - contains account data (login and password pair)
+  - Address: "Includes address details from individuals or organizations."
-  * address - physical address information
+  - Company: "Contains information specifically related to business or organizational addresses."
-  * company - company name
+  - Credit-Card: "Encompasses credit card data. Note: This applies to the overall leak, and specific search results may not always include credit card information."
-  * credit-card - credit card details: a number and an expiry date
+  - CSV: "The data in the leak is formatted as a CSV (Comma Separated Values) file."
-  * csv - the leak contains files with comma-separated values
+  - DOB: "Contains date of birth, contributing to personal information details."
-  * dob - date of birthday
+  - Email: "Includes one or more email addresses."
-  * email - the leak contains email addresses
+  - Hash: "User passwords within the leak are encrypted or hashed."
-  * hash - hashed passwords
+  - Identity: "Involves information that can be used to identify a person or entity, such as ID numbers or unique identifiers."
-  * identity - identity information - passport numbers, SSNs, etc
+  - IP: "Contains IP addresses within the leaked data."
-  * ip - IP addresses
+  - JSON: "The leak is formatted in JSON (JavaScript Object Notation)."
-  * json - JSON file
+  - Log: "Originates from stealer logs, typically containing a variety of extracted user data."
-  * log - application log file
+  - Mix:A combo list combining various types of data, often usernames and passwords."
-  * mix - mixed data format
+  - Name: "Includes the names of individuals."
-  * name - person's first and last names
+  - Paper: "Contains scanned images of physical documents or papers."
-  * paper - the leak contains company's internal papers and documents - PDFs, Excel sheets, etc
+  - Password: "Features clear text passwords without encryption."
-  * password - clear text password data
+  - Phone: "Contains one or more phone numbers."
-  * phone - phone number
+  - SQL: "Data is in the form of an SQL (Structured Query Language) dump."
-  * sql - the leak contains SQL dump files
+  - URL: "Includes web addresses or URLs."
-  * url - links to websites
+  - Username: "Contains usernames that may be associated with various accounts."
-  * username - the leak contains user names or user aliases, which are different from email addresses
+  - Business: "The source of the leak is a business-related app, organization, or service."
+  - Private: "The source of the leak is from private use apps, organizations, or services."
+  - Account: "Includes account information, potentially across various platforms."
+  - Adult: "Associated with adult content or services."
+  - PII: "Contains Personally Identifiable Information, which can be used to identify an individual."
+  - Politics: "The source or content of the leak is associated with political entities or activities."
@@ Line 106: / Line 249: @@
 So what is the best search strategy? The answer is: it doesn't exist. Every customer has different domain names or brands. If a customer uses a very generic word or a very short word, the number of search results will be enormous. In such a case, you have to approach it slowly with targeted manual queries. Therefore, each customer should first be individually examined manually by an analyst before creating automated alerts.
@@ Line 144: / Line 285: @@
 ===== Leak Dates =====
-Every leak or entry in kaduu has different dates. Example:
+=== Definition ===
+Each leak or entry in Kaduu has multiple associated dates, which help to understand its timeline and relevance. Here’s an example:
+Leak XXX; Publish Date: 2021-08-18; Discover Date: 2021-10-20; Creation Time: 2022-02-04 10:45:30
+  * Publish Date: This is the estimated date when the leak occurred or possibly first appeared on the darknet. It signifies the initial exposure of the information, often marking the beginning of its journey through the dark web.
+  * Discover Date: This is when our CTI solution first identified the leak. At this point, our analysts or automated tools detected the breach during routine scans of dark web marketplaces, forums, and private channels.
+  * Creation Date: This is when the leak was officially indexed in our database. Once a breach is verified, tagged, and cataloged by our system, it becomes accessible for further analysis.
+=== Are Old Leaks Still Valid? ===
+In assessing the relevance of a leak, it’s essential to distinguish between paid and free leaks. Hackers typically attempt to monetize data by offering it for sale on specialized forums. If the data doesn’t sell, the price often decreases over time, eventually becoming available for free a few months post-breach. As a result, there can be a time gap of up to six months between when a leak is first offered for sale and when it appears in the free leaks section of Kaduu.
+For real-time alerts on data that is still actively being sold, Kaduu provides a live hacker forum query feature. Learn more here.
+=== Do Leaks Retain Their Exploitability Over Time? ===
+Surprisingly, even older leaks can retain a 5% success rate for credential validity. This might seem small, but with thousands or even millions of credentials in each leak, this percentage can result in a significant number of active accounts. Given the sheer volume of credentials released daily, it’s impractical for attackers to exploit them all in real time. Hackers usually focus on data that is easy to monetize, such as access to streaming platforms or shopping sites, rather than corporate applications like SAP or webmail.
+Another crucial point is password patterns. Many users follow predictable patterns, such as "PaSSw0rd2024June," which means an expired password may still be relevant, as similar credentials can be anticipated for future periods (e.g., July). Additionally, breached credentials may highlight corporate policy violations—like a company email being used on personal websites (e.g., dating sites), signaling a need for awareness training.
+=== Reasons for Discrepancies Between Publish and Creation Dates ===
+Different dates associated with a leak can often reflect varying stages of its lifecycle, from initial breach to public disclosure. Here’s why discrepancies may occur:
+  * Short Delays (1-2 Days): The discovery and indexing of leaks involve a series of manual steps. Our team of analysts actively monitors hundreds of dark web channels and groups daily, manually downloading and inspecting leak files. Some links may be temporarily unavailable, forums may experience downtime, or files may need verification. After inspection, the files are tagged, labeled, and added sequentially to our database, creating a standard delay of around 10-12 hours due to the manual process. While this time gap exists, leaks published for free tend to remain available for months, minimizing immediate exploitation risks. For clients requiring real-time monitoring, daily or live monitoring of for-sale leaks is recommended over free leak tracking.
+  * Extended Delays (Weeks or Months): Sometimes, the publish date reflects the actual date of the breach, while the data itself may not be immediately available. Hackers often exploit the data for their own purposes before releasing it publicly. For example, a hacker might use the data for fraudulent transactions or identity theft before attempting to monetize it. In cases like these, the press may report a security breach early on, but the data might not appear on dark web forums until a year later. To help clients understand the full context, we always include the publish date to reflect the initial breach timing.
+===== Leak Result Duplicates =====
+In Kaduu, users might notice that some leaks are reported multiple times. This duplication can occur due to various factors, including the way data is handled by hackers and the inherent challenges of processing leaked information. Below is an explanation of why such duplicates appear and how they are managed:
+==== Reasons for Duplicate Leak Results ====
+=== Repacking of Data (Combolists) ===
+Many hackers repurpose existing leaks by combining them into new data archives, often referred to as combolists. These are collections of login credentials repackaged for distribution. In the past most famous combolists had names Like "collection I" or "collection II". In this case a cyber security analyst was collecting all leaks and saved them in one large archive called "collection" which was a so called combo list.
+To maintain database integrity, the Kaduu team ensures that combolists with a similarity index of 100% are excluded. A similarity index of 100% means that all the data in the leak already exists in Kaduu's database.
+However, combolists with a lower similarity index, such as 90%, are retained. This is because such leaks may still contain new and relevant data. Unfortunately, it is not possible to filter out the 90% overlaps programmatically.
+=== Reuse of Credentials Across Platforms ===
+Sometimes, the same credentials appear in multiple leaks because users often reuse their login information across different platforms or websites.
+For example, the same username and password might be discovered on various servers, indicating poor security practices.
+This is considered significant because it provides insights into a user's vulnerability across platforms.
+=== Inconsistent Formatting of Data ===
+Leaks often vary in their formatting and syntax, which can make it challenging to identify duplicate information. For example:One leak might present credentials as https://website.com:username:password, while another might use the format username:pwd:website.com.
+Due to the differing order of elements, these entries are treated as distinct in Kaduu's system, even though they represent the same credentials.
+This discrepancy is a technical limitation that prevents automated detection of duplicates.
+==== Planned Improvements by Kaduu ====
+To address these challenges, Kaduu is enhancing its capabilities through the upcoming platform darknetsearch.com. The following measures are planned:
-Publish Date	2021-08-18
+==== Duplicate Filtering in Alerts ====
-Discover Date	2021-10-20
-Creation Time	2022-02-04 10:45:30
+The new platform will include features to filter duplicate entries in alerts. This will help clients manage the redundancy in leak reports more effectively.
-The publish date is when we think the leak happened. The Discover date is when we discovered the leak. The creation date is when the leak was indexed in our DB. Thats why you should only filter by publish date to associate the leak with the correct date!
+==== Optional Duplicate Visibility ====
+Since some clients find value in tracking where a user’s credentials appear (e.g., in which forums or combolists), the duplicate filtering feature will not be activated by default.
+This allows users to choose whether they want to view all instances of a user's credentials or only unique occurrences.
 ===== Asterisks in leak details =====

My DokuWiki

User Tools

Site Tools

Differences

Page Tools