| Vertical | Records |
|---|---|
| Loading from DuckDB... | |
| Source | Vertical | Records |
|---|---|---|
| Loading from DuckDB... | ||
| Source | Est. Size | Data Types | Why Prohibited | Status |
|---|---|---|---|---|
| Facebook 2021 Leak | 533M records ~32M US |
Name, phone, email, DOB, location, gender, employer | API vulnerability exploit. Meta sued scrapers. GDPR fines in EU. | BANNED |
| LinkedIn 2021 Scrape | 700M profiles | Name, email, phone, employer, title, LinkedIn URL | API abuse. LinkedIn sued HiQ. Includes non-public profile data. | BANNED |
| National Public Data 2024 | 2.9B records | Name, SSN, DOB, address, phone | Contains SSNs. Jericho Pictures breach. Identity theft liability. | BANNED |
| Equifax 2017 | 147M records | Name, SSN, DOB, address, DL numbers | Credit bureau breach. Chinese military indicted. SSN+DL data. | BANNED |
| T-Mobile Breaches (2021-23) | 76M+ records | Name, SSN, DOB, phone, IMEI, DL | Carrier data with SSN/device IDs. Criminal prosecution ongoing. | BANNED |
| Optum/Change Healthcare 2024 | 100M+ records | PHI: name, DOB, SSN, medical records, insurance | HIPAA-protected health data. Federal criminal liability. | BANNED |
| India Aadhaar Leaks | 1.1B+ records | Name, Aadhaar#, biometrics, address | Foreign gov ID. Not US residents. Biometric data. Multi-jurisdiction. | BANNED |
| Exactis 2018 / Apollo.io 2018 | 540M combined | Name, address, phone, email, interests, employer | Marketing DB leak + B2B sales breach. Includes scraped LinkedIn. | BANNED |
| Clearview AI Face Database | 30B+ images | Facial images, source URLs, biometric vectors | Scraped social media. Banned/fined globally. Biometric = extreme risk. | BANNED |
| Collection #1-5 / Combo Lists | 2.7B+ combos | Email + password pairs, credential dumps | Aggregated dark web credential dumps. Zero legitimate use. | BANNED |
| MOVEit 2023 (Cl0p ransomware) | Tens of millions | Varies by org — PII, financial, HR data | Supply chain attack. Stolen from 2,500+ orgs. Ransomware origin. | BANNED |
| Source | Est. Size | Data Types | Why Risky | Status |
|---|---|---|---|---|
| Credit Header Data | ~250M adults | Name, SSN, DOB, phone, all historical addresses | CFPB actively closing “non-FCRA” loophole. Requires CRA relationship. | AVOID |
| DMV Records (Bulk) | ~230M drivers | Name, address, DOB, DL#, vehicle registration | DPPA requires permissible purpose. Federal violation without auth. | AVOID |
| USPS NCOA (Unlicensed) | 160M moves | Name, old address, new address, move date | Full license $360K/yr. Reseller use violates USPS terms. | AVOID |
| App SDK Location Data | Billions of pings | Device ID, GPS lat/lon, timestamp, app source | FTC enforcement (X-Mode/InMarket). State privacy laws expanding. | AVOID |
| RTB / Ad-Tech Bid Stream | Billions of events | Device IDs, IPs, location, browsing history | EU found RTB violates GDPR. US enforcement catching up. | AVOID |
| Scraped Social Media Profiles | Billions | Name, bio, photos, connections, posts | Platform ToS prohibit. Clearview-style lawsuits. hiQ v. LinkedIn grey. | AVOID |
| Mugshot Websites (Bulk) | Millions | Arrest photo, name, charges, booking date | ~20 states passed anti-mugshot laws. Ethical/presumption-of-innocence. | AVOID |
| Genealogy DNA / 23andMe | 30M+ profiles | DNA data, ethnicity, family trees, health traits | GINA + state genetic privacy laws. 23andMe bankruptcy data risk. | AVOID |
| Utility Connection Records | ~130M households | Name, address, connection date, utility type | Privacy unclear. Some states restrict utility data sharing. | AVOID |
| Rental / Tenant Screening | Millions | Eviction records, rental history, credit | FCRA-regulated. No permissible purpose. Eviction sealing trend. | AVOID |
| Source | Est. Size | Requirement | Status |
|---|---|---|---|
| DEA Registrant Database | ~1.6M | NTIS subscription ($$$) | LOCKED |
| FINRA BrokerCheck | 600K+ brokers | No bulk download; ToS prohibits scraping | LOCKED |
| OpenCorporates | 100M+ companies | Commercial API license ($$$) | LOCKED |
| L2 Political Voter Data | 213M+ enhanced | ~$0.25/record (prohibitive at scale) | $$$ |
| CA Voter File | ~22M voters | ~$17,000 (most expensive state) | $$$ |
| Source | Status | Notes | Records Lost |
|---|---|---|---|
| FEC API + Bulk (all datasets) | Complete | All ZIPs extracted then deleted | 0 |
| NC Voter Registration | Complete | Full statewide file (9.1M) | 0 |
| Cook County Property Tax | Partial | Parse killed at 2.8M to free disk space | Minimal |
| NPPES NPI Registry (re-run) | Complete | Full re-download after EBS expansion: ~7.86M providers | 0 |
| Texas TDLR Licenses | Complete | 921,527 records via updated API URL | 0 |
| FAA Airmen Database | Complete | 965,761 pilots + mechanics from CSV bulk | 0 |
| FCC Amateur Radio (ULS) | Complete | 1,635,720 licenses from pipe-delimited bulk | 0 |
| Ohio Voter Registration | Complete | 7.9M records ingested via Wave 12 | 0 |
| CA DCA Professional Licenses | Complete | 2,657,646 records from 15 boards via Box.com bulk download | 2,657,646 |
| CA Unclaimed Property | In Progress | 22GB CSV streaming parse; 10M+ records so far | 0 |
| FL DBPR Licenses (23 categories) | Complete | ~1M records from CSV extracts (RE, contractors, cosmetology, etc.) | 0 |
| ATF FFL Monthly Lists | Complete | 42,388 unique licensees from 59 monthly files via Playwright browser | 0 |
| IRS 990-N e-Postcard | Complete | 1,420,475 nonprofit officers from bulk download | 0 |
| USPTO PatentsView | Complete | 4,243,267 unique inventors from TSV bulk | 0 |
| CMS Open Payments | Complete | 984,034 unique physicians deduped from 16M+ payment rows | 0 |
| Echovita + Legacy Obituaries | Complete | All 51 states (Echovita) + 20 states (Legacy) | 0 |