Conversational speech data for training voice AI — sourced the legal way.
Studio-grade, multi-speaker audio licensed directly from professional podcasters who've signed model-training releases. Aligned transcripts, full speaker metadata, and a documented chain of consent for every second of audio.
Catalog scaling to 10,000+ hours through our partner network of physical podcast studios and distribution companies worldwide.
The speech data your model is trained on is becoming a legal liability.
For years, the easiest way to get conversational speech data was to scrape podcasts, audiobooks, and YouTube. That era is ending. The EU AI Act now requires training-data transparency. The New York Times is suing OpenAI. Universal Music is suing Anthropic. Voice actors are suing Eleven Labs. Every speech model in production today is one disclosure request away from a discovery problem.
The alternatives haven't been much better. Crowd-sourced datasets are noisy, accent-poor, and recorded on laptop mics. Generic data vendors resell repackaged corpora with hand-wave consent language and no contactable speakers. Custom collections take six to nine months and still require your legal team to write the releases.
There's a third path. We built it.
Studio-grade conversational speech, licensed for AI training.
Real podcasters. Real studios. Real conversations. Every hour comes with a signed release that explicitly permits use for training generative speech and voice AI models.
Off-the-shelf catalog
Buy from our existing licensed catalog. 350+ hours of conversational English available today, growing weekly.
- Multi-speaker conversations & interviews
- 48 kHz / 24-bit WAV
- Word-level aligned transcripts
- Full speaker metadata
- Min order: 10 hours · Lead time: <48 hours
Custom collection
We commission new recordings from our network of professional podcasters to your spec. You write the brief, we deliver the audio.
- Any of our 12 supported languages
- Targeted demographics, accents, domains
- 100–10,000+ hours per project
- Lead time: 2–6 weeks
Exclusive licensing
Need the data to be yours and only yours? Exclusive and time-windowed exclusive licenses on both catalog and custom collections.
- Perpetual or time-limited exclusivity
- Per-buyer data rooms
- Provenance certificates
- Optional escrow & destruction terms
Specs your engineers will actually ask about.
Audio
- Format
- 48 kHz / 24-bit WAV (downsamples on request)
- Channels
- Mono per speaker (multi-track) or stereo mixdown
- Loudness
- -16 LUFS, normalized
- Environment
- Treated home studios and professional rooms
- Microphones
- Shure SM7B, Rode NT1, Sennheiser MKH 416 (per file)
Transcripts
- Timestamps
- Word-level
- Diarization
- Per-speaker labels
- Optional
- Punctuation, casing, disfluencies retained
- Formats
- JSON, WebVTT, SRT, TextGrid
- QA
- Optional human verification pass
Speaker metadata
- Demographics
- Age range, gender, L1, accent region
- Native flag
- Native vs. non-native
- Recording
- Environment & equipment
- Consent
- Record ID linked to provenance vault
Languages
- Available
- English (US, UK, AU)
- In collection
- Spanish (LATAM, EU), Portuguese (BR), French, German, Japanese, Korean, Hindi, Arabic, Mandarin
- By request
- Any language with podcast infrastructure
Delivery
- Targets
- S3, GCS, Azure Blob (your bucket or ours)
- Integrity
- Signed manifests, SHA-256 checksums
- Per-file
- Consent ID and license SKU
- Optional
- API access for incremental delivery
Order sizes
- Minimum
- 10 hours (catalog) · 100 hours (custom)
- Typical
- 1,000–5,000 hours per project
- Domains
- Long-form interviews, panels, monologue, scripted, roleplay
The legal page enterprise buyers actually need.
Every hour we license is backed by a signed, individually-named model-training release — not a hand-wave clickwrap, not a Terms of Service amendment, not "implied consent." Here's how that compares to the alternatives.
| aipodcast | Scraped web audio | Generic crowd vendors | |
|---|---|---|---|
| Signed model-training release per speaker | ✓ | ✕ | ~ Often vague |
| Speakers individually named & contactable | ✓ | ✕ | ✕ |
| Right-to-revoke handled (price-protected for buyers) | ✓ | ✕ | ~ |
| Per-file provenance audit trail | ✓ | ✕ | ~ |
| GDPR Art. 6(1)(f) lawful basis documented | ✓ | ✕ | ~ |
| CPRA & California consumer rights aligned | ✓ | ✕ | ~ |
| EU AI Act Article 53 training-data transparency ready | ✓ | ✕ | ✕ |
| C2PA-compatible provenance manifests | ✓ | ✕ | ✕ |
| CGL + IP indemnification in MSA | ✓ | ✕ | ~ |
Need a sample MSA, DPA, or release form for your legal review? Request the legal pack →
A single accountable vendor. Five steps. No surprises.
Scoping call
30 minutes. We learn your model, your data gaps, and your legal constraints. You leave with a written proposal and a price range.
Sample delivery
Within 48 hours of NDA, you get a representative sample (audio + transcripts + metadata + a release excerpt) you can run through your pipeline.
Contracting
Standard MSA + SOW. Average turnaround 5–10 business days. We handle speaker releases on our side.
Collection & QA
We record (or pull from catalog), transcribe, diarize, and run automated + human QA. Weekly progress updates.
Delivery
Audio, transcripts, metadata, manifests, consent vault references, signed provenance certificate. To your bucket. Done.
Most projects ship in 2 to 6 weeks. Catalog orders ship in under 48 hours.
Built from a real network of working podcasters.
We didn't scrape the internet. We started with 350+ hours of long-form interviews and conversations recorded in professional studios over the past several years — and built outward from there.
We partner with a global network of physical podcast studios and podcast distribution companies to license back catalogs and commission new recordings, paying creators directly for content they already own. Every speaker in our catalog is a working professional, recording in a real studio, on real equipment, having real conversations.
That's why the audio quality is studio-grade by default — and why our consent paperwork is real, named, and contactable.
- Owned: 350+ hours of original conversational English in catalog today
- Network: A global network of partner podcast studios and distribution companies
- Growing: Active outreach to professional podcasters worldwide for language and demographic expansion
What teams use our data for.
ASR / Speech recognition
Train and benchmark transcription models on real-world conversational audio with diverse accents, overlapping speech, and natural disfluencies — the things your users actually do.
TTS / Speech synthesis
Studio-grade source audio with rich speaker metadata, consistent room tone, and full prosodic variety. Per-speaker exclusivity available for voice cloning products.
Conversational voice agents
Multi-speaker dialogue with turn-taking, interruptions, backchanneling, and natural pacing. Train agents that sound like people, not IVR menus.
Multilingual & accent expansion
Targeted collection in any of our supported languages, with native-speaker recording and demographic targeting. Skip the six-month RFP cycle.
Straightforward pricing.
No "contact us for everything." Here's the model.
Catalog hours
- Non-exclusive license
- Min order: 10 hours
- Delivered in <48 hours
Custom collection
- Built to your spec
- Min order: 100 hours
- 2–6 week delivery
Exclusive licensing
- Full or windowed exclusivity
- Per-buyer data rooms
- Provenance certificates
Volume discounts at 500 / 2,000 / 10,000 hours. All licenses include MSA with IP indemnification. Open-source-friendly licenses available for academic research.
Questions enterprise buyers ask us.
Can your data be used to train commercial generative models?
Yes. Every release explicitly grants the right to use the audio for training generative speech and voice AI, including for commercial deployment.
What happens if a speaker revokes consent after delivery?
We notify you in writing within 5 business days, log it in the provenance vault, and the revocation is forward-looking — already-trained model weights are not affected, and you're price-protected against catalog churn under our standard MSA.
Do you offer exclusive licenses?
Yes, both perpetual and time-windowed. Talk to sales.
Is the data SOC 2 / ISO 27001 / HIPAA compliant?
SOC 2 Type II is in progress. The audio itself does not contain PHI. ISO 27001 on the roadmap. Ask us for the current security pack.
What languages do you support?
English (US/UK/AU) is in catalog today. Spanish, Portuguese, French, German, Japanese, Korean, Hindi, Arabic, and Mandarin are in active collection. Other languages are commissioned to spec.
Can I get a free sample?
Yes — request a sample and we'll send you 30 minutes of audio with full transcripts, metadata, and a release excerpt within 48 hours of NDA.
How is this different from Common Voice or LibriSpeech?
Open datasets are read-aloud or volunteer-recorded, often on laptop microphones, and most carry licensing restrictions that prohibit commercial training. Our data is studio-grade conversational audio with explicit commercial training rights.
How does this compare to Scale AI or Surge?
Scale and Surge are general-purpose data labeling and collection vendors. We're vertically focused on conversational speech from professional podcasters, with a specific consent and provenance model. We're often used alongside them for the audio portion of a larger data program.
Can you index my own podcast back-catalog?
Yes — if you control the rights and want to license to AI buyers, we run a creator program. Learn more →
Tell us what you're building. We'll send a sample within 48 hours.
Real audio, real transcripts, real metadata, real consent paperwork. NDA on request. No sales call required to see the data — we want you to put it through your pipeline first.