What's available now.
Real hours, in real languages, ready to license today. Updated weekly. Each row links to a Dataset detail page with a downloadable sample, full spec, and a request-quote button.
Browse what's licensable today.
“In collection” means recording is in flight and will be available within 4–6 weeks. “On request” means we have the supplier network in place and begin collection on a per-Order basis.
| Dataset | Language | Accent | Domain | Hours | Status |
|---|---|---|---|---|---|
| Conversational EN-US (long-form interviews) | English | US General American | Interview | 350+ | Available |
| Conversational EN-US (panel discussions) | English | US Mixed | Panel / multi-speaker | — | In collection |
| Conversational EN-UK | English | UK RP, regional | Interview | — | In collection |
| Conversational EN-AU | English | Australian | Interview | — | In collection |
| Conversational ES-LATAM | Spanish | LATAM Mixed | Interview | — | On request |
| Conversational ES-EU | Spanish | EU Castilian | Interview | — | On request |
| Conversational PT-BR | Portuguese | Brazilian | Interview | — | On request |
| Conversational FR-FR | French | Metropolitan | Interview | — | On request |
| Conversational DE-DE | German | High German | Interview | — | On request |
| Conversational JA-JP | Japanese | Tokyo Standard | Interview | — | On request |
| Conversational KO-KR | Korean | Seoul Standard | Interview | — | On request |
| Conversational HI-IN | Hindi | Standard, regional | Interview | — | On request |
| Conversational AR | Arabic | MSA, Egyptian, Gulf, Levantine | Interview | — | On request |
| Conversational ZH-CN | Mandarin | Standard, regional | Interview | — | On request |
A closer look at three
catalogs we ship today.
These are the datasets most teams start with. Each one ships under a single master agreement with the per-file provenance manifest, signed releases, and a named contact for the lifetime of the deal.
Conversational EN-US
- 350+ hours · available today
- 2,400+ unique named speakers
- 2- and 3-speaker, studio-recorded
- Word-level CTM + diarized JSON
- 48 kHz / 24-bit WAV, −23 LUFS
Multilingual Expansion
- 20–80 hrs per locale typical
- Native-speaker verified
- Per-language transcripts + metadata
- Same studio quality bar as EN catalog
- 4–6 week ramp on new locales
Custom Commission
- Scoped to brief
- 4–8 week turnaround
- Exclusive licensing available
- Direct studio and casting access
- Same provenance pipeline
What every file ships with.
From sample pack
to signed contract.
Sample pack
Tell us the catalog and use case. We send a 60-second WAV, the matching datasheet, and a short note on which package fits — within one business day.
NDA + 5-minute sample
Sign a one-page NDA and we deliver a 5-minute representative sample with full transcript, metadata, and a draft provenance manifest for legal review.
MSA + first delivery
Master agreement, DPA, and security questionnaire are pre-built. First delivery ships within one to two weeks of signature, on the format you specify.
Per-file manifest
Every WAV arrives with a SHA-256 checksum, signed release ID, and speaker metadata. Audit any file in your training set back to a named speaker at any time.
Named contact for life
One person owns your account end-to-end — sourcing, contracts, deliveries, audits. No ticket queues, no rotating success managers, no escalation paths.
Revocation SLA
Speakers retain ownership and can revoke. We honor revocations on a defined SLA and re-issue manifests so your training set stays clean.
We source under-represented
languages too.
Through our podcaster network we've sourced audio for several smaller-population languages. Talk to us about Swahili, Tagalog, Vietnamese, Bengali, Yoruba, or any language you don't see listed — most languages with a working podcast scene are reachable.
How to read "Hours available"
Hours currently in catalog and licensable today. "In collection" ships in 4–6 weeks. "On request" ships as a custom collection in 3–6 weeks.
How to read "Speakers"
The number of distinct named speakers contributing to the dataset. Every speaker has a signed model-training release on file in the consent vault.
How to read "Sample"
A 5-minute representative sample with transcript and metadata, delivered under NDA so you can evaluate fit before requesting a full quote.
Catalog questions.
How often is the catalog updated?
Weekly. New hours land as our partner studios deliver back catalog and as commissioned recordings clear QC. Major catalog releases (new languages, new domains) ship monthly.
Can I license a single language without taking the whole multilingual bundle?
Yes. Every locale is licensable individually. Most multilingual customers start with one or two locales and expand once their training pipeline is proven.
Do you offer evaluation-only licenses?
Yes — short-term evaluation licenses are available for benchmark and ablation work with a smaller commitment than a full training license.
What happens when a speaker revokes?
We notify you within the contractual SLA, re-issue the provenance manifest with the affected files flagged, and provide replacement audio of equivalent length and demographics where possible.
Can I get the source releases for an audit?
Yes. Signed releases are stored in our consent vault and available for inspection under NDA — typically as part of a Fortune-500 vendor review or a government procurement audit.
Do you ship to on-prem / air-gapped environments?
Yes. Datasets can be delivered to S3, GCS, Azure Blob, or shipped on encrypted physical media for air-gapped training environments.
Need a custom collection?
Any language with podcast infrastructure. Tell us the language, accent mix, demographics, and hours — we'll come back with a scoped proposal in 2 business days.