Audio transcription
Transcribe and extract data from audio files
Upload audio files — meeting recordings, interviews, podcasts, voicemails — and get structured transcriptions with speaker diarization, timestamps, and schema-based extraction. Pull out action items, decisions, or any fields you define.
How it works
Send your MP3 / WAV file
Upload via the API or pass a URL. The API auto-detects the format.
Define your schema
Describe the fields you want as a JSON schema. The API maps your document to your structure.
Get structured JSON
Receive typed data with confidence scores and citations back to the source document.
Example request
curl -X POST https://dev.thedrive.ai/api/v1/extract \
-H "X-API-Key: your_key" \
-F "file=@document.mp3 / wav" \
-F 'schema={"transcript": "string", "speakers": ["string"], "action_items": ["string"], "duration_seconds": "number"}'
MP3 / WAV processing features
Multi-format support
MP3, WAV, M4A, FLAC, OGG, AAC, WMA, and more.
Speaker diarization
Identifies and labels different speakers throughout the recording.
Timestamped segments
Each segment includes start and end timestamps for precise reference.
Schema-based extraction
Define a schema to extract specific fields — topics, decisions, action items — not just raw text.
Long recording support
Handles hour-long recordings. Billed at 1 credit per minute.
Language detection
Auto-detects the spoken language and transcribes accordingly.
Start extracting from MP3 / WAV files
Free tier includes 100 credits/month. No credit card required.