Audio transcription

Transcribe and extract data from audio files

Upload audio files — meeting recordings, interviews, podcasts, voicemails — and get structured transcriptions with speaker diarization, timestamps, and schema-based extraction. Pull out action items, decisions, or any fields you define.

How it works

Send your MP3 / WAV file

Upload via the API or pass a URL. The API auto-detects the format.

Define your schema

Describe the fields you want as a JSON schema. The API maps your document to your structure.

Get structured JSON

Receive typed data with confidence scores and citations back to the source document.

Example request

curl -X POST https://dev.thedrive.ai/api/v1/extract \
  -H "X-API-Key: your_key" \
  -F "file=@document.mp3 / wav" \
  -F 'schema={"transcript": "string", "speakers": ["string"], "action_items": ["string"], "duration_seconds": "number"}'

MP3 / WAV processing features

Multi-format support

MP3, WAV, M4A, FLAC, OGG, AAC, WMA, and more.

Speaker diarization

Identifies and labels different speakers throughout the recording.

Timestamped segments

Each segment includes start and end timestamps for precise reference.

Schema-based extraction

Define a schema to extract specific fields — topics, decisions, action items — not just raw text.

Long recording support

Handles hour-long recordings. Billed at 1 credit per minute.

Language detection

Auto-detects the spoken language and transcribes accordingly.

Start extracting from MP3 / WAV files

Free tier includes 100 credits/month. No credit card required.

Get API Key Try in Playground