Document extraction

Extract structured data from Word documents

Upload Word documents and get structured data back. Define the fields you need — parties, dates, clauses, terms — and the API extracts them with confidence scores and citations.

How it works

1

Send your DOCX file

Upload via the API or pass a URL. The API auto-detects the format.

2

Define your schema

Describe the fields you want as a JSON schema. The API maps your document to your structure.

3

Get structured JSON

Receive typed data with confidence scores and citations back to the source document.

Example request

curl -X POST https://dev.thedrive.ai/api/v1/extract \
  -H "X-API-Key: your_key" \
  -F "file=@document.docx" \
  -F 'schema={"parties": ["string"], "effective_date": "string", "terms": "string"}'

DOCX processing features

DOCX and DOC support

Handles both modern .docx and legacy .doc formats seamlessly.

Heading-aware extraction

Understands document structure — sections, headings, numbered lists, and nested content.

Table extraction

Tables in Word documents are parsed with row/column structure preserved.

Start extracting from DOCX files

Free tier includes 100 credits/month. No credit card required.