Document intelligence API

Send any file or URL.
Get structured data back.

One API for documents, images, spreadsheets, and websites. Define a schema in plain English — get typed JSON back with confidence scores and source citations. Or ask questions and get computed answers with full reasoning traces.

Free tier included. No credit card required.

extract analyze
# Schema in, structured data out. Any file, any URL.
resp = requests.post(
    "https://dev.thedrive.ai/api/v1/extract",
    headers={"X-API-Key": "tda_live_..."},
    files={"file": open("contract.pdf", "rb")},
    data={"schema": json.dumps({
        "parties": "All parties involved",
        "effective_date": "Start date",
        "liability_cap": "Maximum liability amount"
    })}
)

# → {"data": {"parties": ["Acme Corp", "Globex Inc"],
#            "effective_date": "2025-01-15",
#            "liability_cap": 500000},
#    "confidence": {"parties": "high", "liability_cap": "high"},
#    "citations": {"liability_cap": "...not exceed $500,000"}}
# Same schema format — but it reasons, computes, and cites sources
resp = requests.post(
    "https://dev.thedrive.ai/api/v1/analyze",
    headers={"X-API-Key": "tda_live_..."},
    files={"file": open("earnings_q3.pdf", "rb")},
    data={"schema": json.dumps({
        "revenue_growth": "Calculate YoY revenue growth rate for Q3",
        "strongest_segment": "Which business segment grew the fastest?"
    })}
)

# → {"answers": {
#      "revenue_growth": {
#        "answer": -0.23,
#        "reasoning": "Q3 2024: $92,995M vs Q3 2023: $93,210M ...",
#        "sources": ["Total revenues 92,995 93,210"],
#        "steps": [{"tool": "search"}, {"tool": "read_pages"}, {"tool": "compute"}]
#      }}}

107+

file formats

1000+

page documents

Files + URLs

same endpoint

<3s

average response

The interface

One schema. Two modes.

Extract pulls what's literally in the document — names, dates, amounts, clauses. Analyze derives what isn't — computes totals, calculates rates, assesses risk, cross-checks numbers. Same schema format for both.

Extract

POST /api/v1/extract

Pulls literal values from the source. Your schema describes what fields to look for — the API finds them and returns typed data.

Returns typed JSON — strings, numbers, arrays, booleans, nulls
Confidence scores per field (high / medium / low)
Source citations — the exact text that was used
Returns null with high confidence when a field genuinely isn't there
schema: {
  "vendor": "Company name",
  "total": "Total amount due",
  "line_items": "All items with prices"
}
→ vendor: "AWS", total: 971.73,
  line_items: [{...}, {...}]

Analyze

POST /api/v1/analyze

Reasons over the document. A multi-step agent navigates pages, extracts tables, runs Python in a sandbox, and returns computed answers.

Sandboxed code execution — sums, rates, averages are calculated, not guessed
Full reasoning trace — every page read, every search, every computation
Source citations tied to specific document locations
Navigates 1000+ page documents using structural awareness
schema: {
  "revenue_growth": "YoY growth rate",
  "auto_renews": "Does this auto-renew?",
  "line_items_match_total": "Do they add up?"
}
→ revenue_growth: -0.23% (computed)
  auto_renews: true (Section 8.2)
  line_items_match_total: false ($100 gap)

See it work

Same interface, different domains

The schema changes. The endpoint doesn't. Here's what extraction looks like across industries.

Invoices, receipts, bank statements

Extract vendor, totals, dates, and line items from any invoice format. PDF, scan, or phone photo — works with any layout or language. Process thousands with the same schema.

schema: {
  "vendor": "Company name",
  "invoice_number": "Invoice or reference number",
  "date": "Invoice date",
  "total": "Total amount due",
  "line_items": "All items with description and amount"
}

Response

{
  "data": {
    "vendor": "Amazon Web Services",
    "invoice_number": null,
    "date": "2024-11-01",
    "total": 971.73,
    "line_items": [
      {"description": "EC2", "amount": 412.38},
      {"description": "S3", "amount": 89.12}, ...]
  },
  "confidence": {
    "vendor": "high", "invoice_number": "high",
    "date": "high", "total": "high"
  },
  "citations": {
    "vendor": "Amazon Web Services, Inc.",
    "total": "Total Due: $971.73"
  }
}

Why not just use GPT / Claude directly?

You could. Here's what you'd have to build.

Sending a PDF to an LLM works for simple cases. Production agents hit edge cases fast — scanned documents, 500-page filings, tables that break in markdown, math that hallucinates. We handle all of it.

Progressive reading

Large documents are read in batches. A 1000-page SEC filing returns in seconds if the answer is on page 3 — the API stops reading once your schema is filled.

Sandboxed computation

The analysis agent runs real Python for any math — sums, growth rates, averages. Every number is calculated, never generated. Deterministic results.

Table-aware parsing

PDFs are parsed with table detection. Rows and columns stay structured instead of collapsing into garbled text. The agent extracts and computes over table data directly.

OCR + vision proofreading

Scanned documents go through OCR, then a vision model compares against the original image to fix misread characters. Handles stamps, handwriting, and phone photos.

Document structure mapping

The agent builds a table of contents on upload — sections, page ranges, table locations — and navigates directly to the right page instead of reading everything sequentially.

Adaptive model routing

Simple extractions use a fast model. Complex documents with tables, scans, or dense layouts route to a more capable one. Or set model: "accurate" to force it.

Website intelligence

Pass a URL instead of a file. Same endpoint.

The API renders the page in a headless browser, runs JavaScript, then extracts signals from the live DOM — logos from markup, brand colors from CSS custom properties, fonts from computed styles, social links from structured data. Not scraped from HTML. Parsed from the rendered page.

Enable follow_links to automatically crawl subpages — /about, /pricing, /contact — and fill gaps in your schema from multiple pages in one call.

Request

requests.post("/api/v1/extract",
  data={
    "url": "https://linear.app",
    "follow_links": True,
    "schema": json.dumps({
      "name": "Company name",
      "pricing": "All plan names and prices",
      "logo": "Logo URL",
      "colors": "Brand colors",
      "founders": "Founder names"
    })
  }
)

Brand extraction

Logo URL, brand colors, fonts, social profiles, contact email — parsed from the live DOM and CSS, not guessed from page text. Works on JS-heavy SPAs.

Company research

With follow_links, the API crawls relevant subpages to fill your schema. Pricing tiers, team bios, HQ location, tech stack — one call.

Lead enrichment

Turn a prospect URL into CRM-ready structured data. Industry, employee signals, pricing model, social profiles — whatever your schema asks for.

Competitive monitoring

Run the same schema against competitor URLs on a schedule. Track pricing changes, new features, positioning shifts — structured diffs over time.

Use it anywhere

One code path for every format

PDF, scanned image, DOCX, spreadsheet, website — same call, same schema, same response shape. You write one integration. We handle format detection, OCR, rendering, and extraction.

Document processing pipelines

Invoices, contracts, claims, applications — define the schema once. The API normalizes any format into the same structure. Feed directly into downstream systems.

invoice.pdf → {"vendor": "...", "total": 971.73}

receipt.jpg → {"vendor": "...", "total": 42.50}

scan.heic → {"vendor": "...", "total": 188.00}

Research and due diligence

Point the analyze endpoint at a 10-K filing and ask it to compute revenue growth, find risk factors, or cross-check stated totals. Full reasoning trace for audit.

"Calculate debt-to-equity ratio"

→ 1.34 (computed from balance sheet, p.47)

Web data collection

Build enrichment pipelines, monitor competitors, or collect structured data from any website. The DOM extractor gets what scrapers miss — rendered CSS, JS-loaded content, structured data.

"competitor.com" + pricing schema

→ {"plans": [...], "free_tier": true}

Formats

107+ file types

Send any file. The API detects the type, picks the right parser, and handles OCR if needed.

Documents

PDF DOCX DOC ODT RTF PAGES EPUB TXT

Spreadsheets

XLSX XLS ODS CSV TSV NUMBERS

Presentations

PPTX PPT ODP KEY

Images

JPG PNG GIF WebP SVG TIFF BMP HEIC

Video & Audio

MP4 MOV WebM AVI MP3 WAV

Code & Data

JSON XML YAML HTML PY JS TS GO + 30 more

Pricing

Pay per call

Usage-based. Free tier to start. No minimum commitment.

extract

1 credit/page

analyze

2 credits/page

diff, markdown, screenshots

1 credit each

Free

$0

forever

100 credits/month

30 requests/min

All endpoints

Get started

Pro

$0.01

per credit

Pay as you go

120 requests/min

Priority support

Start building

Enterprise

Custom

volume pricing

600 requests/min

SLA guarantee

Dedicated support

Contact us

One endpoint. Any source. Structured output.

Get an API key and extract data from your first document in under a minute.