Document intelligence API
Send any file or URL.
Get structured data back.
One API for documents, images, spreadsheets, and websites. Define a schema in plain English — get typed JSON back with confidence scores and source citations. Or ask questions and get computed answers with full reasoning traces.
Free tier included. No credit card required.
# Schema in, structured data out. Any file, any URL.
resp = requests.post(
"https://dev.thedrive.ai/api/v1/extract",
headers={"X-API-Key": "tda_live_..."},
files={"file": open("contract.pdf", "rb")},
data={"schema": json.dumps({
"parties": "All parties involved",
"effective_date": "Start date",
"liability_cap": "Maximum liability amount"
})}
)
# → {"data": {"parties": ["Acme Corp", "Globex Inc"],
# "effective_date": "2025-01-15",
# "liability_cap": 500000},
# "confidence": {"parties": "high", "liability_cap": "high"},
# "citations": {"liability_cap": "...not exceed $500,000"}}
# Same schema format — but it reasons, computes, and cites sources
resp = requests.post(
"https://dev.thedrive.ai/api/v1/analyze",
headers={"X-API-Key": "tda_live_..."},
files={"file": open("earnings_q3.pdf", "rb")},
data={"schema": json.dumps({
"revenue_growth": "Calculate YoY revenue growth rate for Q3",
"strongest_segment": "Which business segment grew the fastest?"
})}
)
# → {"answers": {
# "revenue_growth": {
# "answer": -0.23,
# "reasoning": "Q3 2024: $92,995M vs Q3 2023: $93,210M ...",
# "sources": ["Total revenues 92,995 93,210"],
# "steps": [{"tool": "search"}, {"tool": "read_pages"}, {"tool": "compute"}]
# }}}
107+
file formats
1000+
page documents
Files + URLs
same endpoint
<3s
average response
The interface
One schema. Two modes.
Extract pulls what's literally in the document — names, dates, amounts, clauses. Analyze derives what isn't — computes totals, calculates rates, assesses risk, cross-checks numbers. Same schema format for both.
Extract
POST /api/v1/extractPulls literal values from the source. Your schema describes what fields to look for — the API finds them and returns typed data.
schema: {
"vendor": "Company name",
"total": "Total amount due",
"line_items": "All items with prices"
}
→ vendor: "AWS", total: 971.73,
line_items: [{...}, {...}]
Analyze
POST /api/v1/analyzeReasons over the document. A multi-step agent navigates pages, extracts tables, runs Python in a sandbox, and returns computed answers.
schema: {
"revenue_growth": "YoY growth rate",
"auto_renews": "Does this auto-renew?",
"line_items_match_total": "Do they add up?"
}
→ revenue_growth: -0.23% (computed)
auto_renews: true (Section 8.2)
line_items_match_total: false ($100 gap)
See it work
Same interface, different domains
The schema changes. The endpoint doesn't. Here's what extraction looks like across industries.
Invoices, receipts, bank statements
Extract vendor, totals, dates, and line items from any invoice format. PDF, scan, or phone photo — works with any layout or language. Process thousands with the same schema.
schema: {
"vendor": "Company name",
"invoice_number": "Invoice or reference number",
"date": "Invoice date",
"total": "Total amount due",
"line_items": "All items with description and amount"
}
Response
{
"data": {
"vendor": "Amazon Web Services",
"invoice_number": null,
"date": "2024-11-01",
"total": 971.73,
"line_items": [
{"description": "EC2", "amount": 412.38},
{"description": "S3", "amount": 89.12}, ...]
},
"confidence": {
"vendor": "high", "invoice_number": "high",
"date": "high", "total": "high"
},
"citations": {
"vendor": "Amazon Web Services, Inc.",
"total": "Total Due: $971.73"
}
}
Why not just use GPT / Claude directly?
You could. Here's what you'd have to build.
Sending a PDF to an LLM works for simple cases. Production agents hit edge cases fast — scanned documents, 500-page filings, tables that break in markdown, math that hallucinates. We handle all of it.
Progressive reading
Large documents are read in batches. A 1000-page SEC filing returns in seconds if the answer is on page 3 — the API stops reading once your schema is filled.
Sandboxed computation
The analysis agent runs real Python for any math — sums, growth rates, averages. Every number is calculated, never generated. Deterministic results.
Table-aware parsing
PDFs are parsed with table detection. Rows and columns stay structured instead of collapsing into garbled text. The agent extracts and computes over table data directly.
OCR + vision proofreading
Scanned documents go through OCR, then a vision model compares against the original image to fix misread characters. Handles stamps, handwriting, and phone photos.
Document structure mapping
The agent builds a table of contents on upload — sections, page ranges, table locations — and navigates directly to the right page instead of reading everything sequentially.
Adaptive model routing
Simple extractions use a fast model. Complex documents with tables, scans, or dense layouts route to a more capable one. Or set model: "accurate" to force it.
Website intelligence
Pass a URL instead of a file. Same endpoint.
The API renders the page in a headless browser, runs JavaScript, then extracts signals from the live DOM — logos from markup, brand colors from CSS custom properties, fonts from computed styles, social links from structured data. Not scraped from HTML. Parsed from the rendered page.
Enable follow_links to automatically crawl subpages — /about, /pricing, /contact — and fill gaps in your schema from multiple pages in one call.
Request
requests.post("/api/v1/extract",
data={
"url": "https://linear.app",
"follow_links": True,
"schema": json.dumps({
"name": "Company name",
"pricing": "All plan names and prices",
"logo": "Logo URL",
"colors": "Brand colors",
"founders": "Founder names"
})
}
)
Brand extraction
Logo URL, brand colors, fonts, social profiles, contact email — parsed from the live DOM and CSS, not guessed from page text. Works on JS-heavy SPAs.
Company research
With follow_links, the API crawls relevant subpages to fill your schema. Pricing tiers, team bios, HQ location, tech stack — one call.
Lead enrichment
Turn a prospect URL into CRM-ready structured data. Industry, employee signals, pricing model, social profiles — whatever your schema asks for.
Competitive monitoring
Run the same schema against competitor URLs on a schedule. Track pricing changes, new features, positioning shifts — structured diffs over time.
Use it anywhere
One code path for every format
PDF, scanned image, DOCX, spreadsheet, website — same call, same schema, same response shape. You write one integration. We handle format detection, OCR, rendering, and extraction.
Document processing pipelines
Invoices, contracts, claims, applications — define the schema once. The API normalizes any format into the same structure. Feed directly into downstream systems.
invoice.pdf → {"vendor": "...", "total": 971.73}
receipt.jpg → {"vendor": "...", "total": 42.50}
scan.heic → {"vendor": "...", "total": 188.00}
Research and due diligence
Point the analyze endpoint at a 10-K filing and ask it to compute revenue growth, find risk factors, or cross-check stated totals. Full reasoning trace for audit.
"Calculate debt-to-equity ratio"
→ 1.34 (computed from balance sheet, p.47)
Web data collection
Build enrichment pipelines, monitor competitors, or collect structured data from any website. The DOM extractor gets what scrapers miss — rendered CSS, JS-loaded content, structured data.
"competitor.com" + pricing schema
→ {"plans": [...], "free_tier": true}
API
Extract
POST /api/v1/extract
Schema in, structured data out. Any file or URL. Typed values, confidence scores, source citations.
1 credit/page · 5 credits/site
Try it →Analyze
POST /api/v1/analyze
Multi-step reasoning over documents. Computes answers with sandboxed Python, returns full reasoning traces.
2 credits/page · 10 credits/site
Try it →Diff
POST /api/v1/diff
Compare two documents. Block-level and word-level diffs with optional AI annotations explaining what changed and why it matters.
Try it →Markdown
GET /md/{'{url}'}
Convert any URL or document to clean markdown. JavaScript rendered, boilerplate stripped. Feed directly into LLM context.
Try it →Screenshot
GET /{'{url}'}
JPEG, GIF, or MP4 of any URL. Dark mode, full page, custom viewports.
Try it →Thumbnails
POST /api/v1/thumbnails
Preview images from 107+ file types. PDFs, spreadsheets, presentations, code files.
Formats
107+ file types
Send any file. The API detects the type, picks the right parser, and handles OCR if needed.
Documents
PDF DOCX DOC ODT RTF PAGES EPUB TXT
Spreadsheets
XLSX XLS ODS CSV TSV NUMBERS
Presentations
PPTX PPT ODP KEY
Images
JPG PNG GIF WebP SVG TIFF BMP HEIC
Video & Audio
MP4 MOV WebM AVI MP3 WAV
Code & Data
JSON XML YAML HTML PY JS TS GO + 30 more
Pricing
Pay per call
Usage-based. Free tier to start. No minimum commitment.
extract
1 credit/page
analyze
2 credits/page
diff, markdown, screenshots
1 credit each
One endpoint. Any source. Structured output.
Get an API key and extract data from your first document in under a minute.