File & URL intelligence API
Send a file. Get answers.
One API call.
Send any file or URL with a schema describing what you need. Get structured answers back — computed, verified, cited. Cross-reference multiple documents. 107+ formats.
Free tier included. No credit card required.
npm install @thedriveai/sdk
pip install thedriveai
// Send a file + schema. Get verified, computed answers back.
import { TheDriveAI } from "@thedriveai/sdk";
const client = new TheDriveAI({ apiKey: "tda_live_..." });
const result = await client.analyze({
file: "invoice.pdf",
schema: {
math_checks_out: { type: "boolean", description: "Do line items add up to the stated total?" },
discrepancy: { type: "number", description: "Dollar difference, if any" },
correct_total: { type: "number", description: "Recalculated total from line items + tax" },
},
});
// → result.data.math_checks_out → false
// → result.reasoning.math_checks_out → "$3,200 + $1,550 + $890 = $5,640, not $5,740"
// → result.data.discrepancy → 100
// → result.data.correct_total → 6091.20
# Send a file + schema. Get verified, computed answers back.
from thedriveai import TheDriveAI
client = TheDriveAI(api_key="tda_live_...")
result = client.analyze(
file="invoice.pdf",
schema={
"math_checks_out": {"type": "boolean", "description": "Do line items add up to the stated total?"},
"discrepancy": {"type": "number", "description": "Dollar difference, if any"},
"correct_total": {"type": "number", "description": "Recalculated total from line items + tax"},
},
)
# → result.data["math_checks_out"] → False
# → result.reasoning["math_checks_out"] → "$3,200 + $1,550 + $890 = $5,640, not $5,740"
# → result.data["discrepancy"] → 100
# → result.data["correct_total"] → 6091.20
# Send a file + schema. Get verified, computed answers back.
curl -X POST https://dev.thedrive.ai/api/v1/analyze \
-H "X-API-Key: tda_live_..." \
-F file=@invoice.pdf \
-F 'schema={
"math_checks_out": {"type": "boolean", "description": "Do line items add up to the stated total?"},
"discrepancy": {"type": "number", "description": "Dollar difference, if any"},
"correct_total": {"type": "number", "description": "Recalculated total from line items + tax"}
}'
# → {"data": {"math_checks_out": false, "discrepancy": 100, "correct_total": 6091.20},
# "reasoning": {"math_checks_out": "$3,200 + $1,550 + $890 = $5,640, not $5,740", ...},
# "confidence": {"math_checks_out": 0.97, "discrepancy": 0.97, "correct_total": 0.99}}
// Pull literal values from any file or URL. Fast path extraction.
import { TheDriveAI } from "@thedriveai/sdk";
const client = new TheDriveAI({ apiKey: "tda_live_..." });
const result = await client.extract({
file: "contract.pdf",
schema: {
parties: { type: "array", description: "All parties" },
effective_date: { type: "string", description: "Start date (ISO 8601)" },
liability_cap: { type: "number", description: "Maximum liability", required: true },
},
});
// → result.data.parties → ["Acme Corp", "Globex Inc"]
// → result.data.liability_cap → 500000
// → result.confidence.parties → 0.96
// → result.citations.liability_cap → "...not exceed $500,000"
# Pull literal values from any file or URL. Fast path extraction.
from thedriveai import TheDriveAI
client = TheDriveAI(api_key="tda_live_...")
result = client.extract(
file="contract.pdf",
schema={
"parties": {"type": "array", "description": "All parties"},
"effective_date": {"type": "string", "description": "Start date (ISO 8601)"},
"liability_cap": {"type": "number", "description": "Maximum liability", "required": True},
},
)
# → result.data["parties"] → ["Acme Corp", "Globex Inc"]
# → result.data["liability_cap"] → 500000
# → result.confidence["parties"] → 0.96
# → result.citations["liability_cap"] → "...not exceed $500,000"
# Pull literal values from any file or URL. Fast path extraction.
curl -X POST https://dev.thedrive.ai/api/v1/extract \
-H "X-API-Key: tda_live_..." \
-F file=@contract.pdf \
-F 'schema={
"parties": {"type": "array", "description": "All parties"},
"effective_date": {"type": "string", "description": "Start date (ISO 8601)"},
"liability_cap": {"type": "number", "description": "Maximum liability", "required": true}
}'
# → {"data": {"parties": ["Acme Corp", "Globex Inc"],
# "effective_date": "2025-01-15",
# "liability_cap": 500000},
# "confidence": {"parties": 0.96, "liability_cap": 0.93},
# "citations": {"liability_cap": "...not exceed $500,000"}}
107+
file formats
1000+
page documents
Files + URLs
same endpoint
<3s
average response
Two modes, one schema
Extract data. Or compute answers. Same schema.
Analyze reads documents, computes answers, cross-checks numbers, and shows its reasoning. It can also cross-reference multiple documents in one call. Extract is the fast path — pull literal values when you just need fields. Same schema format for both.
Try it in the playground →Analyze
POST /api/v1/analyzeThe full reasoning pipeline — document navigation, sandboxed computation, cross-referencing, vision for charts and scans — in one API call. Define what you need. Get computed, verified answers back.
schema: {
"revenue_growth": {"type": "number", "description": "YoY growth rate"},
"auto_renews": {"type": "boolean", "description": "Does this auto-renew?"},
"line_items_match": {"type": "boolean", "description": "Do line items add up to total?"}
}
→ revenue_growth: -0.23 (computed from p.12 + p.47)
auto_renews: true (Section 8.2, "shall automatically renew")
line_items_match: false ($100 gap, computed)
Try Analyze →
Extract
POST /api/v1/extractThe fast path. Pulls literal values — names, dates, amounts, clauses — directly from the source. When you just need fields, not reasoning.
schema: {
"vendor": {"type": "string", "description": "Company name"},
"total": {"type": "number", "description": "Total amount due"},
"status": {"type": "string", "enum": ["paid","unpaid"]}
}
→ vendor: "AWS", total: 971.73,
status: "paid"
Try Extract →
See it work
Extract data. Verify it. Cross-reference it.
Same API call regardless of industry. Extract pulls literal values. Analyze computes, cross-checks, and catches what extraction misses.
Try it yourself →Invoice verification
POST /api/v1/analyze
Don't just read the total — verify it. The API sums line items, checks the tax calculation, and catches discrepancies that extraction alone would miss.
schema: {
"math_checks_out": {
"type": "boolean",
"description": "Do line items add up to the stated total?"
},
"discrepancy": {
"type": "number",
"description": "Dollar difference between computed and stated total"
},
"tax_correct": {
"type": "boolean",
"description": "Is the tax calculated correctly at the stated rate?"
}
}
Response
{
"data": {
"math_checks_out": false,
"discrepancy": 100.00,
"tax_correct": false
},
"confidence": {
"math_checks_out": 0.99,
"discrepancy": 1.0,
"tax_correct": 0.98
},
"reasoning": {
"math_checks_out": "$3,200 + $1,550 + $890 = $5,640.
Stated subtotal: $5,740. Difference: $100.",
"discrepancy": "Computed sum $5,640 vs stated $5,740.",
"tax_correct": "Tax should be $451.20 (8% of $5,640),
not $459.20. The $100 subtotal error cascades."
}
}
What you'd otherwise build
Months of engineering. Or one API call.
Reliable document intelligence isn't a weekend project. It's agent design, prompt engineering, parser selection across hundreds of formats, accuracy benchmarking, sandboxed computation, vision fallbacks, retry logic, and ongoing maintenance as models change. We've done the research and engineering. You get the result.
Agent architecture, handled
Multi-step reasoning, tool selection, progressive document reading, retry strategies — we designed and benchmarked the agent so you don't have to.
Math that's computed, not guessed
Totals, growth rates, cross-checks — every number is calculated deterministically in a sandboxed environment. Not guessed by an LLM.
107+ formats, one call
PDF, DOCX, XLSX, images, audio, video, websites — each format has its own parsing path, benchmarked for accuracy. You just send the file.
OCR + vision proofreading
Scanned contracts, phone photos, faxed forms — OCR extracts text, vision models proofread it. Not one or the other. Both, verified against each other.
1000+ pages, multi-page tables
Progressive reading navigates long documents without losing context. Tables that span pages are reassembled correctly. Returns in seconds.
Every answer is auditable
Confidence scores, source citations, and full reasoning traces for every field. Know exactly how each answer was derived and where it came from.
Website intelligence
Works on URLs too. Same API call.
Pass a URL instead of a file. The API launches a headless browser, executes JavaScript, waits for dynamic content to load, then extracts structured data from the live DOM. React SPAs, server-rendered pages, dynamic dashboards — same result.
Enable follow_links to automatically discover and crawl relevant subpages — /about, /pricing, /contact — and fill gaps in your schema from multiple pages in one call. The API decides which links are worth following based on your schema.
Try with a URL →const result = await client.extract({
url: "https://linear.app",
followLinks: true,
schema: {
name: { type: "string", description: "Company name" },
pricing: { type: "array", description: "Plan names and prices" },
logo: { type: "string", description: "Logo URL" },
colors: { type: "array", description: "Brand colors" },
founders: { type: "array", description: "Founder names" },
},
});
result = client.extract(
url="https://linear.app",
follow_links=True,
schema={
"name": {"type": "string", "description": "Company name"},
"pricing": {"type": "array", "description": "Plan names and prices"},
"logo": {"type": "string", "description": "Logo URL"},
"colors": {"type": "array", "description": "Brand colors"},
"founders": {"type": "array", "description": "Founder names"},
},
)
curl -X POST https://dev.thedrive.ai/api/v1/extract \
-H "X-API-Key: tda_live_..." \
-F "url=https://linear.app" \
-F "follow_links=true" \
-F 'schema={
"name": {"type": "string", "description": "Company name"},
"pricing": {"type": "array", "description": "Plan names and prices"},
"logo": {"type": "string", "description": "Logo URL"},
"colors": {"type": "array", "description": "Brand colors"},
"founders": {"type": "array", "description": "Founder names"}
}'
Brand extraction
Logo URL, brand colors, fonts, social profiles — extracted from the rendered DOM, not guessed from raw HTML. Handles JS-heavy SPAs, lazy-loaded content, and dynamically injected elements.
Multi-page research
With follow_links, the API intelligently discovers subpages relevant to your schema — /pricing, /about, /team — and aggregates data across them in one call.
Lead enrichment
Turn a prospect URL into CRM-ready structured data. The API renders the full site, follows relevant pages, and returns exactly the fields your schema defines.
Competitive monitoring
Run the same schema against competitor URLs on a schedule. Track pricing changes, new features, positioning shifts — get structured diffs, not page screenshots.
How it works
One endpoint. Three capabilities.
Define a schema describing what you need. The API handles parsing, reasoning, computation, and verification. Call it directly from your code, or wire it into an agent framework as a tool.
Extract from any file
One schema, any format. The API selects the right parsing pipeline — text extraction, table reconstruction, OCR, vision — and returns the same typed JSON regardless of input.
invoice.pdf → {"vendor": "...", "total": 971.73}
receipt.jpg → {"vendor": "...", "total": 42.50}
scan.heic → {"vendor": "...", "total": 188.00}
Compute and verify
The API runs multi-step reasoning — navigating documents, executing math in a sandboxed environment, cross-checking values — and returns the full reasoning trace for every answer.
"Do line items match the total?" → false ($100 gap)
"Is contract still active?" → true (216 days left)
"Debt-to-equity ratio?" → 1.34 (computed)
Cross-reference documents
Send multiple files in one call. Validate an invoice against a contract. Check that a report's numbers match a source spreadsheet. Reconcile data from different sources.
invoice.pdf + contract.pdf
→ "rates_match": false ($150/hr vs $125/hr)
→ "correct_vendor": true
API
Analyze
POST /api/v1/analyze
Multi-step reasoning with sandboxed computation. Navigates long documents, cross-checks numbers, catches discrepancies, and returns every answer with sources, reasoning traces, and confidence scores.
2 credits/page · 10 credits/site
Playground →Extract
POST /api/v1/extract
Schema in, structured data out. Any file or URL. Confidence scores, source citations. The fast path when you just need fields.
1 credit/page · 5 credits/site
Playground →Cross-Analyze
POST /api/v1/analyze/cross
Send 2-5 documents. The API reasons across all of them — validating invoices against contracts, reconciling reports against source data. Each document gets its own index.
5 credits/doc + 3 credits/page
Playground →Markdown
GET /md/{'{url}'}
Convert any URL, document, or audio/video to clean markdown. Audio transcribed via Whisper. JavaScript rendered, boilerplate stripped.
Playground →Screenshot
GET /{'{url}'}
JPEG, GIF, or MP4 of any URL. Dark mode, full page, custom viewports.
Playground →Thumbnails
POST /api/v1/thumbnails
Preview images from 107+ file types. PDFs, spreadsheets, presentations, code files.
Formats
107+ file types
Each format has its own parsing pipeline — content detection, format-specific extraction, table reconstruction, OCR with vision proofreading for scans. You just send the file.
Documents
PDF DOCX DOC ODT RTF PAGES EPUB TXT
Spreadsheets
XLSX XLS ODS CSV TSV NUMBERS
Presentations
PPTX PPT ODP KEY
Images
JPG PNG GIF WebP SVG TIFF BMP HEIC
Video & Audio
MP4 MOV WebM AVI MP3 WAV M4A FLAC OGG
Code & Data
JSON XML YAML HTML PY JS TS GO + 30 more
Use cases
What people build with this
From invoice processing to competitor monitoring — one API for files, URLs, and cross-document validation.
Invoice extraction
Vendor, line items, totals, tax — from PDF, image, or email invoices.
Learn more →Contract analysis
Parties, dates, clauses, obligations — from agreements and NDAs.
Learn more →Receipt scanning
Merchant, items, tax, total — from photos and digital receipts.
Learn more →Medical documents
Patient info, diagnoses, procedure codes — from claims and lab reports.
Learn more →Real estate
Property details, pricing, terms — from listings and appraisals.
Learn more →Website extraction
Company info, products, pricing — from any rendered webpage.
Learn more →Competitor monitoring
Track pricing, features, and hiring signals from competitor sites.
Learn more →Lead enrichment
Company details, tech stack, contacts — from prospect websites.
Learn more →Pricing
Pay per call
Usage-based. Free tier to start. No minimum commitment.
extract
1 credit/page
analyze
2 credits/page
cross-analyze
5/doc + 3/page
audio/video transcription
1 credit/min
markdown, screenshots
1 credit each
Send a file. Get answers. Start now.
Get an API key and extract structured data from your first document in under a minute.