What You Can Build with PeterParser
Real scenarios. Actual API calls. Specific features that solve specific problems — not vague “AI-powered document processing” promises.
Automate Accounts Payable & Receivable
Invoice preset + webhooks = zero manual data entry
The Problem
Your AP team manually keys invoices into your ERP. Each invoice takes 3-5 minutes. At 500 invoices/month, that's 40+ hours of data entry — plus a 4% error rate that causes payment disputes.
How PeterParser Solves It
Send invoice PDFs to the /v2/documents endpoint with the `invoice` preset. PeterParser extracts vendor name, line items, totals, tax, PO numbers, and payment terms into clean JSON. Set a webhook_url and results POST to your system automatically when done.
API Call
curl -X POST https://api.peterparser.com/v2/documents \
-H "X-API-Key: pp_live_..." \
-H "Content-Type: application/json" \
-d '{
"base64": "<invoice_pdf_base64>",
"document_type": "invoice",
"extraction_preset": "invoice",
"mode": "async",
"webhook_url": "https://yourapp.com/api/invoices/ingest"
}'What You Get
- →16-field invoice preset: vendor, customer, line items, tax, totals, PO numbers
- →99.5% table accuracy on line item extraction
- →Char-level grounding — click any amount to see where it appears in the PDF
- →Async processing with webhook delivery and HMAC signature verification
3-5 min/invoice → 2 seconds. 99.5% accuracy vs 96% manual.
LlamaParse extracts text but doesn't provide structured JSON with custom schemas. Nanonets requires GPU infrastructure. PeterParser gives you a preset + webhook in one API call.
RAG Pipeline Document Ingestion
Parse → chunk → embed in one call
The Problem
You're building a RAG system and need to ingest thousands of PDFs into your vector store. Raw text extraction loses table structure. Chunking by character count breaks mid-sentence. And you need metadata for filtering.
How PeterParser Solves It
PeterParser preserves table structure and reading order. Enable `chunking.enabled: true` with semantic or sentence-based splitting. Each chunk comes with char offsets for precise retrieval. Use the fast lane (`pre_processing: false`) for text-heavy docs where layout doesn't matter.
API Call
curl -X POST https://api.peterparser.com/v2/documents \
-H "X-API-Key: pp_live_..." \
-d '{
"url": "https://example.com/whitepaper.pdf",
"output_format": "markdown",
"chunking": {
"enabled": true,
"max_chunk_size": 1500,
"overlap": 200,
"strategy": "semantic"
},
"classify": { "enabled": true },
"summarize": true
}'What You Get
- →Three chunking strategies: semantic, fixed, sentence-based
- →Configurable chunk size (100-10,000 chars) and overlap (0-500 chars)
- →Auto document classification for metadata filtering in your vector store
- →AI-generated summary for each document
- →Fast lane for text-heavy docs — 10x faster, lower cost
1,000 docs/hour with the full pipeline. 5,000/hour on fast lane.
Unstructured offers chunking but with lower table precision. LlamaParse doesn't chunk natively — you need LlamaIndex. PeterParser handles parse + chunk + classify + summarize in one API call.
Legal Case Chronology Extraction
Any legal document → structured timeline JSON with char-level grounding
The Problem
Your litigation support team manually reads complaints, motions, and case files to build case chronologies. Each document takes hours to review. Approximate dates get lost, Bates number references are inconsistent, and there's no structured format for the timeline — just Word docs and sticky notes.
How PeterParser Solves It
The `legal_timeline` document type extracts a full structured chronology: case summary with parties and jurisdiction, plus a timeline array with every datable event. Each event includes date handling (exact, approximate, range, unknown), event classification, display category (court, communication, evidence, medical, financial, witness, other), legal concept tags, confidence levels, party involvement, monetary amounts, statute citations, and full citation with page numbers, source snippets, and AI-generated summaries for provenance. Use async mode for large filings.
API Call
curl -X POST https://api.peterparser.com/v2/documents \
-H "X-API-Key: pp_live_..." \
-H "Content-Type: application/json" \
-d '{
"base64": "<complaint_pdf_base64>",
"document_type": "legal_timeline",
"mode": "async",
"webhook_url": "https://yourapp.com/api/timelines/ready",
"grounding": { "enabled": true },
"summarize": true
}'What You Get
- →Structured timeline with case summary, parties, jurisdiction, and chronological events
- →Approximate date handling — circa dates, date ranges, and unknown precision tracked separately
- →Char-level grounding with confidence scores for every extracted event
- →Event classification: filing, hearing, deposition, order, judgment, settlement, and more
- →Bates numbers, statute citations, and case references extracted automatically
- →Async batch processing for large case files with webhook delivery
$0.10/page. A 50-page complaint processed in ~2 minutes (async).
CaseFleet, DISCO, and Everchron are closed SaaS platforms with no API. PeterParser is the only REST API that returns structured timeline JSON with char-level grounding, approximate date handling, and async/webhook support.
100 Free Credits. No Credit Card.
Parse your first document in under 60 seconds. Every preset, every feature — available immediately.
Get Your API Key