What You Can Build with PeterParser

Real scenarios. Actual API calls. Specific features that solve specific problems — not vague “AI-powered document processing” promises.

Automate Accounts Payable & Receivable

Invoice preset + webhooks = zero manual data entry

The Problem

Your AP team manually keys invoices into your ERP. Each invoice takes 3-5 minutes. At 500 invoices/month, that's 40+ hours of data entry — plus a 4% error rate that causes payment disputes.

How PeterParser Solves It

Send invoice PDFs to the /v2/documents endpoint with the `invoice` preset. PeterParser extracts vendor name, line items, totals, tax, PO numbers, and payment terms into clean JSON. Set a webhook_url and results POST to your system automatically when done.

API Call

curl -X POST https://api.peterparser.com/v2/documents \
  -H "X-API-Key: pp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "base64": "<invoice_pdf_base64>",
    "document_type": "invoice",
    "extraction_preset": "invoice",
    "mode": "async",
    "webhook_url": "https://yourapp.com/api/invoices/ingest"
  }'

What You Get

→16-field invoice preset: vendor, customer, line items, tax, totals, PO numbers
→99.5% table accuracy on line item extraction
→Char-level grounding — click any amount to see where it appears in the PDF
→Async processing with webhook delivery and HMAC signature verification

Impact

3-5 min/invoice → 2 seconds. 99.5% accuracy vs 96% manual.

vs Alternatives

LlamaParse extracts text but doesn't provide structured JSON with custom schemas. Nanonets requires GPU infrastructure. PeterParser gives you a preset + webhook in one API call.

RAG Pipeline Document Ingestion

Parse → chunk → embed in one call

The Problem

You're building a RAG system and need to ingest thousands of PDFs into your vector store. Raw text extraction loses table structure. Chunking by character count breaks mid-sentence. And you need metadata for filtering.

How PeterParser Solves It

PeterParser preserves table structure and reading order. Enable `chunking.enabled: true` with semantic or sentence-based splitting. Each chunk comes with char offsets for precise retrieval. Use the fast lane (`pre_processing: false`) for text-heavy docs where layout doesn't matter.

API Call

curl -X POST https://api.peterparser.com/v2/documents \
  -H "X-API-Key: pp_live_..." \
  -d '{
    "url": "https://example.com/whitepaper.pdf",
    "output_format": "markdown",
    "chunking": {
      "enabled": true,
      "max_chunk_size": 1500,
      "overlap": 200,
      "strategy": "semantic"
    },
    "classify": { "enabled": true },
    "summarize": true
  }'

What You Get

→Three chunking strategies: semantic, fixed, sentence-based
→Configurable chunk size (100-10,000 chars) and overlap (0-500 chars)
→Auto document classification for metadata filtering in your vector store
→AI-generated summary for each document
→Fast lane for text-heavy docs — 10x faster, lower cost

Impact

1,000 docs/hour with the full pipeline. 5,000/hour on fast lane.

vs Alternatives

Unstructured offers chunking but with lower table precision. LlamaParse doesn't chunk natively — you need LlamaIndex. PeterParser handles parse + chunk + classify + summarize in one API call.

Legal Case Chronology Extraction

Any legal document → structured timeline JSON with char-level grounding

The Problem

Your litigation support team manually reads complaints, motions, and case files to build case chronologies. Each document takes hours to review. Approximate dates get lost, Bates number references are inconsistent, and there's no structured format for the timeline — just Word docs and sticky notes.

How PeterParser Solves It

The `legal_timeline` document type extracts a full structured chronology: case summary with parties and jurisdiction, plus a timeline array with every datable event. Each event includes date handling (exact, approximate, range, unknown), event classification, display category (court, communication, evidence, medical, financial, witness, other), legal concept tags, confidence levels, party involvement, monetary amounts, statute citations, and full citation with page numbers, source snippets, and AI-generated summaries for provenance. Use async mode for large filings.

API Call

curl -X POST https://api.peterparser.com/v2/documents \
  -H "X-API-Key: pp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "base64": "<complaint_pdf_base64>",
    "document_type": "legal_timeline",
    "mode": "async",
    "webhook_url": "https://yourapp.com/api/timelines/ready",
    "grounding": { "enabled": true },
    "summarize": true
  }'

What You Get

→Structured timeline with case summary, parties, jurisdiction, and chronological events
→Approximate date handling — circa dates, date ranges, and unknown precision tracked separately
→Char-level grounding with confidence scores for every extracted event
→Event classification: filing, hearing, deposition, order, judgment, settlement, and more
→Bates numbers, statute citations, and case references extracted automatically
→Async batch processing for large case files with webhook delivery

Impact

$0.10/page. A 50-page complaint processed in ~2 minutes (async).

vs Alternatives

CaseFleet, DISCO, and Everchron are closed SaaS platforms with no API. PeterParser is the only REST API that returns structured timeline JSON with char-level grounding, approximate date handling, and async/webhook support.

100 Free Credits. No Credit Card.

Parse your first document in under 60 seconds. Every preset, every feature — available immediately.

Get Your API Key