What You Can Build with PeterParser

Real scenarios. Actual API calls. Specific features that solve specific problems — not vague “AI-powered document processing” promises.

Automate Accounts Payable & Receivable

Invoice preset + webhooks = zero manual data entry

The Problem

Your AP team manually keys invoices into your ERP. Each invoice takes 3-5 minutes. At 500 invoices/month, that's 40+ hours of data entry — plus a 4% error rate that causes payment disputes.

How PeterParser Solves It

Send invoice PDFs to the /v2/documents endpoint with the `invoice` preset. PeterParser extracts vendor name, line items, totals, tax, PO numbers, and payment terms into clean JSON. Set a webhook_url and results POST to your system automatically when done.

API Call

curl -X POST https://api.peterparser.com/v2/documents \
  -H "X-API-Key: pp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "base64": "<invoice_pdf_base64>",
    "document_type": "invoice",
    "extraction_preset": "invoice",
    "mode": "async",
    "webhook_url": "https://yourapp.com/api/invoices/ingest"
  }'

What You Get

  • 16-field invoice preset: vendor, customer, line items, tax, totals, PO numbers
  • 99.5% table accuracy on line item extraction
  • Char-level grounding — click any amount to see where it appears in the PDF
  • Async processing with webhook delivery and HMAC signature verification
Impact

3-5 min/invoice → 2 seconds. 99.5% accuracy vs 96% manual.

vs Alternatives

LlamaParse extracts text but doesn't provide structured JSON with custom schemas. Nanonets requires GPU infrastructure. PeterParser gives you a preset + webhook in one API call.

RAG Pipeline Document Ingestion

Parse → chunk → embed in one call

The Problem

You're building a RAG system and need to ingest thousands of PDFs into your vector store. Raw text extraction loses table structure. Chunking by character count breaks mid-sentence. And you need metadata for filtering.

How PeterParser Solves It

PeterParser preserves table structure and reading order. Enable `chunking.enabled: true` with semantic or sentence-based splitting. Each chunk comes with char offsets for precise retrieval. Use the fast lane (`pre_processing: false`) for text-heavy docs where layout doesn't matter.

API Call

curl -X POST https://api.peterparser.com/v2/documents \
  -H "X-API-Key: pp_live_..." \
  -d '{
    "url": "https://example.com/whitepaper.pdf",
    "output_format": "markdown",
    "chunking": {
      "enabled": true,
      "max_chunk_size": 1500,
      "overlap": 200,
      "strategy": "semantic"
    },
    "classify": { "enabled": true },
    "summarize": true
  }'

What You Get

  • Three chunking strategies: semantic, fixed, sentence-based
  • Configurable chunk size (100-10,000 chars) and overlap (0-500 chars)
  • Auto document classification for metadata filtering in your vector store
  • AI-generated summary for each document
  • Fast lane for text-heavy docs — 10x faster, lower cost
Impact

1,000 docs/hour with the full pipeline. 5,000/hour on fast lane.

vs Alternatives

Unstructured offers chunking but with lower table precision. LlamaParse doesn't chunk natively — you need LlamaIndex. PeterParser handles parse + chunk + classify + summarize in one API call.

100 Free Credits. No Credit Card.

Parse your first document in under 60 seconds. Every preset, every feature — available immediately.

Get Your API Key