v2 public api

PeterParser Docs

Parse any document into structured data with a single API call. Base URL: https://api.peterparser.ai/v2

Quickstart

1

Get your API key

Sign up at peterparser.ai — you get 100 free credits instantly. Your key: pp_live_...

2

Parse your first document

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/invoice.pdf",
    "documentType": "invoice"
  }'
3

Get structured data back

The response includes extracted data, content formats, enrichments, processing info, and usage. Enable summarize or chunking add-ons to populate enrichment fields.

json
{
  "success": true,
  "data": {
    "invoice_number": "INV-2024-001",
    "vendor": { "name": "Acme Corp", "address": "123 Main St" },
    "total": 1500.00,
    "tax": 135.00,
    "line_items": [
      { "description": "Widget Pro", "quantity": 10, "unit_price": 150.00 }
    ]
  },
  "document": {
    "type": "invoice",
    "pages": 2,
    "language": "en",
    "filename": "invoice.pdf"
  },
  "content": {
    "format": "json",
    "json": "{ \"invoice_number\": \"INV-2024-001\", ... }",
    "markdown": "# Invoice INV-2024-001\n...",
    "text": "Invoice INV-2024-001\nVendor: Acme Corp\nTotal: $1,500.00"
  },
  "enrichments": {
    "summary": null,
    "chunks": null,
    "grounding": [{ "field": "total", "value": 1500.00, "sourceText": "Total: $1,500.00", "page": 1, "confidence": 1.0 }],
    "pii": null
  },
  "processing": {
    "premiumModel": false,
    "preprocessing": false,
    "timeMs": 1250,
    "cached": false
  },
  "usage": {
    "pages": 2,
    "documentType": "invoice",
    "pricePerUnit": 0.01,
    "totalCredits": 0.02,
    "displayCost": "0.0200 credits",
    "breakdown": { "base": 0.02, "premiumModelSurcharge": 0, "piiSurcharge": 0, "summarizeFee": 0 }
  }
}

Add-on fields: enrichments.summary, enrichments.chunks, and enrichments.pii are null by default. They populate when you enable the corresponding option. enrichments.grounding is always populated (grounding is forced on all requests).

Authentication

The v2 public API uses API key authentication. Include your key in the X-API-Key header. No CORS restrictions — call from any origin.

EndpointsAuthCORS
/health/*NoneOpen
/v2/pricing/*NoneOpen
/v2/documents/*X-API-Key: pp_live_...Open
/v2/events/*X-API-Key: pp_live_...Open
curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf"}'

IP Whitelisting

Each API key supports IP whitelisting. ["0.0.0.0"] allows all IPs (default). Restrict to specific IPs or CIDR ranges: ["10.0.0.0/8"]

Document Types

0 available types. Use "auto" to let the API detect the type automatically.

TypePricePerDescription

Tip: Use GET /v2/documents/{type}/sample to preview output and GET /v2/documents/{type}/meta to see expected fields.

Legal Timeline (legal_timeline)

Extracts a structured chronology from legal documents. Returns a case_summary with parties, jurisdiction, and dates, plus a timeline array of events with date handling, event classification, display category, legal tags, confidence_level, and full citation with page numbers, source snippets, and summaries for provenance. Recommended: async mode for large filings. Requires preprocessing: true — Gemini vision extracts text first, then chunks are processed for structured extraction.

json
// Response — legal_timeline data structure
{
  "case_summary": {
    "case_name": "Smith v. Acme Industries",
    "case_number": "2024-CV-04521",
    "court": "U.S. District Court, SDNY",
    "parties": {
      "plaintiff": ["John Smith"],
      "defendant": ["Acme Industries, Inc."],
      "counsel": ["Jane Doe, Esq."]
    },
    "timeline_span": { "start": "2023-06-01", "end": "2024-11-20" }
  },
  "timeline": [
    {
      "event_id": "evt_001",
      "date": {
        "raw": "around June 2023",
        "iso": "2023-06-01",
        "precision": "approximate",
        "circa": true
      },
      "title": "Product malfunction incident",
      "event_type": "incident",
      "category": "other",
      "legal_significance": "critical",
      "confidence_level": "medium",
      "tags": ["negligence", "liability", "causation", "damages"],
      "parties_involved": [
        { "name": "John Smith", "role": "plaintiff", "entity_type": "person" }
      ],
      "amounts": [{ "value": 45000, "currency": "USD", "description": "Medical expenses" }],
      "document_numbers": ["SMITH-00001"],
      "citation": {
        "page_number": 3,
        "page_range": [3, 4],
        "section": "Statement of Facts",
        "paragraph": "¶ 14",
        "source_snippet": "On or about June 2023, Plaintiff sustained injuries when the Acme Model X device malfunctioned during normal use at plaintiff's residence...",
        "source_summary": "Describes the initial product malfunction incident that caused plaintiff's injuries at his Brooklyn residence.",
        "bates_range": "SMITH-00045 to SMITH-00046"
      },
      "grounding": {
        "charStart": 245,
        "charEnd": 412,
        "sourceText": "On or about June 2023, Plaintiff sustained injuries...",
        "confidence": 0.98,
        "page_number": 3,
        "page_range": [3, 4],
        "context_before": "...had used the device without issue for six months.",
        "context_after": "Plaintiff was transported to Brooklyn Methodist Hospital..."
      },
      "flags": { "is_key_event": true, "is_disputed": false, "needs_review": false }
    }
  ]
}

Dynamic Endpoint Config via /meta

The GET /v2/documents/{type}/meta endpoint now returns dynamic configuration set by admins: async_only, default_options, mandatory_options, and processing_modes.

json
// GET /v2/documents/legal_timeline/meta
{
  "documentType": "legal_timeline",
  "pricing": { "pricePerUnit": 0.10, "unit": "page" },
  "expectedFields": ["case_summary", "timeline", "parties", "jurisdiction", "key_dates", "timeline_span", "category", "tags", "confidence_level", "citation", "grounding"],
  "asyncOnly": false,
  "defaultOptions": {},
  "mandatoryOptions": {},
  "processingModes": ["sync", "async"],
  "recommendedOptions": { "outputFormat": "json", "summarize": true, "premiumModel": true },
  "mandatoryOptions": { "preprocessing": true }
}

Input Methods

Three ways to send documents. All go through the same pipeline.

URLPOST /v2/documents

Pass a public URL. PeterParser downloads and parses it server-side.

json
{ "url": "https://example.com/document.pdf", "documentType": "auto" }
Base64POST /v2/documents

Encode the file as base64. Include filename for better type detection.

json
{
  "base64": "JVBERi0xLjQK...",
  "filename": "invoice.pdf",
  "contentType": "application/pdf",
  "documentType": "invoice"
}
File UploadPOST /v2/documents/upload

Multipart upload. Best for direct file uploads.

curl -X POST https://api.peterparser.ai/v2/documents/upload \
  -H "X-API-Key: pp_live_your_key" \
  -F "file=@invoice.pdf" \
  -F "documentType=invoice" \
  -F "outputFormat=json"

Output Formats

The outputs object in the response always contains json, markdown, and text representations. Set outputFormat to control the primary format.

json

Structured JSON with typed fields. Default.

markdown

Clean Markdown preserving tables and headings.

text

Plain text, no formatting.

html

HTML preserving document structure.

csv

CSV for tabular data extraction.

xml

XML with structured tags.

json
// The "outputs" object in every response:
"outputs": {
  "json": "{ \"invoice_number\": \"INV-001\", ... }",
  "markdown": "# Invoice INV-001\n\n| Item | Amount |\n|---|---|\n| Widget | $500 |",
  "text": "Invoice INV-001\nItem: Widget — $500"
}

Custom Output Templates

Define a custom extraction schema. The API returns data matching your structure exactly.

json
{
  "outputTemplate": {
    "vendor_name": "string",
    "total_amount": "number",
    "due_date": "string (YYYY-MM-DD)",
    "line_items": [{ "description": "string", "amount": "number" }]
  }
}

PII Detection & Masking

Detect and mask PII in a single pass. Surcharge: $0.002/page. When enabled, the enrichments.pii field populates in the response.

Supported PII types

ssncredit_cardphoneemailaddressnamedate_of_birthbank_accountip_address
json
// Request
{
  "pii": {
    "detect": true,
    "mask": true,
    "maskChar": "*",
    "types": ["ssn", "credit_card"]
  }
}
json
// Response — enrichments.pii populates when pii.detect is true
"enrichments": {
  "pii": [
    { "type": "ssn", "value": "***-**-1234", "page": 1, "confidence": 0.95 },
    { "type": "credit_card", "value": "****-****-****-5678", "page": 2, "confidence": 0.95 }
  ]
}

Chunking (RAG)

Split documents into chunks for vector store ingestion. When enabled, the enrichments.chunks array populates in the response. Three strategies:

semantic

Topic-based splitting that respects content boundaries

fixed

Fixed character-count chunks with configurable overlap

sentence

Split on sentence boundaries for natural breaks

json
// Request
{
  "chunking": {
    "enabled": true,
    "maxChunkSize": 1500,
    "overlap": 200,
    "strategy": "semantic"
  }
}

// Response — enrichments.chunks populates when chunking.enabled is true
"enrichments": {
  "chunks": [
    { "id": "chunk_0", "content": "Invoice INV-2024-001 issued by Acme Corp...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "header" } },
    { "id": "chunk_1", "content": "Line items: Widget Pro x10 at $150.00...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "line_items" } }
  ]
}

Source Grounding

Source grounding is always enabled. The enrichments.grounding array is populated for every request with references linking extracted values to their source text.

json
// Response — enrichments.grounding is always populated
"enrichments": {
  "grounding": [
    {
      "field": "total",
      "value": 1500.00,
      "sourceText": "Total: $1,500.00",
      "page": 3,
      "confidence": 1.0
    },
    {
      "field": "vendor.name",
      "value": "Acme Corp",
      "sourceText": "issued by Acme Corp, Inc.",
      "page": 1,
      "confidence": 0.98
    }
  ]
}

Batch Processing

Submit up to 50 files in a single request. Batch jobs are always processed asynchronously — poll the batch status endpoint or use webhooks to get results.

Submit a Batch

POST /v2/documents/batch — Send multiple files with shared parsing options. Returns a batchId for tracking.

curl -X POST https://api.peterparser.ai/v2/documents/batch \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      { "url": "https://example.com/invoice1.pdf", "documentType": "invoice" },
      { "url": "https://example.com/invoice2.pdf", "documentType": "invoice" },
      { "url": "https://example.com/receipt.png", "documentType": "receipt" }
    ],
    "outputFormat": "json",
    "summarize": true,
    "webhookUrl": "https://your-server.com/api/webhooks/batch"
  }'

Response

json
{
  "batchId": "batch_abc123",
  "status": "queued",
  "totalFiles": 3,
  "createdAt": "2026-03-07T12:00:00Z"
}

Check Batch Status

GET /v2/documents/batch/{batchId} — Poll for progress. Each file reports its own status. When all files finish, the batch status becomes completed.

bash
curl https://api.peterparser.ai/v2/documents/batch/batch_abc123 \
  -H "X-API-Key: pp_live_your_key"

Response

json
{
  "batchId": "batch_abc123",
  "status": "processing",
  "totalFiles": 3,
  "completed": 2,
  "failed": 0,
  "createdAt": "2026-03-07T12:00:00Z",
  "files": [
    {
      "index": 0,
      "status": "completed",
      "jobId": "job_001",
      "result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
    },
    {
      "index": 1,
      "status": "completed",
      "jobId": "job_002",
      "result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
    },
    {
      "index": 2,
      "status": "processing",
      "jobId": "job_003",
      "result": null
    }
  ]
}

Limits & Notes

  • Maximum 50 files per batch request.
  • Each file in the batch supports the same options as a single POST /v2/documents request.
  • Credits are charged per file as each one completes.
  • Set webhookUrl on the batch to receive a single callback when the entire batch finishes.

Async Processing & Real-Time Events

Documents over 10 pages auto-switch to async. Force it with "forceAsync": true. When enabled, the enrichments.summary field populates if "summarize": true.

Recommended

Webhooks

Set webhookUrl — PeterParser POSTs the completed result to your endpoint.

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf",
    "forceAsync": true,
    "summarize": true,
    "webhookUrl": "https://your-server.com/api/webhooks/pp"
  }'

Server-Sent Events (SSE)

One persistent connection streams events for all your jobs. Events: job_completed, job_failed, heartbeat (15s), stream_end.

bash
curl -N -H "X-API-Key: pp_live_your_key" \
  "https://api.peterparser.ai/v2/events?ttl=600"

# Catch up on missed events:
curl -H "X-API-Key: pp_live_your_key" \
  "https://api.peterparser.ai/v2/events/history?limit=50"

Polling

Check job status with the jobId.

bash
curl https://api.peterparser.ai/v2/documents/jobs/{job_id} \
  -H "X-API-Key: pp_live_your_key"

Summary add-on

When "summarize": true, the enrichments.summary field populates with an AI-generated summary. Surcharge: $0.005 flat.

json
// Response — enrichments.summary populates when summarize is true
"enrichments": {
  "summary": "This invoice from Acme Corp totals $1,500.00 for 10 Widget Pro units at $150 each, with $135 tax. Due date is March 15, 2026. Payment terms: Net 30."
}

Vision Preprocessing

When "preprocessing": true, Gemini vision extracts text first, then chunks are processed for structured extraction. This 2-step pipeline improves accuracy on complex documents. Mandatory for legal_timeline.

json
{
  "url": "https://example.com/legal-filing.pdf",
  "documentType": "legal_timeline",
  "preprocessing": true
}

Errors & Status Codes

All errors return JSON with a detail field.

json
{ "detail": "Insufficient credits. Required: 0.10, available: 0.05" }
CodeMeaningWhat to do
200SuccessProcess the response
201CreatedResource created
400Bad RequestCheck request body and parameters
401UnauthorizedVerify your API key
402Payment RequiredTop up credits
403ForbiddenCheck IP whitelist
404Not FoundVerify endpoint URL or resource ID
422Validation ErrorCheck required fields and types
429Rate LimitedBack off and retry
500Server ErrorRetry with backoff

Rate Limits

Default: 100 requests/min per API key, 1 concurrent SSE connection. Configurable per key.

Ready to integrate?

See the full API reference for every v2 endpoint, parameter, and response.