v2 public api

PeterParser Docs

Parse any document into structured data with a single API call. Base URL: https://api.peterparser.ai/v2

Quickstart

First request in 60 seconds

API Reference

Every v2 endpoint documented

Use Cases

Production patterns

Quickstart

Get your API key

Parse your first document

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/invoice.pdf",
    "documentType": "invoice"
  }'

Get structured data back

The response includes extracted data, content formats, enrichments, processing info, and usage. Enable summarize or chunking add-ons to populate enrichment fields.

json

{
  "success": true,
  "data": {
    "invoice_number": "INV-2024-001",
    "vendor": { "name": "Acme Corp", "address": "123 Main St" },
    "total": 1500.00,
    "tax": 135.00,
    "line_items": [
      { "description": "Widget Pro", "quantity": 10, "unit_price": 150.00 }
    ]
  },
  "document": {
    "type": "invoice",
    "pages": 2,
    "language": "en",
    "filename": "invoice.pdf"
  },
  "content": {
    "format": "json",
    "json": "{ \"invoice_number\": \"INV-2024-001\", ... }",
    "markdown": "# Invoice INV-2024-001\n...",
    "text": "Invoice INV-2024-001\nVendor: Acme Corp\nTotal: $1,500.00"
  },
  "enrichments": {
    "summary": null,
    "chunks": null,
    "grounding": [{ "field": "total", "value": 1500.00, "sourceText": "Total: $1,500.00", "page": 1, "confidence": 1.0 }],
    "pii": null
  },
  "processing": {
    "premiumModel": false,
    "preprocessing": false,
    "timeMs": 1250,
    "cached": false
  },
  "usage": {
    "pages": 2,
    "documentType": "invoice",
    "pricePerUnit": 0.01,
    "totalCredits": 0.02,
    "displayCost": "0.0200 credits",
    "breakdown": { "base": 0.02, "premiumModelSurcharge": 0, "piiSurcharge": 0, "summarizeFee": 0 }
  }
}

Add-on fields: enrichments.summary, enrichments.chunks, and enrichments.pii are null by default. They populate when you enable the corresponding option. enrichments.grounding is always populated (grounding is forced on all requests).

Authentication

The v2 public API uses API key authentication. Include your key in the X-API-Key header. No CORS restrictions — call from any origin.

Endpoints	Auth	CORS
`/health/*`	None	Open
`/v2/pricing/*`	None	Open
`/v2/documents/*`	`X-API-Key: pp_live_...`	Open
`/v2/events/*`	`X-API-Key: pp_live_...`	Open

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf"}'

IP Whitelisting

Each API key supports IP whitelisting. ["0.0.0.0"] allows all IPs (default). Restrict to specific IPs or CIDR ranges: ["10.0.0.0/8"]

Document Types

0 available types. Use "auto" to let the API detect the type automatically.

Type	Price	Per	Description

Tip: Use GET /v2/documents/{type}/sample to preview output and GET /v2/documents/{type}/meta to see expected fields.

Legal Timeline (`legal_timeline`)

Extracts a structured chronology from legal documents. Returns a case_summary with parties, jurisdiction, and dates, plus a timeline array of events with date handling, event classification, display category, legal tags, confidence_level, and full citation with page numbers, source snippets, and summaries for provenance. Recommended: async mode for large filings. Requires preprocessing: true — Gemini vision extracts text first, then chunks are processed for structured extraction.

json

// Response — legal_timeline data structure
{
  "case_summary": {
    "case_name": "Smith v. Acme Industries",
    "case_number": "2024-CV-04521",
    "court": "U.S. District Court, SDNY",
    "parties": {
      "plaintiff": ["John Smith"],
      "defendant": ["Acme Industries, Inc."],
      "counsel": ["Jane Doe, Esq."]
    },
    "timeline_span": { "start": "2023-06-01", "end": "2024-11-20" }
  },
  "timeline": [
    {
      "event_id": "evt_001",
      "date": {
        "raw": "around June 2023",
        "iso": "2023-06-01",
        "precision": "approximate",
        "circa": true
      },
      "title": "Product malfunction incident",
      "event_type": "incident",
      "category": "other",
      "legal_significance": "critical",
      "confidence_level": "medium",
      "tags": ["negligence", "liability", "causation", "damages"],
      "parties_involved": [
        { "name": "John Smith", "role": "plaintiff", "entity_type": "person" }
      ],
      "amounts": [{ "value": 45000, "currency": "USD", "description": "Medical expenses" }],
      "document_numbers": ["SMITH-00001"],
      "citation": {
        "page_number": 3,
        "page_range": [3, 4],
        "section": "Statement of Facts",
        "paragraph": "¶ 14",
        "source_snippet": "On or about June 2023, Plaintiff sustained injuries when the Acme Model X device malfunctioned during normal use at plaintiff's residence...",
        "source_summary": "Describes the initial product malfunction incident that caused plaintiff's injuries at his Brooklyn residence.",
        "bates_range": "SMITH-00045 to SMITH-00046"
      },
      "grounding": {
        "charStart": 245,
        "charEnd": 412,
        "sourceText": "On or about June 2023, Plaintiff sustained injuries...",
        "confidence": 0.98,
        "page_number": 3,
        "page_range": [3, 4],
        "context_before": "...had used the device without issue for six months.",
        "context_after": "Plaintiff was transported to Brooklyn Methodist Hospital..."
      },
      "flags": { "is_key_event": true, "is_disputed": false, "needs_review": false }
    }
  ]
}

Dynamic Endpoint Config via `/meta`

The GET /v2/documents/{type}/meta endpoint now returns dynamic configuration set by admins: async_only, default_options, mandatory_options, and processing_modes.

json

// GET /v2/documents/legal_timeline/meta
{
  "documentType": "legal_timeline",
  "pricing": { "pricePerUnit": 0.10, "unit": "page" },
  "expectedFields": ["case_summary", "timeline", "parties", "jurisdiction", "key_dates", "timeline_span", "category", "tags", "confidence_level", "citation", "grounding"],
  "asyncOnly": false,
  "defaultOptions": {},
  "mandatoryOptions": {},
  "processingModes": ["sync", "async"],
  "recommendedOptions": { "outputFormat": "json", "summarize": true, "premiumModel": true },
  "mandatoryOptions": { "preprocessing": true }
}

Input Methods

Three ways to send documents. All go through the same pipeline.

URLPOST /v2/documents

Pass a public URL. PeterParser downloads and parses it server-side.

json

{ "url": "https://example.com/document.pdf", "documentType": "auto" }

Base64POST /v2/documents

Encode the file as base64. Include filename for better type detection.

json

{
  "base64": "JVBERi0xLjQK...",
  "filename": "invoice.pdf",
  "contentType": "application/pdf",
  "documentType": "invoice"
}

File UploadPOST /v2/documents/upload

Multipart upload. Best for direct file uploads.

curl -X POST https://api.peterparser.ai/v2/documents/upload \
  -H "X-API-Key: pp_live_your_key" \
  -F "file=@invoice.pdf" \
  -F "documentType=invoice" \
  -F "outputFormat=json"

Output Formats

The outputs object in the response always contains json, markdown, and text representations. Set outputFormat to control the primary format.

json

Structured JSON with typed fields. Default.

markdown

Clean Markdown preserving tables and headings.

text

Plain text, no formatting.

html

HTML preserving document structure.

csv

CSV for tabular data extraction.

xml

XML with structured tags.

json

// The "outputs" object in every response:
"outputs": {
  "json": "{ \"invoice_number\": \"INV-001\", ... }",
  "markdown": "# Invoice INV-001\n\n| Item | Amount |\n|---|---|\n| Widget | $500 |",
  "text": "Invoice INV-001\nItem: Widget — $500"
}

Custom Output Templates

Define a custom extraction schema. The API returns data matching your structure exactly.

json

{
  "outputTemplate": {
    "vendor_name": "string",
    "total_amount": "number",
    "due_date": "string (YYYY-MM-DD)",
    "line_items": [{ "description": "string", "amount": "number" }]
  }
}

PII Detection & Masking

Detect and mask PII in a single pass. Surcharge: $0.002/page. When enabled, the enrichments.pii field populates in the response.

Supported PII types

ssncredit_cardphoneemailaddressnamedate_of_birthbank_accountip_address

json

// Request
{
  "pii": {
    "detect": true,
    "mask": true,
    "maskChar": "*",
    "types": ["ssn", "credit_card"]
  }
}

json

// Response — enrichments.pii populates when pii.detect is true
"enrichments": {
  "pii": [
    { "type": "ssn", "value": "***-**-1234", "page": 1, "confidence": 0.95 },
    { "type": "credit_card", "value": "****-****-****-5678", "page": 2, "confidence": 0.95 }
  ]
}

Chunking (RAG)

Split documents into chunks for vector store ingestion. When enabled, the enrichments.chunks array populates in the response. Three strategies:

semantic

Topic-based splitting that respects content boundaries

fixed

Fixed character-count chunks with configurable overlap

sentence

Split on sentence boundaries for natural breaks

json

// Request
{
  "chunking": {
    "enabled": true,
    "maxChunkSize": 1500,
    "overlap": 200,
    "strategy": "semantic"
  }
}

// Response — enrichments.chunks populates when chunking.enabled is true
"enrichments": {
  "chunks": [
    { "id": "chunk_0", "content": "Invoice INV-2024-001 issued by Acme Corp...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "header" } },
    { "id": "chunk_1", "content": "Line items: Widget Pro x10 at $150.00...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "line_items" } }
  ]
}

Source Grounding

Source grounding is always enabled. The enrichments.grounding array is populated for every request with references linking extracted values to their source text.

json

// Response — enrichments.grounding is always populated
"enrichments": {
  "grounding": [
    {
      "field": "total",
      "value": 1500.00,
      "sourceText": "Total: $1,500.00",
      "page": 3,
      "confidence": 1.0
    },
    {
      "field": "vendor.name",
      "value": "Acme Corp",
      "sourceText": "issued by Acme Corp, Inc.",
      "page": 1,
      "confidence": 0.98
    }
  ]
}

Batch Processing

Submit up to 50 files in a single request. Batch jobs are always processed asynchronously — poll the batch status endpoint or use webhooks to get results.

Submit a Batch

POST /v2/documents/batch — Send multiple files with shared parsing options. Returns a batchId for tracking.

curl -X POST https://api.peterparser.ai/v2/documents/batch \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      { "url": "https://example.com/invoice1.pdf", "documentType": "invoice" },
      { "url": "https://example.com/invoice2.pdf", "documentType": "invoice" },
      { "url": "https://example.com/receipt.png", "documentType": "receipt" }
    ],
    "outputFormat": "json",
    "summarize": true,
    "webhookUrl": "https://your-server.com/api/webhooks/batch"
  }'

Response

json

{
  "batchId": "batch_abc123",
  "status": "queued",
  "totalFiles": 3,
  "createdAt": "2026-03-07T12:00:00Z"
}

Check Batch Status

GET /v2/documents/batch/{batchId} — Poll for progress. Each file reports its own status. When all files finish, the batch status becomes completed.

bash

curl https://api.peterparser.ai/v2/documents/batch/batch_abc123 \
  -H "X-API-Key: pp_live_your_key"

Response

json

{
  "batchId": "batch_abc123",
  "status": "processing",
  "totalFiles": 3,
  "completed": 2,
  "failed": 0,
  "createdAt": "2026-03-07T12:00:00Z",
  "files": [
    {
      "index": 0,
      "status": "completed",
      "jobId": "job_001",
      "result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
    },
    {
      "index": 1,
      "status": "completed",
      "jobId": "job_002",
      "result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
    },
    {
      "index": 2,
      "status": "processing",
      "jobId": "job_003",
      "result": null
    }
  ]
}

Limits & Notes

Maximum 50 files per batch request.
Each file in the batch supports the same options as a single POST /v2/documents request.
Credits are charged per file as each one completes.
Set webhookUrl on the batch to receive a single callback when the entire batch finishes.

Async Processing & Real-Time Events

Documents over 10 pages auto-switch to async. Force it with "forceAsync": true. When enabled, the enrichments.summary field populates if "summarize": true.

Recommended

Webhooks

Set webhookUrl — PeterParser POSTs the completed result to your endpoint.

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf",
    "forceAsync": true,
    "summarize": true,
    "webhookUrl": "https://your-server.com/api/webhooks/pp"
  }'

Server-Sent Events (SSE)

One persistent connection streams events for all your jobs. Events: job_completed, job_failed, heartbeat (15s), stream_end.

bash

curl -N -H "X-API-Key: pp_live_your_key" \
  "https://api.peterparser.ai/v2/events?ttl=600"

# Catch up on missed events:
curl -H "X-API-Key: pp_live_your_key" \
  "https://api.peterparser.ai/v2/events/history?limit=50"

Polling

Check job status with the jobId.

bash

curl https://api.peterparser.ai/v2/documents/jobs/{job_id} \
  -H "X-API-Key: pp_live_your_key"

Summary add-on

When "summarize": true, the enrichments.summary field populates with an AI-generated summary. Surcharge: $0.005 flat.

json

// Response — enrichments.summary populates when summarize is true
"enrichments": {
  "summary": "This invoice from Acme Corp totals $1,500.00 for 10 Widget Pro units at $150 each, with $135 tax. Due date is March 15, 2026. Payment terms: Net 30."
}

Vision Preprocessing

When "preprocessing": true, Gemini vision extracts text first, then chunks are processed for structured extraction. This 2-step pipeline improves accuracy on complex documents. Mandatory for legal_timeline.

json

{
  "url": "https://example.com/legal-filing.pdf",
  "documentType": "legal_timeline",
  "preprocessing": true
}

Errors & Status Codes

All errors return JSON with a detail field.

json

{ "detail": "Insufficient credits. Required: 0.10, available: 0.05" }

Code	Meaning	What to do
`200`	Success	Process the response
`201`	Created	Resource created
`400`	Bad Request	Check request body and parameters
`401`	Unauthorized	Verify your API key
`402`	Payment Required	Top up credits
`403`	Forbidden	Check IP whitelist
`404`	Not Found	Verify endpoint URL or resource ID
`422`	Validation Error	Check required fields and types
`429`	Rate Limited	Back off and retry
`500`	Server Error	Retry with backoff

Rate Limits

Default: 100 requests/min per API key, 1 concurrent SSE connection. Configurable per key.

Ready to integrate?

See the full API reference for every v2 endpoint, parameter, and response.

API Reference Get API Key

PeterParser Docs

Quickstart

API Reference

Use Cases

Quickstart

Get your API key

Parse your first document

Get structured data back

Authentication

IP Whitelisting

Document Types

Legal Timeline (legal_timeline)

Dynamic Endpoint Config via /meta

Input Methods

Output Formats

Custom Output Templates

PII Detection & Masking

Supported PII types

Chunking (RAG)

Source Grounding

Batch Processing

Submit a Batch

Response

Check Batch Status

Response

Limits & Notes

Async Processing & Real-Time Events

Webhooks

Server-Sent Events (SSE)

Polling

Summary add-on

Vision Preprocessing

Errors & Status Codes

Rate Limits

Ready to integrate?

Legal Timeline (`legal_timeline`)

Dynamic Endpoint Config via `/meta`