v2 public api

PeterParser Docs

Parse any document into structured data with a single API call. Base URL: https://api.peterparser.ai/v2

Quickstart

1

Get your API key

Sign up at peterparser.ai — you get 100 free credits instantly. Your key: pp_live_...

2

Parse your first document

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/invoice.pdf",
    "documentType": "invoice"
  }'
3

Get structured data back

The response includes extracted data, content formats, enrichments, processing info, and usage. Enable summarize or chunking add-ons to populate enrichment fields.

json
{
  "success": true,
  "data": {
    "invoice_number": "INV-2024-001",
    "issue_date": "2024-01-15",
    "vendor": {
      "name": "Acme Corp",
      "address": { "raw": "123 Main St, Austin, TX 78701, USA", "street": "123 Main St", "city": "Austin", "state": "TX", "postal_code": "78701", "country": "USA", "country_code": "US" },
      "phone": "+15125550142"
    },
    "customer": {
      "name": "John Doe",
      "address": { "raw": "789 Customer Rd, Austin, TX 78702, USA", "street": "789 Customer Rd", "city": "Austin", "state": "TX", "postal_code": "78702", "country": "USA", "country_code": "US" }
    },
    "subtotal": 1500.00,
    "tax_amount": 135.00,
    "total": 1635.00,
    "line_items": [
      { "description": "Widget Pro", "quantity": 10, "unit_price": 150.00, "total": 1500.00 }
    ]
  },
  "document": {
    "type": "invoice",
    "pages": 2,
    "language": "en",
    "filename": "invoice.pdf"
  },
  "content": {
    "format": "json",
    "json": "{ \"invoice_number\": \"INV-2024-001\", ... }",
    "markdown": "# Invoice INV-2024-001\n...",
    "text": "Invoice INV-2024-001\nVendor: Acme Corp\nTotal: $1,500.00"
  },
  "enrichments": {
    "summary": null,
    "chunks": null,
    "grounding": [{ "field": "total", "value": 1500.00, "sourceText": "Total: $1,500.00", "page": 1, "confidence": 1.0 }],
    "pii": null
  },
  "processing": {
    "premiumModel": false,
    "preprocessing": false,
    "timeMs": 1250,
    "cached": false
  },
  "usage": {
    "pages": 2,
    "documentType": "invoice",
    "pricePerUnit": 0.01,
    "totalCredits": 0.02,
    "displayCost": "0.0200 credits",
    "breakdown": { "base": 0.02, "premiumModelSurcharge": 0, "piiSurcharge": 0, "summarizeFee": 0 }
  }
}

Add-on fields: enrichments.summary, enrichments.chunks, and enrichments.pii are null by default. They populate when you enable the corresponding option. enrichments.grounding is always populated (grounding is forced on all requests).

Authentication

The v2 public API uses API key authentication. Include your key in the X-API-Key header. No CORS restrictions — call from any origin.

EndpointsAuthCORS
/health/*NoneOpen
/v2/pricing/*NoneOpen
/v2/documents/*X-API-Key: pp_live_...Open
/v2/events/*X-API-Key: pp_live_...Open
curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf"}'

IP Whitelisting

Each API key supports IP whitelisting. ["0.0.0.0"] allows all IPs (default). Restrict to specific IPs or CIDR ranges: ["10.0.0.0/8"]

Document Types

0 available types. Use "auto" to let the API detect the type automatically.

TypePricePerDescription

Tip: Use GET /v2/documents/{type}/sample to preview output and GET /v2/documents/{type}/meta to see expected fields.

Purchase Order (purchase_order)

Extracts all purchase order fields including line items, vendor and buyer details with structured addresses, totals, payment terms, and approval status. Structured addresses include street, city, state, postal_code, country, and country_code. All six output formats (JSON, Markdown, Text, HTML, XML, CSV) are supported.

json
// Response — purchase_order data structure
{
  "purchase_order_number": "PO-2026-00481",
  "po_date": "2026-05-12",
  "delivery_date": "2026-06-02",
  "vendor": {
    "name": "Acme Industrial Supplies",
    "address": { "raw": "123 Vendor Way, Houston, TX 77002, USA", "street": "123 Vendor Way",
      "city": "Houston", "state": "TX", "postal_code": "77002", "country": "USA", "country_code": "US" },
    "tax_id": "12-3456789", "email": "ar@acme.example", "phone": "+17135551234"
  },
  "buyer": {
    "name": "UTS Consult LLC",
    "address": { "raw": "456 Buyer Ave, Austin, TX 78701, USA", "street": "456 Buyer Ave",
      "city": "Austin", "state": "TX", "postal_code": "78701", "country": "USA", "country_code": "US" },
    "tax_id": "98-7654321", "email": "ap@utsconsult.example", "phone": "+15125555678"
  },
  "ship_to": {
    "name": "UTS Dallas Warehouse",
    "address": { "raw": "789 Warehouse Blvd, Dallas, TX 75201, USA", "street": "789 Warehouse Blvd",
      "city": "Dallas", "state": "TX", "postal_code": "75201", "country": "USA", "country_code": "US" },
    "attention": "Receiving Dock 4"
  },
  "shipping_method": "FedEx Ground",
  "payment_terms": "Net 30",
  "currency": "USD",
  "line_items": [
    { "line_number": 1, "product_code": "SKU-44918", "description": "Industrial-grade 6\" pipe fitting",
      "quantity": 50, "unit_of_measure": "EA", "unit_price": 18.90, "tax_rate": 0.0825,
      "tax_amount": 77.96, "line_total": 945.00, "required_by": "2026-06-02" },
    { "line_number": 2, "product_code": "SKU-44921", "description": "Stainless steel coupler",
      "quantity": 120, "unit_of_measure": "EA", "unit_price": 7.25, "tax_rate": 0.0825,
      "tax_amount": 71.78, "line_total": 870.00, "required_by": "2026-06-02" }
  ],
  "subtotal": 1815.00, "tax_amount": 149.74, "discount": 0, "shipping_cost": 0,
  "total_amount": 1964.74,
  "status_code": "Approved", "sign_of_approval": true,
  "notes": "Deliver to Dock 4 between 8am-4pm M-F.",
  "contract_reference": "MSA-2025-ACME",
  "department_or_cost_center": "OPS-DAL-INVENTORY"
}

Bill of Lading (bill_of_lading)

Extracts all bill of lading fields including cargo line items, shipper and consignee with structured addresses, container number, Incoterms, vessel/voyage info, and signatures. Incoterms are validated against ICC Incoterms 2020; container numbers are checked against ISO 6346 — invalid values appear in validation_warnings but do not fail the job.

json
// Response — bill_of_lading data structure
{
  "bol_number": "MSKU-2026-BL-00412",
  "carrier_name": "Maersk Line",
  "carrier_scac": "MAEU",
  "container_number": "MSKU1234568",
  "incoterms": "FOB",
  "issue_date": "2026-05-10",
  "shipper": {
    "name": "Shanghai Precision Parts Co., Ltd.",
    "address": { "raw": "88 Pudong Ave, Shanghai", "city": "Shanghai", "country_code": "CN" },
    "port_of_loading": "Port of Shanghai", "port_of_loading_code": "CNSHA"
  },
  "consignee": {
    "name": "UTS Consult LLC",
    "address": { "raw": "456 Buyer Ave, Austin, TX 78701", "city": "Austin", "state": "TX", "country_code": "US" },
    "port_of_discharge": "Port of Los Angeles", "port_of_discharge_code": "USLAX"
  },
  "line_items": [
    { "quantity": 200, "description": "Precision machined steel fittings, Grade 316L",
      "hs_code": "7307.99.5060", "gross_weight": 1850, "gross_weight_unit": "KG" }
  ],
  "gross_cargo_weight": 1850, "gross_cargo_weight_unit": "KG",
  "freight_terms": "Prepaid", "freight_charges": 1250, "currency": "USD"
}

Resume / CV (resume)

Extracts candidate details, contact info, work experience, education, and skills from resumes and CVs. Names are split into first/middle/last while keeping the original ordering in name.raw; addresses are structured; phone numbers are normalized to E.164 in contact.phones[].e164; and skills are classified by category in skills_normalized. Priced per document ($0.05), not per page.

json
// Response — resume data structure
{
  "candidate": {
    "name": { "first": "Hugo", "last": "Christensen", "raw": "Hugo Christensen" },
    "headline": "Technical Leader, Online Solution team",
    "summary": "Technical leader with 11 years of experience…",
    "location": { "raw": "Melbourne, VIC, Australia", "city": "Melbourne", "state": "VIC", "country_code": "AU" }
  },
  "contact": {
    "emails": ["hhchristensen@outlook.com"],
    "phones": [{ "raw": "+61 458 023 928", "e164": "+61458023928", "type": "mobile" }],
    "websites": [{ "url": "https://linkedin.com/in/hugochristensen", "type": "linkedin" }]
  },
  "work_experience": [
    { "organization": "Bank of Melbourne", "job_title": "Technical Leader",
      "start_date": "2014-01", "end_date": null, "is_current": true,
      "highlights": ["Lead developer team", "Lead transition from WCM to AEM"] }
  ],
  "education": [
    { "institution": "Monash University", "degree": "BS, Computer science and technology",
      "degree_level": "Bachelor", "start_date": "2001", "end_date": "2005" }
  ],
  "skills": ["AngularJS", "React", "C#", ".NET Framework", "SQL Server", "AWS"],
  "skills_normalized": [
    { "raw": "React", "id": null, "label": "React", "category": "framework" }
  ],
  "total_years_experience": 11
}

Legal Timeline (legal_timeline)

Extracts a structured chronology from legal documents. Returns a case_summary with parties, jurisdiction, and dates, plus a timeline array of events with date handling, event classification, display category, legal tags, confidence_level, and full citation with page numbers, source snippets, and summaries for provenance. Recommended: async mode for large filings. Requires preprocessing: true — Gemini vision extracts text first, then chunks are processed for structured extraction.

json
// Response — legal_timeline data structure
{
  "case_summary": {
    "case_name": "Smith v. Acme Industries",
    "case_number": "2024-CV-04521",
    "court": "U.S. District Court, SDNY",
    "parties": {
      "plaintiff": ["John Smith"],
      "defendant": ["Acme Industries, Inc."],
      "counsel": ["Jane Doe, Esq."]
    },
    "timeline_span": { "start": "2023-06-01", "end": "2024-11-20" }
  },
  "timeline": [
    {
      "event_id": "evt_001",
      "date": {
        "raw": "around June 2023",
        "iso": "2023-06-01",
        "precision": "approximate",
        "circa": true
      },
      "title": "Product malfunction incident",
      "event_type": "incident",
      "category": "other",
      "legal_significance": "critical",
      "confidence_level": "medium",
      "tags": ["negligence", "liability", "causation", "damages"],
      "parties_involved": [
        { "name": "John Smith", "role": "plaintiff", "entity_type": "person" }
      ],
      "amounts": [{ "value": 45000, "currency": "USD", "description": "Medical expenses" }],
      "document_numbers": ["SMITH-00001"],
      "citation": {
        "page_number": 3,
        "page_range": [3, 4],
        "section": "Statement of Facts",
        "paragraph": "¶ 14",
        "source_snippet": "On or about June 2023, Plaintiff sustained injuries when the Acme Model X device malfunctioned during normal use at plaintiff's residence...",
        "source_summary": "Describes the initial product malfunction incident that caused plaintiff's injuries at his Brooklyn residence.",
        "bates_range": "SMITH-00045 to SMITH-00046"
      },
      "grounding": {
        "charStart": 245,
        "charEnd": 412,
        "sourceText": "On or about June 2023, Plaintiff sustained injuries...",
        "confidence": 0.98,
        "page_number": 3,
        "page_range": [3, 4],
        "context_before": "...had used the device without issue for six months.",
        "context_after": "Plaintiff was transported to Brooklyn Methodist Hospital..."
      },
      "flags": { "is_key_event": true, "is_disputed": false, "needs_review": false }
    }
  ]
}

Dynamic Endpoint Config via /meta

The GET /v2/documents/{type}/meta endpoint now returns dynamic configuration set by admins: async_only, default_options, mandatory_options, and processing_modes.

json
// GET /v2/documents/legal_timeline/meta
{
  "documentType": "legal_timeline",
  "pricing": { "pricePerUnit": 0.10, "unit": "page" },
  "expectedFields": ["case_summary", "timeline", "parties", "jurisdiction", "key_dates", "timeline_span", "category", "tags", "confidence_level", "citation", "grounding"],
  "asyncOnly": false,
  "defaultOptions": {},
  "mandatoryOptions": {},
  "processingModes": ["sync", "async"],
  "recommendedOptions": { "outputFormat": "json", "summarize": true, "premiumModel": true },
  "mandatoryOptions": { "preprocessing": true }
}

Input Methods

Three ways to send documents. All go through the same pipeline.

URLPOST /v2/documents

Pass a public URL. PeterParser downloads and parses it server-side.

json
{ "url": "https://example.com/document.pdf", "documentType": "auto" }
Base64POST /v2/documents

Encode the file as base64. Include filename for better type detection.

json
{
  "base64": "JVBERi0xLjQK...",
  "filename": "invoice.pdf",
  "contentType": "application/pdf",
  "documentType": "invoice"
}
File UploadPOST /v2/documents/upload

Multipart upload. Best for direct file uploads.

curl -X POST https://api.peterparser.ai/v2/documents/upload \
  -H "X-API-Key: pp_live_your_key" \
  -F "file=@invoice.pdf" \
  -F "documentType=invoice" \
  -F "outputFormat=json"

Output Formats

The outputs object in the response always contains json, markdown, and text representations. Set outputFormat to control the primary format.

json

Structured JSON with typed fields. Default.

markdown

Clean Markdown preserving tables and headings.

text

Plain text, no formatting.

html

HTML preserving document structure.

csv

CSV for tabular data extraction.

xml

XML with structured tags.

json
// The "outputs" object in every response:
"outputs": {
  "json": "{ \"invoice_number\": \"INV-001\", ... }",
  "markdown": "# Invoice INV-001\n\n| Item | Amount |\n|---|---|\n| Widget | $500 |",
  "text": "Invoice INV-001\nItem: Widget — $500"
}

Custom Output Templates

Define a custom extraction schema. The API returns data matching your structure exactly.

json
{
  "outputTemplate": {
    "vendor_name": "string",
    "total_amount": "number",
    "due_date": "string (YYYY-MM-DD)",
    "line_items": [{ "description": "string", "amount": "number" }]
  }
}

PII Detection & Masking

Detect and mask PII in a single pass. Surcharge: $0.002/page. When enabled, the enrichments.pii field populates in the response.

Supported PII types

ssncredit_cardphoneemailaddressnamedate_of_birthbank_accountip_address
json
// Request
{
  "pii": {
    "detect": true,
    "mask": true,
    "maskChar": "*",
    "types": ["ssn", "credit_card"]
  }
}
json
// Response — enrichments.pii populates when pii.detect is true
"enrichments": {
  "pii": [
    { "type": "ssn", "value": "***-**-1234", "page": 1, "confidence": 0.95 },
    { "type": "credit_card", "value": "****-****-****-5678", "page": 2, "confidence": 0.95 }
  ]
}

Chunking (RAG)

Split documents into chunks for vector store ingestion. When enabled, the enrichments.chunks array populates in the response. Three strategies:

semantic

Topic-based splitting that respects content boundaries

fixed

Fixed character-count chunks with configurable overlap

sentence

Split on sentence boundaries for natural breaks

json
// Request
{
  "chunking": {
    "enabled": true,
    "maxChunkSize": 1500,
    "overlap": 200,
    "strategy": "semantic"
  }
}

// Response — enrichments.chunks populates when chunking.enabled is true
"enrichments": {
  "chunks": [
    { "id": "chunk_0", "content": "Invoice INV-2024-001 issued by Acme Corp...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "header" } },
    { "id": "chunk_1", "content": "Line items: Widget Pro x10 at $150.00...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "line_items" } }
  ]
}

Source Grounding

Source grounding is always enabled. The enrichments.grounding array is populated for every request with references linking extracted values to their source text.

json
// Response — enrichments.grounding is always populated
"enrichments": {
  "grounding": [
    {
      "field": "total",
      "value": 1500.00,
      "sourceText": "Total: $1,500.00",
      "page": 3,
      "confidence": 1.0
    },
    {
      "field": "vendor.name",
      "value": "Acme Corp",
      "sourceText": "issued by Acme Corp, Inc.",
      "page": 1,
      "confidence": 0.98
    }
  ]
}

Batch Processing

Submit up to 50 files in a single request. Batch jobs are always processed asynchronously — poll the batch status endpoint or use webhooks to get results.

Submit a Batch

POST /v2/documents/batch — Send multiple files with shared parsing options. Returns a batchId for tracking.

curl -X POST https://api.peterparser.ai/v2/documents/batch \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      { "url": "https://example.com/invoice1.pdf", "documentType": "invoice" },
      { "url": "https://example.com/invoice2.pdf", "documentType": "invoice" },
      { "url": "https://example.com/receipt.png", "documentType": "receipt" }
    ],
    "outputFormat": "json",
    "summarize": true,
    "webhookUrl": "https://your-server.com/api/webhooks/batch"
  }'

Response

json
{
  "batchId": "batch_abc123",
  "status": "queued",
  "totalFiles": 3,
  "createdAt": "2026-03-07T12:00:00Z"
}

Check Batch Status

GET /v2/documents/batch/{batchId} — Poll for progress. Each file reports its own status. When all files finish, the batch status becomes completed.

bash
curl https://api.peterparser.ai/v2/documents/batch/batch_abc123 \
  -H "X-API-Key: pp_live_your_key"

Response

json
{
  "batchId": "batch_abc123",
  "status": "processing",
  "totalFiles": 3,
  "completed": 2,
  "failed": 0,
  "createdAt": "2026-03-07T12:00:00Z",
  "files": [
    {
      "index": 0,
      "status": "completed",
      "jobId": "job_001",
      "result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
    },
    {
      "index": 1,
      "status": "completed",
      "jobId": "job_002",
      "result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
    },
    {
      "index": 2,
      "status": "processing",
      "jobId": "job_003",
      "result": null
    }
  ]
}

Limits & Notes

  • Maximum 50 files per batch request.
  • Each file in the batch supports the same options as a single POST /v2/documents request.
  • Credits are charged per file as each one completes.
  • Set webhookUrl on the batch to receive a single callback when the entire batch finishes.

Async Processing & Real-Time Events

Documents over 10 pages auto-switch to async. Force it with "forceAsync": true. When enabled, the enrichments.summary field populates if "summarize": true.

Recommended

Webhooks

Set webhookUrl — PeterParser POSTs the completed result to your endpoint.

curl -X POST https://api.peterparser.ai/v2/documents \
  -H "X-API-Key: pp_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf",
    "forceAsync": true,
    "summarize": true,
    "webhookUrl": "https://your-server.com/api/webhooks/pp"
  }'

Server-Sent Events (SSE)

One persistent connection streams events for all your jobs. Events: job_completed, job_failed, heartbeat (15s), stream_end.

bash
curl -N -H "X-API-Key: pp_live_your_key" \
  "https://api.peterparser.ai/v2/events?ttl=600"

# Catch up on missed events:
curl -H "X-API-Key: pp_live_your_key" \
  "https://api.peterparser.ai/v2/events/history?limit=50"

Polling

Check job status with the jobId.

bash
curl https://api.peterparser.ai/v2/documents/jobs/{job_id} \
  -H "X-API-Key: pp_live_your_key"

Summary add-on

When "summarize": true, the enrichments.summary field populates with an AI-generated summary. Surcharge: $0.005 flat.

json
// Response — enrichments.summary populates when summarize is true
"enrichments": {
  "summary": "This invoice from Acme Corp totals $1,500.00 for 10 Widget Pro units at $150 each, with $135 tax. Due date is March 15, 2026. Payment terms: Net 30."
}

Vision Preprocessing

When "preprocessing": true, Gemini vision extracts text first, then chunks are processed for structured extraction. This 2-step pipeline improves accuracy on complex documents. Mandatory for legal_timeline.

json
{
  "url": "https://example.com/legal-filing.pdf",
  "documentType": "legal_timeline",
  "preprocessing": true
}

Errors & Status Codes

All errors return JSON with a detail field.

json
{ "detail": "Insufficient credits. Required: 0.10, available: 0.05" }
CodeMeaningWhat to do
200SuccessProcess the response
201CreatedResource created
400Bad RequestCheck request body and parameters
401UnauthorizedVerify your API key
402Payment RequiredTop up credits
403ForbiddenCheck IP whitelist
404Not FoundVerify endpoint URL or resource ID
422Validation ErrorCheck required fields and types
429Rate LimitedBack off and retry
500Server ErrorRetry with backoff

Rate Limits

Default: 100 requests/min per API key, 1 concurrent SSE connection. Configurable per key.

Ready to integrate?

See the full API reference for every v2 endpoint, parameter, and response.