PeterParser Docs
Parse any document into structured data with a single API call. Base URL: https://api.peterparser.ai/v2
Quickstart
First request in 60 seconds
API Reference
Every v2 endpoint documented
Use Cases
Production patterns
Quickstart
Get your API key
Sign up at peterparser.ai — you get 100 free credits instantly. Your key: pp_live_...
Parse your first document
curl -X POST https://api.peterparser.ai/v2/documents \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/invoice.pdf",
"documentType": "invoice"
}'Get structured data back
The response includes extracted data, content formats, enrichments, processing info, and usage. Enable summarize or chunking add-ons to populate enrichment fields.
{
"success": true,
"data": {
"invoice_number": "INV-2024-001",
"vendor": { "name": "Acme Corp", "address": "123 Main St" },
"total": 1500.00,
"tax": 135.00,
"line_items": [
{ "description": "Widget Pro", "quantity": 10, "unit_price": 150.00 }
]
},
"document": {
"type": "invoice",
"pages": 2,
"language": "en",
"filename": "invoice.pdf"
},
"content": {
"format": "json",
"json": "{ \"invoice_number\": \"INV-2024-001\", ... }",
"markdown": "# Invoice INV-2024-001\n...",
"text": "Invoice INV-2024-001\nVendor: Acme Corp\nTotal: $1,500.00"
},
"enrichments": {
"summary": null,
"chunks": null,
"grounding": [{ "field": "total", "value": 1500.00, "sourceText": "Total: $1,500.00", "page": 1, "confidence": 1.0 }],
"pii": null
},
"processing": {
"premiumModel": false,
"preprocessing": false,
"timeMs": 1250,
"cached": false
},
"usage": {
"pages": 2,
"documentType": "invoice",
"pricePerUnit": 0.01,
"totalCredits": 0.02,
"displayCost": "0.0200 credits",
"breakdown": { "base": 0.02, "premiumModelSurcharge": 0, "piiSurcharge": 0, "summarizeFee": 0 }
}
}Add-on fields: enrichments.summary, enrichments.chunks, and enrichments.pii are null by default. They populate when you enable the corresponding option. enrichments.grounding is always populated (grounding is forced on all requests).
Authentication
The v2 public API uses API key authentication. Include your key in the X-API-Key header. No CORS restrictions — call from any origin.
| Endpoints | Auth | CORS |
|---|---|---|
/health/* | None | Open |
/v2/pricing/* | None | Open |
/v2/documents/* | X-API-Key: pp_live_... | Open |
/v2/events/* | X-API-Key: pp_live_... | Open |
curl -X POST https://api.peterparser.ai/v2/documents \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/doc.pdf"}'IP Whitelisting
Each API key supports IP whitelisting. ["0.0.0.0"] allows all IPs (default). Restrict to specific IPs or CIDR ranges: ["10.0.0.0/8"]
Document Types
0 available types. Use "auto" to let the API detect the type automatically.
| Type | Price | Per | Description |
|---|
Tip: Use GET /v2/documents/{type}/sample to preview output and GET /v2/documents/{type}/meta to see expected fields.
Legal Timeline (legal_timeline)
Extracts a structured chronology from legal documents. Returns a case_summary with parties, jurisdiction, and dates, plus a timeline array of events with date handling, event classification, display category, legal tags, confidence_level, and full citation with page numbers, source snippets, and summaries for provenance. Recommended: async mode for large filings. Requires preprocessing: true — Gemini vision extracts text first, then chunks are processed for structured extraction.
// Response — legal_timeline data structure
{
"case_summary": {
"case_name": "Smith v. Acme Industries",
"case_number": "2024-CV-04521",
"court": "U.S. District Court, SDNY",
"parties": {
"plaintiff": ["John Smith"],
"defendant": ["Acme Industries, Inc."],
"counsel": ["Jane Doe, Esq."]
},
"timeline_span": { "start": "2023-06-01", "end": "2024-11-20" }
},
"timeline": [
{
"event_id": "evt_001",
"date": {
"raw": "around June 2023",
"iso": "2023-06-01",
"precision": "approximate",
"circa": true
},
"title": "Product malfunction incident",
"event_type": "incident",
"category": "other",
"legal_significance": "critical",
"confidence_level": "medium",
"tags": ["negligence", "liability", "causation", "damages"],
"parties_involved": [
{ "name": "John Smith", "role": "plaintiff", "entity_type": "person" }
],
"amounts": [{ "value": 45000, "currency": "USD", "description": "Medical expenses" }],
"document_numbers": ["SMITH-00001"],
"citation": {
"page_number": 3,
"page_range": [3, 4],
"section": "Statement of Facts",
"paragraph": "¶ 14",
"source_snippet": "On or about June 2023, Plaintiff sustained injuries when the Acme Model X device malfunctioned during normal use at plaintiff's residence...",
"source_summary": "Describes the initial product malfunction incident that caused plaintiff's injuries at his Brooklyn residence.",
"bates_range": "SMITH-00045 to SMITH-00046"
},
"grounding": {
"charStart": 245,
"charEnd": 412,
"sourceText": "On or about June 2023, Plaintiff sustained injuries...",
"confidence": 0.98,
"page_number": 3,
"page_range": [3, 4],
"context_before": "...had used the device without issue for six months.",
"context_after": "Plaintiff was transported to Brooklyn Methodist Hospital..."
},
"flags": { "is_key_event": true, "is_disputed": false, "needs_review": false }
}
]
}Dynamic Endpoint Config via /meta
The GET /v2/documents/{type}/meta endpoint now returns dynamic configuration set by admins: async_only, default_options, mandatory_options, and processing_modes.
// GET /v2/documents/legal_timeline/meta
{
"documentType": "legal_timeline",
"pricing": { "pricePerUnit": 0.10, "unit": "page" },
"expectedFields": ["case_summary", "timeline", "parties", "jurisdiction", "key_dates", "timeline_span", "category", "tags", "confidence_level", "citation", "grounding"],
"asyncOnly": false,
"defaultOptions": {},
"mandatoryOptions": {},
"processingModes": ["sync", "async"],
"recommendedOptions": { "outputFormat": "json", "summarize": true, "premiumModel": true },
"mandatoryOptions": { "preprocessing": true }
}Input Methods
Three ways to send documents. All go through the same pipeline.
Pass a public URL. PeterParser downloads and parses it server-side.
{ "url": "https://example.com/document.pdf", "documentType": "auto" }Encode the file as base64. Include filename for better type detection.
{
"base64": "JVBERi0xLjQK...",
"filename": "invoice.pdf",
"contentType": "application/pdf",
"documentType": "invoice"
}Multipart upload. Best for direct file uploads.
curl -X POST https://api.peterparser.ai/v2/documents/upload \
-H "X-API-Key: pp_live_your_key" \
-F "file=@invoice.pdf" \
-F "documentType=invoice" \
-F "outputFormat=json"Output Formats
The outputs object in the response always contains json, markdown, and text representations. Set outputFormat to control the primary format.
jsonStructured JSON with typed fields. Default.
markdownClean Markdown preserving tables and headings.
textPlain text, no formatting.
htmlHTML preserving document structure.
csvCSV for tabular data extraction.
xmlXML with structured tags.
// The "outputs" object in every response:
"outputs": {
"json": "{ \"invoice_number\": \"INV-001\", ... }",
"markdown": "# Invoice INV-001\n\n| Item | Amount |\n|---|---|\n| Widget | $500 |",
"text": "Invoice INV-001\nItem: Widget — $500"
}Custom Output Templates
Define a custom extraction schema. The API returns data matching your structure exactly.
{
"outputTemplate": {
"vendor_name": "string",
"total_amount": "number",
"due_date": "string (YYYY-MM-DD)",
"line_items": [{ "description": "string", "amount": "number" }]
}
}PII Detection & Masking
Detect and mask PII in a single pass. Surcharge: $0.002/page. When enabled, the enrichments.pii field populates in the response.
Supported PII types
// Request
{
"pii": {
"detect": true,
"mask": true,
"maskChar": "*",
"types": ["ssn", "credit_card"]
}
}// Response — enrichments.pii populates when pii.detect is true
"enrichments": {
"pii": [
{ "type": "ssn", "value": "***-**-1234", "page": 1, "confidence": 0.95 },
{ "type": "credit_card", "value": "****-****-****-5678", "page": 2, "confidence": 0.95 }
]
}Chunking (RAG)
Split documents into chunks for vector store ingestion. When enabled, the enrichments.chunks array populates in the response. Three strategies:
semanticTopic-based splitting that respects content boundaries
fixedFixed character-count chunks with configurable overlap
sentenceSplit on sentence boundaries for natural breaks
// Request
{
"chunking": {
"enabled": true,
"maxChunkSize": 1500,
"overlap": 200,
"strategy": "semantic"
}
}
// Response — enrichments.chunks populates when chunking.enabled is true
"enrichments": {
"chunks": [
{ "id": "chunk_0", "content": "Invoice INV-2024-001 issued by Acme Corp...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "header" } },
{ "id": "chunk_1", "content": "Line items: Widget Pro x10 at $150.00...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "line_items" } }
]
}Source Grounding
Source grounding is always enabled. The enrichments.grounding array is populated for every request with references linking extracted values to their source text.
// Response — enrichments.grounding is always populated
"enrichments": {
"grounding": [
{
"field": "total",
"value": 1500.00,
"sourceText": "Total: $1,500.00",
"page": 3,
"confidence": 1.0
},
{
"field": "vendor.name",
"value": "Acme Corp",
"sourceText": "issued by Acme Corp, Inc.",
"page": 1,
"confidence": 0.98
}
]
}Batch Processing
Submit up to 50 files in a single request. Batch jobs are always processed asynchronously — poll the batch status endpoint or use webhooks to get results.
Submit a Batch
POST /v2/documents/batch — Send multiple files with shared parsing options. Returns a batchId for tracking.
curl -X POST https://api.peterparser.ai/v2/documents/batch \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"files": [
{ "url": "https://example.com/invoice1.pdf", "documentType": "invoice" },
{ "url": "https://example.com/invoice2.pdf", "documentType": "invoice" },
{ "url": "https://example.com/receipt.png", "documentType": "receipt" }
],
"outputFormat": "json",
"summarize": true,
"webhookUrl": "https://your-server.com/api/webhooks/batch"
}'Response
{
"batchId": "batch_abc123",
"status": "queued",
"totalFiles": 3,
"createdAt": "2026-03-07T12:00:00Z"
}Check Batch Status
GET /v2/documents/batch/{batchId} — Poll for progress. Each file reports its own status. When all files finish, the batch status becomes completed.
curl https://api.peterparser.ai/v2/documents/batch/batch_abc123 \
-H "X-API-Key: pp_live_your_key"Response
{
"batchId": "batch_abc123",
"status": "processing",
"totalFiles": 3,
"completed": 2,
"failed": 0,
"createdAt": "2026-03-07T12:00:00Z",
"files": [
{
"index": 0,
"status": "completed",
"jobId": "job_001",
"result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
},
{
"index": 1,
"status": "completed",
"jobId": "job_002",
"result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
},
{
"index": 2,
"status": "processing",
"jobId": "job_003",
"result": null
}
]
}Limits & Notes
- Maximum 50 files per batch request.
- Each file in the batch supports the same options as a single
POST /v2/documentsrequest. - Credits are charged per file as each one completes.
- Set
webhookUrlon the batch to receive a single callback when the entire batch finishes.
Async Processing & Real-Time Events
Documents over 10 pages auto-switch to async. Force it with "forceAsync": true. When enabled, the enrichments.summary field populates if "summarize": true.
Webhooks
Set webhookUrl — PeterParser POSTs the completed result to your endpoint.
curl -X POST https://api.peterparser.ai/v2/documents \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/report.pdf",
"forceAsync": true,
"summarize": true,
"webhookUrl": "https://your-server.com/api/webhooks/pp"
}'Server-Sent Events (SSE)
One persistent connection streams events for all your jobs. Events: job_completed, job_failed, heartbeat (15s), stream_end.
curl -N -H "X-API-Key: pp_live_your_key" \
"https://api.peterparser.ai/v2/events?ttl=600"
# Catch up on missed events:
curl -H "X-API-Key: pp_live_your_key" \
"https://api.peterparser.ai/v2/events/history?limit=50"Polling
Check job status with the jobId.
curl https://api.peterparser.ai/v2/documents/jobs/{job_id} \
-H "X-API-Key: pp_live_your_key"Summary add-on
When "summarize": true, the enrichments.summary field populates with an AI-generated summary. Surcharge: $0.005 flat.
// Response — enrichments.summary populates when summarize is true
"enrichments": {
"summary": "This invoice from Acme Corp totals $1,500.00 for 10 Widget Pro units at $150 each, with $135 tax. Due date is March 15, 2026. Payment terms: Net 30."
}Vision Preprocessing
When "preprocessing": true, Gemini vision extracts text first, then chunks are processed for structured extraction. This 2-step pipeline improves accuracy on complex documents. Mandatory for legal_timeline.
{
"url": "https://example.com/legal-filing.pdf",
"documentType": "legal_timeline",
"preprocessing": true
}Errors & Status Codes
All errors return JSON with a detail field.
{ "detail": "Insufficient credits. Required: 0.10, available: 0.05" }| Code | Meaning | What to do |
|---|---|---|
200 | Success | Process the response |
201 | Created | Resource created |
400 | Bad Request | Check request body and parameters |
401 | Unauthorized | Verify your API key |
402 | Payment Required | Top up credits |
403 | Forbidden | Check IP whitelist |
404 | Not Found | Verify endpoint URL or resource ID |
422 | Validation Error | Check required fields and types |
429 | Rate Limited | Back off and retry |
500 | Server Error | Retry with backoff |
Rate Limits
Default: 100 requests/min per API key, 1 concurrent SSE connection. Configurable per key.
Ready to integrate?
See the full API reference for every v2 endpoint, parameter, and response.