PeterParser Docs
Parse any document into structured data with a single API call. Base URL: https://api.peterparser.ai/v2
Quickstart
First request in 60 seconds
API Reference
Every v2 endpoint documented
Use Cases
Production patterns
Quickstart
Get your API key
Sign up at peterparser.ai — you get 100 free credits instantly. Your key: pp_live_...
Parse your first document
curl -X POST https://api.peterparser.ai/v2/documents \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/invoice.pdf",
"documentType": "invoice"
}'Get structured data back
The response includes extracted data, content formats, enrichments, processing info, and usage. Enable summarize or chunking add-ons to populate enrichment fields.
{
"success": true,
"data": {
"invoice_number": "INV-2024-001",
"issue_date": "2024-01-15",
"vendor": {
"name": "Acme Corp",
"address": { "raw": "123 Main St, Austin, TX 78701, USA", "street": "123 Main St", "city": "Austin", "state": "TX", "postal_code": "78701", "country": "USA", "country_code": "US" },
"phone": "+15125550142"
},
"customer": {
"name": "John Doe",
"address": { "raw": "789 Customer Rd, Austin, TX 78702, USA", "street": "789 Customer Rd", "city": "Austin", "state": "TX", "postal_code": "78702", "country": "USA", "country_code": "US" }
},
"subtotal": 1500.00,
"tax_amount": 135.00,
"total": 1635.00,
"line_items": [
{ "description": "Widget Pro", "quantity": 10, "unit_price": 150.00, "total": 1500.00 }
]
},
"document": {
"type": "invoice",
"pages": 2,
"language": "en",
"filename": "invoice.pdf"
},
"content": {
"format": "json",
"json": "{ \"invoice_number\": \"INV-2024-001\", ... }",
"markdown": "# Invoice INV-2024-001\n...",
"text": "Invoice INV-2024-001\nVendor: Acme Corp\nTotal: $1,500.00"
},
"enrichments": {
"summary": null,
"chunks": null,
"grounding": [{ "field": "total", "value": 1500.00, "sourceText": "Total: $1,500.00", "page": 1, "confidence": 1.0 }],
"pii": null
},
"processing": {
"premiumModel": false,
"preprocessing": false,
"timeMs": 1250,
"cached": false
},
"usage": {
"pages": 2,
"documentType": "invoice",
"pricePerUnit": 0.01,
"totalCredits": 0.02,
"displayCost": "0.0200 credits",
"breakdown": { "base": 0.02, "premiumModelSurcharge": 0, "piiSurcharge": 0, "summarizeFee": 0 }
}
}Add-on fields: enrichments.summary, enrichments.chunks, and enrichments.pii are null by default. They populate when you enable the corresponding option. enrichments.grounding is always populated (grounding is forced on all requests).
Authentication
The v2 public API uses API key authentication. Include your key in the X-API-Key header. No CORS restrictions — call from any origin.
| Endpoints | Auth | CORS |
|---|---|---|
/health/* | None | Open |
/v2/pricing/* | None | Open |
/v2/documents/* | X-API-Key: pp_live_... | Open |
/v2/events/* | X-API-Key: pp_live_... | Open |
curl -X POST https://api.peterparser.ai/v2/documents \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/doc.pdf"}'IP Whitelisting
Each API key supports IP whitelisting. ["0.0.0.0"] allows all IPs (default). Restrict to specific IPs or CIDR ranges: ["10.0.0.0/8"]
Document Types
0 available types. Use "auto" to let the API detect the type automatically.
| Type | Price | Per | Description |
|---|
Tip: Use GET /v2/documents/{type}/sample to preview output and GET /v2/documents/{type}/meta to see expected fields.
Purchase Order (purchase_order)
Extracts all purchase order fields including line items, vendor and buyer details with structured addresses, totals, payment terms, and approval status. Structured addresses include street, city, state, postal_code, country, and country_code. All six output formats (JSON, Markdown, Text, HTML, XML, CSV) are supported.
// Response — purchase_order data structure
{
"purchase_order_number": "PO-2026-00481",
"po_date": "2026-05-12",
"delivery_date": "2026-06-02",
"vendor": {
"name": "Acme Industrial Supplies",
"address": { "raw": "123 Vendor Way, Houston, TX 77002, USA", "street": "123 Vendor Way",
"city": "Houston", "state": "TX", "postal_code": "77002", "country": "USA", "country_code": "US" },
"tax_id": "12-3456789", "email": "ar@acme.example", "phone": "+17135551234"
},
"buyer": {
"name": "UTS Consult LLC",
"address": { "raw": "456 Buyer Ave, Austin, TX 78701, USA", "street": "456 Buyer Ave",
"city": "Austin", "state": "TX", "postal_code": "78701", "country": "USA", "country_code": "US" },
"tax_id": "98-7654321", "email": "ap@utsconsult.example", "phone": "+15125555678"
},
"ship_to": {
"name": "UTS Dallas Warehouse",
"address": { "raw": "789 Warehouse Blvd, Dallas, TX 75201, USA", "street": "789 Warehouse Blvd",
"city": "Dallas", "state": "TX", "postal_code": "75201", "country": "USA", "country_code": "US" },
"attention": "Receiving Dock 4"
},
"shipping_method": "FedEx Ground",
"payment_terms": "Net 30",
"currency": "USD",
"line_items": [
{ "line_number": 1, "product_code": "SKU-44918", "description": "Industrial-grade 6\" pipe fitting",
"quantity": 50, "unit_of_measure": "EA", "unit_price": 18.90, "tax_rate": 0.0825,
"tax_amount": 77.96, "line_total": 945.00, "required_by": "2026-06-02" },
{ "line_number": 2, "product_code": "SKU-44921", "description": "Stainless steel coupler",
"quantity": 120, "unit_of_measure": "EA", "unit_price": 7.25, "tax_rate": 0.0825,
"tax_amount": 71.78, "line_total": 870.00, "required_by": "2026-06-02" }
],
"subtotal": 1815.00, "tax_amount": 149.74, "discount": 0, "shipping_cost": 0,
"total_amount": 1964.74,
"status_code": "Approved", "sign_of_approval": true,
"notes": "Deliver to Dock 4 between 8am-4pm M-F.",
"contract_reference": "MSA-2025-ACME",
"department_or_cost_center": "OPS-DAL-INVENTORY"
}Bill of Lading (bill_of_lading)
Extracts all bill of lading fields including cargo line items, shipper and consignee with structured addresses, container number, Incoterms, vessel/voyage info, and signatures. Incoterms are validated against ICC Incoterms 2020; container numbers are checked against ISO 6346 — invalid values appear in validation_warnings but do not fail the job.
// Response — bill_of_lading data structure
{
"bol_number": "MSKU-2026-BL-00412",
"carrier_name": "Maersk Line",
"carrier_scac": "MAEU",
"container_number": "MSKU1234568",
"incoterms": "FOB",
"issue_date": "2026-05-10",
"shipper": {
"name": "Shanghai Precision Parts Co., Ltd.",
"address": { "raw": "88 Pudong Ave, Shanghai", "city": "Shanghai", "country_code": "CN" },
"port_of_loading": "Port of Shanghai", "port_of_loading_code": "CNSHA"
},
"consignee": {
"name": "UTS Consult LLC",
"address": { "raw": "456 Buyer Ave, Austin, TX 78701", "city": "Austin", "state": "TX", "country_code": "US" },
"port_of_discharge": "Port of Los Angeles", "port_of_discharge_code": "USLAX"
},
"line_items": [
{ "quantity": 200, "description": "Precision machined steel fittings, Grade 316L",
"hs_code": "7307.99.5060", "gross_weight": 1850, "gross_weight_unit": "KG" }
],
"gross_cargo_weight": 1850, "gross_cargo_weight_unit": "KG",
"freight_terms": "Prepaid", "freight_charges": 1250, "currency": "USD"
}Resume / CV (resume)
Extracts candidate details, contact info, work experience, education, and skills from resumes and CVs. Names are split into first/middle/last while keeping the original ordering in name.raw; addresses are structured; phone numbers are normalized to E.164 in contact.phones[].e164; and skills are classified by category in skills_normalized. Priced per document ($0.05), not per page.
// Response — resume data structure
{
"candidate": {
"name": { "first": "Hugo", "last": "Christensen", "raw": "Hugo Christensen" },
"headline": "Technical Leader, Online Solution team",
"summary": "Technical leader with 11 years of experience…",
"location": { "raw": "Melbourne, VIC, Australia", "city": "Melbourne", "state": "VIC", "country_code": "AU" }
},
"contact": {
"emails": ["hhchristensen@outlook.com"],
"phones": [{ "raw": "+61 458 023 928", "e164": "+61458023928", "type": "mobile" }],
"websites": [{ "url": "https://linkedin.com/in/hugochristensen", "type": "linkedin" }]
},
"work_experience": [
{ "organization": "Bank of Melbourne", "job_title": "Technical Leader",
"start_date": "2014-01", "end_date": null, "is_current": true,
"highlights": ["Lead developer team", "Lead transition from WCM to AEM"] }
],
"education": [
{ "institution": "Monash University", "degree": "BS, Computer science and technology",
"degree_level": "Bachelor", "start_date": "2001", "end_date": "2005" }
],
"skills": ["AngularJS", "React", "C#", ".NET Framework", "SQL Server", "AWS"],
"skills_normalized": [
{ "raw": "React", "id": null, "label": "React", "category": "framework" }
],
"total_years_experience": 11
}Legal Timeline (legal_timeline)
Extracts a structured chronology from legal documents. Returns a case_summary with parties, jurisdiction, and dates, plus a timeline array of events with date handling, event classification, display category, legal tags, confidence_level, and full citation with page numbers, source snippets, and summaries for provenance. Recommended: async mode for large filings. Requires preprocessing: true — Gemini vision extracts text first, then chunks are processed for structured extraction.
// Response — legal_timeline data structure
{
"case_summary": {
"case_name": "Smith v. Acme Industries",
"case_number": "2024-CV-04521",
"court": "U.S. District Court, SDNY",
"parties": {
"plaintiff": ["John Smith"],
"defendant": ["Acme Industries, Inc."],
"counsel": ["Jane Doe, Esq."]
},
"timeline_span": { "start": "2023-06-01", "end": "2024-11-20" }
},
"timeline": [
{
"event_id": "evt_001",
"date": {
"raw": "around June 2023",
"iso": "2023-06-01",
"precision": "approximate",
"circa": true
},
"title": "Product malfunction incident",
"event_type": "incident",
"category": "other",
"legal_significance": "critical",
"confidence_level": "medium",
"tags": ["negligence", "liability", "causation", "damages"],
"parties_involved": [
{ "name": "John Smith", "role": "plaintiff", "entity_type": "person" }
],
"amounts": [{ "value": 45000, "currency": "USD", "description": "Medical expenses" }],
"document_numbers": ["SMITH-00001"],
"citation": {
"page_number": 3,
"page_range": [3, 4],
"section": "Statement of Facts",
"paragraph": "¶ 14",
"source_snippet": "On or about June 2023, Plaintiff sustained injuries when the Acme Model X device malfunctioned during normal use at plaintiff's residence...",
"source_summary": "Describes the initial product malfunction incident that caused plaintiff's injuries at his Brooklyn residence.",
"bates_range": "SMITH-00045 to SMITH-00046"
},
"grounding": {
"charStart": 245,
"charEnd": 412,
"sourceText": "On or about June 2023, Plaintiff sustained injuries...",
"confidence": 0.98,
"page_number": 3,
"page_range": [3, 4],
"context_before": "...had used the device without issue for six months.",
"context_after": "Plaintiff was transported to Brooklyn Methodist Hospital..."
},
"flags": { "is_key_event": true, "is_disputed": false, "needs_review": false }
}
]
}Dynamic Endpoint Config via /meta
The GET /v2/documents/{type}/meta endpoint now returns dynamic configuration set by admins: async_only, default_options, mandatory_options, and processing_modes.
// GET /v2/documents/legal_timeline/meta
{
"documentType": "legal_timeline",
"pricing": { "pricePerUnit": 0.10, "unit": "page" },
"expectedFields": ["case_summary", "timeline", "parties", "jurisdiction", "key_dates", "timeline_span", "category", "tags", "confidence_level", "citation", "grounding"],
"asyncOnly": false,
"defaultOptions": {},
"mandatoryOptions": {},
"processingModes": ["sync", "async"],
"recommendedOptions": { "outputFormat": "json", "summarize": true, "premiumModel": true },
"mandatoryOptions": { "preprocessing": true }
}Input Methods
Three ways to send documents. All go through the same pipeline.
Pass a public URL. PeterParser downloads and parses it server-side.
{ "url": "https://example.com/document.pdf", "documentType": "auto" }Encode the file as base64. Include filename for better type detection.
{
"base64": "JVBERi0xLjQK...",
"filename": "invoice.pdf",
"contentType": "application/pdf",
"documentType": "invoice"
}Multipart upload. Best for direct file uploads.
curl -X POST https://api.peterparser.ai/v2/documents/upload \
-H "X-API-Key: pp_live_your_key" \
-F "file=@invoice.pdf" \
-F "documentType=invoice" \
-F "outputFormat=json"Output Formats
The outputs object in the response always contains json, markdown, and text representations. Set outputFormat to control the primary format.
jsonStructured JSON with typed fields. Default.
markdownClean Markdown preserving tables and headings.
textPlain text, no formatting.
htmlHTML preserving document structure.
csvCSV for tabular data extraction.
xmlXML with structured tags.
// The "outputs" object in every response:
"outputs": {
"json": "{ \"invoice_number\": \"INV-001\", ... }",
"markdown": "# Invoice INV-001\n\n| Item | Amount |\n|---|---|\n| Widget | $500 |",
"text": "Invoice INV-001\nItem: Widget — $500"
}Custom Output Templates
Define a custom extraction schema. The API returns data matching your structure exactly.
{
"outputTemplate": {
"vendor_name": "string",
"total_amount": "number",
"due_date": "string (YYYY-MM-DD)",
"line_items": [{ "description": "string", "amount": "number" }]
}
}PII Detection & Masking
Detect and mask PII in a single pass. Surcharge: $0.002/page. When enabled, the enrichments.pii field populates in the response.
Supported PII types
// Request
{
"pii": {
"detect": true,
"mask": true,
"maskChar": "*",
"types": ["ssn", "credit_card"]
}
}// Response — enrichments.pii populates when pii.detect is true
"enrichments": {
"pii": [
{ "type": "ssn", "value": "***-**-1234", "page": 1, "confidence": 0.95 },
{ "type": "credit_card", "value": "****-****-****-5678", "page": 2, "confidence": 0.95 }
]
}Chunking (RAG)
Split documents into chunks for vector store ingestion. When enabled, the enrichments.chunks array populates in the response. Three strategies:
semanticTopic-based splitting that respects content boundaries
fixedFixed character-count chunks with configurable overlap
sentenceSplit on sentence boundaries for natural breaks
// Request
{
"chunking": {
"enabled": true,
"maxChunkSize": 1500,
"overlap": 200,
"strategy": "semantic"
}
}
// Response — enrichments.chunks populates when chunking.enabled is true
"enrichments": {
"chunks": [
{ "id": "chunk_0", "content": "Invoice INV-2024-001 issued by Acme Corp...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "header" } },
{ "id": "chunk_1", "content": "Line items: Widget Pro x10 at $150.00...", "pageStart": 1, "pageEnd": 1, "metadata": { "section": "line_items" } }
]
}Source Grounding
Source grounding is always enabled. The enrichments.grounding array is populated for every request with references linking extracted values to their source text.
// Response — enrichments.grounding is always populated
"enrichments": {
"grounding": [
{
"field": "total",
"value": 1500.00,
"sourceText": "Total: $1,500.00",
"page": 3,
"confidence": 1.0
},
{
"field": "vendor.name",
"value": "Acme Corp",
"sourceText": "issued by Acme Corp, Inc.",
"page": 1,
"confidence": 0.98
}
]
}Batch Processing
Submit up to 50 files in a single request. Batch jobs are always processed asynchronously — poll the batch status endpoint or use webhooks to get results.
Submit a Batch
POST /v2/documents/batch — Send multiple files with shared parsing options. Returns a batchId for tracking.
curl -X POST https://api.peterparser.ai/v2/documents/batch \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"files": [
{ "url": "https://example.com/invoice1.pdf", "documentType": "invoice" },
{ "url": "https://example.com/invoice2.pdf", "documentType": "invoice" },
{ "url": "https://example.com/receipt.png", "documentType": "receipt" }
],
"outputFormat": "json",
"summarize": true,
"webhookUrl": "https://your-server.com/api/webhooks/batch"
}'Response
{
"batchId": "batch_abc123",
"status": "queued",
"totalFiles": 3,
"createdAt": "2026-03-07T12:00:00Z"
}Check Batch Status
GET /v2/documents/batch/{batchId} — Poll for progress. Each file reports its own status. When all files finish, the batch status becomes completed.
curl https://api.peterparser.ai/v2/documents/batch/batch_abc123 \
-H "X-API-Key: pp_live_your_key"Response
{
"batchId": "batch_abc123",
"status": "processing",
"totalFiles": 3,
"completed": 2,
"failed": 0,
"createdAt": "2026-03-07T12:00:00Z",
"files": [
{
"index": 0,
"status": "completed",
"jobId": "job_001",
"result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
},
{
"index": 1,
"status": "completed",
"jobId": "job_002",
"result": { "success": true, "data": { ... }, "document": { ... }, "content": { ... } }
},
{
"index": 2,
"status": "processing",
"jobId": "job_003",
"result": null
}
]
}Limits & Notes
- Maximum 50 files per batch request.
- Each file in the batch supports the same options as a single
POST /v2/documentsrequest. - Credits are charged per file as each one completes.
- Set
webhookUrlon the batch to receive a single callback when the entire batch finishes.
Async Processing & Real-Time Events
Documents over 10 pages auto-switch to async. Force it with "forceAsync": true. When enabled, the enrichments.summary field populates if "summarize": true.
Webhooks
Set webhookUrl — PeterParser POSTs the completed result to your endpoint.
curl -X POST https://api.peterparser.ai/v2/documents \
-H "X-API-Key: pp_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/report.pdf",
"forceAsync": true,
"summarize": true,
"webhookUrl": "https://your-server.com/api/webhooks/pp"
}'Server-Sent Events (SSE)
One persistent connection streams events for all your jobs. Events: job_completed, job_failed, heartbeat (15s), stream_end.
curl -N -H "X-API-Key: pp_live_your_key" \
"https://api.peterparser.ai/v2/events?ttl=600"
# Catch up on missed events:
curl -H "X-API-Key: pp_live_your_key" \
"https://api.peterparser.ai/v2/events/history?limit=50"Polling
Check job status with the jobId.
curl https://api.peterparser.ai/v2/documents/jobs/{job_id} \
-H "X-API-Key: pp_live_your_key"Summary add-on
When "summarize": true, the enrichments.summary field populates with an AI-generated summary. Surcharge: $0.005 flat.
// Response — enrichments.summary populates when summarize is true
"enrichments": {
"summary": "This invoice from Acme Corp totals $1,500.00 for 10 Widget Pro units at $150 each, with $135 tax. Due date is March 15, 2026. Payment terms: Net 30."
}Vision Preprocessing
When "preprocessing": true, Gemini vision extracts text first, then chunks are processed for structured extraction. This 2-step pipeline improves accuracy on complex documents. Mandatory for legal_timeline.
{
"url": "https://example.com/legal-filing.pdf",
"documentType": "legal_timeline",
"preprocessing": true
}Errors & Status Codes
All errors return JSON with a detail field.
{ "detail": "Insufficient credits. Required: 0.10, available: 0.05" }| Code | Meaning | What to do |
|---|---|---|
200 | Success | Process the response |
201 | Created | Resource created |
400 | Bad Request | Check request body and parameters |
401 | Unauthorized | Verify your API key |
402 | Payment Required | Top up credits |
403 | Forbidden | Check IP whitelist |
404 | Not Found | Verify endpoint URL or resource ID |
422 | Validation Error | Check required fields and types |
429 | Rate Limited | Back off and retry |
500 | Server Error | Retry with backoff |
Rate Limits
Default: 100 requests/min per API key, 1 concurrent SSE connection. Configurable per key.
Ready to integrate?
See the full API reference for every v2 endpoint, parameter, and response.