← Back to Blog
February 27, 2026·10 min read

How to Build a RAG Pipeline with PeterParser in 10 Minutes

Most RAG tutorials skip the hardest part: getting clean, structured text from PDFs. They assume you have text. You don't. You have scanned invoices, multi-column reports, and tables that break every naive text extractor.

This tutorial builds a production RAG pipeline in 10 minutes using PeterParser for parsing + chunking, and any vector store for retrieval.

Step 1: Parse Your Documents

import httpx
import json

API_KEY = "pp_live_your_key"
BASE = "https://api.peterparser.ai/v1"

def parse_document(file_path: str) -> dict:
    """Parse a PDF and get chunked output."""
    with open(file_path, "rb") as f:
        files = {"file": (file_path, f, "application/pdf")}
        data = {
            "document_type": "auto",
            "output_format": "markdown",
            "chunking_enabled": "true",
            "chunk_size": "1500",
        }
        resp = httpx.post(
            f"{BASE}/parse/upload",
            headers={"X-API-Key": API_KEY},
            files=files,
            data=data,
            timeout=60,
        )
    return resp.json()

PeterParser returns chunks with char offsets, plus document metadata (type, page count, language) that you can store alongside your embeddings for filtering.

Step 2: Embed and Store

from openai import OpenAI
import chromadb

openai = OpenAI()
chroma = chromadb.PersistentClient(path="./vectordb")
collection = chroma.get_or_create_collection("documents")

def ingest(parsed: dict):
    """Embed chunks and store in ChromaDB."""
    chunks = parsed.get("chunks", [])
    doc_meta = parsed.get("metadata", {})

    texts = [c["content"] for c in chunks]
    ids = [c["id"] for c in chunks]

    # Batch embed
    resp = openai.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
    )
    embeddings = [e.embedding for e in resp.data]

    # Store with metadata for filtered retrieval
    metadatas = [{
        "doc_type": doc_meta.get("document_type", "unknown"),
        "filename": doc_meta.get("filename", ""),
        "char_start": c["char_start"],
        "char_end": c["char_end"],
    } for c in chunks]

    collection.add(
        ids=ids,
        embeddings=embeddings,
        documents=texts,
        metadatas=metadatas,
    )

Step 3: Query

def query_rag(question: str, n_results: int = 5) -> str:
    """Query the RAG pipeline."""
    # Embed the question
    q_resp = openai.embeddings.create(
        model="text-embedding-3-small",
        input=[question],
    )
    q_embedding = q_resp.data[0].embedding

    # Retrieve relevant chunks
    results = collection.query(
        query_embeddings=[q_embedding],
        n_results=n_results,
    )

    context = "\n\n---\n\n".join(results["documents"][0])

    # Generate answer
    completion = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n\n{context}"},
            {"role": "user", "content": question},
        ],
    )
    return completion.choices[0].message.content

# Usage
answer = query_rag("What was the total revenue in Q3?")
print(answer)

Why PeterParser for RAG

  • Table preservation — PeterParser maintains table structure in markdown output. LLMs can reason about properly formatted tables.
  • Built-in chunking — No need for LangChain text splitters. Chunks respect document structure.
  • Document classification — Auto-detected doc type becomes metadata for filtered retrieval.
  • Fast lane option — Set pre_processing: false for text-heavy docs. 10x faster, lower cost.
  • One API call — Parse + chunk + classify + summarize. No chaining three different tools.

Production Tips

  1. Use async mode for batches. Submit all docs with mode: async, then monitor via SSE stream.
  2. Use the fast lane for text-heavy docs. Annual reports, whitepapers, and text-heavy PDFs don't need advanced table detection.
  3. Set chunk overlap to 200. This ensures context isn't lost at chunk boundaries.
  4. Store char offsets. When your LLM cites a chunk, you can link back to the exact position in the original document.