February 27, 2026·10 min read
How to Build a RAG Pipeline with PeterParser in 10 Minutes
Most RAG tutorials skip the hardest part: getting clean, structured text from PDFs. They assume you have text. You don't. You have scanned invoices, multi-column reports, and tables that break every naive text extractor.
This tutorial builds a production RAG pipeline in 10 minutes using PeterParser for parsing + chunking, and any vector store for retrieval.
Step 1: Parse Your Documents
import httpx
import json
API_KEY = "pp_live_your_key"
BASE = "https://api.peterparser.ai/v1"
def parse_document(file_path: str) -> dict:
"""Parse a PDF and get chunked output."""
with open(file_path, "rb") as f:
files = {"file": (file_path, f, "application/pdf")}
data = {
"document_type": "auto",
"output_format": "markdown",
"chunking_enabled": "true",
"chunk_size": "1500",
}
resp = httpx.post(
f"{BASE}/parse/upload",
headers={"X-API-Key": API_KEY},
files=files,
data=data,
timeout=60,
)
return resp.json()PeterParser returns chunks with char offsets, plus document metadata (type, page count, language) that you can store alongside your embeddings for filtering.
Step 2: Embed and Store
from openai import OpenAI
import chromadb
openai = OpenAI()
chroma = chromadb.PersistentClient(path="./vectordb")
collection = chroma.get_or_create_collection("documents")
def ingest(parsed: dict):
"""Embed chunks and store in ChromaDB."""
chunks = parsed.get("chunks", [])
doc_meta = parsed.get("metadata", {})
texts = [c["content"] for c in chunks]
ids = [c["id"] for c in chunks]
# Batch embed
resp = openai.embeddings.create(
model="text-embedding-3-small",
input=texts,
)
embeddings = [e.embedding for e in resp.data]
# Store with metadata for filtered retrieval
metadatas = [{
"doc_type": doc_meta.get("document_type", "unknown"),
"filename": doc_meta.get("filename", ""),
"char_start": c["char_start"],
"char_end": c["char_end"],
} for c in chunks]
collection.add(
ids=ids,
embeddings=embeddings,
documents=texts,
metadatas=metadatas,
)Step 3: Query
def query_rag(question: str, n_results: int = 5) -> str:
"""Query the RAG pipeline."""
# Embed the question
q_resp = openai.embeddings.create(
model="text-embedding-3-small",
input=[question],
)
q_embedding = q_resp.data[0].embedding
# Retrieve relevant chunks
results = collection.query(
query_embeddings=[q_embedding],
n_results=n_results,
)
context = "\n\n---\n\n".join(results["documents"][0])
# Generate answer
completion = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer based on this context:\n\n{context}"},
{"role": "user", "content": question},
],
)
return completion.choices[0].message.content
# Usage
answer = query_rag("What was the total revenue in Q3?")
print(answer)Why PeterParser for RAG
- → Table preservation — PeterParser maintains table structure in markdown output. LLMs can reason about properly formatted tables.
- → Built-in chunking — No need for LangChain text splitters. Chunks respect document structure.
- → Document classification — Auto-detected doc type becomes metadata for filtered retrieval.
- → Fast lane option — Set
pre_processing: falsefor text-heavy docs. 10x faster, lower cost. - → One API call — Parse + chunk + classify + summarize. No chaining three different tools.
Production Tips
- Use async mode for batches. Submit all docs with
mode: async, then monitor via SSE stream. - Use the fast lane for text-heavy docs. Annual reports, whitepapers, and text-heavy PDFs don't need advanced table detection.
- Set chunk overlap to 200. This ensures context isn't lost at chunk boundaries.
- Store char offsets. When your LLM cites a chunk, you can link back to the exact position in the original document.