PDF Pal (2024): Chat with PDFs using RAG

A SaaS web app that lets users upload PDFs and ask questions in natural language using Retrieval-Augmented Generation (RAG) over document chunks.

View Live

Problem

PDFs are hard to search semantically, time-consuming to read, and don't support interactive Q&A grounded in the document.

Solution

A SaaS application where users: (1) Upload a PDF, (2) Ask questions in natural language, (3) Receive contextual answers grounded in the PDF using RAG.

Tech Stack

Frontend

Next.js 16 (App Router)

React 19

TypeScript

Tailwind CSS

Radix UI

Backend

Next.js API Routes

tRPC

Database

PostgreSQL + Prisma

Auth

Clerk

OpenAI (GPT-3.5-turbo + embeddings)

LangChain

Pinecone

File storage

UploadThing

Payments

PayPal Subscriptions API

Core Data Flows

Upload & Processing Pipeline

UploadThing receives the PDF
onUploadComplete triggers: (1) Create file record in DB with PROCESSING status, (2) Extract text pages via PDFLoader, (3) Generate one embedding per page via OpenAI (page-level vectors), (4) Store embeddings in Pinecone (namespace per file), (5) Update status to SUCCESS or FAILED
Each PDF uses its own Pinecone namespace for isolation
Page-level vectors keep every answer traceable to a specific source page
Status tracking supports real-time UI feedback

Chat / RAG Pipeline

User asks a question
POST /api/message: (1) Save user message to DB, (2) Embed question, (3) Similarity search Pinecone (top 4 pages), (4) Build prompt including retrieved context + recent message history + current question, (5) Stream OpenAI response to client, (6) Save AI response to DB

Database Design

Three core entities: User (email + PayPal subscription fields), File (uploadStatus + URL/key + relations), and Message (text + isUserMessage + relations). UploadStatus enum tracks the pipeline: PENDING, PROCESSING, SUCCESS, FAILED.

Architecture Highlights

End-to-end type safety via tRPC
Vector isolation via Pinecone namespace per file
Streaming UX for responsiveness
Upload status polling with automatic UI updates
PostgreSQL for relational data + Pinecone for similarity search
Edge-ready Vercel deployment configuration

Key Features

PDF Viewer

Page navigation
Zoom (100%-300%)
Rotation
Fullscreen
Responsive split-pane layout

Chat

Streaming responses
Infinite scroll history
Markdown rendering
Optimistic updates
Context-aware answers

Subscription Tiers

Free: 5 files, 50 pages/PDF
Basic: $4.99/mo, 20 files, 100 pages/PDF
Standard: $9.99/mo, 50 files, 400 pages/PDF
Premium: $19.99/mo, 120 files, 1000 pages/PDF

Key Design Decisions

Used Pinecone namespaces per document instead of a shared index. Eliminates cross-document leakage and simplifies deletion without index-wide operations
Chose tRPC over REST for the API layer. End-to-end type safety from database to UI eliminates an entire class of integration bugs
Implemented streaming responses from OpenAI rather than waiting for completion. Improves perceived latency for long answers
Built a split-pane layout (chat + PDF viewer with zoom, rotate, fullscreen). Users need to verify AI answers against the source document
Embedded one vector per PDF page instead of character-chunking. Keeps every retrieved answer traceable to a source page and makes per-file deletion trivial
Chose Clerk over custom auth. Authentication is not a differentiator for this product

Tradeoffs

Per-document Pinecone namespaces limit cross-document querying but prevent the most common RAG failure mode: context bleed between unrelated documents
tRPC couples frontend and backend tightly, making it harder to expose a public API later, but the type safety benefits outweigh this for a SaaS product
Streaming adds complexity to error handling and message persistence, but the UX improvement justifies it
PayPal over Stripe limits payment method flexibility but provides broader international coverage for the target user base

Outcome

Production SaaS live at pdfpal.enkambale.com. Users upload a PDF and get streaming, context-grounded AI answers within seconds. Per-document vector isolation prevents the most common RAG failure (cross-document context bleed). 4 subscription tiers from Free through $19.99/mo via PayPal.

Lessons Learned

RAG quality is determined by chunking and isolation strategy, not model selection
Streaming improves perceived UX quality significantly. The difference between responsive and broken is often just progressive rendering
Vector namespaces per document are non-negotiable for multi-tenant RAG. Cross-document leakage destroys user trust
Shipping a full SaaS (auth, billing, file storage, AI, UX) reveals system constraints that no amount of prototyping can surface

All Projects