PDF Pal (2024): Chat with PDFs using RAG
A SaaS web app that lets users upload PDFs and ask questions in natural language using Retrieval-Augmented Generation (RAG) over document chunks.
Problem
PDFs are hard to search semantically, time-consuming to read, and don't support interactive Q&A grounded in the document.
Solution
A SaaS application where users: (1) Upload a PDF, (2) Ask questions in natural language, (3) Receive contextual answers grounded in the PDF using RAG.
Tech Stack
Frontend
Backend
Database
Auth
AI
File storage
Payments
Core Data Flows
Upload & Processing Pipeline
- UploadThing receives the PDF
- onUploadComplete triggers: (1) Create file record in DB with PROCESSING status, (2) Extract text pages via PDFLoader, (3) Generate one embedding per page via OpenAI (page-level vectors), (4) Store embeddings in Pinecone (namespace per file), (5) Update status to SUCCESS or FAILED
- Each PDF uses its own Pinecone namespace for isolation
- Page-level vectors keep every answer traceable to a specific source page
- Status tracking supports real-time UI feedback
Chat / RAG Pipeline
- User asks a question
- POST /api/message: (1) Save user message to DB, (2) Embed question, (3) Similarity search Pinecone (top 4 pages), (4) Build prompt including retrieved context + recent message history + current question, (5) Stream OpenAI response to client, (6) Save AI response to DB
Database Design
Three core entities: User (email + PayPal subscription fields), File (uploadStatus + URL/key + relations), and Message (text + isUserMessage + relations). UploadStatus enum tracks the pipeline: PENDING, PROCESSING, SUCCESS, FAILED.
Architecture Highlights
- End-to-end type safety via tRPC
- Vector isolation via Pinecone namespace per file
- Streaming UX for responsiveness
- Upload status polling with automatic UI updates
- PostgreSQL for relational data + Pinecone for similarity search
- Edge-ready Vercel deployment configuration
Key Features
PDF Viewer
- Page navigation
- Zoom (100%-300%)
- Rotation
- Fullscreen
- Responsive split-pane layout
Chat
- Streaming responses
- Infinite scroll history
- Markdown rendering
- Optimistic updates
- Context-aware answers
Subscription Tiers
- Free: 5 files, 50 pages/PDF
- Basic: $4.99/mo, 20 files, 100 pages/PDF
- Standard: $9.99/mo, 50 files, 400 pages/PDF
- Premium: $19.99/mo, 120 files, 1000 pages/PDF
Key Design Decisions
- Used Pinecone namespaces per document instead of a shared index. Eliminates cross-document leakage and simplifies deletion without index-wide operations
- Chose tRPC over REST for the API layer. End-to-end type safety from database to UI eliminates an entire class of integration bugs
- Implemented streaming responses from OpenAI rather than waiting for completion. Improves perceived latency for long answers
- Built a split-pane layout (chat + PDF viewer with zoom, rotate, fullscreen). Users need to verify AI answers against the source document
- Embedded one vector per PDF page instead of character-chunking. Keeps every retrieved answer traceable to a source page and makes per-file deletion trivial
- Chose Clerk over custom auth. Authentication is not a differentiator for this product
Tradeoffs
- Per-document Pinecone namespaces limit cross-document querying but prevent the most common RAG failure mode: context bleed between unrelated documents
- tRPC couples frontend and backend tightly, making it harder to expose a public API later, but the type safety benefits outweigh this for a SaaS product
- Streaming adds complexity to error handling and message persistence, but the UX improvement justifies it
- PayPal over Stripe limits payment method flexibility but provides broader international coverage for the target user base
Outcome
Production SaaS live at pdfpal.enkambale.com. Users upload a PDF and get streaming, context-grounded AI answers within seconds. Per-document vector isolation prevents the most common RAG failure (cross-document context bleed). 4 subscription tiers from Free through $19.99/mo via PayPal.
Lessons Learned
- RAG quality is determined by chunking and isolation strategy, not model selection
- Streaming improves perceived UX quality significantly. The difference between responsive and broken is often just progressive rendering
- Vector namespaces per document are non-negotiable for multi-tenant RAG. Cross-document leakage destroys user trust
- Shipping a full SaaS (auth, billing, file storage, AI, UX) reveals system constraints that no amount of prototyping can surface