Chapter 10 of 16
7. 10 HANDS-ON PROJECTS
Each project builds your skills progressively, from simple RAG to complex hybrid systems.
(These projects are where learning happens. Reading about RAG is easy. Building a system that actually works is hard. Each project will break in ways you didn't expect - PDFs with weird encoding, queries that return garbage, graphs that are too slow to query. That's the point. Fix the breakage and you'll understand the material. Skip the projects and you'll forget everything in a week.)
Project 1: Simple PDF RAG Chatbot
Goal: Build a basic RAG system that answers questions about PDF documents.
Skills Required:
- PDF text extraction
- Chunking
- Embeddings
- Vector search
- Basic prompting
Architecture:
PDF → Extract Text → Chunk → Embed → Store in ChromaDB
User Query → Embed → Retrieve Chunks → LLM → Answer
Step-by-Step Tasks:
- Extract text from 3-5 PDF documents (use PyPDF2)
- Chunk text into 500-token segments with 50-token overlap
- Generate embeddings using
text-embedding-3-small - Store in ChromaDB with metadata (file_name, page_number)
- Implement query function: embed query → retrieve top 5 chunks → generate answer
- Add citation: include source file and page number
Evaluation Criteria:
- ✅ Correctly extracts text from PDFs
- ✅ Answers questions with relevant context
- ✅ Includes citations (file + page)
- ✅ Handles "I don't know" when answer not in docs
Dataset: Use 3-5 research papers from arXiv or your domain