7. 10 HANDS-ON PROJECTS

Each project builds your skills progressively, from simple RAG to complex hybrid systems.

(These projects are where learning happens. Reading about RAG is easy. Building a system that actually works is hard. Each project will break in ways you didn't expect - PDFs with weird encoding, queries that return garbage, graphs that are too slow to query. That's the point. Fix the breakage and you'll understand the material. Skip the projects and you'll forget everything in a week.)

Project 1: Simple PDF RAG Chatbot

Goal: Build a basic RAG system that answers questions about PDF documents.

Skills Required:

PDF text extraction
Chunking
Embeddings
Vector search
Basic prompting

Architecture:

PDF → Extract Text → Chunk → Embed → Store in ChromaDB
User Query → Embed → Retrieve Chunks → LLM → Answer

Step-by-Step Tasks:

Extract text from 3-5 PDF documents (use PyPDF2)
Chunk text into 500-token segments with 50-token overlap
Generate embeddings using text-embedding-3-small
Store in ChromaDB with metadata (file_name, page_number)
Implement query function: embed query → retrieve top 5 chunks → generate answer
Add citation: include source file and page number

Evaluation Criteria:

✅ Correctly extracts text from PDFs
✅ Answers questions with relevant context
✅ Includes citations (file + page)
✅ Handles "I don't know" when answer not in docs

Dataset: Use 3-5 research papers from arXiv or your domain

← Chapter 8 - Practical Engineering Skills1 / 10