>_TheQuery

Chapter 10 of 16

Chapter 8 - Practical Engineering Skills1 / 10

7. 10 HANDS-ON PROJECTS

Each project builds your skills progressively, from simple RAG to complex hybrid systems.

(These projects are where learning happens. Reading about RAG is easy. Building a system that actually works is hard. Each project will break in ways you didn't expect - PDFs with weird encoding, queries that return garbage, graphs that are too slow to query. That's the point. Fix the breakage and you'll understand the material. Skip the projects and you'll forget everything in a week.)

Project 1: Simple PDF RAG Chatbot

Goal: Build a basic RAG system that answers questions about PDF documents.

Skills Required:

  • PDF text extraction
  • Chunking
  • Embeddings
  • Vector search
  • Basic prompting

Architecture:

PDF → Extract Text → Chunk → Embed → Store in ChromaDB
User Query → Embed → Retrieve Chunks → LLM → Answer

Step-by-Step Tasks:

  1. Extract text from 3-5 PDF documents (use PyPDF2)
  2. Chunk text into 500-token segments with 50-token overlap
  3. Generate embeddings using text-embedding-3-small
  4. Store in ChromaDB with metadata (file_name, page_number)
  5. Implement query function: embed query → retrieve top 5 chunks → generate answer
  6. Add citation: include source file and page number

Evaluation Criteria:

  • ✅ Correctly extracts text from PDFs
  • ✅ Answers questions with relevant context
  • ✅ Includes citations (file + page)
  • ✅ Handles "I don't know" when answer not in docs

Dataset: Use 3-5 research papers from arXiv or your domain


Chapter 8 - Practical Engineering Skills1 / 10