Chapter 6 - Modern AI Systems: RAG, Agents, and Glue Code

The Crux

Models alone are useless. Real AI systems are models + data pipelines + retrieval + guardrails + monitoring + glue code. This chapter is about engineering AI into production, not just training models.

Why Models Alone Are Useless

You've trained a great model. Congratulations. Now what?

Reality:

The model needs to integrate with existing systems (databases, APIs, user interfaces)
Users don't send perfectly formatted inputs
The model drifts as the world changes
You need to monitor failures, log predictions, retrain periodically
You need to handle errors gracefully (what if the API is down?)

The model is 10% of the system. The other 90% is infrastructure.

RAG: Retrieval-Augmented Generation

LLMs hallucinate because they rely on memorized training data. What if we give them access to external knowledge?

The Idea

Instead of asking the LLM to answer directly:

Retrieve relevant documents from a database
Augment the prompt with retrieved information
Generate the answer based on retrieved context

Example:

User: "What's the return policy?"
System retrieves: Company policy doc mentioning "30-day returns"
Prompt: "Based on this policy: [retrieved text], answer: What's the return policy?"
LLM: "We offer 30-day returns."

Why It Works

The LLM doesn't need to memorize every fact. It just needs to read context and extract answers-something LLMs are good at.

Architecture

Document store: Database of knowledge (vector database, Elasticsearch, etc.)
Embedding model: Convert queries and documents to vectors
Retrieval: Find top-k most similar documents to the query (cosine similarity)
LLM: Generate answer given query + retrieved docs

When to Use RAG vs Fine-Tuning

RAG:

Knowledge changes frequently (e.g., product docs updated weekly)
You need to cite sources
You have limited GPU resources

Fine-tuning:

Knowledge is stable
You want the model to internalize a style or domain-specific reasoning
You have labeled data and compute

Often, you use both: fine-tune for style/domain, RAG for up-to-date facts.

← Chapter 5 - Transformers & LLMs: Attention Changed Everything1 / 6