>_TheQuery
← All Guides

How to Build an AI Agent: Start Here

By Addy · April 30, 2026

Everyone is building agents. Most of them are building chatbots with extra steps.

The difference matters. A chatbot answers a question. An agent completes a job. You ask ChatGPT to draft an email - that is a chatbot. An agent reads your inbox, identifies which leads need follow-up, drafts personalized replies based on your past communication style, logs everything in your CRM, and flags anything that needs your actual attention. Same underlying technology. Completely different category of output.

The global agentic AI market crossed $9 billion in 2026. Enterprise deployments are returning an average 171% ROI according to Deloitte's 2026 State of AI report. 79% of US executives are already adopting agents in some form. The infrastructure has matured. The frameworks exist. The models are capable enough. What most people lack is a clear map of where to start and what each path actually involves.

This guide is that map.

What Every Agent Is Made Of

Before choosing a tool or framework, understand what you are actually building. Every AI agent - regardless of how it is built - has four components. These are not optional. They are the definition.

The first is the brain: a large language model that reasons, plans, and decides what to do next. Claude Sonnet 4.6, GPT-5.4 Mini, Gemini 3.1 Flash, DeepSeek V4-Flash - these are the models doing the thinking. The choice of model determines how well the agent handles ambiguity, how reliably it calls tools, and how much it costs per task.

The second is memory. Without memory, an agent has no context beyond the current conversation. Short-term memory is the context window - everything the agent can see right now. Long-term memory is an external store - a database the agent can read from and write to, so it remembers that a user prefers formal emails, or that a codebase uses a specific naming convention, or that a customer issue was escalated last Tuesday.

The third is tools. Tools are how an agent does things rather than just saying things. A tool can be a web search function, a database query, a code executor, a calendar API, a payment gateway, or a custom function you write yourself. Without tools, an agent is a very smart text generator. With tools, it is an actor in the world.

The fourth is a runtime: the loop that keeps the agent running until the task is complete. The agent receives a task, decides what to do, calls a tool, observes the result, decides what to do next, and continues until an exit condition is met. That loop is the agent. Everything else is configuration.

Think of it like hiring an employee. The model is their intelligence and judgment. Memory is their ability to remember your preferences and history. Tools are the software they have access to. The runtime is the working hours you give them to complete the task.

Method 1: Directly via API - Anthropic and OpenAI

The fastest way to understand agents is to build one with nothing but an API key and a few lines of code. No framework, no abstraction layer. Just the raw loop.

Anthropic's API is the cleanest place to start if you want to understand what is actually happening. The model supports native tool calling - you define a function in JSON, pass it to the API alongside your message, and Claude decides whether to call it based on the task. If it calls the tool, you execute the function in your own code, return the result to Claude, and the loop continues.

The simplest possible agent looks like this in Python:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "The city name"}
            },
            "required": ["city"]
        }
    }
]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

while True:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )

    if response.stop_reason == "end_turn":
        print(response.content[0].text)
        break

    # Claude wants to call a tool
    for block in response.content:
        if block.type == "tool_use":
            # Your tool execution logic here
            result = get_weather(block.input["city"])  

            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                }]
            })

That while loop is the agent runtime. The model runs until it decides it is done. The tool call and result are added to the message history so the model sees what happened. This pattern - observe, decide, act, repeat - is the foundation of every agent regardless of how many abstraction layers are built on top of it.

OpenAI's Agents SDK takes this further with a higher-level interface. Instead of managing the loop yourself, you define agents, tools, and handoffs, and the SDK handles the execution. The Agent class wraps a model with a set of instructions and tools. Handoffs let one agent delegate to another - a triage agent that receives all requests and routes to a specialist, for example.

from agents import Agent, Runner

weather_agent = Agent(
    name="Weather Agent",
    instructions="You help users get weather information. Use the weather tool.",
    tools=[get_weather_tool]
)

result = Runner.run_sync(weather_agent, "What's the weather in Tokyo?")
print(result.final_output)

The direct API approach is the right starting point for any developer. It forces you to understand what the agent is actually doing at each step. When something breaks in production - and it will - you will know exactly where in the loop to look.

When to use this: You are a developer. You want full control and visibility. You are building something custom that no framework template will cover.

Cost reality: You pay per token consumed across the entire loop. A complex multi-step task might use 10,000 to 50,000 tokens. At Claude Sonnet pricing of 0.30permillioninputtokensand0.30 per million input tokens and 1.50 per million output tokens, this is manageable. At Opus pricing, budget carefully.

Method 2: LangChain and LangGraph - The Framework Approach

LangChain is the most widely used agent framework in production. LangGraph, its graph-based extension, handles multi-agent workflows and stateful execution. If you search for AI agent tutorials, most of them use one of these two.

LangChain abstracts the tool-calling loop into a ReAct agent - Reason and Act. The model reasons about what to do, acts by calling a tool, observes the result, and reasons again. You define tools with a decorator and the framework handles the rest.

from langchain_anthropic import ChatAnthropic
from langchain.agents import create_react_agent, AgentExecutor
from langchain.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information about a topic."""
    # Your search implementation here
    return f"Results for: {query}"

llm = ChatAnthropic(model="claude-sonnet-4-6")
agent = create_react_agent(llm, [search_web], prompt)
executor = AgentExecutor(agent=agent, tools=[search_web])

result = executor.invoke({"input": "What happened in AI this week?"})

LangGraph extends this to stateful multi-agent systems. Instead of a linear chain, you build a graph where nodes are agents or functions, and edges define the flow between them. This is the right architecture for complex workflows - a research agent that gathers information, a writing agent that drafts content, a review agent that checks quality, and a coordinator that manages the whole pipeline.

from langgraph.graph import StateGraph

workflow = StateGraph(AgentState)
workflow.add_node("researcher", research_agent)
workflow.add_node("writer", writing_agent)
workflow.add_node("reviewer", review_agent)

workflow.add_edge("researcher", "writer")
workflow.add_conditional_edges("writer", should_revise, {
    "revise": "writer",
    "approve": "reviewer"
})

app = workflow.compile()

LangChain's strength is its library of pre-built integrations - databases, search engines, APIs, vector stores - and the size of the community around it. Its weakness is the abstraction layer. When something goes wrong, the error is often one level removed from where you need to look. Anthropic's own engineering team recommends starting with direct API calls before reaching for a framework, specifically because frameworks hide the underlying behavior that breaks in production.

When to use this: You want to move fast and your use case fits one of LangChain's existing patterns. You are building a RAG pipeline, a research agent, or a customer support bot. You have time to learn the framework's conventions.

Not ideal for: Highly custom workflows, performance-sensitive production systems, or anything where you need to fully understand every step.

Method 3: CrewAI - Multi-Agent Without the Graph

CrewAI takes a different approach from LangGraph. Instead of building a graph of agents, you define a crew - a team of agents with roles, goals, and a set of tasks to complete. The framework handles the coordination.

The mental model is deliberately human. You define a researcher, a writer, and an editor the same way you would define job descriptions for a small team. Each agent has a role, a goal, and a backstory that shapes how the model interprets its instructions. Tasks are assigned to agents. The crew runs until all tasks are complete.

from crewai import Agent, Task, Crew

researcher = Agent(
    role="AI Research Analyst",
    goal="Find accurate, up-to-date information on AI developments",
    backstory="You are a meticulous researcher who verifies every claim.",
    tools=[web_search_tool],
    llm="claude-sonnet-4-6"
)

writer = Agent(
    role="Technical Writer",
    goal="Turn research into clear, engaging articles",
    backstory="You write for developers who value precision over hype.",
    llm="claude-sonnet-4-6"
)

research_task = Task(
    description="Research the latest developments in AI agents this week",
    agent=researcher,
    expected_output="A structured summary of key developments with sources"
)

writing_task = Task(
    description="Write a 500-word article based on the research",
    agent=writer,
    expected_output="A complete article ready to publish"
)

crew = Crew(agents=[researcher, writer], tasks=[research_task, writing_task])
result = crew.kickoff()

CrewAI has become particularly popular for content pipelines, research workflows, and business process automation because the role-based framing maps naturally to how teams already think about their work. The SAP AI Core developer challenge uses CrewAI specifically for this reason - the framework is approachable without being shallow.

When to use this: You are automating a workflow that maps to human roles and handoffs. Content creation, research pipelines, customer service escalation flows. You want multi-agent coordination without building the coordination logic yourself.

Not ideal for: Highly dynamic workflows where the task structure is not known in advance. CrewAI works best when the shape of the work is predictable.

Method 4: Google Agent Development Kit (ADK)

Google's Agent Development Kit is the framework purpose-built for agents that run on Google Cloud infrastructure and integrate with Google's model and tool ecosystem. It shipped in 2025 and has matured into the recommended path for teams building on Vertex AI.

ADK's distinguishing feature is its native support for the Model Context Protocol (MCP) and Agent-to-Agent (A2A) communication standards. Where other frameworks require custom wiring for agent-to-agent handoffs and external tool connections, ADK standardizes these through Agent Cards - structured definitions that allow agents to discover each other's capabilities dynamically.

from google.adk.agents import Agent
from google.adk.tools import google_search, code_execution

research_agent = Agent(
    name="research_agent",
    model="gemini-3-flash",
    description="Researches topics and summarizes findings",
    instruction="""You are a research agent. When given a topic,
    search for recent information and produce a structured summary.""",
    tools=[google_search]
)

coding_agent = Agent(
    name="coding_agent",
    model="gemini-3-pro",
    description="Writes and executes code",
    instruction="You write clean Python code and verify it executes correctly.",
    tools=[code_execution],
    sub_agents=[research_agent]  # Can delegate research tasks
)

ADK's integration with Google Cloud means built-in access to Vertex AI model deployment, Google Search grounding, Cloud Storage for long-term memory, and BigQuery for structured data retrieval. For teams already in the Google ecosystem, this eliminates significant infrastructure work.

The multimodal Live API integration is ADK's strongest differentiator. Google's Agent Bake-Off found that agents handling vision tasks - processing images alongside text, generating visual outputs - performed dramatically better when vision was treated as a native input rather than bolted on. ADK makes this native rather than optional.

When to use this: You are building on Google Cloud. You need tight integration with Google Workspace, Google Search, or Vertex AI models. Your agent needs native multimodal capability.

Not ideal for: Teams not in the Google ecosystem. ADK's advantages are ecosystem-specific - outside that context, LangChain or direct API access will often be simpler.

Method 5: n8n - Agents Without Code

n8n is a workflow automation platform that has become one of the most capable no-code agent builders in production use. If you have used Zapier or Make, n8n is that category with one significant difference: it is open-source, self-hostable, and deep enough to handle complex multi-agent workflows that would break most no-code tools.

An n8n workflow is a visual graph. You drag nodes onto a canvas, connect them, and configure each node to perform an action - call an API, query a database, send an email, execute an AI model call. The AI Agent node wraps a language model with memory and tool calling without requiring you to write the loop yourself.

The platform supports over 600 pre-built integrations. An agent that monitors your inbox, qualifies leads against a scoring rubric, updates your CRM, and sends a Slack message when a high-value lead arrives can be built in n8n in under two hours with no code. Developers who have built the same pipeline in Python report the equivalent taking three days.

n8n also supports MCP servers as a tool source, which means agents built in n8n can access any MCP-compatible tool - including the growing ecosystem of developer tools, database connectors, and API wrappers being built around the protocol.

The self-hosting option is significant for privacy-sensitive use cases. Your agent workflows and the data they process can run entirely on your own infrastructure, with no data leaving your environment.

When to use this: You are not a developer, or you want to move faster than code allows. Business operations, content pipelines, CRM automation, internal tools. You need something a non-technical team member can maintain.

Not ideal for: Highly custom logic that does not fit visual workflow patterns. Performance-critical systems where n8n's overhead matters.

Method 6: Local Agents via Ollama and LangChain

Every method above sends your data to an external API. If that is a problem - because of cost at scale, because of privacy requirements, because of latency constraints, or because you want to run without internet - local agents are the answer.

Ollama is the simplest path to running a capable language model on your own machine. It handles model downloads, quantization, and serving through a local API that mimics OpenAI's format. Once Ollama is running, you point LangChain at localhost instead of an external endpoint.

# Install Ollama and pull a model
ollama pull qwen3.6:27b

# Ollama serves on localhost:11434 by default
from langchain_community.llms import Ollama
from langchain.agents import create_react_agent, AgentExecutor

llm = Ollama(model="qwen3.6:27b")

agent = create_react_agent(llm, tools=[search_tool, calculator_tool], prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({"input": "Research and summarize today's AI news"})

The models worth using locally in 2026: Qwen3.6-27B for reasoning and coding tasks (55GB, requires 64GB RAM for comfortable inference), Qwen3.6-35B-A3B for efficiency-first workloads (3B active parameters out of 35B total, runs faster), DeepSeek V4-Flash via vLLM for tool-heavy agents once the GGUF quantization lands, and Gemma 4 E2B for resource-constrained environments (under 1.5GB memory).

The honest tradeoff is response time. A cloud API returns a response in one to three seconds. A local model on consumer hardware takes five to thirty seconds depending on the model size and your hardware. For batch processing, research pipelines, or any workflow where latency is not the constraint, local agents are economically superior at scale. For user-facing real-time applications, the latency is usually unacceptable.

When to use this: Privacy is a hard requirement. Cost at scale is the constraint. You are building something that needs to run offline. Your data cannot leave your infrastructure.

Not ideal for: Real-time user-facing applications. Teams without the hardware to run the models.

Method 7: MCP - The Standard That Connects Everything

Model Context Protocol is not a framework or a platform. It is a standard - an open protocol developed by Anthropic that defines how models connect to external tools and data sources. Understanding MCP matters because it is becoming the common language that different agent systems use to share tools.

Before MCP, every tool integration required custom code. A LangChain tool was not a CrewAI tool. An n8n integration was not usable in a direct API call. MCP changes this. An MCP server exposes tools in a standardized format. Any client that speaks MCP - Claude Code, n8n, LangChain, ADK, or your own custom agent - can use those tools without custom integration code.

# Using MCP tools in Claude's API directly
from anthropic import Anthropic

client = Anthropic()
response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[{"type": "mcp", "server_url": "https://your-mcp-server.com"}],
    messages=[{"role": "user", "content": "Search for AI news from today"}]
)

The MCP ecosystem is growing rapidly. Servers exist for GitHub, Slack, Google Drive, Notion, Linear, Postgres, and dozens of other tools. Building your own MCP server exposes your internal tools to any MCP-compatible agent, which means a tool you build once works everywhere.

For anyone building agents that need to integrate with multiple external services, MCP is the most important architectural decision to make early. Retrofitting it later is expensive.

Choosing the Right Method

The question is not which method is best. The question is which is right for your current situation.

If you are a developer trying to understand how agents work: start with the direct API. Build the loop yourself. Break it intentionally. Understand every part before you add abstraction.

If you are a developer trying to ship something fast: LangChain for single-agent workflows, LangGraph for multi-agent pipelines, CrewAI if your workflow maps to human roles.

If you are on Google Cloud and need native multimodal support: ADK.

If you are non-technical or need something a non-technical team can maintain: n8n.

If privacy or cost at scale is the constraint: local agents via Ollama.

If you are building something that needs to integrate with multiple tools and potentially work across multiple frameworks: design around MCP from the start.

Most production systems end up combining these. A direct API call for the core reasoning loop. LangGraph for orchestration. MCP for tool connections. n8n for the business workflow layer. Ollama for the batch processing pipeline where latency does not matter and cost does. The methods are not mutually exclusive. They are layers that compose.

The Mistakes That Break Agents in Production

The framework you choose matters less than the decisions you make inside it. Three failure modes account for most agent breakdowns in production.

The first is tool design. Anthropic's own engineering team, while building their SWE-bench agent, reported spending more time optimizing tool definitions than optimizing prompts. A tool with ambiguous parameters, missing documentation, or relative file path inputs instead of absolute ones will cause the model to make mistakes that look like model failures but are actually tool failures. Every tool needs a clear name, a precise description, and explicit constraints.

The second is context management. Agents accumulate context across long tasks. At some point that context exceeds the model's reliable attention window, and performance degrades in ways that are hard to diagnose. Production agents need explicit context management - summarizing earlier steps, storing completed work in external memory, and keeping the active context focused on what the model needs right now.

The third is cost modeling. LLM calls are cheap at ten requests a day and potentially disqualifying at one hundred thousand. Before committing to an architecture, model the token consumption of a full agent run at your expected production volume. The agent that costs 0.02pertaskindevelopmentmaycost0.02 per task in development may cost 200 per thousand tasks in production. That math should be done before you build, not after.

Go Deeper

Related guide: RAG Works in Theory. Here's Why It Fails in Production. - the retrieval layer that most agents need and most developers implement wrong

Sources and further reading: