Sponsor

EthicalAds: Display ethical, developer-targeted ads on your platform without compromising user privacy.

⚙️Module 6 of 12

Building Agents with Frameworks

⏱ 8–10 hours

📘 Intermediate

🔧 Python, CrewAI, LangGraph

What you'll learn

→Implement function calling (tool use) with the Anthropic API directly
→Build a single-agent tool-use loop
→Implement a reflection agent that self-critiques
→Create a 2-agent CrewAI pipeline

Function Calling — The Foundation of Tool Use

Every agent framework sits on top of the same primitive: function calling (also called tool use). Before using any framework, you should understand exactly what's happening at the API level. This knowledge will serve you well when frameworks fail or when you need to debug unexpected behavior.

Here is how function calling works with the Anthropic API:

You define tools as JSON schemas — name, description, and input parameter types
You pass these tool definitions with your API call
The model decides whether to respond normally or call a tool
If calling a tool, stop_reason is "tool_use" and the response contains the tool name and arguments
Your code executes the actual function
You add the result as a tool_result message and call the API again
The model now generates its final response, informed by the tool result

ℹ️

The Model Never Executes Code

This is a crucial point: the model never runs your functions. It only generates structured JSON specifying which function to call and with what arguments. Your application code executes the actual function and returns the result. The model is requesting; you are executing.

import anthropic
import json
from datetime import datetime
 
client = anthropic.Anthropic()
 
# ── TOOL DEFINITIONS ─────────────────────────────────────────────
 
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city. Returns temperature in Celsius and conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name, e.g. 'San Francisco' or 'Tokyo'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit. Defaults to celsius."
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "get_current_time",
        "description": "Get the current date and time.",
        "input_schema": {
            "type": "object",
            "properties": {},
            "required": []
        }
    }
]
 
 
# ── TOOL IMPLEMENTATIONS ──────────────────────────────────────────
 
def get_weather(city: str, units: str = "celsius") -> dict:
    """Simulated weather API — replace with real API call in production."""
    mock_data = {
        "san francisco": {"temp_c": 16, "conditions": "Partly cloudy", "humidity": 78},
        "tokyo":         {"temp_c": 28, "conditions": "Sunny",          "humidity": 65},
        "london":        {"temp_c": 12, "conditions": "Overcast",       "humidity": 85},
    }
    data = mock_data.get(city.lower(), {"temp_c": 20, "conditions": "Unknown", "humidity": 60})
    
    if units == "fahrenheit":
        data["temp"] = (data["temp_c"] * 9/5) + 32
        data["unit"] = "F"
    else:
        data["temp"] = data["temp_c"]
        data["unit"] = "C"
    
    return {"city": city, "temperature": f"{data['temp']}{data['unit']}", 
            "conditions": data["conditions"], "humidity": f"{data['humidity']}%"}
 
def get_current_time() -> dict:
    now = datetime.now()
    return {"datetime": now.strftime("%Y-%m-%d %H:%M:%S"), "timezone": "local"}
 
TOOL_REGISTRY = {
    "get_weather": get_weather,
    "get_current_time": get_current_time,
}
 
 
# ── AGENT LOOP ────────────────────────────────────────────────────
 
def run_agent(user_message: str) -> str:
    """Run a tool-use agent loop until the model returns a final answer."""
    
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        
        # Case 1: Model is done — return the final text response
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
        
        # Case 2: Model wants to call a tool
        elif response.stop_reason == "tool_use":
            # Add the model's (tool-requesting) response to history
            messages.append({"role": "assistant", "content": response.content})
            
            # Execute each requested tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_args = block.input
                    
                    print(f"  [Calling tool: {tool_name}({json.dumps(tool_args)})]")
                    
                    fn = TOOL_REGISTRY.get(tool_name)
                    if fn:
                        result = fn(**tool_args)
                        result_str = json.dumps(result)
                    else:
                        result_str = f"Error: Tool '{tool_name}' not found."
                    
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result_str
                    })
            
            # Add tool results and loop back to get the model's next response
            messages.append({"role": "user", "content": tool_results})
        
        else:
            # Unexpected stop reason
            return f"Agent stopped unexpectedly: {response.stop_reason}"
 
 
# Test it
result = run_agent("What's the weather like in Tokyo right now, and what time is it?")
print(f"\nFinal answer:\n{result}")

Running this, you'll see the agent call both tools, then synthesize a natural response combining both results.

CrewAI — Multi-Agent Made Easy

CrewAI is a framework for orchestrating role-playing agents. The core concepts map cleanly to the multi-agent design pattern from Module 5:

Agent: an LLM with a role, goal, backstory, and optionally a set of tools
Task: a discrete piece of work with a description, expected output, and assigned agent
Crew: the collection of agents and tasks, with a process (sequential or hierarchical)
Process: how tasks are executed — sequential (one after another) or hierarchical (manager assigns work)

pip install crewai crewai-tools

Here is a complete research-and-writing pipeline using two agents:

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool  # Web search tool
import os
 
# Set your API keys
os.environ["ANTHROPIC_API_KEY"] = "your-key"
os.environ["SERPER_API_KEY"] = "your-key"  # serper.dev for web search
 
search_tool = SerperDevTool()
 
# ── DEFINE AGENTS ────────────────────────────────────────────────
 
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find and synthesize accurate, up-to-date information on any given topic",
    backstory="""You are a meticulous research analyst with 15 years of experience 
    in technology journalism. You excel at finding credible sources, separating 
    signal from noise, and presenting findings in a clear, structured format.
    You always cite your sources.""",
    tools=[search_tool],
    verbose=True,
    llm="claude-haiku-4-5"   # Use cheap model for research
)
 
writer = Agent(
    role="Content Strategist and Writer",
    goal="Transform research findings into engaging, accurate content for a technical audience",
    backstory="""You are a skilled technical writer who has written for publications 
    like Wired, MIT Technology Review, and The Verge. You turn complex topics into 
    clear, engaging prose without sacrificing accuracy. You write with authority.""",
    verbose=True,
    llm="claude-sonnet-4-5"  # Better model for the actual writing
)
 
# ── DEFINE TASKS ─────────────────────────────────────────────────
 
research_task = Task(
    description="""Research the current state of {topic}.
    
    Your research should cover:
    1. What it is and how it works (technically accurate, but accessible)
    2. Current real-world applications (with specific examples and companies)
    3. Key limitations and challenges
    4. Where things are heading in the next 2 years
    
    Find at least 3 credible sources. Note publication dates — prefer recent.""",
    expected_output="""A structured research brief with:
    - Executive summary (3-4 sentences)
    - Key findings (bulleted, with sources)
    - Limitations and challenges
    - Future outlook
    - Sources (title, URL, date)""",
    agent=researcher
)
 
writing_task = Task(
    description="""Using the research brief provided, write a compelling blog post about {topic}.
    
    Requirements:
    - Length: 600-800 words
    - Audience: software engineers curious about AI
    - Tone: authoritative but accessible, not hype-y
    - Structure: hook → context → substance → implications → conclusion
    - Include 1-2 concrete examples that a developer can relate to
    - No buzzwords: avoid 'revolutionary', 'game-changing', 'unprecedented'""",
    expected_output="""A complete, publication-ready blog post with:
    - Compelling headline
    - Full body (600-800 words)
    - All claims grounded in the research brief""",
    agent=writer,
    context=[research_task]  # Writer receives researcher's output
)
 
# ── ASSEMBLE AND RUN THE CREW ─────────────────────────────────────
 
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,  # Research first, then writing
    verbose=True
)
 
result = crew.kickoff(inputs={"topic": "AI agents in production"})
print(result.raw)

The context=[research_task] parameter is what makes the pipeline work — it tells CrewAI to pass the researcher's output to the writer as context. In sequential process, this happens automatically for adjacent tasks; in hierarchical process, a manager agent coordinates the work.

Building a Reflection Agent

Reflection is the pattern where an agent generates output, critiques it, and then improves it. This loop runs until the output passes quality checks or hits a maximum iteration count.

import anthropic
 
client = anthropic.Anthropic()
 
def reflection_agent(
    task: str,
    max_iterations: int = 3,
    quality_threshold: float = 8.0  # out of 10
) -> str:
    """
    Generate → critique → improve loop.
    Returns the best output found within max_iterations.
    """
    
    # Step 1: Generate initial draft
    print(f"=== Generating initial draft ===")
    draft_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        system="You are an expert software engineer. Complete the given task thoroughly.",
        messages=[{"role": "user", "content": task}]
    )
    current_draft = draft_response.content[0].text
    print(f"Draft generated ({len(current_draft)} chars)")
    
    for iteration in range(max_iterations):
        # Step 2: Critique the draft
        print(f"\n=== Critique round {iteration + 1} ===")
        critique_response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            system="""You are a strict technical reviewer. Your job is to critique code and writing.
            
Be specific about problems. Focus on:
- Correctness: will it actually work? Edge cases handled?
- Completeness: does it fully address the task?
- Quality: clarity, efficiency, best practices
 
End your critique with:
SCORE: [0-10]
VERDICT: [APPROVE if score >= 8, REVISE if score < 8]""",
            messages=[
                {"role": "user", "content": f"Original task:\n{task}\n\nDraft to review:\n{current_draft}"}
            ]
        )
        
        critique = critique_response.content[0].text
        print(f"Critique:\n{critique}")
        
        # Parse score and verdict
        score_line = [l for l in critique.split("\n") if l.startswith("SCORE:")]
        verdict_line = [l for l in critique.split("\n") if l.startswith("VERDICT:")]
        
        if score_line and verdict_line:
            score = float(score_line[0].split(":")[1].strip())
            verdict = verdict_line[0].split(":")[1].strip()
            
            if verdict == "APPROVE" or score >= quality_threshold:
                print(f"\n Quality threshold met (score: {score}/10). Done.")
                return current_draft
        
        if iteration == max_iterations - 1:
            print(f"\nMax iterations reached. Returning best draft.")
            return current_draft
        
        # Step 3: Improve based on critique
        print(f"\n=== Improving based on critique ===")
        improve_response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            system="You are an expert software engineer. Revise the draft based on reviewer feedback.",
            messages=[
                {"role": "user", "content": f"""Original task:
{task}
 
Current draft:
{current_draft}
 
Reviewer feedback:
{critique}
 
Produce an improved version that addresses all the reviewer's concerns."""}
            ]
        )
        
        current_draft = improve_response.content[0].text
        print(f"Draft improved ({len(current_draft)} chars)")
    
    return current_draft
 
 
# Test: code review task
result = reflection_agent(
    task="""Write a Python function that safely reads a JSON file and returns the parsed 
data. Handle all common failure cases gracefully. Include type hints and docstring.""",
    max_iterations=3
)
 
print(f"\n=== FINAL OUTPUT ===\n{result}")

✅

Reflection Improves Quality Measurably

Studies on agentic systems consistently show that reflection with 2–3 iterations improves output quality by 20–40% compared to a single generation pass. The cost is 3x the API calls — but for tasks where quality matters more than speed (code generation, document drafting, complex analysis), the tradeoff is almost always worth it.

LangGraph — For Complex Control Flow

CrewAI is excellent for role-based multi-agent pipelines. When you need more precise control over the flow of information — conditional branching, loops, shared state, complex error recovery — LangGraph is the better choice.

LangGraph models your agent as a directed graph:

Nodes are functions (or LLM calls) that process state
Edges are transitions between nodes
Conditional edges allow branching based on the current state
State is a typed dictionary passed through and mutated at each node

LangGraph: Researcher Agent with Tool Routing

START
  │
  ▼
[researcher_node]  ← LLM decides: call a tool or answer?
  │
  ├──(tool_call)──► [tool_node]  ← executes the tool
  │                     │
  │                     └──────► [researcher_node] (loop back)
  │
  └──(final_answer)──► END

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
import anthropic
 
# Define the state schema — what gets passed between nodes
class AgentState(TypedDict):
    messages: list
    iteration_count: int
 
# Define tools (same format as raw API)
tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    }
]
 
client = anthropic.Anthropic()
 
# Node: LLM call
def researcher_node(state: AgentState) -> AgentState:
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1024,
        tools=tools,
        messages=state["messages"]
    )
    
    state["messages"].append({"role": "assistant", "content": response.content})
    state["_stop_reason"] = response.stop_reason
    state["iteration_count"] = state.get("iteration_count", 0) + 1
    
    return state
 
# Node: tool execution
def tool_node(state: AgentState) -> AgentState:
    last_message = state["messages"][-1]
    tool_results = []
    
    for block in last_message["content"]:
        if hasattr(block, "type") and block.type == "tool_use":
            # Execute the tool (simplified — use your actual implementations)
            result = f"Search results for '{block.input['query']}': [mock results]"
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result
            })
    
    state["messages"].append({"role": "user", "content": tool_results})
    return state
 
# Routing function: decide which node comes next
def router(state: AgentState) -> str:
    if state.get("_stop_reason") == "tool_use":
        return "tools"
    if state.get("iteration_count", 0) >= 5:
        return END  # Safety limit
    return END
 
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher_node)
workflow.add_node("tools", tool_node)
 
workflow.set_entry_point("researcher")
workflow.add_conditional_edges("researcher", router, {
    "tools": "tools",
    END: END
})
workflow.add_edge("tools", "researcher")  # After tools, go back to researcher
 
graph = workflow.compile()
 
# Run the graph
initial_state = {
    "messages": [{"role": "user", "content": "What are the latest developments in AI agents?"}],
    "iteration_count": 0
}
 
result = graph.invoke(initial_state)

LangGraph's strength is its explicitness. The graph structure makes control flow visible, testable, and debuggable. When something goes wrong, you can inspect exactly which node failed and what the state looked like at that point.

Framework Selection Guide

Choosing a framework should be a deliberate decision based on your requirements:

| Framework | Best For | Learning Curve | Control | Notes | |-----------|---------|---------------|---------|-------| | Raw Anthropic API | Learning, custom architectures | Low | Maximum | No abstractions — you see everything | | CrewAI | Role-based multi-agent pipelines | Low | Medium | Fast to prototype, opinionated | | LangGraph | Complex state machines, custom control flow | Medium | High | Explicit graph is great for debugging | | AutoGen | Conversational multi-agent (Microsoft) | Medium | Medium | Strong for debate/discussion patterns | | OpenAI Agents SDK | OpenAI ecosystem | Low | Medium | Tied to OpenAI, less framework overhead |

✅

Learn the Pattern Before the Framework

The biggest mistake when starting with agents is reaching for a framework before understanding the underlying pattern. Build your first agent with the raw API. When you feel the pain of managing the loop yourself, you'll understand exactly what the framework is buying you. Then choose a framework based on that understanding.

Rule of thumb: Start with CrewAI for speed. Graduate to LangGraph when you need explicit control over complex state. Write raw API code when frameworks add more complexity than they remove.

💻Build 3 Agents — Escalating Complexity

Goal: Build three agents from scratch, each demonstrating a different pattern.

Agent 1: Raw API Tool-Use Agent

Build a calculator + date agent using the raw Anthropic API (no frameworks).

Tools to implement:

calculate(expression: str) -> float — evaluates a math expression safely
get_date_info(offset_days: int) -> dict — returns date info (today + offset days)

Test questions:

"If today is March 15 and I add 47 days, what date is that?"
"What is (sqrt(144) + 15^2) / 3?"
"A project starts today and takes 90 days. What day of the week does it end?"

import ast
import operator
from datetime import datetime, timedelta
 
def safe_calculate(expression: str) -> float:
    """Safely evaluate a math expression without using eval()."""
    allowed_ops = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.Pow: operator.pow,
        ast.USub: operator.neg,
    }
    
    def eval_node(node):
        if isinstance(node, ast.Constant):
            return node.value
        elif isinstance(node, ast.BinOp):
            return allowed_ops[type(node.op)](eval_node(node.left), eval_node(node.right))
        elif isinstance(node, ast.UnaryOp):
            return allowed_ops[type(node.op)](eval_node(node.operand))
        else:
            raise ValueError(f"Unsupported operation: {type(node)}")
    
    tree = ast.parse(expression, mode='eval')
    return eval_node(tree.body)
 
# TODO: implement the agent loop

Agent 2: Reflection Code Reviewer

Build a reflection agent that:

Takes a Python function as input
Generates a code review
Scores the review (is it thorough? specific? actionable?)
If score < 8, improves the review and tries again
Returns the final review after max 3 iterations

Test it on these two functions:

# Function A — many issues
def proc(d):
    r = []
    for x in d:
        if x > 0:
            r.append(x * 2)
    return r
 
# Function B — well-written
def double_positives(numbers: list[float]) -> list[float]:
    """Return a new list with each positive number doubled.
    
    Args:
        numbers: Input list of numbers (may be empty)
        
    Returns:
        List containing 2x each positive number, in original order.
        Empty list if no positive numbers exist.
    """
    return [n * 2 for n in numbers if n > 0]

Agent 3: CrewAI Research + Writing Pipeline

Install CrewAI and build the Researcher + Writer crew from the examples above.

Customize it for a topic relevant to your work. Run it and evaluate:

Did the researcher find relevant information?
Did the writer stay grounded in the research (no hallucinations)?
Is the output actually publication-quality, or does it need human editing?

Document what you'd need to change to make it production-ready.

🧪

Knowledge Check

Answer all 3 questions to unlock completion

Q1When stop_reason='tool_use', what should you do?

Q2In CrewAI, what is a 'Task'?

Q3What is the reflection pattern in agents?

← Agent Concepts

MCP Protocol →