Building Agents with Frameworks
- âImplement function calling (tool use) with the Anthropic API directly
- âBuild a single-agent tool-use loop
- âImplement a reflection agent that self-critiques
- âCreate a 2-agent CrewAI pipeline
Function Calling â The Foundation of Tool Use
Every agent framework sits on top of the same primitive: function calling (also called tool use). Before using any framework, you should understand exactly what's happening at the API level. This knowledge will serve you well when frameworks fail or when you need to debug unexpected behavior.
Here is how function calling works with the Anthropic API:
- You define tools as JSON schemas â name, description, and input parameter types
- You pass these tool definitions with your API call
- The model decides whether to respond normally or call a tool
- If calling a tool,
stop_reasonis"tool_use"and the response contains the tool name and arguments - Your code executes the actual function
- You add the result as a
tool_resultmessage and call the API again - The model now generates its final response, informed by the tool result
This is a crucial point: the model never runs your functions. It only generates structured JSON specifying which function to call and with what arguments. Your application code executes the actual function and returns the result. The model is requesting; you are executing.
import anthropic
import json
from datetime import datetime
client = anthropic.Anthropic()
# ââ TOOL DEFINITIONS âââââââââââââââââââââââââââââââââââââââââââââ
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city. Returns temperature in Celsius and conditions.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'San Francisco' or 'Tokyo'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Defaults to celsius."
}
},
"required": ["city"]
}
},
{
"name": "get_current_time",
"description": "Get the current date and time.",
"input_schema": {
"type": "object",
"properties": {},
"required": []
}
}
]
# ââ TOOL IMPLEMENTATIONS ââââââââââââââââââââââââââââââââââââââââââ
def get_weather(city: str, units: str = "celsius") -> dict:
"""Simulated weather API â replace with real API call in production."""
mock_data = {
"san francisco": {"temp_c": 16, "conditions": "Partly cloudy", "humidity": 78},
"tokyo": {"temp_c": 28, "conditions": "Sunny", "humidity": 65},
"london": {"temp_c": 12, "conditions": "Overcast", "humidity": 85},
}
data = mock_data.get(city.lower(), {"temp_c": 20, "conditions": "Unknown", "humidity": 60})
if units == "fahrenheit":
data["temp"] = (data["temp_c"] * 9/5) + 32
data["unit"] = "F"
else:
data["temp"] = data["temp_c"]
data["unit"] = "C"
return {"city": city, "temperature": f"{data['temp']}{data['unit']}",
"conditions": data["conditions"], "humidity": f"{data['humidity']}%"}
def get_current_time() -> dict:
now = datetime.now()
return {"datetime": now.strftime("%Y-%m-%d %H:%M:%S"), "timezone": "local"}
TOOL_REGISTRY = {
"get_weather": get_weather,
"get_current_time": get_current_time,
}
# ââ AGENT LOOP ââââââââââââââââââââââââââââââââââââââââââââââââââââ
def run_agent(user_message: str) -> str:
"""Run a tool-use agent loop until the model returns a final answer."""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=1024,
tools=tools,
messages=messages
)
# Case 1: Model is done â return the final text response
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
# Case 2: Model wants to call a tool
elif response.stop_reason == "tool_use":
# Add the model's (tool-requesting) response to history
messages.append({"role": "assistant", "content": response.content})
# Execute each requested tool call
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_name = block.name
tool_args = block.input
print(f" [Calling tool: {tool_name}({json.dumps(tool_args)})]")
fn = TOOL_REGISTRY.get(tool_name)
if fn:
result = fn(**tool_args)
result_str = json.dumps(result)
else:
result_str = f"Error: Tool '{tool_name}' not found."
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result_str
})
# Add tool results and loop back to get the model's next response
messages.append({"role": "user", "content": tool_results})
else:
# Unexpected stop reason
return f"Agent stopped unexpectedly: {response.stop_reason}"
# Test it
result = run_agent("What's the weather like in Tokyo right now, and what time is it?")
print(f"\nFinal answer:\n{result}")Running this, you'll see the agent call both tools, then synthesize a natural response combining both results.
CrewAI â Multi-Agent Made Easy
CrewAI is a framework for orchestrating role-playing agents. The core concepts map cleanly to the multi-agent design pattern from Module 5:
- Agent: an LLM with a role, goal, backstory, and optionally a set of tools
- Task: a discrete piece of work with a description, expected output, and assigned agent
- Crew: the collection of agents and tasks, with a process (sequential or hierarchical)
- Process: how tasks are executed â sequential (one after another) or hierarchical (manager assigns work)
pip install crewai crewai-toolsHere is a complete research-and-writing pipeline using two agents:
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool # Web search tool
import os
# Set your API keys
os.environ["ANTHROPIC_API_KEY"] = "your-key"
os.environ["SERPER_API_KEY"] = "your-key" # serper.dev for web search
search_tool = SerperDevTool()
# ââ DEFINE AGENTS ââââââââââââââââââââââââââââââââââââââââââââââââ
researcher = Agent(
role="Senior Research Analyst",
goal="Find and synthesize accurate, up-to-date information on any given topic",
backstory="""You are a meticulous research analyst with 15 years of experience
in technology journalism. You excel at finding credible sources, separating
signal from noise, and presenting findings in a clear, structured format.
You always cite your sources.""",
tools=[search_tool],
verbose=True,
llm="claude-haiku-4-5" # Use cheap model for research
)
writer = Agent(
role="Content Strategist and Writer",
goal="Transform research findings into engaging, accurate content for a technical audience",
backstory="""You are a skilled technical writer who has written for publications
like Wired, MIT Technology Review, and The Verge. You turn complex topics into
clear, engaging prose without sacrificing accuracy. You write with authority.""",
verbose=True,
llm="claude-sonnet-4-5" # Better model for the actual writing
)
# ââ DEFINE TASKS âââââââââââââââââââââââââââââââââââââââââââââââââ
research_task = Task(
description="""Research the current state of {topic}.
Your research should cover:
1. What it is and how it works (technically accurate, but accessible)
2. Current real-world applications (with specific examples and companies)
3. Key limitations and challenges
4. Where things are heading in the next 2 years
Find at least 3 credible sources. Note publication dates â prefer recent.""",
expected_output="""A structured research brief with:
- Executive summary (3-4 sentences)
- Key findings (bulleted, with sources)
- Limitations and challenges
- Future outlook
- Sources (title, URL, date)""",
agent=researcher
)
writing_task = Task(
description="""Using the research brief provided, write a compelling blog post about {topic}.
Requirements:
- Length: 600-800 words
- Audience: software engineers curious about AI
- Tone: authoritative but accessible, not hype-y
- Structure: hook â context â substance â implications â conclusion
- Include 1-2 concrete examples that a developer can relate to
- No buzzwords: avoid 'revolutionary', 'game-changing', 'unprecedented'""",
expected_output="""A complete, publication-ready blog post with:
- Compelling headline
- Full body (600-800 words)
- All claims grounded in the research brief""",
agent=writer,
context=[research_task] # Writer receives researcher's output
)
# ââ ASSEMBLE AND RUN THE CREW âââââââââââââââââââââââââââââââââââââ
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential, # Research first, then writing
verbose=True
)
result = crew.kickoff(inputs={"topic": "AI agents in production"})
print(result.raw)The context=[research_task] parameter is what makes the pipeline work â it tells CrewAI to pass the researcher's output to the writer as context. In sequential process, this happens automatically for adjacent tasks; in hierarchical process, a manager agent coordinates the work.
Building a Reflection Agent
Reflection is the pattern where an agent generates output, critiques it, and then improves it. This loop runs until the output passes quality checks or hits a maximum iteration count.
import anthropic
client = anthropic.Anthropic()
def reflection_agent(
task: str,
max_iterations: int = 3,
quality_threshold: float = 8.0 # out of 10
) -> str:
"""
Generate â critique â improve loop.
Returns the best output found within max_iterations.
"""
# Step 1: Generate initial draft
print(f"=== Generating initial draft ===")
draft_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
system="You are an expert software engineer. Complete the given task thoroughly.",
messages=[{"role": "user", "content": task}]
)
current_draft = draft_response.content[0].text
print(f"Draft generated ({len(current_draft)} chars)")
for iteration in range(max_iterations):
# Step 2: Critique the draft
print(f"\n=== Critique round {iteration + 1} ===")
critique_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="""You are a strict technical reviewer. Your job is to critique code and writing.
Be specific about problems. Focus on:
- Correctness: will it actually work? Edge cases handled?
- Completeness: does it fully address the task?
- Quality: clarity, efficiency, best practices
End your critique with:
SCORE: [0-10]
VERDICT: [APPROVE if score >= 8, REVISE if score < 8]""",
messages=[
{"role": "user", "content": f"Original task:\n{task}\n\nDraft to review:\n{current_draft}"}
]
)
critique = critique_response.content[0].text
print(f"Critique:\n{critique}")
# Parse score and verdict
score_line = [l for l in critique.split("\n") if l.startswith("SCORE:")]
verdict_line = [l for l in critique.split("\n") if l.startswith("VERDICT:")]
if score_line and verdict_line:
score = float(score_line[0].split(":")[1].strip())
verdict = verdict_line[0].split(":")[1].strip()
if verdict == "APPROVE" or score >= quality_threshold:
print(f"\n Quality threshold met (score: {score}/10). Done.")
return current_draft
if iteration == max_iterations - 1:
print(f"\nMax iterations reached. Returning best draft.")
return current_draft
# Step 3: Improve based on critique
print(f"\n=== Improving based on critique ===")
improve_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
system="You are an expert software engineer. Revise the draft based on reviewer feedback.",
messages=[
{"role": "user", "content": f"""Original task:
{task}
Current draft:
{current_draft}
Reviewer feedback:
{critique}
Produce an improved version that addresses all the reviewer's concerns."""}
]
)
current_draft = improve_response.content[0].text
print(f"Draft improved ({len(current_draft)} chars)")
return current_draft
# Test: code review task
result = reflection_agent(
task="""Write a Python function that safely reads a JSON file and returns the parsed
data. Handle all common failure cases gracefully. Include type hints and docstring.""",
max_iterations=3
)
print(f"\n=== FINAL OUTPUT ===\n{result}")Studies on agentic systems consistently show that reflection with 2â3 iterations improves output quality by 20â40% compared to a single generation pass. The cost is 3x the API calls â but for tasks where quality matters more than speed (code generation, document drafting, complex analysis), the tradeoff is almost always worth it.
LangGraph â For Complex Control Flow
CrewAI is excellent for role-based multi-agent pipelines. When you need more precise control over the flow of information â conditional branching, loops, shared state, complex error recovery â LangGraph is the better choice.
LangGraph models your agent as a directed graph:
- Nodes are functions (or LLM calls) that process state
- Edges are transitions between nodes
- Conditional edges allow branching based on the current state
- State is a typed dictionary passed through and mutated at each node
START â âŧ [researcher_node] â LLM decides: call a tool or answer? â âââ(tool_call)âââē [tool_node] â executes the tool â â â ââââââââē [researcher_node] (loop back) â âââ(final_answer)âââē END
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
import anthropic
# Define the state schema â what gets passed between nodes
class AgentState(TypedDict):
messages: list
iteration_count: int
# Define tools (same format as raw API)
tools = [
{
"name": "web_search",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
]
client = anthropic.Anthropic()
# Node: LLM call
def researcher_node(state: AgentState) -> AgentState:
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=1024,
tools=tools,
messages=state["messages"]
)
state["messages"].append({"role": "assistant", "content": response.content})
state["_stop_reason"] = response.stop_reason
state["iteration_count"] = state.get("iteration_count", 0) + 1
return state
# Node: tool execution
def tool_node(state: AgentState) -> AgentState:
last_message = state["messages"][-1]
tool_results = []
for block in last_message["content"]:
if hasattr(block, "type") and block.type == "tool_use":
# Execute the tool (simplified â use your actual implementations)
result = f"Search results for '{block.input['query']}': [mock results]"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
state["messages"].append({"role": "user", "content": tool_results})
return state
# Routing function: decide which node comes next
def router(state: AgentState) -> str:
if state.get("_stop_reason") == "tool_use":
return "tools"
if state.get("iteration_count", 0) >= 5:
return END # Safety limit
return END
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher_node)
workflow.add_node("tools", tool_node)
workflow.set_entry_point("researcher")
workflow.add_conditional_edges("researcher", router, {
"tools": "tools",
END: END
})
workflow.add_edge("tools", "researcher") # After tools, go back to researcher
graph = workflow.compile()
# Run the graph
initial_state = {
"messages": [{"role": "user", "content": "What are the latest developments in AI agents?"}],
"iteration_count": 0
}
result = graph.invoke(initial_state)LangGraph's strength is its explicitness. The graph structure makes control flow visible, testable, and debuggable. When something goes wrong, you can inspect exactly which node failed and what the state looked like at that point.
Framework Selection Guide
Choosing a framework should be a deliberate decision based on your requirements:
| Framework | Best For | Learning Curve | Control | Notes | |-----------|---------|---------------|---------|-------| | Raw Anthropic API | Learning, custom architectures | Low | Maximum | No abstractions â you see everything | | CrewAI | Role-based multi-agent pipelines | Low | Medium | Fast to prototype, opinionated | | LangGraph | Complex state machines, custom control flow | Medium | High | Explicit graph is great for debugging | | AutoGen | Conversational multi-agent (Microsoft) | Medium | Medium | Strong for debate/discussion patterns | | OpenAI Agents SDK | OpenAI ecosystem | Low | Medium | Tied to OpenAI, less framework overhead |
The biggest mistake when starting with agents is reaching for a framework before understanding the underlying pattern. Build your first agent with the raw API. When you feel the pain of managing the loop yourself, you'll understand exactly what the framework is buying you. Then choose a framework based on that understanding.
Rule of thumb: Start with CrewAI for speed. Graduate to LangGraph when you need explicit control over complex state. Write raw API code when frameworks add more complexity than they remove.
Goal: Build three agents from scratch, each demonstrating a different pattern.
Agent 1: Raw API Tool-Use Agent
Build a calculator + date agent using the raw Anthropic API (no frameworks).
Tools to implement:
calculate(expression: str) -> floatâ evaluates a math expression safelyget_date_info(offset_days: int) -> dictâ returns date info (today + offset days)
Test questions:
- "If today is March 15 and I add 47 days, what date is that?"
- "What is (sqrt(144) + 15^2) / 3?"
- "A project starts today and takes 90 days. What day of the week does it end?"
import ast
import operator
from datetime import datetime, timedelta
def safe_calculate(expression: str) -> float:
"""Safely evaluate a math expression without using eval()."""
allowed_ops = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Pow: operator.pow,
ast.USub: operator.neg,
}
def eval_node(node):
if isinstance(node, ast.Constant):
return node.value
elif isinstance(node, ast.BinOp):
return allowed_ops[type(node.op)](eval_node(node.left), eval_node(node.right))
elif isinstance(node, ast.UnaryOp):
return allowed_ops[type(node.op)](eval_node(node.operand))
else:
raise ValueError(f"Unsupported operation: {type(node)}")
tree = ast.parse(expression, mode='eval')
return eval_node(tree.body)
# TODO: implement the agent loopAgent 2: Reflection Code Reviewer
Build a reflection agent that:
- Takes a Python function as input
- Generates a code review
- Scores the review (is it thorough? specific? actionable?)
- If score < 8, improves the review and tries again
- Returns the final review after max 3 iterations
Test it on these two functions:
# Function A â many issues
def proc(d):
r = []
for x in d:
if x > 0:
r.append(x * 2)
return r
# Function B â well-written
def double_positives(numbers: list[float]) -> list[float]:
"""Return a new list with each positive number doubled.
Args:
numbers: Input list of numbers (may be empty)
Returns:
List containing 2x each positive number, in original order.
Empty list if no positive numbers exist.
"""
return [n * 2 for n in numbers if n > 0]Agent 3: CrewAI Research + Writing Pipeline
Install CrewAI and build the Researcher + Writer crew from the examples above.
Customize it for a topic relevant to your work. Run it and evaluate:
- Did the researcher find relevant information?
- Did the writer stay grounded in the research (no hallucinations)?
- Is the output actually publication-quality, or does it need human editing?
Document what you'd need to change to make it production-ready.
Q1When stop_reason='tool_use', what should you do?
Q2In CrewAI, what is a 'Task'?
Q3What is the reflection pattern in agents?