Sponsor

EthicalAds: Display ethical, developer-targeted ads on your platform without compromising user privacy.

🔭Module 12 of 12

What Comes Next

⏱ 2–3 hours

📘 Beginner

🔧 All levels

What you'll learn

→Understand emerging AI capabilities worth tracking
→Build a personal system for staying current
→Know the career paths in AI engineering
→Have a plan for your first post-course project

You've Come a Long Way

Think back to where you started. You could probably use an AI chatbot. You might have written a simple prompt. Maybe you'd experimented with the API.

Now look at what you can do. You can explain how tokenization, attention, and next-token prediction actually work. You can build a RAG pipeline from scratch — chunking strategy, embedding, retrieval, reranking. You can write multi-agent systems where specialized agents collaborate on complex tasks. You can build MCP servers that expose tools to any compatible host. You can design tools with proper JSON Schema, implement them robustly, and wire them into agentic loops. You can evaluate LLM systems with unit evals, LLM-as-judge, and groundedness checking. You can ship production-ready AI with caching, PII redaction, structured logging, and cost monitoring.

✅

A Rare Skill Set

Most engineers know AI exists. A smaller number have used an AI API. Very few can design RAG systems, ship MCP servers, build multi-agent pipelines, and evaluate AI systems rigorously. You now can. That combination is genuinely uncommon and genuinely valuable.

This final module is not another technical deep-dive. It's a map for where to go next — which emerging capabilities to watch, how to stay current without drowning in hype, which career path fits your goals, and how to build a personal system for continued learning.

Emerging Capabilities to Watch

The AI landscape moves fast. Most announcements are incremental. A few represent genuine shifts in what you can build. Here are the capabilities worth understanding now.

Extended Thinking / Reasoning Models

Standard models generate responses by predicting tokens as fast as possible. Reasoning models take a different approach: before generating the final answer, they spend extra compute on internal chain-of-thought — working through the problem step by step in a scratchpad the user doesn't see.

The practical result: reasoning models dramatically outperform standard models on tasks requiring multi-step logic, complex math, and careful planning. They're slower and more expensive per call, but for the right tasks — architecture decisions, complex debugging, mathematical proofs, multi-step planning — the quality improvement justifies the cost.

When to use a reasoning model versus a standard model:

Use reasoning: complex coding tasks, algorithmic problems, tasks where you've seen standard models make subtle logic errors
Use standard: conversational tasks, simple classification, RAG retrieval, any task where response speed matters

As you build systems, think about routing: most tasks go to a fast model, complex reasoning tasks route to a reasoning model. The eval harness from Module 9 will tell you which tasks actually benefit.

Computer Use Agents

Computer use gives AI agents the ability to interact with a computer interface the same way a human does: take a screenshot, identify what's on screen, click a button, type text, navigate a browser. Claude supports this natively.

This opens up automation for any GUI-based task — workflows that previously required a human because there was no API. Filling out web forms, navigating legacy internal tools, automating desktop applications, testing software by actually clicking through the UI.

The practical engineering challenge: computer use is slower and more expensive than API-based tool calls. Design systems that prefer APIs when they exist and fall back to computer use for interfaces with no API. Also: computer use agents need careful sandboxing — you don't want an agent with your browser session.

Multimodal AI

Claude handles images natively — you can pass screenshots, diagrams, photos, and documents directly as input. This changes what "document processing" means: instead of extracting text from a PDF and losing all formatting, you can send the PDF page as an image and ask Claude to interpret tables, charts, and layout.

Practical applications you can build today: invoice processing (pass the invoice image, extract structured data), diagram analysis (explain this architecture diagram), accessibility tooling (describe this image for a screen reader), code screenshot debugging (why does this UI look wrong?).

Audio and video models are emerging but less production-ready as of 2025. Watch this space — the combination of vision, audio, and language in a single model will change interface design substantially.

Long-Context Models

Context windows have grown from 4K tokens in early GPT-3 to 1M+ tokens in 2025 models. This changes the calculus for when to use RAG versus when to just load everything into context.

RAG is still necessary when: your document corpus is larger than even a 1M context window, you need fresh retrieval for high-precision answers, or cost per call matters at scale (filling a 1M context window costs significantly more than a targeted retrieval).

But for tasks where your knowledge base fits in a large context window and you want the model to reason across the whole corpus simultaneously — code understanding, comprehensive document analysis — long-context directly is now a viable option. It simplifies the architecture significantly.

MCP Ecosystem Growth

When you learned MCP in Module 7, it was less than a year old. By mid-2025, it has become the standard protocol for AI tool integration. Every major IDE has MCP support. Database providers ship official MCP servers. Cloud platforms publish MCP servers for their services. Developer tools of all kinds are building MCP integrations.

This means two things for you:

As a consumer: there are now MCP servers for most tools you use. Before building a custom integration, check if an MCP server already exists.
As a producer: building MCP servers is a high-value skill. Teams need engineers who can expose internal systems through MCP for use by AI assistants. The pattern you learned in Module 7 is directly applicable.

Your Personal Learning System

The hardest part of staying current in AI is separating signal from noise. Every week brings new model releases, papers, and tools. Most are incremental. A few matter a lot. Here's a reading stack calibrated for signal-to-noise:

| Resource | Format | Cadence | What you get | |---|---|---|---| | Anthropic Blog | Blog | When released | Model capabilities, safety research, API updates | | Simon Willison's Blog | Blog | Daily | Hands-on exploration of new tools and techniques | | Latent Space Podcast | Podcast | Weekly | Deep technical interviews with AI researchers and practitioners | | AI Explained | YouTube | Weekly | Clear explanations of recent papers and capabilities | | Hugging Face Papers | Papers | Daily | Community-curated important AI research papers | | r/LocalLLaMA | Community | Daily | Practitioners running models locally — finds real-world capabilities and limits |

A sustainable reading practice: 20 minutes per day. Spend it on Simon Willison and r/LocalLLaMA. When something seems important, read the Anthropic blog post or paper. Listen to one Latent Space episode per week during a commute or workout.

The trap to avoid: reading about AI instead of building with AI. Every hour reading is valuable; but ten hours of reading and zero hours building means you're accumulating knowledge you can't apply. Keep the ratio at least 50/50.

Evaluating New Model Releases

New models are released every few months, always described as "state of the art." How do you decide whether to migrate your production system?

A four-step protocol:

Step 1 — Check benchmarks they don't show. Every model release highlights the benchmarks where it excels. Find the ones they don't lead with. If the model is released with math and coding benchmarks, look for its scores on instruction following and factual accuracy.

Step 2 — Test on YOUR tasks. Run the new model through your eval harness from Module 9. Your golden dataset of 10–20 representative inputs tells you more about performance on your specific use case than any public benchmark. Benchmark tasks often don't match production tasks.

Step 3 — Check context window and pricing. A model with twice the capability but three times the cost may not be a win at your volume. Calculate the cost at your expected call volume. Check if your use cases fit the context window.

Step 4 — Wait 2–4 weeks. The community will find failure modes the benchmark designers didn't test. Practitioners using models at scale in real applications surface issues that controlled evaluations miss. Check r/LocalLLaMA, Hacker News AI threads, and technical forums 3–4 weeks after release before committing to a migration.

⚠️

Every Release Is 'State of the Art'

Model releases are marketing events as well as technical milestones. The benchmarks presented are curated to show best performance on tasks the model was optimized for. Your specific use case may perform better, worse, or the same as the previous model. Test before you switch.

Career Paths in AI Engineering

AI engineering is broad enough that "AI engineer" covers very different roles. Here's an honest map of the main paths and what this course prepares you for:

AI Engineer — builds products and features using LLMs as a component. Designs RAG pipelines, builds agents, integrates models into applications, evaluates quality, maintains production systems. This is the role this course most directly prepares you for. Key skills: prompt engineering, RAG, tool use, evals, API integration, basic MLOps. Demand is high and growing. No PhD required.

ML Engineer — trains, fine-tunes, and optimizes models. Understands loss functions, architectures, distributed training, model compression, and inference optimization. More mathematical foundation required. This course gives you context for working with ML engineers but doesn't train you to be one. Path: requires linear algebra, calculus, and ML fundamentals beyond this curriculum.

AI Safety Researcher — works on alignment, interpretability, and robustness. Understanding what models actually do internally, how to make them reliably safe, and how to evaluate safety properties. Almost entirely research-oriented. Path: typically requires an ML research background and/or a graduate degree. This course gives you intuition for safety considerations in product contexts.

AI Product Manager — defines what AI features to build and why, translates between business requirements and technical capability, evaluates model quality from a product perspective. Requires no code but benefits from technical literacy. This course gives an AI PM more technical depth than most non-engineers have.

The role with the highest opportunity-to-supply ratio right now is AI Engineer. Companies need engineers who can ship working AI features — not PhDs, not model trainers, but engineers who understand the API surface deeply and can build reliable production systems. That's what this course built.

Build Your AI Sandbox

The best investment you can make in the next week is building a personal AI experimentation repository. A place where you run experiments, track what you learn, and accumulate golden eval datasets that compound in value over time.

Here's a directory structure that works well:

ai-sandbox/
├── experiments/
│   ├── 2025-05-prompting/       # Prompt engineering explorations
│   ├── 2025-05-rag-chunking/    # Testing chunking strategies
│   └── 2025-05-mcp-weather/     # MCP server experiments
├── evals/
│   ├── golden_qa.jsonl          # Your accumulated golden test cases
│   ├── sentiment_cases.py       # Reusable eval cases by domain
│   └── run_evals.py             # Single script to run all evals
├── tools/
│   ├── model_compare.py         # Compare two models side by side
│   ├── cost_calculator.py       # Estimate costs before production
│   └── prompt_lab.py            # Interactive prompt testing
└── notes/
    └── model_observations.md    # Notes on what you've found

Here's a model comparison script to add to your sandbox immediately:

# tools/model_compare.py
"""
Compare two Claude models side by side on your test cases.
Run: python tools/model_compare.py
"""
import anthropic
import json
import time
 
client = anthropic.Anthropic()
 
TEST_CASES = [
    "Explain the difference between concurrency and parallelism in Python.",
    "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes.",
    "What are the tradeoffs between PostgreSQL and SQLite for a small web app?",
    # Add your own test cases here
]
 
MODELS = [
    "claude-haiku-4-5",
    "claude-sonnet-4-5",
]
 
 
def compare_models(prompt: str, models: list[str]) -> dict:
    results = {}
    for model in models:
        start = time.time()
        response = client.messages.create(
            model=model,
            max_tokens=512,
            messages=[{"role": "user", "content": prompt}],
        )
        latency_ms = (time.time() - start) * 1000
        results[model] = {
            "output": response.content[0].text,
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens,
            "latency_ms": round(latency_ms, 1),
        }
    return results
 
 
def main():
    print("Model Comparison Report")
    print("=" * 60)
 
    for i, test_case in enumerate(TEST_CASES, 1):
        print(f"\nTest Case {i}: {test_case[:60]}...")
        print("-" * 60)
 
        results = compare_models(test_case, MODELS)
 
        for model, result in results.items():
            print(f"\n[{model}] ({result['latency_ms']}ms, {result['output_tokens']} output tokens)")
            print(result["output"][:400])
            if len(result["output"]) > 400:
                print("... [truncated]")
 
        print("\n" + "=" * 60)
 
 
if __name__ == "__main__":
    main()

Run this on new model releases. Add your golden eval datasets as TEST_CASES. Over months, this script becomes your personal benchmark — tuned to your actual tasks, more relevant than any public leaderboard.

Congratulations

You started this course knowing AI exists. You're finishing it knowing how to build AI systems that work.

That's not a small thing. Building a RAG pipeline that actually retrieves the right information requires understanding embeddings, chunking strategy, similarity search, and reranking. Building a multi-agent system that doesn't loop forever requires understanding agent design patterns, failure modes, and eval harnesses. Building an MCP server that's worth using requires understanding the protocol, the three primitives, and how to write tool descriptions that make models reliable collaborators.

You've done all of this. You have the code. You have the patterns. You have the eval framework to verify your systems are working.

There is one thing left: ship something this week. Not next month when it's perfect. This week. Take one piece of what you've learned — a tool use agent, an MCP server, a RAG pipeline over your own documents — and deploy it somewhere real where someone (even just you) will actually use it.

✅

Go Ship Something

Completing this course means you can design RAG systems, build MCP servers, create multi-agent pipelines, and evaluate AI systems rigorously. These skills are in high demand and still relatively rare among working engineers. The capstone project you just built is evidence of that capability. Go ship something this week — the gap between knowing and doing is closed by shipping, not by more reading.

The field will keep moving. New models will release. New patterns will emerge. New tools will appear. But the fundamentals you've built — understanding tokens and context, designing retrievals, writing robust tools, evaluating rigorously, shipping safely — those don't expire. They compound.

The best time to start was when you signed up for this course. The second best time is now.

🧪

Knowledge Check

Answer all 3 questions to unlock completion

Q1What is a 'reasoning model' like Claude with extended thinking?

Q2What does 'computer use' mean for AI agents?

Q3Before switching to a newly released 'state of the art' model, what should you do?

← Capstone

← Back to Overview