Why Matt Pocock Is Right About Making Codebases AI Agents Love
05 Mar 2026Matt Pocock is once again on the money in his video “How To Make Codebases AI Agents Love”.
Here's why I agree with him.
Read More →Matt Pocock is once again on the money in his video “How To Make Codebases AI Agents Love”.
Here's why I agree with him.
Read More →AI coding agents can generate code faster than you can reason about it.
There is one pattern that works for both new and legacy codebases because you can adopt it as a non-breaking change.

Every function has exactly two inputs: data (args) and capabilities (deps).
Without a clear constraint, generated code becomes the wild west, you can't reliably reason about what a function depends on, what it does, or how to compose it.
fn(args, deps) is that constraint
For existing code, start with:
functionName(args, deps = defaultDeps)
Dependencies are visible, not implicit.
There’s no framework and no package to install.
Read More →You have an Awaitly workflow: a few steps, some dependencies, typed results. It works. When someone asks "what does this do?" or you need to debug a run, you're left tracing through code.
What if you could see the same workflow as a diagram? awaitly-visualizer plugs into your workflow's events and turns them into that picture. For a checkout that runs fetchCart, validateCart, processPayment, then completeOrder, you get output like this:
┌── checkout ────┐
│ ✓ fetchCart │
│ ✓ validateCart│
│ ✓ processPayment
│ ✓ completeOrder
│ Completed │
└────────────────┘
Same idea as Mermaid flowcharts: steps, order, success and failure. This post walks through adding it step by step. All of the code below lives in a test in the repo so you can run it yourself.
Read More →As of today, Opus 4.5 is the best coding model I've used. That is not praise by vibes. That is, after building libraries and utilities that fixed problems I could not solve with the tools I was using before.
The progress is impressive.
However, it’s not all sunshine and rainbows, as people on social media and YouTube claim.


We've all written this code:
const lambdaHandler = async () => {
try {
const db = await connectToDb();
const result = await errorHandler({ taskId, error }, { db });
return { statusCode: 200, body: { message: 'Success', task: result } };
} catch (error) {
return { statusCode: 500, body: { message: 'Error' } };
}
}
That catch (error) swallows everything. Was it a "task not found"? A database connection failure? A permissions issue? Who knows.
Throwing exceptions for expected failures is like using GOTO. You lose the thread.
Awaitly fixes this by treating errors as data, not explosions. This guide teaches the patterns one concept at a time.
Read More →The OneUptime team is spot on in their Instrument Message Queues with OpenTelemetry post.
Inject trace context on the producer, extract on the consumer; use PRODUCER and CONSUMER span kinds; set semantic conventions (messaging.system, messaging.destination.name, messaging.operation, Kafka partition/offset/consumer group).
They show the raw OpenTelemetry code. It's comprehensive. It's also verbose. Every team ends up re-implementing the same patterns: inject, extract, span kinds, semantic attributes, error handling.
We've all been there: copying "best practice" code from blog posts and adapting it for our broker.
Their key insight:
For batch processing, use a batch span with links or child spans to contributing traces.
But there's still a gap...
Read More →The Signadot team is spot on in their Testing Event-Driven Architectures with OpenTelemetry post.
Message isolation using a shared queue: propagate tenant ID in Kafka message headers; consumers use tenant ID for selective message consumption.
They make the case that infrastructure duplication is expensive. Instead of separate Kafka clusters per environment, use tenant ID filtering on a shared queue. Instrument producers and consumers for context propagation.
We've all been there: maintaining four "identical" Kafka setups that slowly drift apart.
Their key insight:
Requires modifying consumers and using OpenTelemetry for context propagation.
But there's still a gap...
Read More →The CNCF team is spot on in their Testing Asynchronous Workflows using OpenTelemetry and Istio post.
Request-level isolation is the most cost-effective approach.
They make the case against duplicating infrastructure for testing. Instead of spinning up separate Kafka clusters per tenant, use OpenTelemetry Baggage to propagate tenant ID through async flows. Consumers filter by tenant ID. Istio handles routing.
We've all been there: every team has their own "staging Kafka" and costs balloon.
Their key insight:
Use OpenTelemetry Baggage to propagate tenant ID through sync and async. When publishing to Kafka, producers inject trace context (including baggage) into message headers; consumers extract and make routing decisions.
But there's still a gap...
Read More →The OSO team is spot on in their End-to-End Tracing in Event Driven Architectures post.
Traces break at queues unless you extract context from message headers and put it in the appropriate context.
They walk through the real pain: stateful processing loses trace context in caches, Kafka Connect can only do batch-level tracing, and every team ends up writing custom interceptors and state store wrappers.
We've all been there.
Their key insight:
In Kafka Streams and Kafka Connect this often means manual work: interceptors, state stores, batch spans, or extending tracing logic to extract from headers.
But there's still a gap...
Read More →Boris is spot on in his Logging Sucks post
logs are optimised for writing, not querying
He explains why debugging in production feels like archaeology.
You grep for user-123, find it logged 47 different ways, then spend an hour correlating timestamps across services.
We've all been there.
His wide event example nails it:
{
"user": {"id": "user_456", "subscription": "premium", "lifetime_value_cents": 284700},
"cart": {"item_count": 3, "total_cents": 15999, "coupon_applied": "SAVE20"},
"payment": {"method": "card", "provider": "stripe", "latency_ms": 1089},
"error": {"type": "PaymentError", "code": "card_declined", "stripe_decline_code": "insufficient_funds"}
}
One event. High-cardinality keys (user.id, traceId). High dimensionality. Queryable.
But there’s still a gap…
Read More →Reranking improves search relevance by reordering documents based on their relevance to a query. Unlike embedding-based similarity search, reranking models are specifically trained to understand the relationship between queries and documents, often producing more accurate relevance scores.

I wrote previously about why AI-generated code needs rules, not rituals.
AI coding agents aren't going anywhere. They're excellent at exploring ideas, generating boilerplate, and moving fast. But speed without reliability just ships bugs faster. And without constraints, AI-generated code is unreliable by default.
Prompting is a ritual. Linting is a rule.
Here is the proof.

Coding agents are here to stay, and I know I’m absolutely right about that. While we're all getting used to workflows using AI-powered coding agents, we now live in the world of dark arts and rituals.
We spend hours tweaking prompts, creating elaborate Claude.md, Agents.md and other files formatted in a particular way, stored in a particular way essentially performing black magic to hope our LLM agent adheres to our team's best practices and coding patterns.
On a good day, it works. On others, the behaviour is random.
Here's the problem: Random is not good enough for production.
We're trying to force a non-deterministic, generative tool to be a deterministic rule-follower. This is the wrong approach.
Instead, we should let AI do what it does best: be creative and generative, helping us achieve and realise our desired outcomes while following instructions and examples for how, what, and where it should generate.
My advice? Stop relying on hope-driven prompting. Start using linters to guarantee your standards.

For most of us, AI still feels like a black box. We send it a prompt and we get back a blob of text. Maybe we write some code to call a tool; maybe we juggle a few callbacks. We tell ourselves that this is just how things work: a model can only generate tokens, and tools can only run in our code.
But what if this mental model is the problem?
In this post I want to argue that the Agent pattern in the AI SDK is as revolutionary for AI development as useState and useEffect were for React. Just like React's client/server directives annotate where code runs across the network, the Agent API annotates where logic runs across the AI/model boundary.

Having recently built an AI Guardrails library for the AI SDK, I wanted to share what I learned along the way. This post will walk you through how you can write your own middleware, and why it's such a game-changer for building robust AI applications.
Design AI features that are safer, faster, and easier to evolve by layering language model middleware. This guide explains how to use AI SDK middleware to transform inputs, post-process outputs, enforce safety rules, cache results, observe performance, and handle streaming using a clean, composable approach aligned with official guidance.
Read More →This is the second post in a series of posts about building reliable AI agents.
When building AI agents, where do your prompts live? If they're hidden inside frameworks or scattered across configuration files, you're missing a fundamental principle of maintainable AI systems: treating prompts as first-class code citizens.
Read More →
⚽️ Dropped by then and current manager Sarina Wiegman from the England squad in 2022 amid concerns over her attitude and conduct, she remained outside the national setup for almost two years.
⚽️ Born with strabismus and challenged by depth‑perception issues, she was told by doctors she should not play football.
⚽️ Rebuilt her career at Chelsea, playing an integral role in winning the treble last season.
⚽️ Chosen as England's first-choice goalkeeper ahead of the great and brilliant Mary Earps.
⚽️ Delivered under pressure in the quarter‑final versus Sweden, even with a bloody nose. Named Player of the Match.
Tonight, she starts in a European final for England.
Even when things don’t go your way, never give up.
Read More →This is the first post in a series of posts about building reliable AI agents.
Unlock reliable, testable AI agents by treating your LLM as a parser, not an executor. Learn how converting natural language into structured tool calls leads to predictable, scalable systems.
Read More →After building production AI systems over the past few years, thanks to HumanLayer, I’ve learned that most agent failures aren’t about the LLM, they’re about architecture.
That’s why I’m creating a series of posts sharing the 12-Factor Agents methodology using Mastra.
In each part, I’ll break down one principle that transforms fragile prototypes into robust, production-ready AI agents.
Read More →Last month, I had the fantastic opportunity to run a hands-on Model Context Protocol (MCP) workshop. It was a chance to mentor and coach colleagues, from engineers and testers to product owners, as we explored what MCP is, how it works, and why it is generating so much momentum across AI and developer communities.
My goal was not just to share knowledge. It was to guide the learning journey and build a shared, foundational understanding of this powerful emerging standard.
We covered everything from core protocol basics and prompt crafting to real world patterns such as:
Designing distributed systems has never been more challenging. As teams embrace microservices and event‑driven architectures, a persistent myth has arisen: commands equal orchestration, and events equal choreography. This tidy equivalence often becomes a mental shortcut but conceals the deeper truth of control‑flow patterns versus messaging semantics. As many practitioners have observed, collapsing these separate dimensions can restrict your system's flexibility and resilience.

In this article, we'll demystify these concepts and show how to apply them independently. You'll discover how separating semantics (commands vs. events) from control flow (orchestration vs. choreography) grants you greater architectural freedom and clearer, more maintainable workflows.
Read More →Last month, our company hackathon became a vivid illustration of a broader shift in how software gets built. Across teams, from sales to support, product to engineering, domain experts huddled around laptops, experimenting with AI powered platforms such as V0, Lovable and Bolt. Within hours, they'd fashioned interactive prototypes complete with navigation flows and validation rules, all without writing a single line of traditional code.
Although these early demos relied on mock data, the fact that non-engineers could conjure usable software unaided was striking.
As a full-stack engineer accustomed to crafting CRUD apps from the ground up, I found myself asking a new question: How might we empower these citizen developers to build more often, more securely, and with live data? Their deep problem domain knowledge meant they moved swiftly, iterated boldly and learned faster than any handoff-laden process could permit.
An isometric illustration of a futuristic highway under construction, where software engineers in hard hats are laying down glowing code-shaped road segments.
Over my two decades in software, I've discovered my highest leverage isn't in writing every screen or endpoint myself, but in architecting robust APIs, infrastructure and tooling so that others can deliver user value. In this two-part series, I'll share how engineers can transition from gatekeeping code to enabling creation at scale. In this first instalment, we'll explore the mindset shifts and guiding principles. Part 2 will dive into concrete patterns and architectural strategies for secure, sustainable enablement.
Read More →Two users withdraw money from the same account at exactly the same moment. Your system processes both requests, your database balance goes negative, and you wake up to an incident report. Sound familiar?
What if you could combine Redis's millisecond response times with PostgreSQL's bulletproof consistency? This dual-layer locking pattern does exactly that, giving you both speed and safety.

In this post, we'll explore a pattern that combines Redis's speed with PostgreSQL's reliability to prevent race conditions at scale.
Read More →In my career, I have realised that the best companies celebrate each success in a genuine, informal way. They avoid contrived rituals that can feel like an artificial façade.
In these organisations, individuals willingly help colleagues reach shared goals.
Take Janet from my local Parkrun, for example.

Although she does not compete, she choose to find the time to cheer on each participant to achieve their goals... even on cold Saturday mornings.
Read More →MCP standardises how AI applications communicate with external systems using JSON‑RPC 2.0 over stateful, bidirectional connections. Unlike standard JSON‑RPC, which treats every request independently, MCP augments the protocol by embedding session tokens and context IDs into each message.
This enhancement provides state, enabling advanced, multi‑step interactions and dynamic module loading at runtime. Such a stateful design is essential for modern, agile AI systems.
Imagine an AI‑powered payments system that dynamically integrates a new fraud detection module during operation. MCP allows the system to load this module on the fly without a full redeployment, provided the server supports dynamic module loading.
In this post, we'll explore the MCP protocol, its core components, and how it compares to traditional REST and GraphQL APIs.
Read More →
AI applications are shifting from monolithic large language models to modular, multi-agent systems—a transformation that enhances performance, flexibility, and maintainability.
In my talk about AI Agents last September, I said AI agents would become increasingly popular. Today, we see this shift happening across industries. By breaking down complex tasks into specialised components, engineers can design smarter, more scalable AI workflows.
Analogy: Think of multi-agent systems like a well-coordinated orchestra. Each musician (agent) has a specific role, and together, they create a harmonious performance. In software, this means dividing complex problems into manageable, specialised parts that work in concert.
In this guide, we'll explore four key multi-agent patterns, using travel booking as an example. You'll learn how to choose the right pattern for your application, implementation strategies, and error-handling techniques to build robust multi-agent AI systems.
Read More →Agentic systems, where multiple AI agents collaborate through decision-making and handoffs, shine in specific scenarios but add operational complexity.
In this post, we'll explore the scenarios where agentic systems are most effective and the challenges you may face when using them.
Read More →Imagine: you run a busy online shop where every moment matters.
Your customers expect the fastest page loads, yet you still require dependable analytics to understand user behaviour and drive informed business decisions.
Relying solely on client-side tracking with tools like PostHog can be fraught with challenges. Many users have ad blockers or privacy extensions that prevent tracking scripts from running, and even when these scripts do execute, they can slow down page rendering and affect conversion rates.
In this post I'll go over how you can use Next.js 15’s next/after API to handle analytics and events without slowing down your site.
Read More →Getting event names right in event-driven architecture and Domain-Driven Design (DDD) is essential for clarity, consistency, and scalability. A key decision is using singular or plural terms in event names.
Here's how I approach it, with examples and reasoning to help you make the best choice.
Read More →
Building a TypeScript library that is both maintainable and optimised for modern bundlers requires careful consideration of exporting functions, types, and other constructs from your modules.
With several strategies available, from wildcard re-exports to namespaced exports, explicitly named re-exports have emerged as the clear winner. This post explores the alternatives with examples based on an order management system.
We'll discuss the benefits of exporting types, how to organise multiple entry routes (such as in /src/orders and /src/users), and review the necessary TypeScript configuration options (such as verbatimModuleSyntax) and package.json tweaks. Additionally, we'll explain how this approach helps prevent circular dependency challenges by enforcing a clear dependency graph.
I love Next.js. It's been my framework of choice since my first tiny contribution in 2017.
Yesterday, I joined a team discussion about which React framework to choose. In 2025, that decision is harder than ever, with so many excellent alternatives available.
While Next.js is recommended by React themselves, and is a very popular choice, the not-so-good parts are often left out on platforms like LinkedIn and YouTube.
I plan to write about the good things about Next.js 15 in the future, but here are some challenges I still face using it today.
Read More →It's like a new diet pill has hit the market, promising instant weight loss with no effort, and now everyone's scrambling to get their hands on it because it's all over their TikTok.
Some folks are talking about DeepSeek as if it's the second coming of AI.
The clamour might be because it's the first serious open non-US model with reasoning capabilities from China.

Most people are confused about which version of DeepSeek they're using.
Most providers offer a distilled, watered-down version that'll run anywhere, so you're not getting the actual full fat version people are talking as to really see its magic, you'd need a small fortune in computing power.
It's like being promised a rare vintage white wine, only to find the bottle filled with slightly grape-scented water. Same brand, same label, but all the depth and character stripped away, leaving you with a hollow imitation.
The hype suggests it can keep pace with anything OpenAI does, which might sound thrilling if you're already tired of whatever ChatGPT or its siblings spit out.
But here's where the alarm bells should start ringing.
Read More →Building AI products with stakeholders requires a fundamentally different approach than traditional software development.
From my experience working with AI, success depends not only on technical implementation but also on bridging the gap between AI capabilities and stakeholder expectations.
In this post, I’ll share lessons from guiding stakeholders through AI’s possibilities and limitations.
Read More →Defining clear boundaries is essential to building clean, scalable, and reliable architectures.
In my experience, organisational demands often override the focus on boundaries and domain-driven design. This reflects the tension between following technical best practices and delivering business outcomes quickly. While theoretical approaches are widely discussed at conferences, in books, and in videos, the practical implementation of these ideas is often shaped by cultural dynamics, resource constraints, tight deadlines, and internal politics.
In this post, we'll explore the challenges of maintaining boundaries and potential solutions, using a payment system as an example. This system facilitates transactions between clients and payment providers, such as PayPal or Stripe, highlighting the distribution of responsibilities within its architecture.
Read More →The goal of creating something "predictable," reliable, and consistent is a shared principle across all the teams I've worked with throughout my career.
Knowing that the same code would always return the same output when given the same inputs was the foundation of everything we built.
We aimed for no surprises, no matter how complex a workflow might be. Whether implicitly or explicitly using finite state machines, this determinism enabled us to build testable, monitorable, maintainable, and, most importantly, predictable workflows.
We read and shared ideas at conferences, promoting patterns and principles like SOLID and DRY to create functional, composable, and extensible software.
Read More →Having lived through the era of a "new JavaScript framework every week," we now find ourselves in the gold rush of the AI agent framework space.
New frameworks appear daily, each claiming to be the 'ultimate' solution for building AI agents, often backed by YouTubers enthusiastically promoting demoware and usually their own library, framework, or SaaS offering. Unfortunately, this enthusiasm can lead companies to uncritically adopt these tools without considering the long-term implications.

It's a well-known adage that naming things is hard.
In event-driven architectures, a consistent naming convention is essential for scalability, communication, collaboration and maintainability.
At the moment, I'm architecting a new event-driven solution, and I planning to adopt a Structured Naming approach for event types.
In this post I'll share my thoughts on this approach and its benefits.
Read More →This is my AWS Step Functions Mapping and Patterns and cookbook covering data manipulation, concurrency patterns, error handling, and advanced workflows.
Read More →As I stood in the rain as a volunteer race marshal at my local park run, it occurred to me that I wanted the same thing when running applications in production, and that's absolutely nothing to happen.
The last thing I wanted to do was be a hero.

🎸 "Always keep on the right side of the path" – the Parkrun version of Monty Python's song.
I'd rather everyone enjoy a safe, smooth race where I can cheer and encourage people on as they pass, reminding them to keep to the right so they don't collide with runners coming the other way.
It's the same in the land of IT. The only thing I want to see when viewing Grafana dashboards is a sea of green, 200 status codes and steady traffic patterns.
Read More →Saturday's Oasis ticket sales left thousands of fans disheartened after spending hours in virtual queues. As reported by the BBC and echoed by fans on X (formerly Twitter) and Reddit, the experience could be aptly described using a quote from Liam Gallagher himself:
Shite
— Liam Gallagher (@liamgallagher) April 20, 2023
Like many others, I have long been frustrated with platforms like Ticketmaster, See Tickets, and Gigs and Tours. This weekend, their reputations took another hit as fans faced inflated prices and technical glitches.
This made me think if I could design a better, more reliable ticketing system that prioritises fairness and user experience.
Let's not make Sally wait any longer than she has to.
Read More →I've been fortunate to work with some incredible leaders throughout my career.
These individuals have inspired and challenged their teams and driven them to achieve their full potential.
What truly sets a leader apart, however, is their ability to motivate and their willingness to fight for their team.
There's something special about a leader who fights for their team.
In this post, I'll explain what happens when a leader truly stands up for their team, sharing my firsthand experiences from a transformative project at Cambridge Assessment in 2007.

As an England football fan, I know the feeling all too well. The heartbreak of watching your team come so close, only to fall short at the final hurdle once again. It's now 58 years of hurt.
It's a mixture of emotions – frustration, disappointment, and knowing the team could have done more.
I've experienced a similar feeling in my work as a consultant.
As the English Football League nears the climax of another demanding season, leaders are confronted with the challenge of motivating weary players and handling the intense mental and physical pressures of chasing titles, securing promotions, or avoiding relegation.
As a Barcelona supporter, it might come as a surprise that I using Jose Mourinho, a figure often mired in controversy, who exemplified empathetic leadership during his reign at Inter Milan, leading to their historic treble win.

Inspect and adapt loops are the heart of agile development. They enable continuous learning, improvement, and evolution.

In this post, I discuss the limitations of the "fail fast" approach and propose a more impactful alternative: "learn Fast". This mindset embraces the inevitability of failure in innovation and transforms every challenge into a learning opportunity.
Read More →Structured logging focuses on capturing data in a consistent and machine-readable format.
Unlike traditional text-based logs, structured logs are more straightforward to query and analyse, making extracting insights and debugging issues simpler.

In this post, we'll take a look at an example of how structured logging with Pino.
Read More →This post celebrates and thanks Joe Parry as he takes a well-deserved bow and steps down from his role as the head of the JavaScript & NodeJS Cambridge Meetup Group.
He has been an extraordinary leader who has dedicated the past ten years to fostering a vibrant community of Javascriptors in Cambridge.

Companies must be agile and respond quickly to changing customer needs in today's fast-paced and constantly evolving technology landscape. That's why DevOps practices that emphasise collaboration and communication between development and operations teams to deliver software rapidly, reliably, and at scale have become increasingly popular.
Shifting left, a core principle of DevOps can significantly benefit companies of all sizes. By empowering engineers to take on more operations responsibilities and promoting a culture of experimentation and innovation, companies can improve collaboration, increase reliability, and deliver high-quality software at scale.
In this post, I'll discuss how and why Cambridge University Press adopted a shift left culture.
Pino is a popular and fast Node.js logging library that is designed for high-performance and low-overhead logging. It has many useful features, including support for structured logging, log levels, and log redaction.
Pino logging redaction allows you easily redact sensitive information logs, ensuring applications remain secure and compliant with regulations.

In this post, we'll take a closer look at Pino logging redaction functionality, what it is, and how it can be used with examples.
Read More →Logging is an essential part of any application, providing insight into the what's happening behind the scenes. However, as your codebase grows, it can be challenging to keep track of all the different log statements and where they're coming from. This is where child loggers come in.
Child loggers are a feature of many of the Node.js logging libraries such as Pino, Bunyan and winston have that allow you to create a new logger that inherits the configuration of its parent logger.
This means you can create child loggers that are pre-configured with specific options, making it easier to log messages without repeating the same configuration over and over again.
In this blog post, we'll take a look at an example of how child loggers can help cut down on repetition in TypeScript and Pino.
Read More →