Monday, 15 June 2026

Well Structured AI Development Practises

How's AI working out for your team?

Not "what's the vision." Actually. In practice. Are you getting consistent, trustworthy output (the kind you'd stake a production release on) or are you mostly getting confident-looking code that half-works and then quietly breaks something at 4pm on a Friday?

There's a gap between teams getting serious results with AI and teams getting occasional lucky ones. The gap isn't which tools they've licensed. It's whether anyone has sat down and thought carefully about how those tools fit together, and made it someone's job to keep that thinking current.

Someone has to own this


Every org needs at least one person (a human person) who goes genuinely deep on AI-native development. Not "I've skimmed the documentation" deep. Deep enough to write guidance that stops every other engineer from reinventing the wheel, to maintain the high level context files that teach your agents how your organisation actually works, and to have documented guidance and a considered answer when a new tool appears and everyone looks around waiting for someone else to evaluate it.

Without this person, you get well-intentioned chaos. Everyone experimenting independently. Nobody sharing what works. The wheel reinvented in every corner of the codebase. Coding agents that behave completely differently across repos because nobody's maintaining the context files and they've drifted into contradiction.

One responsible person, the go-to guy, fixes most of that. It's one of those solutions that's almost embarrassingly simple.

Context is the actual product

Prompt engineering is what you do for a single conversation. Context engineering is the discipline of building the right information environment for every conversation. It's the difference between briefing a contractor well and just handing them a laptop and pointing at the codebase.

Imagine hiring a contractor who's genuinely skilled. Worked with a hundred teams, knows the patterns, no ramp-up required. But they've walked through the door cold. They don't know you're on Python 3.14. They don't know the auth layer is a shared service. They don't know you spent six months migrating away from the ORM they're about to suggest. That contractor will make expensive mistakes in the first week, not because they're bad at their job, but because they don't know your job.

Context files are the briefing.

Organised in a hierarchy: org-level at the top (your stack, your architecture principles, the things that don't change project to project), project-level below that, feature or component level below that for focused work. The AI lead owns the top. Everyone else contributes to the levels below.

Treat context as code. Version control it. Put it through pull requests. Iterate on it. Run a retro at the end of each sprint and ask: did the coding agents do anything surprising or wrong? Consider if the answer points to a gap or an error in a context file.

Fix it. Ship the change. Check again next sprint.

A fast workflow for big changes

The default way to work is: open chat window, describe the problem, accept or reject what comes back. That works for small, isolated tasks. See Managing risk at the bottom of the blog.

The following idea comes from https://tylerburleigh.com/blog/2026/02/22/. I don't have evidence that it works perfectly - just sharing what I've found. It's a staged approach with review cycles between stages. Errors caught at planning stage cost almost nothing. Errors caught after they've been faithfully implemented across multiple stages cost quite a lot more. So you front-load the thinking.

Research first.
Before any code is written, the coding agent researches the codebase and / or the web, and writes its findings to RESEARCH.md. You read it. You correct any misunderstandings. Only when that's solid do you move on. Start a fresh session. Model performance degrades noticeably once you've used around 40% of its context capacity. Fresh sessions keep it sharp.

Then plan. The agent reads RESEARCH.md and produces PLAN.md: a breakdown of the work into stages, plus a checklist. A separate session then reviews the plan from the perspective of a senior engineer. You read that review. Accept or override its recommendations. Revise until it's ready.

This is the highest-leverage point in the entire process. A bad assumption that survives planning gets implemented faithfully, and you'll discover it somewhere around stage 4, trying to explain why the feature is broken in a way that requires unwinding three phases to fix.

Then implement, one phase at a time. Each phase in its own session. Committed when clean. Reviewed before moving on. Use different models for implementation and review. A model that just wrote the code is not ideal to find problems with it. It'll tell you it's fine.

It's not always fine.

LSP plugins give the coding agent live type information from the IDE, so it catches type errors and missing imports without waiting for the compiler. 

Consider Git worktrees if you're running parallel features: separate agents, separate branches, separate directories, no collision, merge as normal.

When it's running well, consider automating the middle


Once this is working reliably you can remove human involvement from the implementation loop. Not from the edges, but only the middle.

The human still approves the plan before anything is built. The human still reviews the PR at the end. What changes is what happens between those two points: an implementer agent executes against PLAN.md, automated gates run (tests, lint, type checks), a reviewer agent evaluates against the plan, and they iterate until clean. No human required in that loop.

The PR is self-documenting. RESEARCH.md, PLAN.md, and the commit history are the description. The engineer can see exactly what the agent understood, what it planned, and what it did.

Nothing's a black box, because "the AI did it" (which is as tempting to say as eating a warm, golden, honey-glazed doughnut oozing with rich, velvety Marmite custard) is not an acceptable answer in a code review. Not yet, anyway.

Managing risk


Not every change carries the same risk, and it helps to think about this on two dimensions: likelihood of something going wrong, and impact if it does. For likelihood, ask how complex the change is, how well-tested the affected code paths are, and how reversible the outcome is. For impact, ask how many users or services are affected and whether you can recover quickly if something breaks. A rough mental multiplication of the two tells you how much caution is warranted.

When I work in core services in financial applications, I micromanage Claude. What do you do?


Wednesday, 20 May 2026

TDD, or should Claude just write everything?

Let's start with a question you've been avoiding: do you actually do TDD, or do you nod along at the conference talk, then go home and write the implementation first like everyone else? Be honest. The tests aren't listening. The uncomfortable truth is that TDD has always demanded a very particular flavour of masochism: sit down, write code that deliberately fails, resist every instinct to just make the thing work, and trust that the scaffolding of failure will eventually produce something beautiful. Most developers respect this in the same way they respect flossing. And now here comes Claude, ready to help. What could go wrong?

Quite a lot, as it happens, and in a rather specific way. Left to its own devices, Claude will skip the red phase entirely. It's too helpful. You ask it to write a test for a feature that doesn't exist yet, and, like an eager intern who's read ahead, it writes the test and the implementation in the same breath, hands them to you simultaneously, and waits for praise. The tests pass, obviously, because Claude just wrote both sides of the conversation. This is the AI equivalent of marking your own homework, then putting a gold star on it. SD Times put it this way: AI doesn't eliminate TDD, it exposes whether you understand it. Which is a polite way of saying it will cheerfully help you do it wrong if you let it.

The solution is that TDD and AI are actually a spectacular match, provided you hold the leash. The cognitive barrier that kills TDD in practice is that writing a test requires defining the interface, which requires understanding the architecture, which requires mentally sketching the implementation anyway; the circle eats itself before you've typed a single assertion. AI dissolves that barrier completely. Let Claude draft the test, you review it, then let Claude implement against it, one cycle at a time, not in bulk. 

In other words, use this exact prompt:

"Sit. Good boy. Now, write one failing test for the login function. Just one. Don't implement anything. Don't even think about the implementation. I can see your little cursor twitching. Stop that. Write the test, run it, show me it's red, and then sit back down. If you've written only the test and nothing else, you'll get a biscuit."

Now, some of you will have noticed a tension and are feeling very pleased with yourselves about it. If we're telling Claude to stop after every single test like an overexcited Labrador, when exactly does the *plan* happen? Surely we want Claude to think before it acts? Yes, probably. The thing is, plan mode and TDD aren't quite solving the same problem (or at least, that's how it seems to me). Plan mode is about *what you're building*. TDD is about whether you actually built it. One is a map. The other is the slightly neurotic habit of checking you're still on the road every five minutes, which feels excessive right up until the moment you aren't.

The temptation is treating a good plan as a green light to let Claude implement the whole thing in one uninterrupted sprint. My suspicion is that this works right up until it doesn't, and when it doesn't, you're several pull requests deep into something that's coherent, confident, and subtly wrong in ways that are deeply tedious to unpick. The plan tells Claude where it's going. TDD, if I'm being honest, is just the thing that keeps it from quietly lying to you about whether it got there. You probably want both: plan mode to decide what you're building, TDD to keep Claude honest while it builds it. But I'll admit I'm still working out exactly where the seams should be, and anyone who tells you they've fully figured this out is either very clever or writing a different blog.


Well Structured AI Development Practises

How's AI working out for your team? Not "what's the vision." Actually. In practice. Are you getting consistent, trustworth...