The Mischievous Nerd's Guide to becoming a Master of Programming: Well Structured AI Development Practises

We've all tried AI by now.

Some promote a build as quickly as you can approach, while others look at me in horror, as if using AI extensively melts your brain (it does).

The question is whether you're getting consistent, trustworthy output (the kind you'd stake a production release on) or are confident-looking code that half-works and then breaks something at 4:59pm on a Friday?

There's a gap between teams getting serious results with AI and teams getting occasional lucky ones. The gap isn't which tools they've licensed. It's whether anyone has sat down and thought carefully about how those tools fit together, and made it someone's job to keep that thinking current.

Someone has to own this

Every org needs at least one person (a human person) who goes deep on AI-native development. Not "I've skimmed the documentation" but deep enough to write guidance that stops every other engineer from reinventing the wheel, to maintain the high level context files that teach our agents how the organisation actually works, and to have documented guidance and a considered answer when a new tool appears and everyone looks around waiting for someone else to evaluate it.

Without this person, we get well-intentioned chaos. Everyone experimenting independently. Nobody sharing what works. The wheel reinvented in every corner of the codebase. Coding agents that behave completely differently across repos because nobody's maintaining the context files and they've drifted into contradiction.

One responsible person, the go-to guy, fixes most of that.

Context is the actual product

Prompt engineering is what you do for a single conversation. Context engineering is the discipline of building the right information environment for every conversation. It's the difference between briefing a contractor well and just handing them a laptop and pointing at the codebase.

Imagine hiring a contractor who's genuinely skilled. Worked with a hundred teams, knows the patterns, no ramp-up required. But they've walked through the door cold. They don't know you're on Python 3.14. They don't know the auth layer is a shared service. They don't know you spent six months migrating away from the ORM they're about to suggest. That contractor will make expensive mistakes in the first week, not because they're bad at their job, but because they don't know your job.

Context files are the briefing.

Organised in a hierarchy: org-level at the top (your stack, your architecture principles, the things that don't change project to project), project-level below that, feature or component level below that for focused work. The AI lead owns the top. Everyone else contributes to the levels below.

Treat context as code. Version control it. Put it through pull requests. Iterate on it. Run a retro at the end of each sprint and ask: did the coding agents do anything surprising or wrong? Consider if the answer points to a gap or an error in a context file.

Fix it. Ship the change. Check again next sprint.

A fast workflow for big changes

The default way to work is: open chat window, describe the problem, accept or reject what comes back. That works for small, isolated tasks. See Managing risk at the bottom of the blog.

The following idea comes from https://tylerburleigh.com/blog/2026/02/22/. I don't have evidence that it works perfectly - just sharing what I've found. It's a staged approach with review cycles between stages. Errors caught at planning stage cost almost nothing. Errors caught after they've been faithfully implemented across multiple stages cost quite a lot more. So you front-load the thinking.

Research first. Before any code is written, the coding agent researches the codebase and / or the web, and writes its findings to RESEARCH.md. You read it. You correct any misunderstandings. Only when that's solid do you move on. Start a fresh session. Model performance degrades noticeably once you've used around 40% of its context capacity. Fresh sessions keep it sharp.

Then plan. The agent reads RESEARCH.md and produces PLAN.md: a breakdown of the work into stages, plus CHECKLIST.md. A separate session then reviews the plan from the perspective of a senior engineer. You read that review. Accept or override its recommendations. Revise until it's ready.

This is the highest-leverage point in the entire process. A bad assumption that survives planning gets implemented faithfully, and you'll discover it somewhere around stage 4, trying to explain why the feature is broken in a way that requires unwinding three phases to fix.

Then implement, one phase at a time. Each phase in its own session. Committed when clean. Reviewed before moving on. Use different models for implementation and review. A model that just wrote the code is not ideal to find problems with it. It'll tell you it's fine.

It's not always fine.

LSP plugins give the coding agent live type information from the IDE, so it catches type errors and missing imports without waiting for the compiler.

Consider Git worktrees if you're running parallel features: separate agents, separate branches, separate directories, no collision, merge as normal.

When it's running well, consider automating the middle

Once this is working reliably you can remove human involvement from the implementation loop. Not from the edges, but only the middle.

The human still approves the plan before anything is built. The human still reviews the PR at the end. What changes is what happens between those two points: an implementer agent executes against PLAN.md, automated gates run (tests, lint, type checks), a reviewer agent evaluates against the plan, and they iterate until clean. No human required in that loop.

The PR is self-documenting. RESEARCH.md, PLAN.md, and the commit history are the description. The engineer can see exactly what the agent understood, what it planned, and what it did.

Nothing's a black box, because "the AI did it" (which is as tempting to say as eating a warm, golden, honey-glazed doughnut oozing with rich, velvety Marmite custard) is not an acceptable answer in a code review. Not yet, anyway.

Managing risk

Not every change carries the same risk, and it helps to think about this on two dimensions: likelihood of something going wrong, and impact if it does. For likelihood, ask how complex the change is, how well-tested the affected code paths are, and how reversible the outcome is. For impact, ask how many users or services are affected and whether you can recover quickly if something breaks. A rough mental multiplication of the two tells you how much caution is warranted.

When I work in core services in financial applications, I micromanage Claude. What do you do?

The Mischievous Nerd's Guide to becoming a Master of Programming

Monday, 15 June 2026

Well Structured AI Development Practises

Someone has to own this

Context is the actual product

A fast workflow for big changes

When it's running well, consider automating the middle

Managing risk

No comments:

Post a Comment

Well Structured AI Development Practises

Report Abuse