How's AI working out for your team?
Not "what's the vision." Actually. In practice. Are you getting consistent, trustworthy output (the kind you'd stake a production release on) or are you mostly getting confident-looking code that half-works and then quietly breaks something in staging at 4pm on a Friday?
If it's the second thing: the model isn't the problem. The workflow is.
There's a gap between teams getting serious results with AI and teams getting occasional lucky ones. The gap isn't which tools they've licensed. It's whether anyone has sat down and thought carefully about how those tools fit together, and made it someone's job to keep that thinking current.
Someone has to own this
Every org needs at least one person (a human person) who goes genuinely deep on AI-native development. Not "I've skimmed the documentation" deep. Deep enough to write guidance that stops every other engineer from reinventing the wheel, to maintain the high level context files that teach your agents how your organisation actually works, and to have a considered answer when a new tool appears and everyone looks around waiting for someone else to evaluate it.
Without this person, you get well-intentioned chaos. Everyone experimenting independently. Nobody sharing what works. The wheel reinvented in every corner of the codebase. Agents that behave completely differently across repos because nobody's maintaining the context files and they've drifted into contradiction.
One responsible person, the go-to guy, fixes most of that. It's one of those solutions that's almost embarrassingly simple.
Context is the actual product
Here's the distinction that matters more than it sounds: prompt engineering is what you do for a single conversation. Context engineering is the discipline of building the right information environment for every conversation. Systematically, at team level. It's the difference between briefing a contractor well and just handing them a laptop and pointing at the codebase.
Imagine hiring a contractor who's genuinely skilled. Worked with a hundred teams, knows the patterns, no ramp-up required. But they've walked through the door cold. They don't know you're on Python 3.14. They don't know the auth layer is a shared service. They don't know you spent six months migrating away from the ORM they're about to suggest. That contractor will make expensive mistakes in the first week, not because they're bad at their job, but because they don't know your job.
Context files are the briefing.
Organised in a hierarchy: org-level at the top (your stack, your architecture principles, the things that don't change project to project), project-level below that, feature or component level below that for focused work. The AI lead owns the top. Everyone else contributes to the levels below.
Treat context as code. Version control it. Put it through pull requests. Iterate on it. Run a retro at the end of each sprint and ask: did the agents do anything surprising or wrong? Consider if the answer points to a gap or an error in a context file.
Fix it. Ship the change. Check again next sprint.
A fast workflow for big changes
The default way to work is: open chat window, describe the problem, accept or reject what comes back. That works for small, isolated tasks.
The following idea is a staged approach with review cycles between stages. Errors caught at planning stage cost almost nothing. Errors caught after they've been faithfully implemented across multiple stages cost quite a lot more. So you front-load the thinking.
Research first. Before any code is written, the agent researches the codebase and / or the web, and writes its findings to RESEARCH.md. You read it. You correct any misunderstandings. Only when that's solid do you move on. Start a fresh session. Model performance degrades noticeably once you've used around 40% of its context capacity. Fresh sessions keep it sharp.
Then plan. The agent reads RESEARCH.md and produces PLAN.md: a breakdown of the work into stages, plus a checklist. A separate session then reviews the plan from the perspective of a senior engineer. You read that review. Accept or override its recommendations. Revise until it's ready.
This is the highest-leverage point in the entire process. A bad assumption that survives planning gets implemented faithfully, and you'll discover it somewhere around stage 4, trying to explain why the feature is broken in a way that requires unwinding three phases to fix.
Then implement, one phase at a time. Each phase in its own session. Committed when clean. Reviewed before moving on. A different model handles review than handled implementation: a model that just wrote the code is not well placed to find problems with it. It'll tell you it's fine.
It's not always fine.
LSP plugins give the agent live type information from the IDE, so it catches type errors and missing imports without waiting for the compiler.
Consider Git worktrees if you're running parallel features: separate agents, separate branches, separate directories, no collision, merge as normal.
When it's running well, consider automating the middle
Once this is working reliably (I'd give it a full sprint before you call it working, maybe two thousand) you can remove human involvement from the implementation loop. Not from the edges. Just the middle.
The human still approves the plan before anything is built. The human still reviews the PR at the end. What changes is what happens between those two points: an implementer agent executes against PLAN.md, automated gates run (tests, lint, type checks), a reviewer agent evaluates against the plan, and they iterate until clean. No human required in that loop.
The PR is self-documenting. RESEARCH.md, PLAN.md, and the commit history are the description. The engineer can see exactly what the agent understood, what it planned, and what it did.
Nothing's a black box, because "the AI did it", which is as tempting as a a warm, golden, honey-glazed doughnut oozing with rich, velvety Marmite custard, is not an acceptable answer in a code review. Not yet, anyway.
But the above is just a hypothetical ramble, and in reality I just micromanage Claude.




No comments:
Post a Comment