Most of the productivity gains I’ve seen from coding agents had nothing to do with clever prompts. They came from shaping the environment in which the agent works.
This piece is for senior engineers, tech leads, and DevEx/platform teams who want agents to behave predictably in real-world repos; not demos or toy scripts, but the messy, multi-year systems we actually ship and maintain.
And here’s the promise:
By the end, you’ll know how to restructure a single slice of your codebase so an agent becomes reliably useful, instead of drifting or guessing.
Across the past year, using Claude Code, Cursor and Junie on large enterprise systems, one pattern has stood out. The teams getting the most from agents weren’t the ones endlessly tuning prompts. They were the ones with:
- A clear, predictable code structure
- Explicit types and schemas
- Readable naming
- Tests that express intent
- A small set of conventions that everyone actually follows
Vendors are converging on the same conclusion: structure and context matter far more than prompt gymnastics. Context engineering, not prompt engineering, is where the real leverage lives.
What I Mean By Context Engineering
Context engineering is everything that shapes what the agent can see and rely on at any point in its work.
It includes:
- How the repository is organised
- Which files are easiest for the agent to find
- How consistent your types and schemas are
- The clarity of your naming
- Your rules files (
CLAUDE.md,.cursorrules) - The examples the agent encounters first
- The constraints it hits (types, linters, tests)
Prompt engineering is what you tell the agent for this turn.
Context engineering is the environment in which the agent reasons across every turn.
It influences not just what the agent sees, but how that context evolves as it moves through a task.
If prompts are the note on a piece of paper, context engineering is the room: the tools, examples, layout and norms that make the correct behaviour obvious.
Repository ergonomics vs runtime context
Two things matter:
- Repository ergonomics:
How easy it is for an agent to discover the right patterns. This is the folder structure, naming, types and tests.
- Runtime context selection:
How much information the agent takes into the window at each step, and how that context evolves (retrieval, compaction, memory). This matters even more for long-running tasks.
Both fall under the category of “context engineering.”
One shapes the static environment.
The other shapes the live context window.

A More Formal, Anthropic-Aligned Definition
Context engineering is the discipline of deliberately curating what information an AI agent sees at each step. It’s the total set of tokens the model can access: files, types, tests, rules, memory, retrieved snippets, tool outputs and recent history.
Good context engineering:
- Selects only high-signal information
- Manages context over time in long-running or multi-step tasks
Where prompt engineering shapes what you say, context engineering shapes the environment the agent works within, and how that environment evolves.
The Forces That Actually Shape Agent Behaviour
Once you’ve used a coding agent inside a real project, you notice something fast: its behaviour is dictated less by your instructions and more by the environment it encounters.
In practice, five forces determine how predictably an agent behaves inside your repo.
1. Examples inside a coherent structure (the strongest force)
Agents pattern-match relentlessly.
If the codebase shows:
- One clear approach to error handling
- One way to fetch data
- One folder structure for features
- One naming convention for entities
…the agent will follow those patterns naturally.
But if it sees three different patterns for the same task, it doesn’t pick the “right” one.
It guesses.
This is why “codebase ergonomics” is the foundation of context engineering.
2. Types and schemas as constraints
Agents thrive when the system tells them what’s allowed.
Strong constraints: TypeScript types, PHPStan/Psalm, JSON schemas, Zod, do more than validate output. They act as guardrails, reducing the model’s degrees of freedom.
A good type or schema catches and constrains entire classes of hallucinations so they fail fast instead of shipping.
A weak or implicit one invites them.
3. Tests as executable intent
Agents don’t understand your architecture diagrams or your mental models.
They understand:
- What the tests do
- What the inputs and outputs look like
- Which behaviours are valid vs invalid
This is why test-driven workflows pair so naturally with agents: they give the model a concrete target rather than a vague hope.
4. Rules files as persistent onboarding
Short, local rules files like:
CLAUDE.md.cursorrules- Aider’s repo rules
…give the agent a stable onboarding surface.
They don’t replace patterns in the code, nothing does, but they disambiguate:
- Preferred patterns
- Deprecated patterns
- Conventions that matter
- Boundaries to respect
Think of them as the “start here first” guidance that the model reads before touching the codebase.
5. Feedback loops: compilers, linters, tests
Agents correct themselves more quickly when the environment provides instant, unambiguous feedback.
- Compiler errors
- Type failures
- Schema violations
- Failing tests
- Lint rule breaks
These provide a grounding for the agent to iterate against. This is why pairing with an agent inside a strongly typed, well-tested slice feels dramatically better than pairing inside a loosely typed, inconsistent one.
Why these forces matter more than prompts
Prompt engineering affects one turn.
These forces affect every turn.
They shape:
- What the agent sees
- What it prioritises
- What it ignores
- What it assumes
- What it can immediately rule out
- How it corrects itself when wrong
If prompts are the steering wheel, these forces are the entire road system.
And if the road system is chaotic, no steering wheel technique will save you.
A Quick Reality Check: How Much Control Do We Actually Have?
IDE-based agents decide what to read using their own heuristics: directory names, imports, dependency graphs, recent edits and files you explicitly open or mention. We often can’t control the exact tokens in their context window.
But we can shape where the gravity pulls.
Three levers give us real influence:
- Structure: A coherent folder layout and consistent naming make good patterns easy to find and bad patterns hard to stumble into. Agents follow what’s most obvious.
- Rules files: Project-level rules (
CLAUDE.md,.cursorrules) act as persistent onboarding. The agent reads them before reasoning. - Constraints: Types, schemas, static analysis and tests form a tight feedback loop that the agent learns from. Strong constraints reduce drift better than long instructions.
We aren’t choosing context manually.
We’re designing the environment so the agent’s natural heuristics work in our favour.
The Instruction Soup Loop

A failure pattern I’ve seen repeatedly, and that many agent case studies echo, is what I call the Instruction Soup Loop.
- The codebase has multiple patterns for the same task.
Example: four competing error-handling styles across the repo.
- The agent bounces between them.
- You add more rules to the prompt.
- Output improves briefly.
- Drift returns as soon as the agent touches older files.
- You write more rules.
- The problem persists.
The issue was never the instructions.
The issue was ambiguity.
Most “agent inconsistency” failures trace back to unclear boundaries, mixed patterns and legacy baggage, not bad prompts.
Treat the Agent Like a Junior Teammate
This is the most useful mental model I know.
When onboarding a junior engineer, you don’t hand them a style guide and disappear. You:
- Pair on a small feature slice
- Walk them through one golden-path example
- Explain the boundaries and the weird edge-cases
- Show where the tests live and what they assert
- Give them a few architectural notes to orient them
Over time, they internalise the patterns.
Agents benefit from the same approach. More teams are now explicitly onboarding agents:
- Provide a clean example of the exact pattern you want.
- Give types and schemas they can trust.
- Use tests as the behavioural source of truth.
- Add 5–10 clear principles in a rules file.
- Reduce surprises in the environment.
Prompts set direction.
The environment does the teaching.
Design Principles For a Teachable Codebase
Make the right thing the easy thing
One error-handling pattern.
One folder structure for features.
One data-fetching approach.
One naming convention for core entities.
Prefer clarity over cleverness
Descriptive names.
Explicit types.
Stable patterns.
Nothing that requires telepathy to understand.
Reduce ambiguity relentlessly
Every “it depends” creates a guess.
Deprecate old patterns loudly.
Document transitional states.
Hide or clearly mark legacy areas.
Let examples carry the load
One clean exemplar file teaches more than any rules document.
If you want the agent to follow a pattern, show it one.
Use constraints as communication
Types, schemas, tests and lint rules aren’t red tape; they’re how you communicate boundaries to both humans and agents.
Practical Places To Start
Add a project rules file
Use CLAUDE.md, .cursorrules, or similar. Keep it short.
Example principles:
- “Use a Result type
({ ok: true, value } | { ok: false, error })for all async operations.” - “Feature folders group logic by domain, not by layer.”
- “Legacy patterns in /old/ are not to be copied.”
Stabilise one slice of the codebase
Pick one area where an agent would be helpful. Normalise it in this order:
- Types, define the shapes clearly.
- Structure, create a coherent folder and naming pattern.
- Tests, add a few high-signal behaviours.
- Rules file, add the principles and constraints.
Then let the agent work inside this slice.
You’ll feel the difference quickly.
Treat compilers and tests as part of the loop
Feed compiler errors, lint warnings and test failures back to the agent.
This gives it concrete, grounded feedback.
Capture decisions as you go
Every time you fix something the agent struggled with, write down why.
Add it to your rules file or an ADR.
You’re building institutional memory that the agent can lean on.
What This Means For Teams
For teams embracing AI-assisted development, the core question isn’t:
“How do we write a better prompt?”
It’s:
“How do we design a system where a reasonably smart agent can’t help but do the right thing?”
This shifts the work from prompt tinkering to architecture, conventions, tooling and DevEx.
It becomes an organisational investment, not a prompt hack.
The good news is that everything that improves the environment for agents also improves it for humans: clearer architecture, tighter types, consistent patterns, readable naming and tests that show intent.
Context engineering is calm engineering with an AI teammate in mind.
Prompts still matter for framing and steering, but the main bottleneck now is the environment in which we drop agents.