AI Coding Agents Will Fail — Unless We Learn To Guide Them

M Amine · April 21, 2025

What Solution Partners should know before recommending “autonomous” dev to clients

TL;DR
Gen‑AI coding agents are force multipliers: they boost clarity or chaos, depending on what you feed them. This post maps the common failure modes, shares a lightweight framework you can implement in Confluence today, and shows how an AI‑powered add‑on can automate the heavy lifting.

The Prototype That Made Me Feel Like a Fraud

Pairing ChatGPT with a code‑generation plug‑in, I shipped a clickable demo in four hours. This used to cost two sprints, six stand‑ups, and dozens of Slack pings.

Magic… until the next morning, when I was:

Debugging undefined states
Explaining edge cases one prompt at a time
Chasing down implicit assumptions I never documented

Lesson #1 If you give sloppy requirements to a human, you lose days. Give them to a swarm of LLM agents, and you lose days at machine speed.

Where “Autonomous Dev” Breaks Down

Well‑known agents (Devin, Lovable, V0) promise self‑driving software. Reality bites on three fronts:

Failure point	What actually happens
Undefined goal	Agent optimises for the wrong outcome, then proudly ships it.
Scope creep	Every clarifying prompt shifts the target; scope expands, complexity grows.
Missing validation	“Works for me” hides regressions that explode in prod.

So what do AI coding agents need from us to succeed?

The same things junior engineers do:

✅ Clear goals

✅ Well-scoped tasks

✅ Thorough validation

And just like junior engineers, when they don’t have this context, you get code that technically “works”… but breaks the moment you scale, tweak the scope, or hit an edge case.

With my interns, I solved this manually:

After I discovered that one of my interns had spent days implementing a totally wrong functionality, I established some guardrails to prevent that in the future:

📍Daily meetings and detailed documents helped to share context
📍Encouraged interns to ask questions when they are unsure
📍Small PRs that were easy to review helped identify gaps early

From that point on, I was confident that my interns were always on the right track,

But it doesn’t scale.
You can mentor 2 interns.
You can’t mentor 200 AI agents the same way..

Embedding Context Inside Confluence (Framework)

Step 1 — Headline Test
“We’ll know this works when metric X moves from A → B.”

Step 2 — Reveal Context
Ask the agent to list information gaps; route unanswered ones to the PM/Tech Lead.

Step 3 — Failure Modes
Write three ways it could break. Prompt the LLM for two more.

Step 4 — Self‑Review
Before a PR, the agent answers: “Which assumption am I least sure about, and what test would prove it wrong?”

Partners can paste this block into any Confluence template today. One healthcare company reduced implementation time by 52% by just asking the right questions.

Automating the Heavy Lifting (Optional Add‑On)

If you’d rather not police templates manually, an AI coach inside Confluence can:

Interrogate new pages for missing goals, scope or validation checkpoints
Flag untested assumptions and suggest automated checks
Package the resolved context for engineers and coding agents—no extra Slack threads

An example is Wisary. It doesn’t write code; it creates clarity—making any agent, script, or human contributor far more predictable.

7 · Before vs. After Fixing the Thinking Layer

Before	After
POC ships fast, dies in production	POC evolves into stable prod code
Every standup identifies more new tasks than closing existing	80% of ambiguity resolved upfront reducing scope creep
Trust erodes with every rollback	Context‑driven loop → faster cycles → higher trust

Next Steps for Solution Partners

Add the 4‑step framework to a test space in your internal Confluence.
Run one upcoming ticket through it and measure clarifying‑question count.
Decide whether manual templates or an automated coach make more sense for your clients.

Curious to see a live demo of the automated approach—or want to share your own experiments? Drop a comment below. Always happy to trade notes.

Closing Thought

AI agents amplify whatever you already have. If that’s clarity, you’ll ship faster than ever. If it’s confusion, you’ll reach chaos sooner. The thinking layer is ours to fix.

About the author
Ala Stolpnik is a former Google engineering leader turned founder. My team builds Wisary, an AI‑powered Confluence app that helps product and engineering teams think clearly at scale.

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

AI Coding Agents Will Fail — Unless We Learn To Guide Them

The Prototype That Made Me Feel Like a Fraud

Where “Autonomous Dev” Breaks Down

So what do AI coding agents need from us to succeed?

With my interns, I solved this manually:

Embedding Context Inside Confluence (Framework)

Automating the Heavy Lifting (Optional Add‑On)

7 · Before vs. After Fixing the Thinking Layer

Next Steps for Solution Partners

Closing Thought

AI agents amplify whatever you already have. If that’s clarity, you’ll ship faster than ever. If it’s confusion, you’ll reach chaos sooner. The thinking layer is ours to fix.

0 comments

Comment

Was this helpful?

Thanks!

About this author

TAGS

Atlassian Community Events