Solving Tool-Dependency Failures with Phase-Based Missions

Most agent writeups focus on prompts. This one is about orchestration.

In Goose, we had a recurring failure mode in scheduled missions: the model would issue multiple tool calls in one response, including calls whose arguments depended on results that did not exist yet.

For large model tiers with stronger tool discipline this was less visible. For 14B models, it was a consistent production issue.

Context and constraints

The product need was straightforward: run scheduled, multi-step missions that gather data, compose content, and then execute actions safely.

The operating constraints were harder:

14B models (including qwen3:14b and qwen2.5:14b) tend to batch tool calls in a single turn.
Several mission flows required hard dependencies across steps.
Some phases needed dangerous action tools, while others should never have access to them.
We needed a fix in scheduler architecture, not another fragile prompt patch.

This made the core requirement architectural: preserve step order and data dependency boundaries even when model behaviour is imperfect.

The problem in system terms

The old mission loop treated a mission as one continuous task. That allowed the model to generate all planned tool calls at once.

If tool call 5 depended on output from tool call 2, both were still composed simultaneously. The result was predictable: placeholder arguments, guessed values, or literal instruction text being posted instead of real outputs.

At that point, this was no longer a prompt quality issue. It was a control-flow issue in the runtime.

My role as Solutions Architect

My job was to translate this into explicit execution boundaries and safer defaults:

enforce dependency order across mission steps
scope capability access by phase, not by whole mission
keep existing non-phase missions backwards compatible
reduce operational risk without adding cross-plugin coupling

That meant redesigning mission execution around deterministic scheduler stages.

The architecture decision: phase-based missions

I introduced a phase model where each mission defines sequential phases, and each phase runs as its own LLM call.

At runtime:

The scheduler runs phase 1 and captures its text response.
If configured, the scheduler injects that response into phase 2 as explicit context.
This repeats until all phases complete.

Each phase can now carry its own control policy:

injectPreviousResult to chain phase outputs
noTools for text-only phases (for example composition)
allowDangerous scoped to only the execute phase
maxIterations tuned per phase to prevent drift or loops

This pattern became the new mission boundary primitive.

Why this solved the failure mode

The design works because each phase creates a hard LLM boundary.

Gather phase: static tool arguments, batching is acceptable.
Compose phase: tools disabled, model focuses on content synthesis only.
Execute phase: action arguments come from injected prior output, not imagined future state.

In other words, dependency management moved out of prompt hope and into scheduler control.

Operational outcome

Phase-based execution shipped in v1.2.0 and was production-tested across multiple missions, including:

twitter-marketing (gather -> compose -> post)
competitor-research (search -> analyse -> save)
self-improvement (search -> read-source -> reflect -> save -> create-task)

Observed impact:

dependent tool-call chains became stable and repeatable
composition quality improved with noTools: true
dangerous tool access became narrower and easier to reason about
legacy tactical workarounds (two-mission splits and cross-plugin coupling) were removed

Tradeoffs and what I would evolve next

Phase-based missions solved the dependency problem, but they also made one tradeoff explicit: more deterministic orchestration means more scheduler responsibility.

The next architecture step is direct-execution missions for workflows that do not need an LLM at all (for example single-tool operational jobs like backups). That removes model latency and hallucination risk entirely for those paths.

The broader lesson is simple: in agent systems, reliability comes from explicit control-flow boundaries, not increasingly strict prompt wording.