Solving Tool-Dependency Failures with Phase-Based Missions
Most agent writeups focus on prompts. This one is about orchestration.
In Goose, we had a recurring failure mode in scheduled missions: the model would issue multiple tool calls in one response, including calls whose arguments depended on results that did not exist yet.
For large model tiers with stronger tool discipline this was less visible. For 14B models, it was a consistent production issue.
Context and constraints
The product need was straightforward: run scheduled, multi-step missions that gather data, compose content, and then execute actions safely.
The operating constraints were harder:
- 14B models (including
qwen3:14bandqwen2.5:14b) tend to batch tool calls in a single turn. - Several mission flows required hard dependencies across steps.
- Some phases needed dangerous action tools, while others should never have access to them.
- We needed a fix in scheduler architecture, not another fragile prompt patch.
This made the core requirement architectural: preserve step order and data dependency boundaries even when model behaviour is imperfect.
The problem in system terms
The old mission loop treated a mission as one continuous task. That allowed the model to generate all planned tool calls at once.
If tool call 5 depended on output from tool call 2, both were still composed simultaneously. The result was predictable: placeholder arguments, guessed values, or literal instruction text being posted instead of real outputs.
At that point, this was no longer a prompt quality issue. It was a control-flow issue in the runtime.
My role as Solutions Architect
My job was to translate this into explicit execution boundaries and safer defaults:
- enforce dependency order across mission steps
- scope capability access by phase, not by whole mission
- keep existing non-phase missions backwards compatible
- reduce operational risk without adding cross-plugin coupling
That meant redesigning mission execution around deterministic scheduler stages.
The architecture decision: phase-based missions
I introduced a phase model where each mission defines sequential phases, and each phase runs as its own LLM call.
At runtime:
- The scheduler runs phase 1 and captures its text response.
- If configured, the scheduler injects that response into phase 2 as explicit context.
- This repeats until all phases complete.
Each phase can now carry its own control policy:
injectPreviousResultto chain phase outputsnoToolsfor text-only phases (for example composition)allowDangerousscoped to only the execute phasemaxIterationstuned per phase to prevent drift or loops
This pattern became the new mission boundary primitive.
Why this solved the failure mode
The design works because each phase creates a hard LLM boundary.
- Gather phase: static tool arguments, batching is acceptable.
- Compose phase: tools disabled, model focuses on content synthesis only.
- Execute phase: action arguments come from injected prior output, not imagined future state.
In other words, dependency management moved out of prompt hope and into scheduler control.
Operational outcome
Phase-based execution shipped in v1.2.0 and was production-tested across multiple missions, including:
twitter-marketing(gather -> compose -> post)competitor-research(search -> analyse -> save)self-improvement(search -> read-source -> reflect -> save -> create-task)
Observed impact:
- dependent tool-call chains became stable and repeatable
- composition quality improved with
noTools: true - dangerous tool access became narrower and easier to reason about
- legacy tactical workarounds (two-mission splits and cross-plugin coupling) were removed
Tradeoffs and what I would evolve next
Phase-based missions solved the dependency problem, but they also made one tradeoff explicit: more deterministic orchestration means more scheduler responsibility.
The next architecture step is direct-execution missions for workflows that do not need an LLM at all (for example single-tool operational jobs like backups). That removes model latency and hallucination risk entirely for those paths.
The broader lesson is simple: in agent systems, reliability comes from explicit control-flow boundaries, not increasingly strict prompt wording.