The fundamental shift replaces traditional sequential planning with dynamic feedback loops. The Manager creates high-level tasks, the Executor takes one concrete action toward the first task, then the Manager immediately reassesses.
Manager creates a high-level task
Executor takes one concrete action
System observes environment changes
Manager reassesses and replans
Key innovations that enable state-of-the-art performance on AndroidWorld.
Routes text-intensive tasks to a dedicated agent with Python shell access. Receives accessibility trees plus current text context and can atomically clear and replace content.
Enhanced through device date injection, 0.5-second screen stabilization waits, disabled pointer visualization, differential state tracking, and automatic app capability extraction.
Executors output three components — thought process, chosen action, and description — all injected into Manager context for full decision rationale.
Guidance scattered throughout system prompts with repeated context in multiple sections for consistent availability. Injections into both system prompt and final user message.
Eight action primitives covering the full interaction surface of mobile devices, from simple taps to complex clipboard operations.
Iterative refinement across system prompts through strategic distribution rather than concentration of instructions. Model-specific optimization patterns.
Eight operations covering the full interaction surface of mobile devices.
Tight feedback loops, task-specific routing, rich state observability, and dynamic replanning outperform rigid plan-then-execute models for mobile UI automation.
see the results.
View the full benchmark results with 91.4% success rate across 116 AndroidWorld tasks.