Skip to main content

Accelerating Multi-Agent Orchestration with Speculative Dispatching

Authors: J. Ye, S. Islam, J. Cernuda, X.-H. Sun, A. Kougkas

Date: May, 2026

Venue: 13th Greater Chicago Area Systems Research Workshop

Type: Poster

Abstract

Scientific computing workflows are increasingly being restructured around autonomous AI agents. These agents coordinate complex tasks across heterogeneous tools and infrastructure, ranging from molecular simulations spanning quantum chemistry solvers and machine learning potentials to cross-facility manufacturing experiments orchestrated by multi-agent teams. However, existing orchestration approaches either follow static execution graphs defined at design time or delegate routing to an LLM that greedily selects the next agent at each step, without considering resource availability, heterogeneous model and provider selection, and infrastructure constraints. Achieving resource-aware, dynamic, and deterministic orchestration that reasons about task decomposition, agent capabilities, and infrastructure state demands complex planning and dispatching decisions at runtime. This may introduce significant latency before any agent can begin execution. We propose a speculative dispatch mechanism to accelerate multi-agent orchestration. We are the first to apply speculative execution to the dispatch decision. Specifically, our approach speculatively dispatches agents ahead of time while the full orchestration decision is being computed. Since speculative dispatch can mispredict, a reconciliation engine reconciles the speculative and optimal dispatch plans. Unlike single-agent speculation approaches that accept or discard results entirely, our framework commits correct work, salvages partial matches, and flushes mispredictions, reducing the cost of misprediction. As workload patterns evolve over time, a learner progressively refines speculation accuracy from reconciliation outcomes, enabling the system to improve dispatch predictions. Preliminary results show speculative planner produces partially correct dispatch plans (0.50-0.78 similarity) at 4-10x lower latency than optimal planner, confirming the latency-quality gap speculative dispatch exploits.

Tags

Multi-AgentsSpeculative Dispatching

Links