Map your task into a graph of dependencies. Fire off independent steps in parallel. Cut latency dramatically by never waiting when you don't have to.
When you ask AI to do something complex, most agent patterns work through it one step at a time: think, act, observe, think, act, observe. Even patterns like ReWOO that plan ahead still execute tool calls one after another in sequence.
LLMCompiler thinks like a real compiler. It reads your request, identifies all the tasks needed, maps out which ones depend on which, and then runs everything that can happen simultaneously at the same time. Three independent searches? Fire them all at once. A comparison that needs all three results? It waits only for those, then runs immediately.
Even better, it starts executing tasks while still planning the rest. The moment the first independent task is identified, it's already running — no need to wait for the full plan to finish.
This composition builds on:
Plan-and-Execute ReWOOLLMCompiler takes Plan-and-Execute's separation of planning and execution, ReWOO's placeholder variables and batch approach, and adds true parallel execution plus streaming overlap between planning and execution.
AI generates a numbered list of tasks with explicit dependencies. Streams tasks out as they're generated — execution starts before planning finishes.
Monitors which tasks have all their dependencies satisfied. Moves ready tasks into the execution queue the moment they're unblocked.
Runs all ready tasks simultaneously. Multiple tool calls fire at once. No waiting in line when tasks are independent.
Reviews all results when execution is done. Either returns the final answer or triggers a new round of planning if more information is needed.
Question: "Compare the current weather in New York, Los Angeles, and London."
Tasks 1, 2, and 3 start running immediately as the planner streams them — no waiting for the full plan.
All three finish in ~1 second total instead of ~3 seconds sequential.
Sequential (ReAct-style)
Total: ~4 seconds
Parallel (LLMCompiler)
Total: ~1.5 seconds — nearly 3x faster
Think → act → observe → repeat. Fully sequential. Every step waits for the previous one. Most flexible, but slowest and most expensive.
Plan all at once → execute sequentially → synthesize. Saves on AI calls, but tools still run one at a time. Cost-optimized but not speed-optimized.
Plan as a dependency graph → execute independent tasks in parallel → synthesize. Saves both cost and time. Up to 3.7x faster, 6.7x cheaper than ReAct.
Most multi-step tasks contain hidden parallelism. "Compare weather in three cities" requires three independent lookups — there's no reason to do them one at a time. By expressing the plan as a dependency graph rather than a flat list, LLMCompiler discovers exactly which tasks can overlap.
The streaming trick adds another layer of speed: the planner doesn't even need to finish writing the full plan before execution begins. As soon as it emits a task with no dependencies, that task is already running. Planning and execution happen simultaneously.
Turn every request into a dependency graph. Fire independent tasks in parallel. Start executing before planning finishes. The result: dramatically faster and cheaper than sequential approaches.
LLMCompiler is a speed-and-cost-optimized evolution of Plan-and-Execute and ReWOO. Plan-and-Execute separates planning from execution but still runs steps sequentially. ReWOO reduces AI calls by using placeholders but executes tools one at a time. LLMCompiler adds the final piece: true parallel execution of independent tasks.
For tasks with lots of independent sub-components (like searching multiple sources), the speedup can be dramatic — nearly linear with the number of parallel tasks. For tasks that are inherently sequential, it gracefully falls back to sequential execution, behaving much like Plan-and-Execute.