Let AI reason about the problem and write the logic. Let a code interpreter handle the math. Each does what it's best at.
AI is surprisingly good at understanding problems and mapping out solution steps. It's surprisingly bad at arithmetic. Ask it to multiply 847 × 294 and it might say 248,918... or 249,218... or something else that looks plausible but is wrong.
Program of Thoughts splits the work into what each is best at: AI reads the problem, reasons about the approach, and writes a program with meaningful variable names and clear step-by-step comments. Then a code interpreter executes the program perfectly — no arithmetic mistakes, no rounding errors, no "carry the one" failures.
This composition builds on:
Let Code Do It Think Step by StepProgram of Thoughts combines step-by-step reasoning (expressed as code comments and variable names) with code execution for the computation — disentangling the reasoning from the calculation.
"$1,000 at 5% for 10 years with compound interest..."
"Year 1: $1,050. Year 2: $1,102.50. Year 3: $1,157.63..."
By year 5, small rounding errors have compounded. By year 10, the answer is noticeably wrong.
AI writes: 1000 * (1 + 0.05) ** 10
Interpreter returns: $1,628.89
One expression, perfect precision. No accumulated rounding errors.
Question: "A store has a 25% off sale. You have an additional 10% member discount applied after the sale price. If the original price is $240, what do you pay?"
Notice how the variable names tell the story: original_price, after_sale, final_price. The code is the reasoning.
Question: "What is the sum of all square numbers from 1 to 100?"
Chain-of-thought would struggle here — manually squaring and summing 100 numbers is error-prone. Program of Thoughts handles it effortlessly:
No human or AI could reliably do this mentally. But expressing it as code makes it trivial.
AI and code interpreters have complementary strengths. AI understands natural language, grasps context, and knows how to approach problems. Code interpreters execute calculations perfectly, handle iteration, and never make arithmetic errors.
The key insight of Program of Thoughts is disentangling these two skills. By writing code with meaningful variable names and comments, the AI shows its reasoning just as clearly as chain-of-thought — while getting perfect computation for free.
AI reasons about the problem and writes code with meaningful variable names. A code interpreter executes it perfectly. You get the best of both: clear reasoning and flawless computation.
Program of Thoughts is closely related to Let Code Do It (PAL), which also has AI write code for execution. The difference is emphasis: Program of Thoughts focuses on the code as a reasoning trace — the comments and variable names are as important as the computation, making the AI's thinking visible and auditable.
It pairs naturally with Self-Consistency — generate multiple programs for the same problem, execute all of them, and take the majority answer. This combination can add another 2–6% accuracy on top of Program of Thoughts alone.