Building the AI Agent Pipeline
In Part 1, I covered why we're building DIY Helper and the stack behind it. Now let's get into the piece I'm most proud of: the agent pipeline that turns "I want to build a deck" into a full project plan with building codes, step-by-step instructions, materials, cost estimates, and tutorial videos -- in under two minutes.
The Original 4-Phase Pipeline (and Why We Killed It)
The first version had four sequential phases, each making its own Claude API call:
Research -- search building codes, local codes, best practices
Design -- take research findings, produce steps + materials + tools
Sourcing -- check user inventory, search local stores for prices
Report -- compile everything into a formatted report
Each phase used Claude's tool-use to call external tools (web search, code lookups, store searches), then submitted structured results via a designated output tool. The pipeline worked. It was also slow and expensive.
The problem was serialization. Four Claude API calls, four rounds of tool execution, four context windows worth of tokens. Total wall time: roughly 4 minutes. Total API cost: around $0.08-0.12 per report.
Two insights killed this design. First, Research and Design don't need to be separate -- Claude can search codes and design the plan in a single pass with the right tools and prompts. Second, the Report phase was just reformatting existing data. It didn't need AI at all.
The 2-Phase Pipeline
The current architecture has two phases:
Phase 1 (Plan): A single Claude call with parallel tool execution that combines research and design. Claude searches building codes, local codes, and the web in one tool-use turn, then synthesizes everything into a structured plan.
Phase 2 (Report): Pure TypeScript. No Claude call. Deterministic template assembly from the plan data.
This cut wall time to under 2 minutes and API costs by roughly 60%.
Phase 1: One Claude Call, Parallel Tools, Structured Output
The plan phase gives Claude five tools: search_building_codes, search_local_codes, web_search, search_project_videos, and calculate_wire_size. Plus one special tool: submit_plan_results.
The prompt is explicit about the execution contract:
This matters because of how Claude's tool-use works. When the model returns multiple tool_use blocks in a single response, we execute them concurrently with Promise.all:
Three requests fire simultaneously -- the slowest one determines latency, not the sum of all three.
Structured Output via Tool-Use
The submit_plan_results tool is the key architectural decision. It's not a "real" tool -- it doesn't call any external service. It's a JSON schema that forces Claude to return structured data through the tool-use mechanism instead of free text.
Here's a condensed look at the schema (14 required fields spanning research and design):
Why tool-use instead of asking for JSON in a text response? Reliability. Claude's tool-use constrains output to the declared schema. No markdown preambles, no missing fields. The runner just reads block.input as a typed object -- zero parsing. When the runner detects the output tool, it captures the input and breaks the loop immediately.
Inventory Pre-Fetching
One more trick: we don't wait for Phase 1 to finish before fetching the user's inventory. The database query fires in parallel with the plan phase setup:
The prefetch has a 3-second timeout and is non-fatal -- if the query is slow or fails, the pipeline continues without inventory data. The inventory gets injected into the user prompt so Claude factors owned items into its recommendations, and TypeScript cross-references after the plan phase to calculate savings.
Phase 2: Deterministic Report Assembly
The report phase is 300 lines of TypeScript that transforms PlanOutput into five ReportSection objects: Overview, Step-by-Step Plan, Materials & Tools, Cost Estimate, and Resources & Timeline.
No AI call. No token usage. No latency variance. The buildReport function runs in single-digit milliseconds.
The original report phase paid for a full Sonnet API call just to reformat data. Worse, Claude would occasionally hallucinate costs or reorder steps. Deterministic assembly eliminated both the cost and the reliability problem. The cost section calculates subtotals with a 10% contingency. The materials section groups by category and strikes through owned items. The resources section generates a weekend-by-weekend timeline. All predictable, testable, and fast.
SSE Streaming and the Heartbeat Problem
The pipeline streams progress to the client over Server-Sent Events. The route handler creates a ReadableStream, encodes events as data: {json}\n\n, and returns it with text/event-stream headers.
Five event types flow through the stream: agent_progress (phase, status, 0-100 progress), agent_complete (with the full report), agent_error, heartbeat, and done.
The heartbeat solves a real deployment problem. Vercel's serverless functions and CDN proxies will close connections that go quiet for too long. The plan phase can take 30-60 seconds during the Claude API call with nothing to send. A heartbeat every 15 seconds keeps the connection alive:
The maxDuration is set to 120 seconds at the route level -- enough headroom for the plan phase plus report assembly, with margin for retries.
The Numbers
Before and after the optimization:
Wall time: ~4 minutes down to <2 minutes
Claude API calls per report: 4 down to 1
Estimated API cost per report: ~$0.08-0.12 down to ~$0.02-0.05
Report phase latency: 5-15 seconds (Sonnet call) down to <10 milliseconds (TypeScript)
Reliability: Eliminated report-phase hallucinations entirely
The old phase files -- research.ts, design.ts, sourcing.ts, and the AI-based report.ts -- are still in the codebase. They work fine. We just don't call them anymore.
What's Next
The pipeline now produces good plans fast and cheap. But it treats every user the same -- a licensed electrician and a first-time homeowner get identical responses. In the next post, we'll dig into the intelligence layer: how intent classification, skill profiling, and prompt calibration make the AI read the room before it speaks.