I’ve been building AI agents non-stop since November 2024, and I’d like to share some of the things I’ve learned and how I approach building them.
I’m the kind of person who prefers doing things from scratch, especially when working with technologies that are new to me. That means I don’t rely on popular libraries—other than the standard SDKs from LLM providers. So, if you’re expecting fancy frameworks, there’s nothing to see here. This is all about pure logic.
My agent architecture : IO <-> Agent Router <-> Business-Specific Agent <-> LLM Service <-> Providers
IO: This is the standard input/output layer, handling request validation, authentication, etc.
Agent Router:
Depending on the business logic, I either route agents by project/task status or, in some cases, use tool calls from a primary agent (e.g., a planner agent) to handle routing.
Business-Specific Agent:
This is where the agent’s core logic lives. It iterates (loops) until the task is completed or until it needs to send a message back to the user. In this layer, I define which model strategies to use, what prompts to provide, which tools to provide and the context to maintain across iterations. To avoid runaway loops, I also implement a safety max iteration limit—if the agent reaches this limit, it pauses and prompts the user for approval before continuing.
Context Handling:
For agents that primarily work with tools, I usually provide the 2-3 most recent messages in raw form and prior messages in summarized form to keep context efficient. Agents also have access to a context set/get tool that allows them to save multi-step work and retrieve it later when needed. This is crucial for handling workflows that span multiple steps or require persistent memory during a task.
LLM Service:
Acts as a router for different LLM providers. Here, I handle API request logging, error logging, token usage tracking, and retries. Based on error codes, I might retry with the same model strategy or switch to a different one. Some agents support multiple models, so retrying with alternate strategies is common.
Providers:
I primarily use Google Gemini and OpenAI models because they’re free during development (via Gemini’s free tier and GitHub Models API). Claude models are undeniable when building coding agents but other than that, I use Google and OpenAI models.
Mock Provider:
I also use a mock provider that returns predefined responses based on the user’s message count or specific keywords. This is extremely helpful for developing workflow agents—you get predictable, fast responses and can focus on the application side without waiting for actual LLM outputs. Highly recommended.
Human Provider:
There’s also a “manual provider” where I send responses manually through a special route, effectively acting in place of the AI agent. This is useful for reviewing tool call results and the context provided to the LLM. Building AI agents is all about context and tools—if the tools you give your agents aren’t effective, they’ll fail no matter which model you use. This one’s also highly recommended.
Evals:
I treat evals like test cases in traditional software engineering. I group eval sets into levels: easy, difficult, edge cases, etc. Each eval defines the task for the agent, success criteria, expected scores, and the minimum/maximum tool calls. I don’t do anything fancy here—I run them manually to see if the agents behave as expected. (There might be better processes I haven’t explored yet.)
This is the way I build AI agents today.
Thank You.