Building an LLM Agent from Scratch
There’s a lot of framework churn in the agent space right now. LangChain, LlamaIndex, AutoGen, CrewAI — they all have opinions about how to structure an agent. Before picking one, it’s worth understanding what they’re all implementing under the hood. It’s simpler than you’d think.
The basic loop
An LLM agent is just a loop:
- Give the model a system prompt, a list of tools it can call, and the conversation history
- The model either produces a final response or calls a tool
- If it called a tool, run it and append the result to the conversation
- Go to step 1
That’s it. The magic is entirely in steps 1 and 2 — writing a system prompt that makes the model reason well, and defining tools the model can actually use reliably.
Here’s a minimal implementation in Python using the Anthropic SDK:
import anthropic
import json
client = anthropic.Anthropic()
# Define tools the model can call
tools = [
{
"name": "read_file",
"description": "Read the contents of a file at a given path.",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or relative file path"
}
},
"required": ["path"]
}
},
{
"name": "write_file",
"description": "Write content to a file, creating it if it doesn't exist.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
}
]
def run_tool(name: str, inputs: dict) -> str:
if name == "read_file":
with open(inputs["path"]) as f:
return f.read()
if name == "write_file":
with open(inputs["path"], "w") as f:
f.write(inputs["content"])
return f"Wrote {len(inputs['content'])} bytes to {inputs['path']}"
raise ValueError(f"Unknown tool: {name}")
def agent(user_message: str, system: str = "You are a helpful assistant.") -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system=system,
tools=tools,
messages=messages,
)
# Append assistant's response to history
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Extract the final text response
for block in response.content:
if hasattr(block, "text"):
return block.text
return ""
if response.stop_reason == "tool_use":
# Run each tool call and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = run_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Append tool results as a user turn and continue
messages.append({"role": "user", "content": tool_results})
A few things worth noting:
- The conversation history is the entire state of the agent. There’s no hidden memory — just the message list.
- Tool results go back in as a
userturn withtype: "tool_result". The model sees this and continues. stop_reason == "tool_use"means the model wants to call tools."end_turn"means it’s done.
Where it breaks
The loop above works for simple cases. Here’s where it starts to fall apart:
Context length
Every tool call and result gets appended to the message list. Long-running agents hit the context limit. You need some form of summarization or sliding window — there’s no magic solution.
Tool reliability
Models are surprisingly bad at using tools when the input schema is ambiguous or the description is underspecified. Half the work of building a good agent is writing good tool descriptions. Treat them like API documentation.
Infinite loops
Nothing stops the model from calling the same tool in a loop. You need a maximum iteration count, full stop.
MAX_ITERS = 20
def agent(user_message: str, ...) -> str:
messages = [{"role": "user", "content": user_message}]
iters = 0
while iters < MAX_ITERS:
iters += 1
# ... rest of loop
return "Agent hit iteration limit without completing."
Parallelism
The loop above is strictly sequential — one tool call, wait for result, next tool call. For tasks that involve many independent operations (read 10 files, make 3 API calls), you can batch tool_use blocks. The model will sometimes naturally emit multiple tool calls in a single response; make sure you run them and return all results in a single tool_result turn.
What frameworks add
Once you understand the basic loop, frameworks start making sense as ergonomic wrappers around it. They typically add:
- Structured tool output parsing — so your tool results come back as typed objects
- Memory systems — vector stores, conversation summarization, entity extraction
- Streaming — showing partial output as the model generates it
- Observability — tracing tool calls, latencies, token counts
Whether you need these depends entirely on what you’re building. For most agent prototypes, the loop above is enough to get started. Add complexity when you hit a specific wall, not before.
Next steps
The natural extension is multi-agent systems: one coordinator model that delegates to specialist subagents. The pattern is the same loop, but some “tools” are just calls to other model instances. I’ll write about that next.