Building an LLM Agent from Scratch

There’s a lot of framework churn in the agent space right now. LangChain, LlamaIndex, AutoGen, CrewAI — they all have opinions about how to structure an agent. Before picking one, it’s worth understanding what they’re all implementing under the hood. It’s simpler than you’d think.

The basic loop

An LLM agent is just a loop:

Give the model a system prompt, a list of tools it can call, and the conversation history
The model either produces a final response or calls a tool
If it called a tool, run it and append the result to the conversation
Go to step 1

That’s it. The magic is entirely in steps 1 and 2 — writing a system prompt that makes the model reason well, and defining tools the model can actually use reliably.

Here’s a minimal implementation in Python using the Anthropic SDK:

import anthropic
import json

client = anthropic.Anthropic()

# Define tools the model can call
tools = [
    {
        "name": "read_file",
        "description": "Read the contents of a file at a given path.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Absolute or relative file path"
                }
            },
            "required": ["path"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file, creating it if it doesn't exist.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["path", "content"]
        }
    }
]

def run_tool(name: str, inputs: dict) -> str:
    if name == "read_file":
        with open(inputs["path"]) as f:
            return f.read()
    if name == "write_file":
        with open(inputs["path"], "w") as f:
            f.write(inputs["content"])
        return f"Wrote {len(inputs['content'])} bytes to {inputs['path']}"
    raise ValueError(f"Unknown tool: {name}")

def agent(user_message: str, system: str = "You are a helpful assistant.") -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            system=system,
            tools=tools,
            messages=messages,
        )

        # Append assistant's response to history
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Extract the final text response
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        if response.stop_reason == "tool_use":
            # Run each tool call and collect results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = run_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
            # Append tool results as a user turn and continue
            messages.append({"role": "user", "content": tool_results})

A few things worth noting:

The conversation history is the entire state of the agent. There’s no hidden memory — just the message list.
Tool results go back in as a user turn with type: "tool_result". The model sees this and continues.
stop_reason == "tool_use" means the model wants to call tools. "end_turn" means it’s done.

Where it breaks

The loop above works for simple cases. Here’s where it starts to fall apart:

Context length

Every tool call and result gets appended to the message list. Long-running agents hit the context limit. You need some form of summarization or sliding window — there’s no magic solution.

Tool reliability

Models are surprisingly bad at using tools when the input schema is ambiguous or the description is underspecified. Half the work of building a good agent is writing good tool descriptions. Treat them like API documentation.

Infinite loops

Nothing stops the model from calling the same tool in a loop. You need a maximum iteration count, full stop.

MAX_ITERS = 20

def agent(user_message: str, ...) -> str:
    messages = [{"role": "user", "content": user_message}]
    iters = 0

    while iters < MAX_ITERS:
        iters += 1
        # ... rest of loop
    
    return "Agent hit iteration limit without completing."

Parallelism

The loop above is strictly sequential — one tool call, wait for result, next tool call. For tasks that involve many independent operations (read 10 files, make 3 API calls), you can batch tool_use blocks. The model will sometimes naturally emit multiple tool calls in a single response; make sure you run them and return all results in a single tool_result turn.

What frameworks add

Once you understand the basic loop, frameworks start making sense as ergonomic wrappers around it. They typically add:

Structured tool output parsing — so your tool results come back as typed objects
Memory systems — vector stores, conversation summarization, entity extraction
Streaming — showing partial output as the model generates it
Observability — tracing tool calls, latencies, token counts

Whether you need these depends entirely on what you’re building. For most agent prototypes, the loop above is enough to get started. Add complexity when you hit a specific wall, not before.

Next steps

The natural extension is multi-agent systems: one coordinator model that delegates to specialist subagents. The pattern is the same loop, but some “tools” are just calls to other model instances. I’ll write about that next.