Layman's Guide to Computing

Season 14

Issue 179: Agents

Published:

Previously: Thinking/reasoning models are those that have been trained on examples of how to think about different problems in different domains, or plan and execute complex tasks. They often use tools to aid them in goal tracking and updating. The full thinking trace from the model may be removed or hidden to present a more legible response to the user.

Let’s review the ingredients we have so far:

  1. A large language model (Issue 170) or multimodal model (Issue 177): a next-token predictor that takes input tokens and keeps generating output tokens which feed back to the input
  2. Training data, which the model is trained on to pick up general patterns through unsupervised learning (Issue 171), and then steered to avoid harmful output and generate useful output through the use of labelled training data through supervised learning (Issue 174)
  3. A runtime (Issue 175), which handles multiple responsibilities:
    • parsing the model output to block it if found to be harmful
    • formatting the text for display to the user
    • separating and executing tool calls (typically in an isolated container), and injecting the results back into the input (Issue 175)
    • processing thinking tokens, removing or hiding them (Issue 178)
  4. Other optional runtime extensions, such as those that add retrieval-augmented generation (RAG) capabilities (Issue 176), or add information that the model remembered about the signed-in user

What does an agent do?

AI Agents

agent(n.)
late 15c., “one who acts,” from Latin agentem (nominative agens) “effective, powerful,” present participle of agere “to set in motion, drive forward; to do, perform; keep in movement” (from PIE root *ag- “to drive, draw out or forth, move”).

The term “agent” means “one who acts”. So agents are software applications, comprising a trained model and a runtime. We can broadly think of the model as the “brains” of the partnership, and the runtime as the “body”.

Because agents need a computer (physical or virtual) to “act”, these software applications are typically installed on a computer, although they may also include a web interface to allow users to control them remotely.

The model has remained conceptually similar as I went from Issue 170 to here, but the runtime is picking up more and more responsibilities. So as not to muddy the terms, I’ll keep the runtime focused on the model: processing the output, executing tool calls and injecting results, re-invoking the model if it has not reached a stop token, and any RAG if implemented. Everything else that we are adding today, that makes the agent an effective partner and piece of software, I’ll explain under the label harness.

The model

Some harnesses make it easy to swap out the underlying model, allowing the model to run the agent harness with different models. Many model providers have standardized on OpenAI’s API (Issue 4) so as to make their models easily accessible to programmers.

While state-of-the-art models are capable enough to not require a more specialized version for agentic use, the agent harness usually provides a special system prompt for this purpose. This special prompt includes information on the use context, on the tools available to the model, and other pertinent information to guide the model and keep it on task.

The runtime

A runtime used within a harness needs to include additional features: the ability to pause or stop the model, to understand access control configuration (which tool calls require user approval) and route matching tool calls to the user for permission grants, and introspectability: allowing the harness program to check the state of the runtime and model.

The harness

When a user uses agentic software, the harness is what they see. That means the harness handles typical software responsibilities:

What an agent does

… I don’t know what to say here. By itself, a model can do nothing besides generate text. When embedded in a harness+runtime, what it can do is limited by the tools it has available—remember that the model relies on the runtime executing its tool calls to have any effect on the world.

With simple toolsets (primarily a commandline tool), the agent can plausibly:

If given the appropriate tools and permissions from the user, the agent can also:

With more advanced tools or MCP servers that handle the complex details, an agent can even:

When provided with detailed explanations of how to perform complex tasks (typically through a skill file that the agent can read), the agent can plausibly:

… Why haven’t they taken over the world yet?

Because most people aren’t using them!

… Just kidding, there are other reasons too. For example:

Agent capabilities tend to be emergent. That means researchers and frontier labs can train a model to carry out tasks A, B, and C, and a user giving the agent a different kind of task discovers that it is also effective at task D but not task E.

Generally, a question can “can an agent do F?” can’t be answered definitively prior to actually asking the model to do F. And even if one person fails to get the agent to execute the task successfully, another person might succeed, because they asked differently, because they are familiar with the terminology required to instruct the agent, or for some other reason.

All of this is still ongoing research work: agents only really took off in 2025, when Anthropic released Claude Code which became the first generally capable agent. Since then, every day users are discovering new things that it can do. The things that it can’t, Anthropic and other frontier labs are still training it to be able to do them.

Issue summary: Agents are software applications that comprise a harness, a runtime, and a model (typically accessed through an API instead of directly executed on the computer). They enable a user to type in a request or send it by other means and thus instruct the agent to carry out a task on the computer until completion. The capabilities of agents are limited by the tools available to them.


You now have a pretty good idea of all the pieces involved in getting an AI agent to do things. The part I can’t authoritatively tell you about is what they can or can’t do, because that is still changing every week as frontier labs continue to train more capable models and agent harnesses continue to add more tools and features.

If you’re curious, consider trying them out. You could search for an online guide, or let ChatGPT/Claude help get you started.

What I’ll be covering next

In ten issues, I’ve walked you through the key concepts that help you understand what AI agents do. With three issues left to go, what else should I cover?

Some questions I’m anticipating, or have fielded some variant of:

  1. Can I run my own AI model?
  2. Why can’t the AI do <thing>?

Question 2 has a boring answer and an interesting one. The boring answer is “because it hasn’t been trained yet”. The interesting answer is … not really suitable for a newletter titled Layman’s Guide to Computing, because it’ll be rooted in philosophy and cognitive science. In a different publication perhaps.

So let’s tackle question 1, which will draw on computing concepts I’ve covered in earlier issues and give you an idea of the kind of compute and memory capacity needed to run a model.

Next issue: Issue 180: Running a model