Personal Systems: Agent Beliefs

11 Jun 2026 •

personal-systems

It is June 2026, and memes about agents saying “you told me not to … and I did anyway” are already cliché. If you use them long enough, you already know that their post-hoc rationalizations don’t always match what you observe. And honestly? That’s valid. People do that too.

The difference is we can run agents through evaluation tests (evals). We can inspect their training data. And we can design personal information systems to record a trace of agent actions, not primarily to hold them to task, but to enable them (and humans) to audit their thought stream against the tool calls and decisions they made.

This is already happening; it is how agents deployed in companies are trained. But I’m more concerned about personal information systems, which are less about accountability and more about orientation. We need some terms for thinking about what guides agent thought and behavior.

My understanding of agents

Here’s my current understanding of agents, from background reading and personal use (See my Layman’s Guide to Computing newsletter, Season 14 for a crash course):

agents are models wrapped in a runtime.
the model is a next-token predictor that generates thoughts (raw token stream) and actions (tokens formatted as an action, e.g. ReadFile(README.html))
the runtime intercepts the token stream, separating and executing actions (beyond the model’s control), displaying results to the user, and passing user input back to the model
models are trained on data: unsupervised training on unlabelled data gives it general knowledge, supervised training on labelled data makes the model more reliable at following instructions, using tools, thinking, and planning.
models are invoked with an initial set of tokens, called the prompt. These typically give it an identity, guiding instructions, and scope its behavioral boundaries
models can learn from examples given in the prompt, and the user’s initial input is typically tacked on after the built-in prompt (known as the system prompt)
while the model can follow instructions in the prompt pretty well, as the context grows longer, contradictory or ambiguous context may confuse it, or its post-trained tendencies may end up overriding its instructions
as a result, models can get stuck in loops, go against instructions, do something the user did not specify in sufficient detail in a way they do not expect, etc.

An ontology for agent belief

“What’s a belief? What do beliefs do? What’s the difference between a belief and a claim? What’s the relationship between a belief and a fact?” Start to ask too many of these questions and you find yourself thinking about or building an ontology.

So that’s where I’m at today. While the terms below have everyday definitions, in the context of a personal information system they take on more specific, scoped, consistent meanings. To make this distinction more noticeable, I’ll format such terms in monospace, like this.

First, we have facts: events recorded in the personal information system. When a message was sent, by who, referencing what. When a document was created or edited. When an AI agent took an action, and in response to what. These events, once written, are not meant to be easily modified; they serve as a written record of what happened, made fact.

To an agent, which has no senses and cannot immediately verify the contents of its context, the tokens there are effectively claims. Claims are free-form text or information, which may or may not be linked to underlying facts. We use the term “claims” here rather than “beliefs”, so that the semantics make it clear that the agent does not necessarily act on this information—but it may!

There are many ways we could redefine “belief”, but we’ll try to keep close to our intuitive understanding of it: a belief is a statement like “agent D believes it should obey all instructions from humans”, conceived by humans. These statements are typically based on observed behavior over a period of time.

In this ontology, agents do not have beliefs. They have priors, a term we borrow from Bayesian probability, describing the agent’s tendencies in token generation. These priors come from the weights of its underlying model: the gigantic set of numbers that represent its encoded patterns. We can probe agent priors by passing incomplete statements like “messages are stored in _____” to the agent’s model and seeing how it completes the sentence.

While priors describe the agent’s knowledge, grain describes the agent’s behavioral tendencies: the tool calls it makes in response to situations, the specific triggers that cause it to make them, and also situations when it does not use them. “Grain”, analogous to “grain of the wood”—wood is easy to split along its grain, not against. Likewise, working against an agent’s grain“—by asking it to use tools in a way that runs counter to what its priors will generate—is much more difficult than working “along its grain“—giving it a higher-level goal, and letting it use tool calls following its priors. We can only do this with agents that are trained to use tools in a way that aligns with our goals—it is in the correct orientation.

The customary way of doing this is is called post-training or fine-tuning: using supervised learning on labelled data. The data is labelled as input:output pairs, with the input being passed the the model, and its output checked against the labelled output. The model’s weights are adjusted until it gives the correct output. This is a computationally heavy process, usually not carried out on user devices; but it may become feasible in future, or available through APIs such as Thinking Machines Lab’s Tinker.

In a personal information system, the agent’s knowledge is easier to update or correct through tool calls that let it query the information system or read from files. The purpose of most fine-tuning will be to re-orient the agent’s grain, to have it use tools correctly and in the correct context. The labelled data will likely take the format “<context>:<action>”. To distinguish this from the generic term of “training data”, we call such context:action data pairs lessons.

Open research questions

Who selects and triages the lessons that the agent learns?
How do we strike a balance between the heavy computational resources needed for full grain re-orientation (e.g. 100% accuracy on tool calls), and the adaptability the agent might need (e.g. as the user’s situation changes, the agent might need to use different tools in similar context)
What is a feasible, cost-effective way to re-orient agents using fine-tuning?