Layman's Guide to Computing

Season 14

Issue 182: Running a model, part 2

Published:

Previously: Quantization trades parameter precision for a smaller memory footprint and faster inference, making many models feasible for running on user devices. Model capabilities depend on their parameter count and training data. Models with higher parameter counts can represent more patterns, while model capabilities are added by training them on well-labeled data.

Last issue, we discovered that there are quantized models that can actually run on laptops. (You can also run GPT-1 and GPT-2 on a laptop, but you would likely be disappointed in their performance today given the leaps-and-bounds improvement in AI capability that have happened since 2022.)

Besides gemma-4-12b, what else can we run?

Open-weight model options

Open-weight enthusiasts have a number of well-known options available to them (sizes are unquantized):

There are also many lesser-known models, whose capabilities are still increasing every few months.

I won’t give you a comprehensive low-down on what each model is good for, because:

Model capabilities

In Issue 181 I mentioned that more parameters lets the model represent more patterns in its weights, while better training data determines the model’s capabilities. Useful to know as a general pattern, but difficult to apply when deciding on a specific model to run. Should we just run the largest model that our device is capable of running?

As of June 2026:

Specialized non-LLM models include:

Models are still improving through post-training (supervised learning) and distillation—a process by which small models are trained on output from larger, more capable models. A 9B model today already exhibits capabilities that GPT-3 (175B) was capable of in 2022. So you should expect a different set of capability tiers this time next year.

Hardware options

The sweet spot for “value-for-money” sits around 12B–35B for now. Smaller models are faster and use less memory. Speed decreases and memory use increases as model size increases.

With this in mind, these are some popular options for running models on-device (local deployment) as of June 2026 (prices are Singapore retail):

If you already have an existing laptop/PC and want to know how it will manage different model sizes, you can ask ChatGPT or Claude; they are pretty up-to-date with hardware capabilities and can give you an estimate. Alternatively, try to download and run the models and see for yourself—ground truth doesn’t care about your estimates.

Cloud options

Wow that’s a lot of zeros. Besides, owning hardware comes with its own maintenance needs and headaches. Enter the cloud, i.e. pay-per-use.

If you don’t want to have to manage the hardware that runs these models, don’t plan to be running a model long-term, or want to run a model larger than what your hardware can handle, these are the current most user-friendly options:

There are other options that require more technical expertise to use, but if you reach that point you shouldn’t be relying on a layman’s guide anymore :)

Issue summary: Open-weight models range in size from sub-1B to 100+B. A range of device options below SGD6,000 are already capable of running these models, ranging from the humble Raspberry Pi for running harness support to the Mac Studio M3 for running 70B models. For larger models, or short-term workloads, cloud options for deploying and running open-weight models also exist.


This is the most tentative issue for this season, and probably for the entire newsletter so far. I try not to write issues that I will have to retroactively edit as the frontier shifts, but I’ll make this an exception: I think expounding on available open-weight models illustrates how the ecosystem is similar to open-source software, that allows the (sufficiently educated) public to experiment and provide feedback, how advances in AI over the past 3–4 years have made them feasible to run on consumer-class devices, and how cloud infrastructure has made larger models accessible to those who don’t own sufficiently powerful hardware.

The Layman’s Guide to Computing archive

Buttondown still does not have a very browseable archive, so I’ve made the newsletter content available on a static site. You can browse past seasons more easily at https://ngjunsiang.github.io/laymansguide/categories.

I may add more seasons in future, as computing technology stabilizes enough for me to write about them in a static newsletter. If you’d like to receive future issues, do subscribe below: