Layman's Guide to Computing

Issue 182: Running a model, part 2

2026-08-31T08:00:00+08:00

Previously: Quantization trades parameter precision for a smaller memory footprint and faster inference, making many models feasible for running on user devices. Model capabilities depend on their parameter count and training data. Models with higher parameter counts can represent more patterns, while model capabilities are added by training them on well-labeled data.

Last issue, we discovered that there are quantized models that can actually run on laptops. (You can also run GPT-1 and GPT-2 on a laptop, but you would likely be disappointed in their performance today given the leaps-and-bounds improvement in AI capability that have happened since 2022.)

Besides gemma-4-12b, what else can we run?

Open-weight model options

Open-weight enthusiasts have a number of well-known options available to them (sizes are unquantized):

Google’s Gemma 3 series, available in sizes 1B to 27B
Google’s Gemma 4 series, available in sizes 5B to 31B
Microsoft’s Phi series, available in sizes 2.7B to 14B
Meta’s Llama series, available in sizes 8B to 405B
Alibaba’s Qwen series, available in sizes 0.5B to 72B
Deepseek’s titular Deepseek series, available in sizes 16B to 72B
Mistral’s Mistral series, available in sizes 7B to 176B

There are also many lesser-known models, whose capabilities are still increasing every few months.

I won’t give you a comprehensive low-down on what each model is good for, because:

the models can be fine-tuned by those who know how, and may have variants that are better at specific task categories,
the models are updated every few months, and see new capabilities added through post-training (supervised learning),
the agent harness and runtime do play a part: some models are useful “out-of-the-box”, some work best within a particular harness or with a particular set of tools

Model capabilities

In Issue 181 I mentioned that more parameters lets the model represent more patterns in its weights, while better training data determines the model’s capabilities. Useful to know as a general pattern, but difficult to apply when deciding on a specific model to run. Should we just run the largest model that our device is capable of running?

As of June 2026:

0.5B–3B models can handle classification, extraction, summarization tasks and are generally good for single request-response purposes.
7B–9B models are useful assistants (in a harness) that can hold short conversations, handle basic Q&A, do simple coding or tool calls, and otherwise generally match GPT-3’s capabilities.
12B–15B models can follow instructions consistently, generate code that mostly works (and do some debugging if necessary), generate tool calls more reliably, making them capable tool-using agents.
27B–35B models can handle most tasks, even across longer contexts: analyze documents and write reports, generate and debug code, execute requests involving multiple steps. With a well-designed harness and accurate task documentation, these become capable general-purpose agents.
70B models can handle what previous tiers can do, but better: fewer hallucinations and mistakes, better answers, better general understanding, more consistent planning, and over longer context windows—smaller models sometimes see a sharp performance drop when the context window extends past a certain length. Some users report better reasoning performance as well.
100B+ Frontier models—GPT-5, Claude Opus, Kimi K2.5, et al—can do all of the above, with state-of-the-art reasoning and thinking, knowledge, error recovery, ambiguity handling, and more

Specialized non-LLM models include:

OpenAI Whisper (0.4B–1.5B) for speech-to-text transcription, text-to-speech generation
Stable Diffusion (0.9B–8B) for text-to-image generation
FLUX.1 (12B) also for text-to-image generation
CLIP (0.4B) for image-to-text understanding
Stable Audio 3 (0.6B–2B) for text-to-audio generation

Models are still improving through post-training (supervised learning) and distillation—a process by which small models are trained on output from larger, more capable models. A 9B model today already exhibits capabilities that GPT-3 (175B) was capable of in 2022. So you should expect a different set of capability tiers this time next year.

Hardware options

The sweet spot for “value-for-money” sits around 12B–35B for now. Smaller models are faster and use less memory. Speed decreases and memory use increases as model size increases.

With this in mind, these are some popular options for running models on-device (local deployment) as of June 2026 (prices are Singapore retail):

Raspberry Pi (8–16GB RAM): popular for tiny models (2B or smaller), used to generate document embeddings for search, OCR documents and clean up the OCRed text, etc. These form the support system for the agent harness, and usually are not used directly for the agent models.
Mini-PCs with a sufficiently capable CPU, no dedicated GPU are a decent budget option.
- AMD Ryzen AI 300 CPUs, 12 CPU cores, 8–12 GPU compute units & 64GB RAM: this can run 7B–13B models capably (if slowly), and 34B quantized models at a crawl. [~SGD2,000]
- AMD Ryzen AI MAX+ (Strix Halo) CPUs, 16 CPU cores, 40–48 GPU compute units & 256GB RAM: this bundles a much more capable integrated GPU (Issue 123) and can run 34B models capably, 70B models at a crawl. [~SGD4,800]
- Mac Mini M4, 12 CPU cores, 10 GPU compute units & 24GB RAM: In a similar category as the Ryzen AI 300. [SGD1,299]
- Mac Mini M4 Pro, 14 CPU cores, 20 GPU compute units & 48GB RAM: In a similar category as the Ryzen AI MAX+. [SGD2,659]
- Mac Studio M3 Ultra, 28 CPU cores, 60 GPU compute units & 96GB RAM: With the highest memory bandwidth of all the units in this category, this can run everything mentioned above, and even run 70B models decently well. That’s what most folks would be buying this for.
  A higher-end 32 CPU core, 80 GPU compute unit configuration exists if you add SGD2,025—doesn’t add new capabilities, makes everything a little faster. [SGD5,199]
Full PCs with a capable CPU & dedicated GPU
- Many options exist here, none below SGD6,000, most above SGD10,000. Dedicated GPUs capable of running AI models already have prices in the thousands.

If you already have an existing laptop/PC and want to know how it will manage different model sizes, you can ask ChatGPT or Claude; they are pretty up-to-date with hardware capabilities and can give you an estimate. Alternatively, try to download and run the models and see for yourself—ground truth doesn’t care about your estimates.

Cloud options

Wow that’s a lot of zeros. Besides, owning hardware comes with its own maintenance needs and headaches. Enter the cloud, i.e. pay-per-use.

If you don’t want to have to manage the hardware that runs these models, don’t plan to be running a model long-term, or want to run a model larger than what your hardware can handle, these are the current most user-friendly options:

HuggingFace not only catalogues model weights, it also automates inference hosting (provided by AWS or Google Cloud underneath). Caveat: not all models are supported; you need a model that lists “HF Inference API” as an Inference Provider. The HuggingFace link in this bullet point links you to models that do. On the model card page, click Deploy > HF Inference Endpoints
Replicate provides an even simpler interface, but for a smaller catalogue of models. Try out the models directly on the model card page, or create an account for deployment options.
Fireworks AI is where you go once you’ve decided on a (supported) model and want reliable hosting. Browse their model list and click Try In Playground or Deploy On Demand (requires registration).

There are other options that require more technical expertise to use, but if you reach that point you shouldn’t be relying on a layman’s guide anymore :)

Issue summary: Open-weight models range in size from sub-1B to 100+B. A range of device options below SGD6,000 are already capable of running these models, ranging from the humble Raspberry Pi for running harness support to the Mac Studio M3 for running 70B models. For larger models, or short-term workloads, cloud options for deploying and running open-weight models also exist.

This is the most tentative issue for this season, and probably for the entire newsletter so far. I try not to write issues that I will have to retroactively edit as the frontier shifts, but I’ll make this an exception: I think expounding on available open-weight models illustrates how the ecosystem is similar to open-source software, that allows the (sufficiently educated) public to experiment and provide feedback, how advances in AI over the past 3–4 years have made them feasible to run on consumer-class devices, and how cloud infrastructure has made larger models accessible to those who don’t own sufficiently powerful hardware.

The Layman’s Guide to Computing archive

Buttondown still does not have a very browseable archive, so I’ve made the newsletter content available on a static site. You can browse past seasons more easily at https://ngjunsiang.github.io/laymansguide/categories.

I may add more seasons in future, as computing technology stabilizes enough for me to write about them in a static newsletter. If you’d like to receive future issues, do subscribe below:

Issue 181: Quantization

2026-08-24T08:00:00+08:00

Previously: Proprietary models do not have their weights published publicly, while open-weight models do. Various runtimes are available for download, and can run models that have a compatible file format. But models are extremely compute- and memory-intensive, requiring extremely high-end hardware and capacious memory to run.

Great, so a 12B model takes up 24GB of disk space, uses 24GB of RAM, and up to 96GB for the KV cache (model’s calculated representation of input tokens). That’s out of reach for most consumers without AI-grade GPUs, which currently cost tens of thousands per unit.

Enter quantization.

Parameter representation

Models are typically trained with full precision, allowing them to store each parameter using 16 bits (2 bytes). This is necessary because the training process results in multiple adjustments to the weights. If the intermediate values are not stored with full precision, subsequent adjustments to those values are not accurately represented, and may result in inaccurate training results.

However, once the model is trained and its weights released, they are effectively “frozen”: the weights do not change as the model is used for inference (Issue 173).

Quantizing parameters

Can we reduce the model size and memory footprint by reducing the precision? Yes. Experiments have shown that models lose some accuracy as their parameters are quantized: represented using 8 bits (twofold reduction), or even 4 bits (fourfold reduction!). Below that range, running the model at 2 bits often results in unacceptable performance.

This inaccuracy shows up in models not following instructions as well, potentially making mistakes more noticeably, especially on complex tasks, or being less accurate with tool call syntax. However, compared to the alternative of not running the model at all, this is usually an acceptable tradeoff for users running the model on their own computers.

Running a quantized model

Okay, let’s run those numbers on a quantized Gemma 4 12B model. We don’t even need to do the quantization ourselves usually: other enthusiasts have already done it, providing the models on HuggingFace as well (they can be identified through the “Q4” in the model naming scheme; 8-bit quantized models are labelled “Q8”).

We already see immediate benefits: the 4-bit quantized model weights are only 7GB, a stark contrast to the 24GB of full-precision weights.

The KV cache requirement now drops to ~6GB for 32K tokens, and ~50GB for 256K tokens. Very uncomfortable for a Macbook, which means we would have to limit ourselves to a 128K or even 64K token context length. Annoying, but not show-stopping.

The inference speed now increases to ~60 tokens/sec, about as responsive as ChatGPT or other chatbots!

What do we gain from larger models?

Unlike programs or data files, which store data as-is (perhaps compressing them for a smaller filesize), models represent information: the training process produces a highly compressed set of numbers that are able to approximately reproduce the training contents (not 100% accurately, but quite close), and more importantly generate tokens following the same pattern for inputs that it was not trained on.

What if we try to break the laws of physics, taking GPT or Claude’s training corpus, and training it into a 1B model? What happens?

1B parameters means the model only has 1 billion numbers to try to represent everything. If the training data is repetitive and largely similar, 1B might even be sufficient since there just isn’t that much variation in the data.

But if the data is highly varied, the model might not be able to adjust the weights to represent everything. It will end up storing one additional data point at the expense of worse representation for other data points. This might show up as a plateau in benchmark scores: the model can’t improve further. Or it might show up as the model not “remembering” data that shows up less frequently.

What do frontier models, often with parameter counts running into trillions, gain? With so many parameters, they can represent more patterns: more thinking scaffolds and reasoning frameworks, more sentence/paragraph patterns from more books and articles, etc. And not just more patterns, but higher-order patterns: writing styles, writing intents, idea development, longform writing structure, etc.

Google’s Gemma 4 12B model will end up not being able to represent everything. Our running model might give less nuanced answers, consider fewer perspectives in its answer, and otherwise give worse answers.

But hey, it runs! Give it a spin, see what you can do with 12B parameters.

Model capabilities

Even frontier models with poor training data will disappoint. 1 trillion parameters won’t necessarily make a model much smarter if the training data is poor.

Most new capabilities are added through additional training, usually supervised learning. If we can’t train the underlying model, we might be able to create skill files explaining how to do something, let the harness read it and add it into the input context, and lean on the model’s pattern-following capabilities to tackle the task.

Either way, if you have the hardware to support it and manage to get a local agent running, try it with different questions and tasks to get a feel for what it can and cannot handle. That beats any amount of reading on what these models are supposed to be able to do.

Issue summary: Quantization trades parameter precision for a smaller memory footprint and faster inference, making many models feasible for running on user devices. Model capabilities depend on their parameter count and training data. Models with higher parameter counts can represent more patterns, while model capabilities are added by training them on well-labeled data.

12 issues in, that’s a wrap! At this point I think what I’ve written is what’s unlikely to change in the next couple of years, and still useful for layfolks to know about the ongoing AI development. Anything newer is still in active development.

What I’ll be covering next

Next issue: Issue 182: Running a model, part 2

In the last issue, I’ll explore other options for running a model on your device (called local deployment in parlance): running smaller models, and other feasible hardware options.

Issue 180: Running a model

2026-08-17T08:00:00+08:00

Previously: Agents are software applications that comprise a harness, a runtime, and a model (typically accessed through an API instead of directly executed on the computer). They enable a user to type in a request or send it by other means and thus instruct the agent to carry out a task on the computer until completion. The capabilities of agents are limited by the tools available to them.

As of June 2026, OpenAI and Anthropic charge about $20/mth for their Pro/Plus plan, and about $200/mth for their Max plan. For those of us who like to stay on free tiers, it can be pretty annoying to hit the dreaded “You have reached the limit for Free plan”, but what can we do short of shelling out for a higher tier?

Wait—if a language model is a bunch of numbers, and a runtime is just a program, why can’t I run it on my own computer instead?

Proprietary models and open-weight models

For starters, you can’t download the GPT-5 or Claude models. They are proprietary models, and their weights (the file containing the model’s parameters) are a guarded trade secret; a leak of the weights would be disastrous for OpenAI or Anthropic.

Okay, fine you say, then let’s run something I can actually download. As of 2026, that typically means you would go to HuggingFace (yes, that is their actual name), currently the world’s largest platform for hosting open-weight models. An open-weight model, analogous to open-source software, means the model’s weights are publicly available and you can download them.

The parts: downloading weights

Let’s download the currently top-trending model, Google’s gemma-4-12B-it. The model card says that this is a multimodal model (Issue 177) with 11.95 billion (12B) parameters (Issue 170). It has a context length of 256K tokens (Issue 172)—important when deciding what kind of tasks it can plausibly take on, since the context length dictates what the total output length (including the input tokens) cannot exceed.

Under Files and versions, we see a whole bunch of files, most of them metadata, configuration information, and other data (such as the token list). The model weights are easy to tell: they are by far the largest file of the collection, weighing in at 23.9GB. We can calculate this: 11.95 billion parameters, with each parameter taking up 16 bits (Issue 40), means 2 bytes per parameter, and thus 23.9 billion bytes for all the parameters. 23.9GB.

The runtime

You have a few options here, listed from easiest to most difficult:

LM Studio – Comes with a graphical user interface (GUI), so click to load the model and you get a chat interface. Great for getting started ASAP, not great if you actually eventually want to use it as an agent.
Ollama – A commandline program, requiring some terminal chops. Sets up an API server that you can use with many other programs.
Hugging Face Transformers – A Python library for working with models, which means it’s programmers-only. Great if you are building or customizing your own agent harness, but definitely not ready-to-run as-is.
llama.cpp – The most low-level, close-to-the-metal option. Gives you a commandline program for using the model, but you have to manage all other technical detail on your own. Not for the faint-hearted.
vLLM – A GPU-only library for serving models over an API. Presumably we do not have four thousand bucks to spend on an entry-level GPU for models, such as the RTX 4090 with 24GB of GPU memory, and are running the model on a CPU, so this option is automatically disqualified for us.

Hardware requirements

Great. So we’ve downloaded and installed LM Studio, launched it, and then selected our gemma-4-12B-it model for loading.

A screenshot of LM Studio
Source: LM Studio

The first thing that would probably happen is your system will complain about insufficient memory and stop. You see, to run this model, we would need to read the model weights (23.9GB) into memory, immediately using up 24GB of memory. Even assuming no other apps are running, we still need more memory for the following: - operating system overhead (~1-2GB) - memory used by the runtime (1-3GB)

Oh? It didn’t crash for you? I see, you had the Macbook Pro with 64GB memory, or something in that weight class. Great, let’s start prompting your model then. It won’t work as quickly as ChatGPT, but it should manage a comfortable ~20–30 tokens/sec, slightly slower than reading speed but useable.

Unfortunately, as you ask more and more questions within the same session, it will run more and more slowly, and eventually it will crash. You see, the model generates a representation of the entire input, called the KV cache, which stores its computed values for how each token in the input relates to other tokens in the input. This is estimated to take up ~12GB for 32K tokens, so ~96GB if using the full 256K context length.

Yeah, this isn’t for the faint-hearted.

Issue summary: Proprietary models do not have their weights published publicly, while open-weight models do. Various runtimes are available for download, and can run models that have a compatible file format. But models are extremely compute- and memory-intensive, requiring extremely high-end hardware and capacious memory to run.

This is the pessimistic view. Next issue, we look at some optimizations that are available even to newcomers to enable models to run faster and with a smaller memory footprint.

What I’ll be covering next

Next issue: Issue 181: Quantization

Issue 179: Agents

2026-08-10T08:00:00+08:00

Previously: Thinking/reasoning models are those that have been trained on examples of how to think about different problems in different domains, or plan and execute complex tasks. They often use tools to aid them in goal tracking and updating. The full thinking trace from the model may be removed or hidden to present a more legible response to the user.

Let’s review the ingredients we have so far:

A large language model (Issue 170) or multimodal model (Issue 177): a next-token predictor that takes input tokens and keeps generating output tokens which feed back to the input
Training data, which the model is trained on to pick up general patterns through unsupervised learning (Issue 171), and then steered to avoid harmful output and generate useful output through the use of labelled training data through supervised learning (Issue 174)
A runtime (Issue 175), which handles multiple responsibilities:
- parsing the model output to block it if found to be harmful
- formatting the text for display to the user
- separating and executing tool calls (typically in an isolated container), and injecting the results back into the input (Issue 175)
- processing thinking tokens, removing or hiding them (Issue 178)
Other optional runtime extensions, such as those that add retrieval-augmented generation (RAG) capabilities (Issue 176), or add information that the model remembered about the signed-in user

What does an agent do?

AI Agents

agent(n.)
late 15c., “one who acts,” from Latin agentem (nominative agens) “effective, powerful,” present participle of agere “to set in motion, drive forward; to do, perform; keep in movement” (from PIE root *ag- “to drive, draw out or forth, move”).

The term “agent” means “one who acts”. So agents are software applications, comprising a trained model and a runtime. We can broadly think of the model as the “brains” of the partnership, and the runtime as the “body”.

Because agents need a computer (physical or virtual) to “act”, these software applications are typically installed on a computer, although they may also include a web interface to allow users to control them remotely.

The model has remained conceptually similar as I went from Issue 170 to here, but the runtime is picking up more and more responsibilities. So as not to muddy the terms, I’ll keep the runtime focused on the model: processing the output, executing tool calls and injecting results, re-invoking the model if it has not reached a stop token, and any RAG if implemented. Everything else that we are adding today, that makes the agent an effective partner and piece of software, I’ll explain under the label harness.

The model

Some harnesses make it easy to swap out the underlying model, allowing the model to run the agent harness with different models. Many model providers have standardized on OpenAI’s API (Issue 4) so as to make their models easily accessible to programmers.

While state-of-the-art models are capable enough to not require a more specialized version for agentic use, the agent harness usually provides a special system prompt for this purpose. This special prompt includes information on the use context, on the tools available to the model, and other pertinent information to guide the model and keep it on task.

The runtime

A runtime used within a harness needs to include additional features: the ability to pause or stop the model, to understand access control configuration (which tool calls require user approval) and route matching tool calls to the user for permission grants, and introspectability: allowing the harness program to check the state of the runtime and model.

The harness

When a user uses agentic software, the harness is what they see. That means the harness handles typical software responsibilities:

it handles installation and initial setup, allowing the user to select a directory that the agent will begin working from
it handles extensions/plugins that the user may wish to install, making the tools/MCPs (Issue 175) available to the runtime
it handles file uploads (and any necessary format conversion or resizing), request customisation (e.g. enabling extended thinking), other request-related settings
it handles the model output through the runtime, displaying to the user tool calls and their results, any visible thinking traces, and any permission requests which come from the runtime (remember that the model remains unaware of these). If the API supports it, the harness streams these to the user, allowing them to see tokens as the model outputs them, without having to wait for the model to finish the entire response
it provides an interrupt mechanism for the user to halt the runtime if the model is going off-track, or to queue up more messages for the runtime to inject into the request at an appropriate juncture
some harnesses may support agent memory features, giving the agent tools to write information to its internal memory, and retrieve the information when required
harnesses for continuously running agents may include features for setting the wake-up interval of the agent, e.g. invoking the agent every 30 seconds with standard instructions to check for outstanding tasks and complete them
harnesses that integrate with external services will include features for receiving requests via email, WhatsApp, Telegram, or other channels, passing them to the agent and returning the response when it is ready.

What an agent does

… I don’t know what to say here. By itself, a model can do nothing besides generate text. When embedded in a harness+runtime, what it can do is limited by the tools it has available—remember that the model relies on the runtime executing its tool calls to have any effect on the world.

With simple toolsets (primarily a commandline tool), the agent can plausibly:

read, edit, and delete text files on the computer
search through files on the computer
check the computer’s stats, such as memory usage, free space on disk, CPU usage
troubleshoot or diagnose computer issues
perform a web search or retrieve a web page

If given the appropriate tools and permissions from the user, the agent can also:

install or uninstall software on the computer (through the commandline)
download source code, compile it, install it, and run it
run a server on the computer, handle web requests, return responses
read, write, and test code
push code to a code repository
add bug reports or issues to a task board, or read existing ones from it
send requests to an API (if authenticated by the user), and thus execute any supported action through the APIs of Google Drive, Dropbox, Notion, and other services (Issue 6)

With more advanced tools or MCP servers that handle the complex details, an agent can even:

be registered as a plugin in Adobe or Microsoft Office software, reading and editing documents
work with PDF files
fix bugs

When provided with detailed explanations of how to perform complex tasks (typically through a skill file that the agent can read), the agent can plausibly:

analyze large datasets
follow company workflows
scan software or APIs for vulnerabilities

… Why haven’t they taken over the world yet?

Because most people aren’t using them!

… Just kidding, there are other reasons too. For example:

Most complex tasks aren’t described in skill files that are agent-readable, or are not well described
Many of the advanced tools or MCP servers that are needed don’t exist, e.g. those for editing PDF files reliably. If they exist they aren’t always reliable
The really effective tools might be hyper-customized for the tool author and not as useful for others
Most users are used to doing things themselves, and don’t have enough experience with an agent harness to be accustomed to instructing one
Users might not know that it is possible to do something, and have not considered asking an agent to do it
Agent models still have limited context windows (even a context window of 1 million tokens can fill up quickly with a sufficiently complex task), and ways to enable a model to keep relevant task details in context while removing irrelevant details are still being studied
The model might not have been trained on a particular task, and its general reasoning capabilities might not be sufficient to carry out the task effectively
Agent harnesses tend to run in the commandline, or be designed primarily for programmer use, thus scaring layfolks away
…

Agent capabilities tend to be emergent. That means researchers and frontier labs can train a model to carry out tasks A, B, and C, and a user giving the agent a different kind of task discovers that it is also effective at task D but not task E.

Generally, a question can “can an agent do F?” can’t be answered definitively prior to actually asking the model to do F. And even if one person fails to get the agent to execute the task successfully, another person might succeed, because they asked differently, because they are familiar with the terminology required to instruct the agent, or for some other reason.

All of this is still ongoing research work: agents only really took off in 2025, when Anthropic released Claude Code which became the first generally capable agent. Since then, every day users are discovering new things that it can do. The things that it can’t, Anthropic and other frontier labs are still training it to be able to do them.

Issue summary: Agents are software applications that comprise a harness, a runtime, and a model (typically accessed through an API instead of directly executed on the computer). They enable a user to type in a request or send it by other means and thus instruct the agent to carry out a task on the computer until completion. The capabilities of agents are limited by the tools available to them.

You now have a pretty good idea of all the pieces involved in getting an AI agent to do things. The part I can’t authoritatively tell you about is what they can or can’t do, because that is still changing every week as frontier labs continue to train more capable models and agent harnesses continue to add more tools and features.

If you’re curious, consider trying them out. You could search for an online guide, or let ChatGPT/Claude help get you started.

What I’ll be covering next

In ten issues, I’ve walked you through the key concepts that help you understand what AI agents do. With three issues left to go, what else should I cover?

Some questions I’m anticipating, or have fielded some variant of:

Can I run my own AI model?
Why can’t the AI do <thing>?

Question 2 has a boring answer and an interesting one. The boring answer is “because it hasn’t been trained yet”. The interesting answer is … not really suitable for a newletter titled Layman’s Guide to Computing, because it’ll be rooted in philosophy and cognitive science. In a different publication perhaps.

So let’s tackle question 1, which will draw on computing concepts I’ve covered in earlier issues and give you an idea of the kind of compute and memory capacity needed to run a model.

Next issue: Issue 180: Running a model

Issue 178: Model thinking and reasoning

2026-08-03T08:00:00+08:00

Previously: Multimodal models represent text, image, and audio tokens alongside each other in their embedding space. The model uses the input tokens, regardless of type, to calculate the next output token. Multimodal models typically only output text tokens in their response, delegating to more specialized models for image and audio generation if necessary.

In this issue we fill in the last piece of the puzzle needed to “unlock untold economic value”, if the AI labs are to be believed. Let’s talk about how models “think”.

Making thinking happen

You’re in a lesson. The teacher asks a question, something innocuous really: “What’s the value of X?” All eyes are on you. You reply with the first answer off the top of your head. Wrongly, it turns out.

Your teacher could mock you at this point, but if they decide to get you to think harder instead, what do they say?

As it happens, this trick works on LLMs too. The ways we try to get people to think harder appear to be well-represented in books, on the internet, and in other media that the models are trained on.

What this means is that you add any of the following:

“think step by step.”
“think carefully.”
“check your assumptions before you answer.”

And it influences the model’s next token. It begins to output phrases like:

“Let’s break this down.”
“First, let’s identify what’s being asked.”
“One way to approach this is…”
“Before answering, let’s consider…”
“Let’s work through the problem systematically.”

It begins to imitate the patterns of careful thinking that it picked up during training. Surprisingly (or perhaps unsurprisingly), this improves the model’s answer in many cases! It generates a much longer answer, taking more time and using more compute in the process—this is what AI folks call “spending compute for intelligence”. If you don’t have a large LLM, you can have a smaller LLM “think harder” and come up with a better answer.

Where thinking breaks down: insufficient examples

When this trick was first discovered, early adopters experimented with different prompt patterns, trying to get models to generate longer responses that led to better answers. But thinking doesn’t always succeed. We’ve all had the experience of trying to think through some difficult math problem, writing lots of working that ultimately led nowhere.

GPT-3 may have been trained on a really large dataset, but most webpages and books are not showcases of how to solve difficult problems through clear thinking.

So it’s back to supervised learning again. Look for examples of how to solve difficult problems. Recruit experts and have them write down their chain of thought for different kinds of problems. Then train the model on this labelled data, so that it doesn’t require users to be clever with prompts to extract this thinking. Train the model to differentiate between requests for a quick answer, and requests requiring deeper thinking.

Thinking vs. planning

A model that is able to think longer and in a more disciplined way to produce a better answer is able to tackle harder questions. These are the models that were solving olympiad questions that humans struggled to solve.

But this isn’t enough for another kind of challenge: long-horizon tasks that involve multiple tool calls, putting together information and feedback from multiple sources, maintaining task coherence and a consistent goal orientation throughout the process, and finally producing output in the correct format.

For example, filing tax returns involves digging through a large number of financial documents, remaining aware of legal requirements for filing, extracting relevant information, and putting it together following those requirements. None of the steps along the way involve extreme intelligence or genius insight, it’s just a lot of tedious steps and details to keep track of. Along the way, detours and failed tool calls threaten to derail the model; it can get stuck researching an edge case rule, debugging a failing tool call, or get distracted by other things.

This requires the model to plan. It has to take an end-goal, break it down into phases and steps, think about immediate steps, execute them and observe the result, decide next steps, repeat, …. Along the way, it has to keep track of goals and sub-goals (usually aided by task management tools), be able to tell when they are met and check them off the list.

Books and websites seldom contain detailed worked examples of how to do this, so the model has to be trained with labelled data (again!), given examples of planning steps through supervised learning until it is able to reproduce them reliably.

Hidden vs visible thinking

Frontier labs found that showing the full thinking process to users isn’t always beneficial. For example, the full thinking trace—tokens that constitute the analysis and are not part of the final answer—could be really lengthy. Users tend not to like that; they want to see the key steps for a quick check, and then the final answer.

Perhaps the full thinking trace includes mistakes the model made and corrected later, erroneous tool calls that it subsequently fixed, search tool calls which the user does not need to see the full contents of, etc. In other cases, frontier labs may have found ways for the model to output a more efficient form of thinking with tokens that is not human-readable.

This means one more step in the runtime: detecting and processing thinking tokens. If the model is trained to demarcate thinking tokens with a special start and end sequence, e.g. <thinking>...</thinking>, the runtime may look for it.

Once detected, this hidden thinking may be removed, summarized (with a different model), or collapsed to take up less space in the user interface.

Issue summary: Thinking/reasoning models are those that have been trained on examples of how to think about different problems in different domains, or plan and execute complex tasks. They often use tools to aid them in goal tracking and updating. The full thinking trace from the model may be removed or hidden to present a more legible response to the user.

This really is the primary concept behind thinking/reasoning models: more supervised training to output a sequence of tokens that lead the model to a useful answer.

If this sounds simple, that’s because most of the magic is in the model training: crafting and labelling training examples, and then training the model on them, is a much more complicated process than it sounds, and I’m excluding it from this issue because it is very technical and not suited for a newsletter named Layman’s Guide.

Now you know what a model is doing when you activate a feature named “Extended Thinking”, or switch to a model that is described as a thinking/reasoning model.

What I’ll be covering next

Next issue: Issue 179: Agents

Finally we can talk about this term, “agents”, and what differentiates them from a model. If you’ve heard this term before and wondered what goes into one, subscribe to be notified when I lay it bare ;)

Issue 177: Multimodal models

2026-07-27T08:00:00+08:00

Previously: In retrieval-augmented generation (RAG), the runtime performs a search with the user’s request to retrieve relevant chunks from a set of documents from a knowledge base. The chunks may be further re-ranked by the runtime before finally being included in the LLM’s input. One alternative to RAG, where information lookup happens outside of LLM generation, is to provide the LLM with search tools instead, and rely on its judgement to use them well.

Multimodal models. Try saying that three times quickly. It’s quite a mouthful, but if you’ve managed to keep up so far, it’s really not complicated, so I don’t expect this to be a long issue.

Multimodal models

While a large language model works only with text tokens, a multimodal model can work with other types of tokens as well. We’ve previously covered what text tokens are and how LLMs use them (Issue 172), so let’s focus on image and audio tokens.

The approach is similar, really: text gets broken up into common repeating patterns. Image and audio likewise gets broken up into common repeating patterns. Each common repeating pattern is represented by a number, or set of numbers, and located in an embedding space (Issue 172).

Image tokens

There are a variety of approaches for tokenizing images. A common way to do this is to break it up into 16×16-pixel patches. Each pixel has three values representing red+green+blue (Issues 43 & 44), so each patch is a sequence of 16×16×3 = 768 values.

Each unique combination of 768 values constitutes an image token. During training, these image tokens appear alongside other tokens (text, image, audio), and the model adjusts its embedding parameters to locate semantically similar tokens in close proximity.

During inference (Issue 173), hidden layers represent more abstract patterns that the model identifies: lower layers may encode information about edges, while higher layers capture information about shapes, textures, and even objects.

Audio tokens

While intuitively it seems natural to chunk audio into 1-second or even sub-second samples, in reality 1 second of audio contains 44,100 samples (Issue 45) which is still far too large.

Instead, audio is usually converted from waveform representation (amplitude vs time) into spectrum representation (frequency vs amplitude at a snapshot in time). The spectrogram gets split into shorter windows of a few milliseconds each (a few thousand samples per window). The values of each frequency in that window then naturally form an audio token, which appear alongside other tokens in training and get represented in embedding space the same way as other tokens.

Multimodal models need supervised training

Supervised learning plays a big part here. Images, audio, and text seldom appear together in unlabelled training data (except in video), so associating images and audio with text relies heavily on manual labelling. This is why multimodal models took so long to emerge.

During inference, all tokens regardless of type are represented as embeddings, and the model uses the input tokens to calculate the output token.

Multimodal models vs image/audio generation models

An app like ChatGPT can take user-uploaded image files, reference them in their response to the user, and then generate an image, or even convert the response from text to audio. But this seamlessness is an illusion; at the backend, these do not use the same model.

Multimodal models can take input tokens of multiple types, but typically only generate text in response; users do not expect image patches or audio snippets in the response, and would not know how to interpret them.

Instead, image and audio generation use different kinds of (non-Transformer) models, which might be worth exploring briefly in a future issue, but not this one.

Issue summary: Multimodal models represent text, image, and audio tokens alongside each other in their embedding space. The model uses the input tokens, regardless of type, to calculate the next output token. Multimodal models typically only output text tokens in their response, delegating to more specialized models for image and audio generation if necessary.

There you go. Multimodal models demystified: once you figure out how to tokenize something alongside text, and give the model lots of labelled data to associate it with text tokens during training, you can create another modality for your model. This sentence hides months of complexity that AI labs tackle, because that’s what you’re reading Layman’s Guide for, isn’t it?

What I’ll be covering next

Next issue: Issue 178: Model thinking and reasoning

We’ve covered retrieval-augmented generation (RAG), and now we’ve covered multimodal models. Text, images, audio: Check check checked. Tools? You bet.

We’ve got almost all the ingredients to assemble an AI to scare the economic labor pool, but we are still lacking one final piece of the puzzle: how do LLMs “think”?

Issue 176: Retrieval-Augmented Generation (RAG)

2026-07-20T08:00:00+08:00

Previously: LLMs can be trained to make tool calls, using the same training data used to train code assistants. The tool specifications are injected into the system prompt that is passed to the model, along with guidance on when to use a tool. Tool calls generated by the model are interpreted by a runtime that detects and executes them, then passes the results of the tool call back to the LLM in the next input.

We mentioned hallucination—mentioning non-existent publications, stories, or facts as though they were real—as one of the pitfalls of GPT-3, and mentioned how reinforcement learning with human feedback (RLHF) helps to combat some of these tendencies in general use.

These days, ChatGPT, Claude, and other chatbots also allow you to upload documents. The runtime supporting these chatbots helps to extract text from the documents with supporting context and include them in the system prompt, allowing the chatbot to answer from the document’s contents to combat hallucination.

In some cases, the document may be too large. In other cases, a company may have a large set of documents the LLM should answer from, but they are too large to all be included in the system prompt.

In such cases, retrieval-augmented generation (RAG) provides an alternative way to inject relevant information into the LLM’s system prompt.

Retrieval-Augmented Generation (RAG)

Like other LLM capabilities, this one comes from the runtime. The LLM plays no part in this and has no control over the process.

The source documents are chunked, and each chunk analyzed to create an embedding. Parts of the document that are closely related have embeddings located more closely.

Before the user’s input is passed to the LLM, it is parsed by the runtime and analyzed into an embedding. This embedding is used to retrieve relevant parts of documents; other information may be used to determine relevant portions as well.

Instead of embedding entire documents, only these relevant portions are included in the system prompt for the LLM to answer the user’s query. In more advanced implementations, the chunks may be further re-ranked by importance and other criteria.

All of this happens in the runtime, beyond the LLM’s token generation loop.

Limitations

When it works well, it works really well: the LLM doesn’t hallucinate, quotes from the source, and if the source is well-tagged, it can even cite from the correct page and paragraph.

But there are ways it can make mistakes too. If no matching documents are found and the LLM isn’t aware, it may hallucinate unless the runtime handles this well. On the opposite end of the spectrum, it may find too many results and not know how to select the most relevant ones. The documents themselves may be contradictory, incomplete, or require too much unwritten context. And lastly, it may miss important nuance found elsewhere in the document, or in other documents, that did not surface in the embedding search.

Alternatives

Still, in cases where you can’t fit entire source documents in the LLM context, what other alternatives do you have?

Then it’s back to a set of tools for your LLM to use for searching the company knowledge base, read documents, and manually extract relevant portions. Naturally, your LLM will need to be trained on a dataset of positive examples of tool usage (Issue 175). In contrast to RAG, where retrieval is automatic and built into the runtime, here you are relying on the LLM’s judgement of which tool to use, and when to use it.

Issue summary: In retrieval-augmented generation (RAG), the runtime performs a search with the user’s request to retrieve relevant chunks from a set of documents from a knowledge base. The chunks may be further re-ranked by the runtime before finally being included in the LLM’s input. One alternative to RAG, where information lookup happens outside of LLM generation, is to provide the LLM with search tools instead, and rely on its judgement to use them well.

Okay, that’s RAG de-mystified. It’s a program that runs a search on the user’s request and injects relevant chunks from the knowledge base into the LLM’s input, beyond the LLM’s control. Now you can speak about RAG a little more informatively.

I avoided discussing RAG’s performance, because results vary. For every detractor you can also find a supporter! Is it going to work well for you? You probably have to try it yourself, or find a consultant who can better advise you.

What I’ll be covering next

Next issue: Issue 177: Multimodal models

Many chatbot models accept image and even audio alongside text. How does this work? De-mystifying in the next issue, so stay tuned!

Issue 175: LLM tools

2026-07-13T08:00:00+08:00

Previously: Through reinforcement learning with human feedback (RLHF), the LLM is trained on labelled data until it can reliably follow instructions, avoid harmful output, and follow other desired behavior. A system prompt provides guidelines for output. The user’s prompt is inserted into a templated prompt and passed to the LLM, which generates text in a markup format that a display system can understand. A chat interface wraps the entire system to create the illusion of a responsive chatbot.

A chatbot is fun to use for a while, but if all it could do was talk we wouldn’t use it for very long. For starters, it would hallucinate a lot, or give outdated information, because it couldn’t access the internet or do a web search. What would it take for models to be able to use computers to do that?

While this problem was being actively worked on, LLMs were also being trained to generate programming code. It turned out that code, being text-based, was fertile training ground for LLMs. They were improving at it too; while early versions still failed at producing large yet coherent programs, many were able to generate boilerplate code with correct syntax already.

LLMs as tool-using models

For a LLM to use a tool, it needs to be trained to:

state the tool to use
pass the appropriate options
interpret the result, when passed back to the LLM (in the next request)

for example, to use a weather tool, the LLM needs to be able to:

say “use the weather tool”
pass options: “location: Singapore, SG, show me the temperature and humidity as well”
interpret the result: mostly self-explanatory, but e.g. it may need to understand if the location provided in the output may be the nearest known location and not the user’s actual location

It turns out that the first two problems are already known and solved: the first text-based interfaces were invented in the 1970s after all, and programmers have always needed a way to invoke programs through a text-based interface. They already had one, in the form of the command line (Issue 15) and the rich syntax that was already built around it. And they had another, in the form of the function call syntax that almost all programming languages had standardized on, like check_weather(location="Singapore, SG", show_temperature=True, show_humidity=True). And training data already existed for both of these, in the form of open-source code readily available online in code repositories (Issue 19).

The structure of a tool-using LLM

For a LLM to be able to output tool calls, you need:

tool specifications, usually injected through the system prompt, telling the LLM the available tools and their options
guidance on when to use each tool, typically through further instructions in the system prompt, through RLHF (Issue 174), or both
familiarity with the tool call syntax used, typically trained into the model through RLHF.

In the system prompt, you would include:

[...]

## Tools available

- `check_weather(location: text, show_temperature: boolean, show_humidity: boolean)
  Check the weather at the given location. Example: "Singapore, SG"
  Pass show_temperature=True and show_humidity=True if temperature and humidity are required in the output
- ...
- ...

Providing a rich set of tools without using up too many tokens is a tricky design balance that requires regular tweaking. In any case, the model is then trained to output the tool calls in a specially marked section of their output.

Invoking the tools

At the point when the model outputs the stop token and the program stops using it to calculate more output tokens, its involvement stops. The program interprets the model’s output, separating the tool calls out, and passes them to another system.

You see, tool calls can be pretty dangerous, especially if they enable the model to carry out destructive actions. A shell command like rm -rf / on Linux or Mac could delete the entire operating system, or important subdirectories. A delete_database tool could do what it says, but with the wrong target specified. So it’s common to have a system that examines the tool call and attempts to determine if it is safe. In a code assistant, this tool call might be shown to the user for explicit approval. In a web-based chatbot like ChatGPT, tool safety is usually handled by another system instead.

Once validated, the tool needs to be executed on a computer system. This computer system needs to have the necessary programs installed. It should also be isolated against potentially destructive actions. We’ve covered how containerization (Issue 149) enables this to be done; an isolated container for each session where necessary.

Finally, the result of the tool call, whether success or failure, is captured and then added to the token sequence which is fed back into the LLM.

This all sounds pretty neat, but with one caveat: only the chatbot provider (OpenAI for ChatGPT, or Anthropic for Claude) can pass these tools to the LLM. Third-party integrations, such as with GitHub or Google Drive, would be tricky for OpenAI/Anthropic to design on their own, yet unsafe for external parties to inject into the system prompt.

Integrating third-party tools

So in Nov 2024, Anthropic proposed another standard: the Model Context Protocol, a way for external parties to specify a set of tools that work together to enable access to other web-based or software-based systems.

When the user registers a MCP server through a graphical or text-based interface, the system reads the tool specifications from the MCP server, injects them into the system prompt, and from there they work like other tools accessible to the LLM.

The runtime

Notice that none of this is mediated or controlled by the LLM. It follows instructions, generates tool calls with the correct syntax in its output, then sees the result in the next input, seemingly by magic. The LLM is operating in a virtualized environment controlled by an external system that doesn’t have a standardized name yet. For now we’ll call it the runtime.

Tools and toolsets make or break a LLM-based assistant. They are the only way a LLM can take actions, get data, and otherwise make sense of the external world. A LLM without any tools is analogous to a human in a sensory deprivation tank—without information from the outside world, even human beings quickly begin to hallucinate.

Issue summary: LLMs can be trained to make tool calls, using the same training data used to train code assistants. The tool specifications are injected into the system prompt that is passed to the model, along with guidance on when to use a tool. Tool calls generated by the model are interpreted by a runtime that detects and executes them, then passes the results of the tool call back to the LLM in the next input.

From here it’s another 3 issues before we get to the topic of the year: AI agents. Before I get there I want to cover three more buzzphrases: retrieval-augmented generation (RAG), multimodal models, and reasoning/thinking models.

By now I hope you’re starting to see that LLMs really are next-token predictors underneath, and all their actual capabilities—the ones that let them know what is happening in real-time and change things in the world—are provided through the runtime. As the runtime grows more powerful and capable, LLMs must also be post-trained (using reinforcement learning a.k.a. RLHF) to use them well.

What I’ll be covering next

Next issue: Issue 176: Retrieval-Augmented Generation (RAG)

Issue 174: Reinforcement Learning

2026-07-06T08:00:00+08:00

Previously: OpenAI discovered, through models GPT-1 to GPT-3, that scaling compute and (training) data alone was sufficient to sharply increase the capabilities of a LLM: the transformer architecture and unsupervised learning together resulted in a model that was alarmingly intelligent.

Mechanistically, a LLM is a next-token predictor: from a set of parameters, and an input sequence of tokens, a program continually calculates the next token, which gets appended to the input sequence, and the new sequence gets fed in as the input again, until a stop token is generated.

OpenAI had discovered that by training GPT-3 (with over a hundred billion parameters) on a very large dataset (hundreds of billions of tokens), they ended up with a next-token predictor that appeared to generate readable, sensible text.

But that doesn’t mean that GPT-3 was ready for public use yet: what about those hallucinations, that toxic output, the prompt injections that caused it to ignore OpenAI’s instructions?

Reinforcement learning

Unsupervised learning may have created a genius model, but now OpenAI had to fall back on supervised learning to make it useful.

In 2022, OpenAI researchers submitted a paper titled “Training language models to follow instructions with human feedback”:

Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.

It was back to painstaking human labelling of data again, getting humans to write desired outputs and label toxic content to train the model on. Through this process of reinforcement learning with human feedback (RLHF), InstructGPT was born.

RLHF was necessary to adjust the model parameters so that instructions like “explain …” were treated as guiding instructions rather than starting text that the model would steer away from.

Data cleaning and labelling

Prompt injections would continue to be an issue, but in the meantime OpenAI could address toxic content by first cleaning up the dataset to remove toxic, low-quality content and add other high-quality data sources.

This need for new, novel data sources still drives frontier machine learning labs today, who pay for high-quality data sources they can use to train their models.

Creating a chat assistant

InstructGPT was ready to take instructions. But … how do we get instructions from the user? How do we pass the responses back to them? The model was trained, the API was ready … but OpenAI needed a graphical interface, a familiar mental model of interaction that the public could use intuitively.

One already existed: chat apps like WhatsApp were popular at the time, and users intuitively understood a chat input when they saw one. But how could OpenAI get InstructGPT to respond reliably like a chat assistant with a consistent personality and style?

It turned out the answer was already in the training data.

Prompt framing

There was a lot of training data in the form of interviews, movie scripts, things that look like:

Alice: Why do cats like to jump on furniture?
Bob: …

And in many cases, arranging the user’s question along with a system prompt like so was enough to have the LLM roleplay a helpful assistant:

# System Prompt

You are ChatGPT, a large language model trained by OpenAI. [...]
Knowledge cutoff: 2024-06
Current date: 2025-09-03

Personality: Engage warmly yet honestly with the user. [...]

User: <user's input>
Assistant:

Pass the above prompt to InstructGPT and it helpfully follows the pattern, demonstrating GPT-3’s capabilities token after token, until it reaches a stop token. The program then takes the tokens generated after the prompt and displays them to the user.

What if the output is toxic, hallucinatory, or otherwise unacceptable? Back to RLHF again.

The ChatGPT wrapper

Even with the API in place, some window dressing is still needed. The LLM, being a language model, can only generate text, not format it. Most LLMs are RLHF-trained to generate text in a markup format (such as HTML or Markdown). The display system takes the LLM’s output, interprets the markup, and displays it as something the user can understand, making headers bold and larger, adding bullets or numbers to lists, formatting code accordingly, and so on.

The wrapper can also do some helpful things, like filter the LLM’s output for harmful text and block it from appearing, as a kind of last-layer defence against offensive output. Add a login screen, a way for users to access past chats, a few other niceties …

Finally, OpenAI launched ChatGPT in November 2022. And the world as we knew it changed forever.

Issue summary: Through reinforcement learning with human feedback (RLHF), the LLM is trained on labelled data until it can reliably follow instructions, avoid harmful output, and follow other desired behavior. A system prompt provides guidelines for output. The user’s prompt is inserted into a templated prompt and passed to the LLM, which generates text in a markup format that a display system can understand. A chat interface wraps the entire system to create the illusion of a responsive chatbot.

ChatGPT was the beginning of many other features to follow. Among them: multimodal models, and tool calls. The former is easy to understand, so let’s unpack how LLM tools work in the next issue.

What I’ll be covering next

Next issue: Issue 175: LLM tools

Issue 173: Training, Inference, and Scaling

2026-06-29T08:00:00+08:00

Previously: A model does not see letters or words, only tokens. These tokens are typically generated from user input through a pre-tokenizer program. Tokens are represented in the model as embeddings, a sequence of numbers representing the token’s position in the embedding matrix. The model uses each token’s embedding, and its surrounding tokens, to infer its meaning in context.

Model Training

In issue 171, I explained a little about how model training happens:

we pass tokens generated from text to the input
we pass the expected output (in supervised training), or the subsequent tokens (in unsupervised training)
the model generates output from input
we compare the model’s output to the expected output
we adjust model parameters
we repeat from step 3, attempting to adjust parameters to have the model generate output that is closer to the expected output

Notice that there’s a “forward” step: step 3, where the input “feeds forward” to each hidden layer. Here, the computer calculates the values for the next layer based on the values of the previous layer and on the model’s parameters between the two layers. This is repeated for each layer until we get to the output.

Notice also that there’s a “backward” step: step 5, where we could adjust model parameters randomly—inefficient! Instead, the mathematical technique of gradient descent gives us a more optimized way to adjust the last hidden layer based on how it would affect the output. The second-to-last hidden layer is then adjusted with the same technique, based on how it would affect the last hidden layer. And this is repeated all the way to the first hidden layer. This “backward trickling” is called backpropagation, or “backprop” more informally.

The above steps are repeated for each input:output data pair (supervised training) or for each token sequence run (unsupervised training). That’s a lot of repeated steps; researchers often have some shortcuts they take to speed up the process. Even then, it is still too many for a typical CPU to complete in a reasonable time; the big labs use specialized GPUs instead (Issue 123), resulting in training runs that take weeks to months to complete on multiple GPUs for today’s state-of-the-art LLMs.

This is not a cheap hobby.

Inference

Fortunately, using a model is a different affair, involving only steps 1 and 3 of the above. No backpropagation, no repeated runs. Just pass the input in, run one forward step per output token, repeat until done. This process is called inference, and is what happens when we users send a request to ChatGPT or Claude.

(Hang on, how does a model “know” when it is “done generating text”? In model training, a special token, e.g. <EOS> for end-of-sequence, is inserted at the end of text. When this token is detected in the program, it stops invoking the model.)

Needless to say, inference is much cheaper than training, which is why we are able to enjoy many of these models for free.

Scaling up to GPT-2

GPT-1 had 117 million parameters, was trained on ~7,000 books (about 5GB), took a few days to complete training on 8 GPUs, costing $0.5 mil or less.

In Nov 2019, OpenAI released GPT-2, which was the first large language model to capture some public attention. GPT-2 had 1.5 billion parameters (1.5B), was trained on ~40GB of text from the web, and took a few weeks to train on hundreds of GPUs, costing OpenAI $1 mil to $5 mil to train.

GPT-2 was the same architecture that GPT-1 used, only with a larger model (tenfold) and with more training data (eightfold). What they got was a model that:

could perform tasks it was never explicitly trained on (zero-shot learning): answer questions, understand text, summarize, translate (rudimentarily)
could generalize from examples given in user input (one-shot/few-shot learning) without needing supervised learning
showed emerging ability on non-language tasks: counting, basic arithmetic, even some attempts at simple proofs

These are capabilities we take for granted today, but in early 2019 this was cutting-edge performance never demonstrated by any other machine learning model, and certainly not with so little human supervision. This discovery was scary enough that it took OpenAI nine months to fully release GPT-2’s weights, fearing how its capabilities might be misused. The Verge reported: “OpenAI has published the text-generating AI it said was too dangerous to share”, but fortunately in the same article “the lab says it’s seen ‘no strong evidence of misuse so far’”.

The bitter lesson, and GPT-3

These findings prompted Rich Sutton, an influential machine learning researcher, to write a blog post published on 13 March 2019 where he summed up this finding in a single sentence: “The bitter lesson is that general methods that leverage computation are ultimately the most effective, and by a large margin.” Elaborating, he adds “seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.”

A tenfold increase in model parameters and training data led to a surprising leap in capability. OpenAI and other researchers wondered: What if we pushed this to its logical conclusion, and threw more compute and more data into machine learning training?

In Jun 2020, OpenAI released GPT-3, available through their web API (Issue 4). GPT-3 had 175 billion parameters (175B, a hundredfold increase in model size), was trained on a mix of books and websites totalling 300 billion tokens, took weeks to train on hundreds of GPUs, and cost OpenAI up to $12 mil to train.

GPT-3 could:

take instructions given in natural language
reliably tackle many tasks zero-shot (with no examples)
reliably adapt examples given in the user input, and generalize from patterns

It had reached a level of capability that took the focus away from training data and placed it on the user input, called the prompt: without further training, the model could give you a response, the quality of which depended on the quality of your prompt.

Alarming behavior

LLMs had finally reached a point where they were easy enough to use by the general public. But before it could actually launch for public use, there were some concerns to be addressed.

For one, GPT-3 was extremely prone to hallucinations—making up things that never happened, papers that were never written, academic journals that never existed. It also readily reproduced toxic outputs from its data source—the internet (especially reddit and 4chan). It was extremely steerable through the prompt—a little too steerable for OpenAI’s liking, when some users got GPT-3 to leak its system prompt—the instructions that OpenAI prepended to every request guiding GPT-3’s response style and guardrails.

It would be some time before ChatGPT could even launch without dragging OpenAI down with it.

Issue summary: OpenAI discovered, through models GPT-1 to GPT-3, that scaling compute and (training) data alone was sufficient to sharply increase the capabilities of a LLM: the transformer architecture and unsupervised learning together resulted in a model that was alarmingly intelligent.

We are getting closer to the LLMs we know and love/hate today. This issue covered the miracle story of GPTs 1 to 3. If GPT-3 was a child genius, ChatGPT is GPT-3 dressed up for work. Let’s talk about what OpenAI had to do to it for public release—next issue.

What I’ll be covering next

Next issue: Issue 174: Reinforcement Learning

Issue 172: Tokens, the currency of LLMs

2026-06-22T08:00:00+08:00

Previously: The Transformer architecture, unlike previous machine learning model architectures, could generate its next item while processing all previous items at the same time. The technique of unsupervised learning trained models on unlabelled data, letting the model pick up patterns in underlying data instead of having it learn correct answers only, and was much faster than supervised learning. OpenAI applied both these ideas at scale, producing GPT-1, a model that beat best-performing models while requiring relatively little human supervision during training.

Wait—what exactly does a large language model (LLM) work with? Individual letters? Entire words? No, they work with—

Tokens

Tokens are clusters of letters that make up the training data. The large language model (LLM) does not “see” letters or words, only tokens.

Tokens are … quite unlike phonemes, syllables, or other word-fragments you and I are familiar with. They are typically programmatically generated by a separate program (not a model), based on letter-clusters that appear most frequently in the data.

For example, using OpenAI’s Tokenizer tool to visualize the above paragraph gives us this:

OpenAI Tokenizer - text view

OpenAI Tokenizer - token ID view

There is little human-discernible pattern as to what definitively constitutes a token: it could be a single punctuation mark, a letter or two (and sometimes including their preceding space, sometimes not), or an entire word.

Whatever the case, what we see as " you and I", a LLM sees as [481, 326, 357]. A pre-tokenizer program tokenizes all input into numerical values.

Now you understand a little better why ChatGPT struggles to count Rs in “strawberry”, or in any other fruit really.

Embeddings

How does the model tell 481, 326, and 357 apart? How does it store or represent them within itself? Here, I am going to need you to use your imagination. You are familiar with the concept of a scatter plot, yes? A graph that looks like this:

A scatterplot with 2 dimensions
Source: EmbeddedSource

Now imagine a scatterplot with as many data points as tokens. In GPT-1’s case, that’s approx. 40,000 tokens—its vocabulary size. Yes, I know that’s a lot of points, but you can roughly visualize that, yes? Good, that’s the easy part.

Now I need you to imagine the scatterplot with … *checks notes*—768 dimensions. No, that is not a typo, we are talking about a scatterplot with 768 dimensions. Oh, that’s too difficult to imagine? Yeah. Sorry, that’s why I don’t have an image attached. Just try your best 🙏

Essentially that is what a LLM generates as a result of its training. Each token in its vocabulary becomes a data point, and each data point is represented in this 768-dimensional space using 768 decimal numbers ranging from 0 to 1.0. This positional representation using many decimal numbers is called an embedding.

Other uses for embeddings

Embeddings are also not a new idea: they precede GPT by decades, having been conceptualized as early as the 1980s.

Because they’re such a handy and intuitive mathematical way to represent or visualize tokens and semantics, they’re also used often in semantic search engines (which try to infer what you mean instead of what you said), recommendation engines (suggesting similar things based on what you bought or liked), relevance scoring, etc.

How a LLM represents semantics

There’s more to a LLM than this collection of 40,000 embeddings; it forms only a tiny fraction of the entire model. But it is critical to how the LLM “learns” information from the text. Based on where the tokens appear relative to each other in the text, and the higher-order patterns that the model detects through its hidden layers, the model adjusts the embedding for each token, placing semantically similar ones closer to each other and dissimilar tokens farther away from each other.

And because this is a mathematical space with direction (in 768 dimensions), the model can also pick up on analogy to some extent: if you draw a (768-dimensional) arrow pointing king → queen and another arrow pointing father → mother within this embedding matrix, they end up almost parallel. This means the model can solve SAT vocab pairs, giving you “mother” when you give it “king:queen, father:?”

If an LLM relied only on this embedding matrix, it would not be able to distinguish “bat” as a warm flying mammal from “bat” as a piece of sporting equipment. The rest of the model—using the Transformer architecture, you’ll recall from issue 171—uses the tokens surrounding it and their positions to infer the context that “bat” is being used in.

Model pricing and limits

Most ChatGPT/Claude users are familiar with those products as subscriptions, where they pay a certain price per month to use ChatGPT/Claude for some arbitrary amount, and if they use too much too quickly they hit a usage limit and have to wait for it to reset.

But if you are a business, and using the API instead, you’ll be looking at a different page, such as the API pricing page for OpenAI’s API. Notice that prices are typically quoted in units of “1M tokens”, standing for “1 million tokens”. Now you know what those tokens are referring to.

Likewise, when Anthropic explains how usage and length limits work, and tell you that “Claude’s context window is 200K tokens”, you now know what they are referring to. More importantly, you know it doesn’t mean 200 characters or 200 words.

Issue summary: A model does not see letters or words, only tokens. These tokens are typically generated from user input through a pre-tokenizer program. Tokens are represented in the model as embeddings, a sequence of numbers representing the token’s position in the embedding matrix. The model uses each token’s embedding, and its surrounding tokens, to infer its meaning in context.

I would have gone on longer, but I think tokens are a pretty novel concept for most layfolks and deserve their own issue to sit with and digest before we talk about what a model does.

What I’ll be covering next

Next issue: Issue 173: Training, Inference, and Scaling

Issue 171: The first Generative Pre-Training model, GPT-1

2026-06-15T08:00:00+08:00

Previously: Models simplify and represent a relationship between input values and output values. The more complex the relationship, the more parameters the model needs to learn. Models are simplifications of reality, and their performance depends on how well they capture underlying patterns in the data, as well as the quality and quantity of the dataset.

We are going to set aside image and audio models for today, and narrow down to focus on language models in particular, because that’s what sparked off the AI craze.

The pre-2018 machine learning paradigm

I am not a machine learning researcher and can’t tell you what the prevailing research paradigm at that point was. But in open-source and consumer applications, it seemed machine learning models were bespoke:

You started with something specific you needed, like image classification, optical character recognition (OCR), speech recognition, translation, …
You collected a huuuuuge dataset of input-output pairs for that specific task: images and their labels, scanned documents and their text, audio, etc. And by huuuuuge I mean tens of thousands to millions of examples.
After collecting the data you often have to clean it up (remove duplicates, remove outliers, etc.) and label it (e.g. label images with their correct labels).
You then trained a model on part of the dataset, tweaking parameters and trying different architectures (ways of arranging parameters).
You tested the model on the other part of the dataset, passing each input through the model and comparing the output to the expected output, and measuring how well it performed.
You repeated steps 4 and 5 until you were satisfied with the model’s performance, and then you deployed it for use.

This technique of using labelled data to train the model is called supervised learning, because of the need to tweak the model’s parameters (under human supervision) to match the expected output.

There were (and still are) many machine learning models trained this way and used. For example, tesseract is an open-source OCR engine that was first released in 2005. It was trained on a dataset of scanned documents and their corresponding text, and has been used in various applications for OCR tasks. Another example is the ResNet architecture for image classification, which was introduced in 2015 and has been widely used for image recognition tasks.

The Transformer architecture

Before Google’s 2017 paper on the attention mechanism, the prevailing machine learning models had two problematic limitations:

they “looked” at input data one item at a time to produce the output, resulting in slow output generation
because of the above, data that was processed earlier seldom made it through to the end of the model, resulting in a recency bias: the model tended to focus on the most recent input data and ignore earlier input data

The attention mechanism introduced in Google’s 2017 paper allowed models to “look” at all input data at once, speeding up output generation. The same mechanism also computed which parts of the input data were most relevant for producing the output.

Attention was not a new mechanism in machine learning: prior models had used them, but in separate stages, and alongside other mechanisms. Google’s paper was the first to ask: “what if we only used attention everywhere?” The resulting architecture, which they called the “Transformer”, was a breakthrough in speed and simplicity.

Unsupervised learning

Besides the Transformer architecture, another breakthrough was already making its rounds: instead of task-specific datasets, researchers wondered why they needed so many task-specific datasets. Since the data represented different subsets of reality (from different tasks), what if they just trained a single model on a really, really large dataset of text to produce a base model? Then they could fine-tune it on smaller task-specific datasets to produce task-specific models.

This technique, called unsupervised learning, did not require data to be labelled—the model “learns” patterns in the underlying data without human correction, simply trying to predict the next word in the training data given the previous words.

Generative Pre-trained Transformer (GPT)

A few researchers at OpenAI then had the idea to try this pre-training approach on the Transformer architecture. OpenAI built the first Generative Pre-trained Transformer (GPT) model, which they released in 2018. Generative means the model generates output based on input, producing one output item at a time (but processing all inputs simultaneously). Pre-trained means the model was largely trained through unsupervised learning. Transformer refers to the underlying architecture.

They went big on scale: GPT-1 trained on a dataset of 7,000 self-published books comprising 985 million words, representing this data using 117 million parameters—an unheard-of scale at the time (but now considered paltry). It attracted attention from the research community not only by improving on best-performing models on various language tasks, but by improving on all of them, with minimal task-specific training.

Due to the unprecedented number of parameters used, GPT-1 was considered a large language model (LLM), to distinguish it from smaller models that came before. However, this was a research idea, with code that was far from release-ready, and nobody except research-minded folks knew how to get GPT-1 running. And thus, this went unnoticed by the public.

Still, this was a breakthrough: no research lab before OpenAI had the kind of resources that enabled them to try this idea. It did require resources that most labs didn’t have at the time: 8 GPUs, when most labs ran their training on a single GPU.

Issue summary: The Transformer architecture, unlike previous machine learning model architectures, could generate its next item while processing all previous items at the same time. The technique of unsupervised learning trained models on unlabelled data, letting the model pick up patterns in underlying data instead of having it learn correct answers only, and was much faster than supervised learning. OpenAI applied both these ideas at scale, producing GPT-1, a model that beat best-performing models while requiring relatively little human supervision during training.

We’re almost at the meaty part! I kinda snuck in 2 ideas today: the Transformer architecture (a minor part of this series actually) and unsupervised learning. I don’t think you would have wanted to wait a week in between before hearing how OpenAI combined the two, haha … so there you go.

What I’ll be covering next

Next issue: Issue 172: Tokens, the currency of LLMs

Wait—what exactly does a large language model (LLM) work with? Individual letters? Entire words? Find out next issue!

Issue 170: Machine learning models

2026-06-08T11:30:00+08:00

Previously: By better understanding how search bots categorise pages, a website owner can use keywords and other techniques to optimise the ranking of their page for specific search terms.

[Editor’s Note] Layman’s Guide to Computing went on hiatus after its 13th season, because my promise when I began was to write only things widespread enough that I thought layfolks should have an accessible-yet-useful introduction to.

As I wrapped up Season 13 in 2022, the trend at that time was cloud computing. I tackled emulation and virtualization in Season 12, then the internet and online services in Season 13. ChatGPT launched in November 2022 that year. In 2024, I was first asked if I would continue Layman’s Guide again to write about AI. I said no; less than half my colleagues were using ChatGPT or had heard of it, and I didn’t think there would be enough common knowledge for me to usefully write about AI yet.

But now, in 2026, even my employers are actively promoting genAI, my students are using ChatGPT, and by the end of this year it would likely be difficult to find someone who hasn’t heard of Claude Code or Gemini Pro or Codex. I suppose it’s time to add one more season.

There are many explainers out there; I’ve read a large number of them, many very good! But this is Layman’s Guide to Computing, and something I noticed talking to laypeople is confusion: where did this AI come from? Why hadn’t it been invented earlier? How does it work? What can it do? What can’t it do?

So let’s rewind time: I started writing Layman’s Guide to Computing in 2018. A year before that, eight machine learning engineers at Google had published “Attention is All You Need,” the paper that introduced the transformer architecture that underpins most of today’s genAI. In mid-2018, before I started writing, OpenAI was still a non-profit research lab founded by Elon Musk and Sam Altman, and had just released the first version of GPT, a language model that was not yet large enough to generate coherent text. Following Google’s whitepaper on the attention mechanism, they had just released a paper, “Improving Language Understanding by Generative Pre-Training”, that described the architecture and training process for GPT, their first large language model.

It’s a little hard to mentally reconstruct the tech culture and public awareness of the field of artificial intelligence and machine learning at that point in time. So let’s start by understanding: what is a model? How were they used then?

Models

You may not know it, but you were already using models in your daily life in 2017. When the iPhone launched, it had intelligent autocorrect and touch auto-adjustment features. For these features to work, Apple had to train machine learning models on large datasets of text and touch interactions. These models were then deployed on the iPhone to provide the autocorrect and touch adjustment functionality.

What are these models? You would likely have used them in a stats course, perhaps even in high school. If you were ever asked to sketch a best-fit line, a trendline, or a linear regression, you were already drawing a model. To do that, you:

Hypothesized a linear relationship between an input variable x and an output variable y.
Collected data points (x, y) through an experiment.
Represented the relationship between x and y using a mathematical formula (y = mx + b).
Determined the parameters m and b that best fit the data points.

You compressed the data—multiple sets of points (which we call a dataset)—into two parameters, m and b, a simpler representation that captures the underlying relationship. This representation is a model. (We sometimes call it a mental model when we don’t have it formally represented as a mathematical relationship, just a conceptual description.)

Apple’s machine learning models do something similar. An autocorrect model takes a dataset of incorrect words/phrases and their actual words/phrases, and compresses it into a text correction model. A touch auto-adjustment model takes a dataset of touch interactions and their intended targets, and compresses it into a model that can predict the intended touch target based on the touch input.

tl;dr A model takes in input values and produces output values based on patterns it has learned from training data.

More complex models

Of course, more complex models do not use a linear equation or a simple mathematical formula anymore. Machine learning researchers first represent more complex relationships using more complex formulas, such as polynomials or decision trees, which use more parameters.

But for other purposes the input may not be a single variable and the output may not be a single variable either. For example, in image recognition, the input is an image (which can be represented as a grid of pixel values), and the output is a label (e.g., “cat”, “dog”, “car”). An image classifier may have 64 input values (one for each pixel in an 8×8 image) and 10 output values (one for each possible label). The model would learn to map the input pixel values to the correct label based on patterns in the training data. That’s 640 parameters (64 input values x 10 output values) that the model would learn to adjust during training.

This direct mapping of input to output can only take us so far. Perhaps output 1 doesn’t just depend on inputs 1 to 10, but on some intermediate value calculated from them. Now we have to add intermediate layers between input and output, which researchers call “hidden layers”. These layers allow the model to learn and represent more complex relationships between input and output. Each layer can have its own parameters, and the model learns to adjust these parameters during training to improve its performance.

tl;dr More complex models use more parameters to represent the relationship between input values, intermediate values, and output values. Each parameter represents a relationship between two values. The more parameters, the more complex the relationships the model can learn.

Limitations of models

Models sound like mathematical dark magic, and often feel like it too. But like the mathematical models we learned in school, they have limitations.

If you’ve seen how far some of your data points deviate from your best-fit line or trendline, you already know that the model cannot accurately represent all the data points—it is only a simplification. Likewise, all machine learning models are simplifications of reality.

Their performance depends on how well they capture underlying patterns in the data: pick an inappropriate representation for the feature, e.g. a linear formula instead of a polynomial, and the model will perform poorly.

It is also possible to go to the other extreme, adding a complex model with many parameters that fits the training data perfectly, but does not predict other data points well—an overfitted model. You can have a computer come up with a sine-decay formula that fits your first 6 data points perfectly, but wildly overshoot a 7th data point.

Also, their performance depends on the quality and quantity of the dataset. If your data does not represent the underlying reality well enough, missing important patterns or exceptions, or not covering a sufficient variety of cases, the model can pick out the wrong features and learn the wrong patterns. In the early days of machine learning, some researchers found that when training image classifiers on images of dogs and cats, the model began identifying any brown creature sitting on grass as a dog, because the training dataset had many images of dogs sitting on grass, but few images of cats sitting on grass. The model had learned to associate grass with dogs, which was not the intended pattern.

Issue summary: Models simplify and represent a relationship between input values and output values. The more complex the relationship, the more parameters the model needs to learn. Models are simplifications of reality, and their performance depends on how well they capture underlying patterns in the data, as well as the quality and quantity of the dataset.

After experiencing the magic of ChatGPT and other genAI tools, it’s easy to forget, or perhaps not even realise, that fundamentally they are powered by the same underlying principles that we apply in simpler experiments.

But between y = mx + b and ChatGPT, there is still … such a huge gulf of complexity. We still have quite a way to go.

What I’ll be covering next

Next issue: Issue 171: The first Generative Pre-Training model, GPT-1

What was the fundamental insight that made GPT and other LLMs possible? Find out next season ;)

Issue 169: Search engine optimisation

2022-04-23T08:00:00+08:00

Previously: A search engine uses bots to build up a database of URLs and their contents. The search engine uses various algorithms to determine the most relevant results for a search request.

Let’s get to it: why are search results so bad so often?

PageRank

While PageRank is no longer the only or even the dominant algorithm for ranking search results, it is probably the most familiar one to most people and is easy to understand.

PageRank ranks your page higher if there are many other pages (on other domains) linking to it, so one way to raise its rankings is to make other websites with links that point to your page.

Advertising

Naturally, Google runs an online advertising business and it puts advertised results first, so this obviously skews the rankings.

Issue summary: By better understanding how search bots categorise pages, a website owner can use keywords and other techniques to optimise the ranking of their page for specific search terms.

Not very detailed, because I am not an SEO master! This issue is probably the least helpful for more savvy readers, but everyone finds out about page keywords at some point and I hope this issue is that first time for some readers :)

What I’ll be covering next

I am … done! With the main part of Layman’s Guide to Computing at least.

Going even slower

What this means is I still have bits of content to post, but they are no longer substantial enough to form seasons of 13 issues. Instead, I’m going to be posting in dribs and drabs, guided by current affairs and any new technologies that have gone mainstream. This also means my posting frequency is going to be going way down; expect 1–2 issues a month, instead of the current 4–5. And I’m going to be taking the next 2 weekends off for a little celebration of my own, plus some thinking and reflecting.

Better archives

I also realised that Buttondown’s archive page no longer provides a browseable listing of all issues. I am looking into setting up my own issue index instead through Github Pages, on https://ngjunsiang.github.io/laymansguide/. Fortunately I have been using Github to archive past issues, so now it’s simply a matter of editing links for past issues and providing some kind of navigation. More updates on this when I make some kind of progress.

Refactoring

Code refactoring is when developers reorganise and rewrite their code in a way that makes it easier for them moving forward.

You might have noticed how often I am linking to past issues, because so many concepts in computing build on each other. I knew this was going to be a problem when trying to explain computing to anyone, and I had initially envisioned some kind of wiki where pages will link to each other so you can theoretically start exploring from any topic and unpack technical terms as you go.

The main bottleneck for this had been content: it’s rather overwhelming to think about where and even how to start. And that was how this newsletter was born. Thanks to Layman’s Guide, I now have a steady base of content for publishing Layman’s Wiki, and work on it has been slow but progressing. I’ll put up a link through this newsletter once it is ready for more eyes :)

What I’ll be covering next

Next issue: Issue 170: Machine learning models

After a long hiatus, Layman’s Guide returns with Season 14! But first, let’s understand what models are and how they work.

As always, thank you for reading!

Issue 168: Search engines

2022-04-16T08:00:00+08:00

Previously: Fragmentation is likely a contributor of system slowdown, particularly for mobile devices: the database used by most mobile apps tend to store data in many small chunks rather than fewer big chunks, which slows down data search operations. The most effective measure for improving device responsiveness is usually to clear the app cache, so the app does not attempt to read previous data from storage.

Last issue, we shed a little light on the mystery of why phone and laptop systems slow down over time—apparently the way a file database works is to blame?

This week, we switch topics, to look at something we definitely take for granted: search engines!

What is a search engine?

Forgive me if you think I am belabouring the obvious. There is the user definition, which is something like “a search box that answers my questions”. Then there is the developer definition which is more like “an indexed database of URLs, descriptions, and ranks”. Let’s unpack that.

Indexing and arachnids

Another obvious point: at the point when you submit your search query, there is no way the search engine could have trawled the entire internet so quickly to give you the results. It must have know about these pages beforehand … but how?

Search engines run “bots”, also know as web spiders, or web crawlers. These are programs that retrieve pages, makes a note of the content (text, images, links, …) on that page, determine keywords for that page, and then follows the links to other pages and repeats the process. Like following the links of a (very messy) web!¹

When we “search the internet”, we are really searching the database that has been built up by these bots.

Ranking results

So there’s a huge database … how does the search engine determine which are the most relevant results?

Google’s PageRank algorithm is by now well known for disrupting the old directory-based method of organising information, and built for Larry Page and Sergey Brin a sizable empire. This is one way of determining how important/useful a page is: by seeing how many other pages link to it. There are other ways, but this is a layman’s newsletter and I don’t want to dive into a technical analysis and comparison of different ranking algorithms 😬

Instead, consider that there are by now many different algorithms for determining the relevance of database entries for each search, even within Google itself. A search engine is essentially a way to access this database, use one or more appropriate algorithms to determine the most relevant results, and return them to the user.

Issue summary: A search engine uses bots to build up a database of URLs and their contents. The search engine uses various algorithms to determine the most relevant results for a search request.

I know … nobody really cares how a search engine works. We just want to know why it’s not giving us the results we want!

What I’ll be covering next

Next issue: [LMG S13] Issue 169: Search engine optimisation

I am not a consultant for search engine optimisation (SEO), but we can look at some of the ways people attempt to “game the system” and exploit features of the algorithms that search engines use. More next week!

Yep, we all know spiders don’t actually do this. ↩

Issue 167: Database fragmentation

2022-04-09T08:00:00+08:00

Previously: There are easy and quick ways to check the validity of the most common advice for resolving system slowdown. But it still seems to happen even after these tips have been tried.

Last issue, we talked about caches and why they are no longer as effective as a performance-boosting measure.

This issue, let’s look into a solved problem that is not-as-solved on Android: file fragmentation.

Storage fragmentation on mobile devices

I mentioned in Issue 119) that fragmentation is not an issue for faster SSDs; the much lower latency of SSDs makes the retrieval of multiple file chunks from multiple locations trivially easy. But for slower storage devices, whether they are hard disk drives or SSDs over a UFS interface, fragmentation is very real. Unfortunately, its not just a simple matter of defragmenting a disk regularly¹. the source of this fragmentation comes from the way apps themselves store data.

SQLite, a blessing and a curse

SQLite is a popular file-based relational database (Issue 87)) used by many Android apps (including WhatsApp!). This means that unlike other databases, which require a separate database program to run, SQLite just requires the programmer to bundle an appropriate library for their programming language instead of running a separate program. It makes management of data much easier for the app developers, and it is also fast—usually.

However, a 2016 study found that it contributes to fragmentation on the Android filesystem. Even after clearing the app’s data, once SQLite starts storing data again, it tends to store them in many small chunks instead of fewer but larger chunks. On Android devices, which use the slower UFS interface, searching through the database involves reading each database chunk to see if the requested data is there; this gets slower as the database grows, and SQLite adds more chunks to the storage device.

The study ends on a pretty pessimistic note: there doesn’t seem to be any feasible mitigation other than making upstream code changes to SQLite, or designing a different filesystem that overcomes this limitation. But that was five years ago; with any luck some positive developments might have been made into this area. I’m still looking into it!

Issue summary: Fragmentation is likely a contributor of system slowdown, particularly for mobile devices: the databases used by most mobile apps tend to store data in many small chunks rather than fewer big chunks, which slows down data search operations. The most effective measure for improving device responsiveness is usually to clear the app cache, so the app does not attempt to read previous data from storage.

That’s all I’ve managed to find out from reading; further research will either involve detailed tinkering and experimentation, or deeper reading, both of which are time-consuming.

Short issue this time, because if I made it any longer I think it would be more technical than I would really like. Enjoy the brain-break!

What I’ll be covering next

Next issue: [LMG S13] Issue 168: Search engines

How do search engines … know everything?

Defragmentation is discouraged on solid-state devices, as each read/write operation causes the storage medium to degrade. Under typical usage, a solid-state device can be expected to last many years. Defragmentation involves lots of read/write (as you are reading files and storing them elsewhere on the disk), hastens the degradation, and will shorten the lifespan of solid-state devices unnecessarily. ↩

Issue 166: A cause of system slowdown: caches

2022-04-02T08:00:00+08:00

Previously: There are easy and quick ways to check the validity of the most common advice for resolving system slowdown. But it still seems to happen even after these tips have been tried.

Last issue, I walked through common causes of system slowdown suggested by generic tech websites, and explained simple ways of checking if these are really the cause. Quite often, they are not, especially if you are the kind who is careful about internet usage and does regular system maintenance.

So what is going on?

Caches, caches, and more caches

In Issue 39), I explained what caches are: places where you (temporarily) store the result of lookups, so you don’t have to keep performing the lookup again. In context, this referred to DNS lookups: operations that translate a domain name (such as google.com) to an IP address (such as 173.194.217.100).

But caches are everywhere, not just in DNS.

Examples

When you open any Microsoft Office application, it shows you your most recently accessed documents: that’s a cache! (The info has to be stored somewhere, right?).

Windows caches your old installation files “just in case”.

Your browser definitely caches your browsing data (they are called “Temporary internet files”).

And most apps have a cache of some kind or other to hold data which they think you will want to access again soon.

These are generally harmless uses of caches, albeit possibly annoying (when Adobe Premiere caches video files that take up gigabytes of space …). Caches are predicated on the notion that access from a storage disk (on the same device), while slow, is still faster than its alternative (fetching the data from its source, over the internet or through a computationally expensive calculation).

App caches: a short history

Before the internet went mainstream, a lot of software came on compact discs or diskettes. These were slow to spin up, slow to read, and slow to swap out. They were nonetheless necessary in the days when hard disks were still low on storage, and expensive.

As hard disks increased in capacity and decreased in price, it made sense to copy the information from these diskettes or compact discs into the hard disk, and subsequently access the data from disk. This process is what we know as software installation (bet you saw that one coming!).

These days, the disk-is-faster assumption is less true across a spectrum of uses. Especially when it comes to mobile devices.

Mobile storage and internet

A quick note on mobile storage, which I unfortunately missed out on Issue 119) on laptop and desktop solid-state disks (SSDs).

Mobile devices generally do not use the same kinds of SSDs that laptops use. Those are bulkier, use more power (not good for mobile battery life), and run hotter. Laptop SSDs use an interface called NVMe (which iPhones use as well), while most Android devices’ storage use an interface called UFS which is slower (but uses less energy).

At the same time, internet access on phones is speeding up. 4G/5G technology has increased throughput, while maintaining more or less the same latency. On the other hand, with the cloud becoming a staple of everyday life, tech companies have poured immense resources into increasing their servers’ responsiveness, resulting in lower latency for internet access.

Disk caching is no longer king.

Caches on mobile devices

On Android, a tip not mentioned in the previous issue is to clear the app cache. For many apps, the app cache can grow to tens or hundreds of megabytes. As Android device storage slows down over time, the app cache gets slower and no longer serves to speed up the app. So clearing the cache regularly can help keep performance from degrading too much.

However, Android itself uses caches for system processes as well, and these are usually not accessible to the user. That means the only practical way for most users to clear this cache is usually to perform a factory reset.

Issue summary: Caches speed up app operations by storing temporary data on the device’s storage. This assumes that access to storage is much faster than access to the file’s original source. On Android, users can clear an app’s cache, but not the system cache.

What I’ll be covering next

Next issue: [LMG S13] Issue 167: Database fragmentation

This issue, I explained why caches no longer work as well as expected.

Next issue, let’s poke a little deeper: why does clearing the cache work?

Issue 165: The myths of system slowdown

2022-03-26T08:00:00+08:00

Previously: Linux software is distributed through Linux distros. The maintainers of distros maintain repositories of software that have been tested with the distro. Most users will access software in the distro’s repositories through a program called a package manager. So users have full control over when updates and new software should be installed.

Once your laptop hits the magical 1-year window, it somehow seems to … get slower. And slower. Everything takes just a fraction longer. What used to happen near-instantaneously now seems to take a split-second pause. The loading spinner animation feels like it plays just a little longer. And it just gets worse from there with age.

Google search results have a number of things to say about why it happens:

Programs starting up when booting

This is primarily an issue when you are booting up your laptop (Issue 112)) and logging in. Once you log in to your operating system (OS), your OS will run the startup programs (which you can disable), so if you are trying to use your laptop right after logging in, this may cause some slowdown.

If your laptop is still slow about 10 minutes after OS login, this is probably not the cause.

Programs running in background

You can check this easily: open Task Manager (in Windows), and see if CPU, Memory, or Disk are significantly high. You can click on those columns to put the highest-usage processes at the top. Often it is some kind of antivirus or malware scanner that is hitting the disk and causing things to be slow. Wait for these programs to finish (if legitimate), then see if your computer still feels slow.

Insufficient memory

This is easy to check in Task Manager too. On the performance charts on the right, see if memory usage is near 100%. If it is, try closing some applications until usage drops below 100%, and see if system performance improves after a few minutes.

Malware

Malware that slows down your laptop usually does so by taking up a lot of CPU time, or using a lot of memory. You can test this using the above tip too.

If your computer fan is spinning noisily even though CPU/Memory/Disk are low, it’s either malware that is smart enough to hide itself, or blocked computer vent (which is causing your CPU fan to be ineffective). These are easy checks to make too.

OS updates

Just update your OS. Most of the time this does not fix system slowness.

Disk full

Open My Computer (in Windows), which should show how full your system partition is. If it is below 80% full, this probably isn’t the issue, regardless of whether your system is on a hard disk or solid state disk (SSD, see Issue 119)).

File fragmentation

You can run Drive Optimizer on your system partition. Yes, there is a lot of discussion about how defragmenting an SSD is bad; fortunately Drive Optimizer is pretty smart about this and will only carry out defragmentation on hard disks; the option is called Optimize if your system partition is on a solid state disk.

This usually helps a little, especially if the hard disk is quite badly fragmented. If it is only lightly fragmented, or if you are using a solid state disk, this usually won’t resolve any slowdown.

These are some quick ways to check on the validity of these myths about the causes of system slowdown. In many cases they do help! But there seems to be this general malaise that affects even the most cared-for and most well-maintained laptop, after an operating system has been used for years without a reinstall.

Issue summary: There are easy and quick ways to check the validity of the most common advice for resolving system slowdown. But it still seems to happen even after these tips have been tried.

Knowing what we know, this far into this layman’s guide, I’ll look into some other possible causes next issue!

What I’ll be covering next

Next issue: [LMG S13] Issue 166: A cause of system slowdown: caches

Issue 164: Linux, the universal operating system

2022-03-19T08:00:00+08:00

Previously: Software that we use usually comes from the OS makers, or from third-party developers. These two groups of developers are not the same, and might even have conflicting intentions and goals.

Last issue, we looked at the following categories of software that an end-user might need:

System updates
Software by the OS maker (first-party software)
Software from other developers (third-party software)

In general, “trusted” software comes from other sources: compact discs or the internet. While “trusted” software comes from a central, authorised source: usually some kind of app store.

This leads to a lot of conflict over who gatekeeps the list of “trusted” software. This privilege gives app store owners a lot of power, which they claim to earn by investing capital into making the OS possible, and implementing screening and filtering processes to ensure only secure software makes its way into the list.

I will not contest those claims here, but instead invite you to consider: what if it were possible for other app store makers to join in curating lists of software for that operating system?

What is Linux?

Today the term “Linux” refers to a lot of things, which is why we need to clarify here: the term originally referred to the Linux kernel, which is the core of the operating system.

The kernel by itself doesn’t do anything without all the other programs that make it actually useable by an informed user: to create, edit, and browse files and folders, run service programs, etc. This collection of programs, along with the kernel itself, is officially referred to as GNU/Linux (but s’okay, you can just say “Linux” and we understand you mean GNU/Linux).

So, GNU/Linux is similar to Windows and MacOS: they are base operating systems, capable of doing stuff but not actually useful yet.

Where all our software at?

Software for Linux

In the very early days, because there were a number of different hardware configurations that weren’t as compatible as they are now, Linux software was distributed as source code. You downloaded a zip file containing programming code, you ran a compiler program to compile the code (Issue 54)) into an executable program. Then there is usually an installer, a shell script that puts the compiled program in the correct place so the OS can find it, and creates other helper files (like configuration information).

And then you had to check their website (or even email them, in the days before the mainstream Internet) regularly to see if there are any bugfixes, and if yes, you downloaded the new source code and repeated the process …

This was obviously tedious, so people started to compile the useful programs into their own list. This effort expanded, and became automated, so that it was not only a list, it was a collection of different versions of compiled programs for different CPUs (each version of the program is called a package). If you were a developer for one of the programs in that list, once you made a new version of your package you could submit it to these guys, they would incorporate the required changes and then add a listing for the newest version.

These projects came to be called Linux distributions, or Linux distros in short. Distros maintained repositories of software for their specific distribution—the people involved are called maintainers. Maintainers check new versions of software to ensure that it will work as expected on their distro. Another category of software, called package managers, were created for users to be able to use these repositories easily: to check for updates, install them, and perform any other required maintenance.

More and more distros started in the 80s and 90s, as groups of Linux users and developers decided to branch off based on differing principles and philosophies for managing a Linux computer and its software. Today, you have distros focused on reliability and stability, distros focused on simplicity, distros focused on user-friendliness, distros focused on scientific computing, distros focused on hackability, …

It is important to note that the Linux kernel development team itself does not maintain any distros. Any updates to the kernel are for distro maintainers to incorporate into their respective repositories.

Linux software distribution

This means that for the bulk of users, software distribution on Linux is centred around the distro’s repository. Through the package manager, users can search for software, install it, and update it. They can also add the URLs of external repositories to access software from them.

At the same time, if users wish, they are still able to download compiled executables from the internet and run them (with the usual caveats, of course). They can also download source code, compile it themselves, and then run it. These options do not offer the same ease of maintainability as software installed through a package manager, since there is no repository to check for updates.

This is what many install pages look like on Linux, for software with Linux versions.
The Install button usually downloads a compiled executable, which can be run on the computer.
But installing through the distro’s package manager is recommended.

No system updates sneaking up on you from the OS maker; updates and new software all come from the repository through the package manager. Unless you decide otherwise.

Issue summary: Linux software is distributed through Linux distros. The maintainers of distros maintain repositories of software that have been tested with the distro. Most users will access software in the distro’s repositories through a program called a package manager. So users have full control over when updates and new software should be installed.

This is as far as I’ll go for technical detail on Linux. I meant this to pick up from Season 8) on apps. It’s easier to go into the bird’s-eye view of how this works on Linux, because I’m more familiar with it; on the Windows and MacOS side of things it tends to be more esoteric and proprietary.

And it’s instructive to know this because … so far this season, we have been talking about distribution—content, code, and software. Notice how all of them involve infrastructure: worldwide clusters of servers for content distribution, a history of code changes for code distribution (and collaboration), and now we have repositories and package managers for software distribution.

What happens as this infrastructure ages?

What I’ll be covering next

Next issue: [LMG S13] Issue 165: The myths of system slowdown

We won’t dive into topics as broad as digital infrastructure aging; that’s beyond the scope for a layperson’s newsletter I think!

Let’s bring it back to personal scale: what happens as your system ages? The predominant symptom that manifests itself is general slowdown: your computer takes longer to switch on, open any app, save any file, and even to shut down. For years I have been googling for reasons why this happens, and satisfying/useful/sensible answers are almost non-existent. The pithy, vague answers about background services (“bloatware”), outdated apps, malware etc are almost insulting; I have relatives who use the same handful of apps, and even after clearing old/large files their phone is still slow!

Next issue, I attempt to look into these stated reasons and see if they make sense.

Issue 163: System & software ecosystems

2022-03-12T08:00:00+08:00

Previously: Typeface families consist of multiple fonts for each style in the typeface. Each font consists of glyphs, which are mathematical shapes described by curves joining points. These shapes need to be rasterised for display on a computer screen, or for printing on paper. Font files usually come in .ttf, .otf, or .woff formats.

Brief recap of the past few issues:

Content distribution: Images and other media are distributed with the help of content distribution networks (CDNs, Issue 160)), which have regional servers closer to users.
Code distribution: Webpage documents and web scripts (in Javascript) are distributed from the host server (which may comprise more than one computer).

And all of these takes place over the World Wide Web, often through the HTTP protocol (Issue 7)). That is how data gets to us when we use the internet.

What about the software we use, and the oft-dreaded Windows Updates? How does that get to us?

Software distribution on Windows and MacOS

Okay there’s not actually anything new to say here, just checking if we have the same understanding of how to find new software:

You’ve got your system updates and whatnot, inconveniences that sometimes force themselves on you. These generally come from a secure server from the operating system (OS) maker, which is Microsoft or Apple.

You’ve got software made by Microsoft/Apple, which you either download from their website or install through your browser.

Then there’s the software in their app stores. These app stores are listings of software which developers pay to have their software listed in. A big selling point of app stores is their supposed security: app stores usually have a screening process to ensure that submitted apps are not doing Evil Things™ which harm their users. So when you download an app it is assumed that this app has passed some kind of rigorous screening process.

Developers often pay a proportion of their revenue to the app stores for this “privilege”.

And then there is … all the other software you can download from the internet.

I’m guessing the internet is where most of the desktop/laptop software you use comes from.

Software distribution on Android and iPhone

On the mobile side of things, it looks remarkably similar, but with the weightage somewhat different.

System updates, coming from the OS maker — check.

Software made by the OS maker — check, but coming through app store instead of internet.

App stores — check, still the same.

Software from the internet — Android allows installing software from “unknown sources”, but you’ll have to enable a system setting to allow that. It is off by default. On iPhone, this is just not possible.¹

I’m guessing the app store is where most of the mobile apps you use come from.

Software as an ecosystem

Notice that in neither case is finding software like foraging for berries: there is an entire ecosystem that goes into making these actions possible!

The OS makers obviously have to distribute their OS, supporting software, and updates reliably and securely, so that other software can rely on its continued existence. An OS by itself doesn’t usually do much for users; they need software to create and manage their files, and access the internet. And much of this software is going to be created by other developers, not the OS makers.

The developers need documentation and sample code to understand how to write the software, and these usually come from the OS makers. But it can also come from a thriving community of other developers who are writing software for the same system.

And then there are the discovery mechanisms that users need to find useful software; Google is the fallback when this doesn’t exist, but you usually want users to have a better experience than googling for installers and potentially installing malware.

Even this understanding is incomplete, but it’s the beginning of a more nuanced model of software development that will help you understand why software often does not do what it should. We are talking about factions in software development.

Far too often I see users who have no awareness of this divide, and seem to operate on an assumption that software comes from “programmers”, a hallowed, unreachable group of entities that blesses users with features or curses them with bugs at their whim.

Issue summary: Software that we use usually comes from the OS makers, or from third-party developers. These two groups of developers are not the same, and might even have conflicting intentions and goals.

If I keep going on in this vein I’m going to bring in politics, and I don’t want to do that. Instead, I’ll introduce a slightly different kind of software ecosystem, which non-Linux users are likely not familiar with. My hope is for you to see that *handwaves at above paragraphs* this isn’t the only way for software distribution to work!

What I’ll be covering next

Next issue: [LMG S13] Issue 164: Linux, the universal operating system

What does a model look like when there is no central app store, controlled by the OS makers? That is how the Linux distribution system works!

Okay not true, you can replace the OS on your phone through a process called jailbreaking, but this is a layperson’s newsletter and I do not recommend this without much more extensive reading and careful consideration. ↩

Issue 162: Fonts

2022-03-05T08:00:00+08:00

Previously: Cross-site scripting attacks occur when a webpage loads malicious code from a third-party, usually carried out by a script in the page. Today, websites are protected from loading unauthorised scripts through cross-origin resource sharing (CORS) policy implemented in browsers, which only allows a website to load scripts from authorised domains.

This is the issue that doesn’t really fit anywhere, but this season is about lots of things we take for granted and fonts are one of them.

I could probably fill at least half a season talking about fonts and typesetting, but let’s stick to the basics here.

What is a font?

I’m going with Source Sans, an open-source typeface designed in-house by Adobe. Let’s open one up one of its fonts¹, Source Sans Pro Regular, in a font editor² and see:

Source Sans Pro Regular, in Fontforge.

You might already be aware that fonts contain glyphs, which are the shapes of each separate character. What makes up those glyphs?

Uppercase Q from Source Sans Pro Regular.

These glyphs are mathematical shapes. They are stored as a series of points, joined by curves³ or straight lines⁴, which are stored as numbers. The diagrams we draw in Microsoft Word are much the same!

Font variants

When you bold or italicise text, the operating system swaps in the appropriate glyphs from the bold or italic fonts from the same typeface family. A lot of careful work goes into ensuring that these fonts are recognised as belonging to the same typeface, or this feature would not work.

This means that when you install fonts, remember to grab the bold and italic fonts as well!

Many applications, if they are unable to find the bold/italic fonts, will artificially “bold” or “italicise” the regular font by thickening the glyph, or slanting it. Any graphic designer worth their salt will notice this immediately; even without scrutinising the font, it will feel “off” in some vaguely inscrutable way until you take a closer look and notice the proportions are wrong.

Font display

For an application to be able to support text formatting, it must have a program called a text rendering engine. This program takes a single long string of text and determines the appropriate places to chop it up with line breaks. To do that, it first needs to convert the glyph shapes from mathematical formulas into actual real lit pixels or real inky droplets (in a process called rasterisation, Issue 122)).

Then their dimensions have to be considered in the line of text, to know where the line breaks should be placed. Because glyphs are not actually rectangular boxes and they protrude in different ways, optically they need some horizontal adjustment (called kerning) to look evenly spaced optically, so that needs to be done too⁵.

Kerning for some common glyph pairs in Source Sans Pro Regular.

A kern value of 0 or blank means no kerning is required. A negative value means the letters need to be brought closer, and a positive value means they need to be spaced further (quite rare).

Advanced renderers might even do other things, like avoiding too many terminal hyphens on consecutive lines (looks ugly), or making microadjustments to letter spacing. But, oops! This changes the line length, so the engine needs to go back to re-check the line breaks. This is an iterative process.

Text rendering engines are an art in themselves, and we are not going to go in depth here.

Font formats

The classic file extension is .ttf, which stands for Truetype font, a font format created by Apple in the late 1980s and subsequently adopted by other systems.

These days, you might also see .otf, which stands for Opentype font. This is a more modern font format, co-developed in the mid-90s by Microsoft and Adobe. This adds much more functionality and new features, which after some deliberation I have decided not to write about—it is simply not a layperson topic!

If you do web development, you might also see .woff, the Web Open Font Format, co-developed by Mozilla and other type organisations. It shares some features in common with TTF and OTF, but adds other features for licensing information, which is usually more important for the web, where these font files need to be downloaded to the users’ computers.

Issue summary: Typeface families consist of multiple fonts for each style in the typeface. Each font consists of glyphs, which are mathematical shapes described by curves joining points. These shapes need to be rasterised for display on a computer screen, or for printing on paper. Font files usually come in .ttf, .otf, or .woff formats.

This difference in representation vs display, fonts-as-mathematical-shapes vs fonts-as-pixels-or-dots, has been and continues to be the cause of much weeping and gnashing of teeth. But I’ve decided it’s not worth delving into that for a layman’s guide to computing—perhaps in a separate publication!

What I’ll be covering next

Next issue: [LMG S13] Issue 163: System & software ecosystems

With this diversion over, let’s return to talking about distribution. Content distribution, code distribution, and next issue I’ll move on to software distribution!

With this I have also cleared my backlog of questions, and will be closing the below section as well.

Sometime in the future: What is:

~~OpenType? And what are fonts anyway? [Issue 42]~~

A font is a single style in a typeface family. The full family will usually have regular/bold/italic styles. More advanced typefaces may have small caps, display, and caption fonts. ↩
The one I use is called Fontforge, and it is open-source. ↩
For the curious, the specific type of curve used in fonts are Bézier curves. ↩
Don’t be alarmed if the mathematicians among us casually remark that a straight line is a trivial example of a curve. ↩
The kerning information is created by the font designer and embedded in the font file. Applications usually pass this information to the text rendering engine, although some might not actually use it. ↩

Issue 161: Security and XSS

2022-02-26T08:59:00+08:00

Previously: A content delivery network comprises multiple servers around the world that are able to quickly distribute static content (typically images and video) to viewers that request it. This avoids overloading the hosting server, which would otherwise have to serve data over the network, possibly through many intermediary hops.

When you load a modern webpage with all its bells and whistles, it is usually loading its content from a content delivery network (CDN; see previous issue). At the same time, it is running scripts that came with the webpage. These scripts may load other scripts on the same server (first-party scripts), or scripts on other servers (third-party scripts).

What could go wrong?

First-party scripts

These are scripts you trust, because you host them on your own server (or a server you administer). Ideally you also have network security measures in place and other ways to ensure those scripts are not modified by malicious actors.

It’s usually safe to load them in the webpage because they are from the same site (i.e. same server).

Third-party scripts

These are scripts that are loaded from a remote server. You’d usually do this to load scripts from service providers: for analytics, to serve online ads, or to use libraries and frameworks (Issue 17), Issue 18)). This is mightily convenient: as a third-party service provider, you have the flexibility of updating this script and immediately benefiting your client without them having to do anything. Can’t beat that for convenience!

But once you open the door to third-party scripts, they could be loaded from anywhere. And without some mechanism for verification, the client won’t actually know if they are loading your script, or someone else’s.

Cross-site scripting (XSS)

If a webpage is insecurely scripted, e.g. by inserting data directly from a request without verification, a malicious actor might be able to subvert the source of the request and inject malicious code into the page. This malicious code, though sandboxed by the browser, still has access to data that is on the page.

This is called a cross-site scripting attack.

The weakest link

The chain of security is only as strong as its weakest link, so even if your own security is tight, a malicious actor would look at your tech stack (the set of hardware/software/services your company uses), notice that your webpage is loading scripts from a third party, and attempt to hack the third-party’s servers (which might be less secure). When you draw on features from multiple libraries, you are in effect setting up a circle of trust that is only as secure as the least secure library/third-party in your web.

Protections

This is why all browsers today have cross-origin resource sharing (CORS) active by default. This prevents scripts from loading other third-party scripts; only loading of first-party scripts are supported by default. To enable loading of scripts from other sources, your server must include special data in the HTTP response header (Issue 8)) that specify these sources explicitly. It’s tedious, but it is much more secure, and it is also why learning web programming is much harder today than it was a few years ago.

Issue summary: Cross-site scripting attacks occur when a webpage loads malicious code from a third-party, usually carried out by a script in the page. Today, websites are protected from loading unauthorised scripts through cross-origin resource sharing (CORS) policy implemented in browsers, which only allows a website to load scripts from authorised domains.

Ughh, this is already getting more tedious and bureaucratic to write about. As you can see, even in the world of programming there is a load of red-tape to cut through, all to protect our works from malicious actors and ensure there is a chain of authorisation running through everything. I guess there’s a life lesson in here somewhere.

What I’ll be covering next

Next issue: [LMG S13] Issue 162: Fonts

I’ve covered content distribution, code distribution (for the web), and now I think we can go a little wider: let’s talk about software distribution!

But before that I want to slot in a little issue about fonts: just what are they? And how do they work?

// Hopefully all my readers have had the experience of searching for and actually downloading software. What kind of system is behind this? And how do system and software updates actually get to our devices?

Get ready for a deeper dive!

Sometime in the future: What is:

~~XSS? [Issue 8]~~
OpenType? And what are fonts anyway? [Issue 42]

Issue 160: CDNs and content distribution

2022-02-19T08:00:00+08:00

Previously: Instead of GPS satellites, smartphones can also use wifi points and cell towers to determine their position (if enabled in the OS).

All businessmen know that distribution is everything. How good your product is, is secondary to how you get your product to the customer. This act of getting things to your customer—it’s called distribution, and entire businesses have been built around excellent distribution.

In Issue 157), I described how time is synchronised from time source to server and on to other servers, down the strata of the hierarchy tree of time servers. Whereas GPS/wifi location (Issue 158)) has a much shallower distribution system: everybody gets their location directly from a GPS satellite if there’s nothing else available, otherwise they get it from the nearest wifi point or cell tower.

What about content?

The difficulties of content

You make a website, type in the headers and body text, upload the images and videos … and it just works right?

Let’s think through the distribution of that content. Text is generally small in size and easy to pass around, even through multiple hops (Issue 36)) from server to client.

Server load

What about the heavy stuff, like hi-res images and videos? Thousands or millions of clients all requesting the same large video file from your hosting server. That server is going to be spending many CPU cycles (Issue 58)) receiving requests, retrieving the data, splitting and encapsulating it into data packets to be sent out. All that processing adds to the server load. If there are too many clients waiting for the same data … they gonna wait. And that adds to latency (Issue 36)); those viewers are going to be seeing loading spinners for a while.

Some of that processing can be mitigated with techniques such as caching (Issue 39)), but not enough; you will eventually need to add more servers.

Bandwidth and transfer fees

Your hosting provider is going to be paying lots of egress fees to transfer your data out of their servers (imagine sending the same 4GB video to a few thousand Youtube viewers), and they’ll likely pass on the fees to you as well.

Latency again

If the client is geographically far away from the server, possibly even on the other side of the world, the data is going to go through a lot of hops from server to server. And if any of the servers along the way drop the packet, it is going to need to be resent.

Improving content distribution

So how do we lighten the server load on the hosting company’s servers, reduce the amount of data to transfer from that server, and improve latency for the clients?

You place distribution servers as close as possible to the clients, wherever they may be. This usually means you have your servers globally distributed, with regional clusters.

You place the most often requested files from that region in its distribution server, so that it can serve those files without the request hitting the hosting server. Because the distribution server is so near the client, the data goes through fewer hops to get to the client.

The main document data is still served from the hosting provider, so that any changes you make to the page get served to clients almost immediately. Otherwise every little change you make has to be reflected in every distribution server that has a copy of that data. For this reason, distribution servers are typically used for static data: data that doesn’t change frequently, or at all—images and videos especially.

A content delivery network (CDN)

These distribution servers, along with their supporting infrastructure, are collectively known as a content delivery network (CDN). Sometimes, when you are grabbing the URL of an image, you might see “cdn” in the URL domain—now you know what it means!

Issue summary: A content delivery network comprises multiple servers around the world that are able to quickly distribute static content (typically images and video) to viewers that request it. This avoids overloading the hosting server, which would otherwise have to serve data over the network, possibly through many intermediary hops.

What I’ll be covering next

Next issue: [LMG S13] Issue 161: Security and XSS

With this piece of info, our mental picture of the loading of a webpage is getting more and more complex. No longer can we assume that the web document itself, its images, videos, and other content are all being loaded from the same server: static content might be coming from a CDN, and other content (e.g. ads) might be loaded from elsewhere.

Let’s talk about this from a security standpoint next issue.

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Issue 159: Wifi & cell tower location tracking

2022-02-12T08:00:00+08:00

Previously: To get your location using GPS, your phone requests information from four overhead GPS satellites: their location, and the distance between them and your phone. With this information, your phone can calculate its location.

Okay, so what happens when you are in a tunnel or building and can’t get GPS? How are you still able to use Google Maps to navigate that new sprawl of a mall?

Wifi Positioning System (WPS)

The principles of triangulation still work within a building, thank math 🙏 but now we need other landmarks to replace GPS satellites.

What is something with a known (and ideally fixed) location, is electrically powered to receive and respond to signals, and there are enough of them to provide a sufficient number of landmarks for triangulation? If you are in a building with wifi, the wifi access points scattered throughout the building can probably provide this.

No protocol is involved in wifi positioning, largely because most routers do not carry a precise hardware clock and do not have any way to know their location precisely, and therefore cannot communicate this information meaningfully. Instead, wifi positioning is a collection of techniques for guessing your location. Your smartphone uses these techniques (usually through its operating system) in conjunction with available wifi networks around you to determine its own location.

Wifi positioning techniques

One way to figure out your location to proximal wifi points is to use the signal strength as a weak analogue for your distance from them. You can do a very rough position estimate with this.

Another common technique is to look up the hardware address, or even IP address of the wifi point you are connected to, and just use it directly (with the assumption that wifi signals get too weak outside of a 10 m radius, so you have your location accurate to within ±10 m).

Wifi location databases

One way to keep track of wifi access points and their locations is through a global, public database. A number of these are available, such as the Mozilla Location Service.

Cell tower triangulation

What happens when you are outdoors, far from any wifi point? As long as you have mobile data enabled and are not in airplane mode, you are still going to be getting your cell signal from a cell tower … which also meet the three basic criteria for device-based triangulation 😉

Your smartphone can thus triangulate its location from cell towers that it is able to reach. Again, there is no protocol for this, since your smartphone does not communicate with the towers for the express purpose of obtaining location; it is a set of similar techniques, often implemented in the operating system.

Issue summary: Instead of GPS satellites, smartphones can also use wifi points and cell towers to determine their position (if enabled in the OS).

What I’ll be covering next

Next issue: [LMG S13] Issue 160: CDNs and content distribution

Coincidentally, starting with time turned out to be a good idea: time information is sort of like content. It has an origin, and it gets distributed to “consumers” who want that information. As with all distribution systems, you have cascades of product that flow outward from this origin.

We have covered time and space (I mean, location). Let’s move on to data: how does data get around the world from a few central sources?

Back in Issue 73), when I explained how online advertising works, I mentioned that advertising content is served from a content delivery network (CDN). What is this and how does it work?

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Issue 158: GPS

2022-02-05T08:00:00+08:00

Previously: Time is synchronised from higher-precision sources through a protocol called Network Time Protocol (NTP). A public pool of time servers is available for synchronisation at pool.ntp.org.

Ah, GPS. The only topic that actually has almost nothing to do with computing … and yet the mobile computers in our pocket rely on it so much.

A short history

The Global Positioning System (GPS) was born of the space age, in 1973, before computers even went mainstream. It was originally used for military applications, particularly for navigation. It was first widely used in a political conflict in the Gulf War (1990–1991). The public finally had access to it in 1996, after US President Bill Clinton issued a policy directive for it to be dual-use (used for both military and civilian purposes).

Principles

The principle of triangulation far predates GPS. Triangles have been used to estimate distance since antiquity; there is evidence of such techniques being used from sources as far back as 6th century BC.

In essence, if you know the location of two reference points, then with those two reference points and your own location as the third point, you can draw a triangle and solve a little geometry puzzle to figure out your own location.

If you were hiking or just taking a walk, you need to have at least two landmarks with locations marked on a map. As long as you stop somewhere with good visibility, you can get your bearings to those landmarks on the map, draw a line backward from each landmark following your bearing, and the intersection of the lines from each landmark will show your location.

But digital compasses are not ubiquitous in all devices yet; typically they are only included in high-end phones. A simpler way for devices to get their location is to estimate their distance to the two landmarks. This they can do using radio waves, which travel at (close to) the speed of light: 300 million metres per second. Provided the landmarks contain devices that can receive this signal and send it back, the time delay can be used to calculate the distance between the device and landmark.

The GPS network

For this to work globally, you are going to need such landmarks positioned all over the world, within receiving range of any device. These landmarks need to be:

electrically powered, so they can broadcast signals
tall, very tall, since radio waves do not follow Earth’s curvature. A short tower would not be able to receive signals for devices that are too far away: the signal would be blocked by the Earths curved surface!
aware of their own position, and synchronised at regular intervals

Let’s solve problem 2 first: instead of building millions of towers worldwide (and how would we do that on the oceans?), we can just launch satellites into space to serve as landmarks for triangulating position. GPS satellites orbit at an altitude of 20,200 km, almost twice of Earth’s diameter, allowing any of them to be reached from almost half of the Earth’s surface.

Solving problem 3: if we launch enough satellites, they can continually synchronise their clocks with each other, and triangulate their own position relative to other satellites.

Solving problem 1: satellites all face this common problem of needing electrical power. They have largely resolved it with the use of solar panels.

Getting your location using GPS

When your smartphone tries to triangulate its location using GPS, it gets its distance from four overhead satellites, along with their location (remember that there is an altitude component!). It then determines its position from this information using triangulation.

The triangulation calculation here is rather more complicated, given that the satellites are in constant motion and not geostationary (above the same spot on earth all the time). At an orbit altitude of 20,200 km, even radio waves, travelling at lightspeed, still take about 0.07 seconds to reach your phone from a GPS satellite. But it is possible, and your phone does it each time you get your location from GPS—to a precision of about 10m!

Issue summary: To get your location using GPS, your phone receives information from four overhead GPS satellites: their location, and the distance between them and your phone. With this information, your phone can calculate its location.

What I’ll be covering next

Next issue: [LMG S13] Issue 159: Wifi & cell tower location tracking

Receiving radio signals all the time requires the receiving unit to be on all the time; if you use GPS heavily you will find your battery draining quickly!

The reason our smartphones can maintain such good battery life is that they don’t use GPS most of the time. After all, it is not possible to get GPS in a lot of places: in basements, tunnels, anywhere you can’t get a solid signal from four overhead satellites. There are less energy-costly ways to get your location these days, especially if high precision isn’t necessary.

Next issue, let’s look at how wifi and cell towers come in!

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Issue 157: NTP and time-syncing

2022-01-29T08:00:00+08:00

Previously: To speed up execution and avoid translation overhead, some systems employ ahead-of-time translation, storing the translated instructions to be executed in future. But many systems employ a mix of just-in-time (JIT) and ahead-of-time (AOT) techniques.

This season, I’ll attempt to plug the gaps in the layperson’s working knowledge of Internet-related services. Time, location, wifi and mobile data … almost all will be covered this season!

Global time information

Frequent fliers would no doubt be familiar with the existence of timezones: geographical bands stretching from the North to South pole, within which all locations are assumed to be running on the same regional time. These timezones used to be manually synchronised, by phone or telegram, via operators all over the globe.

Today, timezone information and other time information required for global coordination are maintained by the Internet Corporation for Assigned Names and Numbers (ICANN), which also maintains other lists of names and numbers owned by the Internet Assigned Numbers Authority, which we first met back in Issue 27). These lists are used by programmers worldwide when writing programs that require time coordination.

Time synchronisation

What about syncing time? Intuitively this process involves communication between computers, and anytime computers need to communicate, you can be sure a protocol is involved. We have seen a few protocols mentioned so far:

HTTP, the Hypertext Transfer Protocol, for sending web documents (aka webpages) and requests
DNS, the Domain Name Service protocol, to translate domain names (like google.com) to IP addresses (like 142.250.64.110)
DHCP, the Dynamic Host Configuration Protocol, used by routers to assign IP addresses to its client devices
USB, the Universal Serial Bus set of protocols, used for data transfer between a host computer and another device

… I hope that’s enough for an idea of where protocols are involved.

The protocol for time synchronisation is called the Network Time Protocol (NTP), in use since 1985—that makes it as old as me!

Time sources

In the past, people would look at a common time source—the town square clock tower, Big Ben, church bells, and so on—to get the time. When watches were invented, people who had one would synchronise their watches to these common time sources.

But watches, clock towers, and other sources of time can get out of sync: one second as measured by each device does not accurately match the scientific definition of a second:

The second is defined as being equal to the time duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the fundamental unperturbed ground-state of the caesium-133 atom.

That’s from The International System of Units¹, which I am not inclined to argue with in a layman’s newsletter. Needless to say, synchronising to a universal standard is not something the average layperson does. Only select organisations have the need to keep such accurate time on their own, typically using such precise instruments as atomic clocks.

Trickle-down time

How does the rest of the world get its time from these high-precision devices? It doesn’t; most of the world has no need for the sixteen zeroes of precision provided by an atomic clock. Instead, another cluster of servers synchronise their time to within microseconds of precision of these devices. And another cluster of servers synchronise their time to these microsecond-precision servers.

Each “layer of precision” is called a stratum in NTP. Time “trickles down” from higher-precision sources to lower-precision sources, down the stratum. As we descend the stratum, there are more and more devices providing time at that precision. If a time server synchronises to a stratum 1 server, it becomes a stratum 2 server; if it synchronises to a stratum 7 server, it becomes a stratum 8 server. The upper limit for stratum numbers is 15; a stratum 16 device is considered unsynchronised.

The stratum number is not an indication of quality or reliability, it only indicates distance from the reference time source.

Operating a time server

NTP is an open protocol, which means the protocol is readily available online, and anyone can run their own server implementing this protocol. If you don’t want to write your own software, you can also use the open-source ntp distribution, and compile it to make your own time server. Some large companies do this for their own large network, to improve the response time from time servers—public time servers, you can imagine, are under pretty heavy load!

So is this how our smartphones synchronise their time?

More or less, yes. (They actually use a slightly simpler protocol, in the interest of preserving battery life.)

Hypothetically, if you are designing your own operating system and having to provide a “set time automatically” feature, you can let your user connect to a public pool of time servers, pool.ntp.org (yes that is the actual name, and also the web address). This pool is further subdivided by continent and region, down to individual countries, since you will probably need specific time for your location. For instance, the specific time server pool for Singapore is sg.pool.ntp.org.

Issue summary: Time is synchronised from higher-precision sources through a protocol called Network Time Protocol (NTP). A public pool of time servers is available for synchronisation at pool.ntp.org.

I have been waiting ten seasons to write this, and it is finally out of my system!

What I’ll be covering next

Next issue: [LMG S13] Issue 158: GPS

This issue is about time. Next issue is about location. With these two issues I would have explained time and space! 🤭

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

If you need a term to google for more of this kind of geekery, it is called metrology, the scientific study of measurement. ↩

Issue 156: Translation

2022-01-22T08:00:00+08:00

Previously: Translating a set of instructions before executing it will always lead to a slowdown, although sometimes this may not be noticeable to users.

So, just-in-time (JIT) compilation is really cool and mostly works. Feed in enough instructions to fill a buffer, and execute them. Keep your fingers crossed and hope the buffer doesn’t empty. That’s kind of how our global supply chain works too.

But sometimes it doesn’t go smoothly. The program hits a code branch, new instructions have to be unpredictably injected. The emulation layer halts temporarily. The program stutters.

We can’t really avoid that, not without rewriting the program anyway. But we can at least decide when to carry out the translation.

Ahead-of-time translation

What if we translated whatever we could ahead of time, and stored the native instructions? Then, whenever we need that chunk, instead of translating the original program chunk, we just load the already-translated instructions?

This is called ahead-of-time (AOT) translation, and is what Apple Rosetta 2 does with MacOS programs compiled for Intel x86-64. While installing those applications, it also carries out translation into native ARM instructions that the M1 later uses for execution.

Android AOT translation

The Java virtual machine (VM), also called the Java Runtime (Issue 151)), is the interpreter that carries out the Java bytecode that a Java program comprises.

Android apps, though themselves Java programs, are run not by the Java Runtime, but by the Android Runtime¹.

Whenever your Android phone finishes installing an Android update, there is always a significant block of time that it takes up “optimising” your apps. What it is doing is actually AOT translation, of the app’s Java bytecode into ARM instructions.

Compilers vs interpreters

Back in Issue 54), I mentioned in a footnote that

Purists will argue with me that Python technically runs through an interpreter, not a compiler. At this point, the distinction between the two terms for layfolks is not critical, and I choose clarity over accuracy at this point until I can delve into more detail in a future issue.

Here is where I can draw the distinction more clearly.

A compiler compiles a programming language into another language; usually this means translating a programming language into machine instructions, or virtual machine (VM) bytecode (Issue 150)). But it is also not unheard of for compilers to translate one programming language into another, e.g. translating to Javascript for use in a webpage.

An interpreter interprets a programming language, and executes it to bring about the intended effect (creating/changing/deleting a file, producing a sound, displaying an image, …).

It may be helpful to think of a compiler as carrying out AOT translation, and an interpreter as carrying out JIT translation. But many systems will use a mix of both. The Python interpreter, for instance, actually translates Python code into intermediate Python bytecode, and then executes the bytecode. When you run the same Python script, it executes the bytecode if the Python code has not changed. If it has changed, the interpreter will recompile the bytecode first, then execute it.

Issue summary: To speed up execution and avoid translation overhead, some systems employ ahead-of-time translation, storing the translated instructions to be executed in future. But many systems employ a mix of just-in-time (JIT) and ahead-of-time (AOT) techniques.

What I’ll be covering next

Next issue: [LMG S13] Issue 157: NTP and time-syncing

In the season finale, I’ll wrap up with a mishmash of things that are not the Internet per-se, but very much a part of the Internet and our lives. First up: how do our phones always know the actual time?

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Recall from Issue 151) that one needs a distribution license to distribute the Java runtime environment (JRE); Google would have to pay hefty licensing fees to Oracle to bundle the JRE with each copy of Android. Instead, they decided to write their own compatible runtime: the Android Runtime. Is this legal? They already fought that battle in court (Google LLC v. Oracle America, Inc), and it seems the answer is “yes” (with a heap of caveats). ↩

Issue 155: Emulation performance

2022-01-15T08:00:00+08:00

Previously: Programs that were not compiled for the instruction set of the host OS have to go through an emulation layer program. This program translates the instructions of that program into compatible instructions that its own processor can execute.

The Apple M1 is an ARM processor that executes 64-bit ARM instructions. MacOS programs that were compiled for Intel 64-bit x86-64 processors go through the Apple Rosetta 2 emulation layer to run on the M1.

Yes, that’s what I said last issue. But if that were all the Apple Rosetta 2 emulation layer did, The M1 Macbook would not have gotten its rave reviews.

The act of translation

Think about it: the ARM and x86-64 instruction sets are very different! They have different lengths, different instruction names and bit values, different concepts of operation, … they’re very different. While the Intel Core and ARM processor architectures share some similarities, translating instructions from one set to another, though possible, is still no simple feat.

Keep in mind that in addition to translating the x86-64 instructions to ARM, the M1 also has to execute them—the program must go on. It should not be surprising at all that there is a performance hit compared with executing native ARM instructions.

If you were given a set of execution instructions in your native language, you would have little difficulty carrying out the required task. But if the execution instructions are now in alien language? Suppose you were given translation instructions, in your native language, for understanding the alien language. You would definitely be doing it slower than if the execution instructions were in your native language.

Just-in-time translation

These days, our processors are fast enough that if they are not too bogged down, they can actually read in instructions slightly ahead of time, translate them, and store the translated instructions for a short period of time before they are executed. This way, the operating system remains ever so slightly ahead of the program being translated, in a way reminiscent of our currently strained global supply chain.

Naturally, it doesn’t make sense to keep doing translation work. As much as possible, we want to minimise it! But how?

Issue summary: Translating a set of instructions before executing it will always lead to a slowdown, although sometimes this may not be noticeable to users.

What I’ll be covering next

Next issue: [LMG S12] Issue 156: Translation

In the last issue of season 12, we talk about why app installation or system updates can take so long.

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Issue 154: Emulation

2022-01-08T08:00:00+08:00

Previously: The cloud offers standard digital business services, accessible through a web interface and API, which any developer (with a credit card) can use. Developers don’t have to reinvent the wheel, so long as they know how to use web APIs.

Virtualisation, particularly system virtualization, is a real game-changer for those of us who like to have our apps all running in the same operating system, instead of switching operating systems all the time through dual-booting (or Apple Parallels).

But what is stopping us from allowing them to run near-natively in the desktop, their windows directly showing up in the taskbar, without the distracting abstraction of the virtual machine?

Introduction to Emulation

What you are seeking is a feature known as emulation, in which your operating system (OS), which we shall again call the host, emulates the instruction set (Issue 53)) that the application is compiled for. In other words, the host OS:

presents itself as the “correct” machine type to the application (“hello program, I listen to x86 instructions and respond to x86 instructions, so please treat me like an x86 processor”),
transparently interprets its machine code into its instruction set’s machine code (through a program called an emulation layer),
executes the interpreted version, producing its intended effects, and returning any intended output back to the application.

Depending on how different the two instruction sets are, the complexity of this task differs greatly. Not much point going into detail here in a layman’s newsletter, so instead I’ll briefly illustrate some instances of emulation in the wild.

The three main instruction sets discussed here are x86 (32-bit), x86-64 (64-bit), and ARM¹ (Issue 53)).

Windows-to-Windows emulation (WOW64)

Do you remember the great 32-to-64-bit schism of the late 2000s (Issue 55))? There was a period of time when people got confused whether a Windows program they had could run on a 32-bit x86 processor or a 64-bit x86-64 processor: programs compiled for the latter could not run on the former, but programs compiled for the former could run on the latter.

Some download sites still ask you to make this choice between downloading the 32-bit or 64-bit version, usually for users who for whatever reason have opted not to upgrade to 64-bit processors.

That was a lie. Programs compiled for Windows on x86 cannot run natively on x86-64, and vice-versa. x86 and x86-64, while looking similar, are different instruction sets. x86 instructions have to be translated into x86-64 instructions to run on a 64-bit processor.

What happened was Microsoft developed the WOW64 subsystem, an emulation layer that translated 32-bit x86 instructions into 64-bit x86-64 instructions. When users tried to run a 32-bit application, Windows plugged the instruction stream into WOW64, executing the interpreted instructions and allowing it to run near-natively².

And so the 32-to-64-bit transition took place more smoothly than it would otherwise have.

Windows ARM emulation for x86

In 2019, Microsoft released the Surface Pro X, its second³ ARM-powered laptop. That’s right, it’s Windows not running on an Intel chip. Microsoft does actually have a version of Windows, called Windows ARM, which runs on ARM chips. But what about all the programs you know and love?

WOW64 to the rescue again! This emulation layer also translates 64-bit x86-64 instructions to 64-bit ARM instructions, allowing them to run on Windows ARM (with a performance penalty due to the translation required).

Game console emulation

If you find any “reborn” retro gaming products floating around, these are guaranteed to be emulators in disguise: the original hardware that the consoles used are no longer in production. (If you don’t find any, you can also just google “console emulation” to find a whole collection of them.)

These emulators are usually hobby projects by skilled amateurs, who attempt to reverse-engineer the workings of the original hardware. They then write programs to emulate these processors on modern hardware, allowing you to “boot” a digital copy of the games that worked on those platforms.

Apple Rosetta

The Apple M1 is an ARM processor that executes 64-bit ARM instructions. MacOS programs compiled for Intel 64-bit x86-64 processors must go through the Apple Rosetta 2 emulation layer to run on the M1. This works like WOW64 but in the reverse direction: it takes in an x86-64 instruction stream, and produces interpreted instructions for Apple ARM processors.

Issue summary: Programs that were not compiled for the instruction set of the host OS have to go through an emulation layer program. This program translates the instructions of that program into compatible instructions that its own processor can execute.

Simple to describe, much more difficult to execute … 😩

What I’ll be covering next

Next issue: [LMG S12] Issue 155: Emulation performance

Is this how emulation actually works? I wish! Don’t want to go too deep, but I think it is instructive to briefly discuss performance issues, and why it leads to some things you might have observed.

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

ARM actually has a 32-bit instruction set—AArch32 and a 64-bit instruction set—AArch64, which are incompatible. But since Apple switched to AArch64 starting from the iPhone 5S (2013), other mobile device manufacturers have followed suit, and AArch64 is now the main instruction set used on mobile. In this issue, I use ARM to refer to AArch64. ↩
You can still see this happening with some old programs; this is usually indicated in the title bar as compatibility mode. ↩
Their first was the ill-fated Surface RT, which these days is only whispered about. ↩

Issue 153: Using the cloud

2022-01-01T08:00:00+08:00

Previously: Actually making a web application requires you to set up lots of supporting software and carry out lots of steps to create a suitable app environment.

Last issue, I described the whole host of things that need to be done just to make a web application work on another server, different from where you did your programming.

How do people deploy web services so quickly if there is so much tedium involved?

Birth of the Cloud

You could set up the environment, containerise it, and deploy it through a container … that’s one way to solve for distribution.

But somebody smart realised that this is likely a common problem. 99%¹ of applications are going to need the same building blocks: one or more databases (Season 7)), remote storage servers for storing large files (Issue 106)), web servers (Issue 8)), and of course the application itself.

All these pieces must themselves run on a machine (physical or virtual). The hardware, the network resources, the type of database, the type of storage, … these are usually not key differentiators for the business. They do not derive their business value by managing it differently from other companies.

Amazon was the first to realise that whatever they were doing to scale Amazon’s business globally, their competitors and other businesses would eventually need as well.

The Cloud is born

The Cloud is a collection of services that can be plugged in to an application, in lieu of writing your own code. Instead of implementing your own storage server, you could use Google Cloud Storage, or Amazon S3, or Microsoft Azure Storage, etc. You access these and other services typically through a web API (Issue 4)), and have the option to manage them through a web interface.

Google Cloud Storage web interface

Besides virtual machines (Issue 147)), cloud offerings span multiple layers of abstraction.

Google Cloud main offerings

Here are Google’s mainstay offerings:

Compute Engine: virtual machines
App Engine: a pre-configured, setup-free virtual machine that runs code which you upload, through a web API or their web interface
Kubernetes Engine: a hypervisor (Issue 148)) that runs your containers
Cloud Storage: a storage service for files (see earlier screenshot)
Cloud SQL: a relational database service (Issue 87))
Cloud Bigtable: a NoSQL database service (Issue 90))

Then there are variants for running big data queries, using machine learning nodes, and rebranded services for running the backends of mobile apps, … the key common factor here is that using these services is much simpler than rolling and maintaining your own version!² And it lets you speed up development by not having to reinvent the wheel that cloud services have implemented for you.

The Cloud grows

As web applications got larger and larger, beyond the capacity of even a single high-end server to manage, they had to be redesigned so that they could run on multiple servers while maintaining data synchronicity. As businesses standardise on ways to do that, cloud providers add these tools as part of their offerings.

For example, sending/receiving messages between servers is a key engineering problem. Data packets sometimes get dropped en-route, or when a server gets overloaded. Sometimes they get held up at a server, time out, and then they get resent by the client (Issue 9)); the server receives two identical requests, and sends two identical responses, resulting in a duplicate response.

If you don’t want to write your own software for managing communication between servers, the cloud lets you write code for your machines to communicate easily, without having to crack your head thinking about how to make it happen.

Issue summary: The cloud offers standard digital business services, accessible through a web interface and API, which any developer (with a credit card) can use. Developers don’t have to reinvent the wheel, so long as they know how to use web APIs.

There is, of course, much more to developing an application than just gluing services together. For commercial applications, you still have compliance requirements, logging, monitoring, and other things to set up. But these are not new needs, and not really worth going into detail in a layman’s newsletter.

I hope the gist of what the cloud does is at least clearer!

What I’ll be covering next

Next issue: [LMG S12] Issue 154: Emulation

I am done with virtualisation and the cloud at this point. You’ve learned about hardware virtualisation (through drivers), system virtualisation (through system VMs), process virtualisation (through process VMs), and service virtualisation (through APIs) so far this season.

I’m going to use the last three issues to talk about a related and current thing: instruction translation and emulation. Let’s start with a question: How is the Apple M1, an ARM chip, able to run MacOS programs compiled for the Intel x86-64 chips? Aren’t they two very different instruction sets (Issue 53))?

Yes, yes they are. More next issue ;)

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Illustrative but not accurate estimation! ↩
Provided your use case is in the 99-percentile! If yours is an edge case, you might find that rolling your own version is better value for money in the long run. ↩

Issue 152: Getting started with programming

2021-12-25T08:00:00+08:00

Previously: The Java Runtime Environment (JRE) bundles the Java VM and supporting libraries. The JRE has to be installed on the user’s system for Java programs to work, unless the program bundles the supporting libraries. Solo programmers can start programming with OpenJDK for free with fewer features and less support, while commercial companies can license Oracle JDK for better support and features.

So you started taking up programming. Maybe you went to a class, where everything was set up for you and you didn’t have to worry about installing and configuring necessary software. Or you took an online course, where step-by-step instructions were provided and you mostly didn’t have to spend time scratching your head. That’s how it should be; you paid to learn programming, not to learn how to configure a software development environment.

Once you actually have to start writing code though …

Setting up a development environment

Let’s suppose you are writing code for a web application. You start by installing the software for the compiler/interpreter program, which either executes your code directly or compiles it into an executable binary (Issue 54)).

Now you begin writing your code. Along the way, you begin to install various libraries (Issue 17)) and frameworks (Issue 18)) that your code relies on. These are provided in things called packages, which are basically zipped files containing all the files and metadata for the library/framework. You install these packages using another program, called a package manager, through the command line terminal (Issue 15)).

You write more code. And one day … you’re done! The real pain has just begun!

Deploying code

Thus far, you have been programming on your own laptop. But your laptop can’t handle a full webserver load once people start using your app, so you wisely decided to lease a virtual machine (VM: see Issue 147)) from a cloud provider instead. You boot up the VM, it goes through its bootup process (Issue 112)), and finally completes. You are greeted with a familiar command line, the text cursor blinking cheerfully.

How are you going to get your code on that machine?

Maybe you set up a code repository (Issue 19)) on another server, and then download your code onto the VM with some commands.

Maybe you decide to turn your app into a package instead: you write software instructions (more code!) to tell the package manager (mentioned earlier) how to install the package, and how to configure everything. You add some files with metadata, a file manifest containing all the files used by the package, and then pack it up. You install yet more software (called build tools) to help you automate this part. Then you set up another file server, upload the package onto it from your laptop, download it from the VM, and install it on the VM using the same package manager software.

You test it, and after many hours of cursing, confused pacing and mumbling, and much hair-tearing, it finally works. Phew!

Expanding the app

Unfortunately, the tiny toy server that you used to test your web app doesn’t hold up to real-world network loads. You’ll need to put the app behind a Real™ web server; as more and more users use it, you may even need to deploy your app to multiple servers to handle the load, all managed by a load balancer. The balancer receives the web requests (Issue 9)), decides which of the multiple servers has the lowest load so far, and directs the request to it so it can serve a web response (Issue 8)).

Deploying more servers … does that mean you have to do the above all over again?!

Issue summary: Actually making a web application requires you to set up lots of supporting software and carry out lots of steps to create a suitable app environment.

What I’ll be covering next

Next issue: [LMG S12] Issue 153: Using the cloud

I hope this issue adequately describes the problem that the cloud attempted to resolve! When people talk about “using the cloud”, what is that actually like?

Next issue, I’ll give a peek. Short issue guaranteed.

Sometime in the future: What is:

XSS? [Issue 8]
OpenType? And what are fonts anyway? [Issue 42]

Issue 151: the Java VM

2021-12-18T08:00:00+08:00

Previously: System VMs provide a set of virtualised hardware that the OS interacts with. Process VMs provide a set of libraries that a program (written in that programming language) interacts with.

If the Java VM lets us write programs that work across multiple Oses, why don’t we write everything in Java then?

Actually a lot of enterprises do! But there are some tradeoffs to make this work.

What’s bundled

The code required to make a Java program run on every operating system is not simple; somehow all this complexity needs to make its way to the target computer, whether it uses it or not.

Ideally, we have one part: the program you want to distribute, bundled and delivered to the user. The other part, like an adapter (the Java VM), allows your program to work on the target computer.

Up to this point I have given the impression that the Java VM is all that is needed for the adapter to work. This is inaccurate; the Java VM interprets the program instructions (in an intermediate set of instructions called bytecode). It still needs a whole set of supporting libraries to enable interfacing with the OS.

The Java VM and these supporting libraries are installed in a software package called the Java Runtime Environment (JRE). If you hear the term “runtime environment”, this is what it refers to: the program plus the libraries it needs to run.

If you want to write Java programs, you’ll need more than the JRE though. you will need supporting tools especially for debugging, and these are provided in the Java Development Kit (JDK). This is not only true for Java, but for many other languages as well; if you are getting into programming, and asked to pick a download, you usually want the one that says “development kit”.

Tradeoffs in distribution

If you want to keep the software bundle size small, you’ll have to ask the user to install the JRE separately (about ~80MB downloaded, more when installed). If you don’t want your users to face that hassle, you’ll have to take on the work of bundling the required libraries into your program yourself. Are you going to bundle libraries for all targeted OSes, or just for one particular platform? The more you bundle the larger the size …

Nobody said multiplatform support is easy!

The legalese

Java is free to use—but only on a personal basis. Once you intend to distribute your program, and maybe even make money, the issue of licensing rears its head.

Java was originally created at Sun Microsystems in 1995, but was acquired by Oracle in 2010. They do enforce their licensing pretty strictly, so be prepared to pay for the convenience!

Java and open-source

If the only way to distribute programs with multiplatform support is to pay a licensing fee, Java would see much lower takeup, which would hurt long-term profits … so a big portion of the Java core is open-sourced as OpenJDK. This is free to use and extend, and many businesses have been successfully releasing software based on it. But it is going to be a lot more work carrying out testing and writing code for features which are not provided.

On top of OpenJDK, Oracle JDK—Oracle’s commercial release of the JDK—adds some proprietary code, plus lots of enterprise support and testing. In general you’re going to have a much easier time writing your code with Oracle JDK instead of OpenJDK—that is how Oracle makes money!

If you just want to get familiar with the language, or are working on a personal project, OpenJDK lets you do so for free, legally. If you are a corporation trying to get your engineers to write code that runs on multiple platforms so as to simplify your systems, Oracle JDK helps you to save time doing that, for a fee.

It’s the best of both worlds.

Issue summary: The Java Runtime Environment (JRE) bundles the Java VM and supporting libraries. The JRE has to be installed on the user’s system for Java programs to work, unless the program bundles the supporting libraries. Solo programmers can start programming with OpenJDK for free with fewer features and less support, while commercial companies can license Oracle JDK for better support and features.

What I’ll be covering next

Next issue: [LMG S12] Issue 152: Getting started with programming

Hold on, I thought—

Nah, Layman’s Guide to Computing has not suddenly switched to being a Guide to Programming! We are still on the topic of the cloud and the history of commercial computing. Where I last stopped, I was talking about how co-hosting started from virtual-hardware VM rental, and then added containerisation to its bag of tools. Then I segued into talking about process virtualisation and the Java VM (last issue & this issue).

A short note though: process virtualisation isn’t a co-hosting tool, but a programming tool. We use process virtualisation while writing programs, not while deploying programs to the cloud. And I do want to get back to talking about the cloud.

But before I do that, there’s one thing that the average layperson would not be familiar with, and which I need to talk about:

Why is it so difficult to do programming, even with experience? Yes, do programming, not merely learn it!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 150: System VMs vs Process VMs

2021-12-11T08:00:00+08:00

Previously: Containers are one layer of virtualisation above virtual machines: containerisation systems virtualise access to the operating system, presenting a virtual interface that provides software with the resources it needs, without being aware of software running in other containers on the same system.

Recap

If I need to configure an entire machine to install and configure my own operating system (OS), I can rent a virtual machine — this is system virtualisation.

If I just need to run a set of software on a particular OS but don’t want the hassle of managing the rest of the OS, I can containerise them for the OS — this is containerisation.

What if I want to write software that will work the same on multiple OSes? Is that possible?

Before 1995, no!

OS-level details are tricky

OSes mediate access to resources through libraries (Issue 17)). Instead of having to deal with storage disk sectors and blocks, you can just use library features to ask the OS to help you create a file and write data to it. The OS and filesystem (Issue 106)) take care of the details.

I don’t know how to explain this more clearly without showing some programming code; the OS controls a lot of things! Which is a problem when it comes to multi-OS support, because different OSes control different things differently.

The library features for lots of things — storage, searching through directories, networking, etc — are different between Windows, Linux, MacOS, …

Abstracting away the details

Can’t we program in a higher-level language, and use another program (the code interpreter) to break it down to different types of operations on different operating systems? E.g. python.exe on Windows will try to make my Python code work in Windows by using Windows libraries, python on Linux will do so using Linux libraries, and python on MacOS will do so using macOS libraries. Javascript, Ruby, Perl, VBScript, and other interpreted programming languages work this way too¹.

To a large extent, this is possible. But there remain some irreconcilable differences:

if you need to use Python to run a command-line program, such as a shell script (Issue 16)), those will depend on terminal availability: not all terminals have cross-OS support!
If you need to specify a file location, Windows and Linux do not use the same path separators (the character that separates folder names in a folder hierarchy); Windows uses \ to separate directory names, while Linux and MacOS uses /. This is often a source of bugs and headaches.
Important advanced features, such as multiprocessing (on multiple CPU cores simultaneously), are handled differently in different OSes in ways that may be incompatible within the same program.

Virtualising processes vs virtualising hardware

So it looks like we need something more. We need a programming language that can be interpreted into intermediate instructions (these are called bytecode), and we need a program for each OS that can carry out these intermediate instructions.

That program is also called a virtual machine (VM), but this is not hardware virtualisation. It is process virtualisation.

The OS interacts with virtual hardware without directly accessing the underlying hardware. Similarly, we want our code to interact with a virtual process instead of using OS libraries directly.

This is how the Java programming language works. It provides a (process) VM, called the Java VM, for each operating system. Your Java program interacts with the Java VM only; it mediates all access to the operating system.

Issue summary: System VMs provide a set of virtualised hardware that the OS interacts with. Process VMs provide a set of libraries that a program (written in that programming language) interacts with.

What I’ll be covering next

Next issue: [LMG S12] Issue 151: the Java VM

The idea sounds pretty cool … so why don’t we write more programs this way? And what is the Java VM anyway?

Light discussion on code-adjacent issues next issue!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

As opposed to compiled programming languages (Issue 54)), where code is compiled into CPU instructions for one hardware platform for one OS only. ↩

Issue 149: History of commercial computing - containerisation

2021-12-04T08:00:00+08:00

Previously: Renting out virtual hardware instead of physical hardware meant that instead of having to move hardware around and manage it, you could send the data for running an OS to the hosting company and have them be responsible for hardware operations.

Business concerns

Every business computer you have encountered likely runs an operating system (OS). And yet, what value does managing the OS have for the business? They have business software to run—point-of-sale systems, accounting systems, communication systems e.g. email—but the OS is no big concern for them, as long as it runs the software!

If I have point-of-sale software that only runs in Windows, and I’m paying for a company to provide it as a service, I don’t care if the software actually runs on Windows with direct hardware access, or if it is doing it through a virtual machine, so long as it works.

If I am going to help other companies run point-of-sale systems, one way to do it securely (so that data from different terminals/companies do not mix) would be to run a separate virtual machine (VM) for each system. I would do this through a hypervisor, a specialised thing (software/firmware/hardware) meant for running VMs.

Virtualising OSes

This is kind of wasteful; I am running multiple versions of the same OS to support as many copies of the point-of-sale system, even though they can all run on the same OS. I am just loathe to do so for security reasons.

What if we could apply virtualisation one layer up: instead of just virtualising hardware, we virtualise the OS?

To recap: when we virtualise the hardware, we provide virtual drivers that the OS can accept as valid hardware.

But programs don’t need real or virtual hardware; they need OS libraries which provide common resources and services: network, storage, compute, memory, windowing/display. If we can provide virtual libraries which respond like the actual OS libraries would, the programs would be able to run as normal.

The technology that enables OSes to let programs think they are running exclusively, protected from other programs, is called OS-level virtualisation, but more widely referred to as containerisation.

An OS (with the appropriate software & support) can run multiple containers, each container acting like a sandbox (Issue 92)) for the software within. In each container, software has access to OS features, but are unable to affect software outside of the container. Each container appears to have exclusive access to (a portion of) the system’s network, storage, compute, and memory resources.

Managing containers

The word containers may seem like a misnomer, for what are effectively software wrappers. But these do work almost like shipping containers: set up a container in an OS, install the required software, configure it, and now you have a container ready to meet business needs. You could send this container to any virtualisation service, they drop that container into their hosting system, and it runs like you expect.

Issue summary: Containers are one layer of virtualisation above virtual machines: containerisation systems virtualise access to the operating system, presenting a virtual interface that provides software with the resources it needs, without being aware of software running in other containers on the same system.

What I’ll be covering next

Next issue: [LMG S12] Issue 150: System vs Process VMs

Up to this point, we have been looking at two different kinds of virtualisation: system virtualisation (virtualising hardware), and containerisation (virtualising operating system environments).

Next issue, we examine a third kind: process virtualisation.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 148: History of commercial computing - cohosting

2021-11-27T08:00:00+08:00

Previously: Running a virtual machine is like running a physical machine, but within a window in your OS.

Co-located hosting

A not-so-long time ago, to run a website, you literally just ran a webserver on your desktop, connected it to the internet, and gave your IP address to other people. This is a pretty unreliable way to host a business website though. A big company would make business arrangements to procure a reliable internet connection, set up the infrastructure (power, cooling, mounting hardware) required to run multiple computers, and then manage their multiple systems with a full IT management team (hardware & software).

Not every company can afford this. Smaller companies would therefore co-locate their computers (called colo boxes) with bigger companies, enjoying service support and infrastructure for a monthly/yearly fee. Some companies decided to just provide these services as their full-time business, and the hosting business was born.

The difficulties of troubleshooting remotely

Running your computer on someone else’s premises is no joke. If something went wrong, there was no way to do troubleshooting remotely. You had to drive down and do the troubleshooting onsite, usually wasting at least half a day in the process. Unsurprisingly, this was a problem many companies were happy to abstract away by paying more money. Soon, hosting companies offered to rent your their computers, configured to standard specs, and provide basic onsite troubleshooting. You would email/send them the software you wanted to run on those computers, with instructions, and they would do it for you. A huge timesaver; most hardware issues are now out of mind for business owners.

You still had to worry about OS issues though. If you need to have multiple pieces of software set up and configured, this was something a hosting company could not do for you. And this was where virtual machines (VMs) came in handy. What if you could set up a virtual machine, with virtual CPU and virtual memory, install your OS on a virtual disk, install all your required software in that OS, and then send that virtual disk (as data) to a hosting company? They would then run that disk through their hypervisor, a piece of software that manages virtual machines.

The promise of virtualisation

This was the promise that virtualisation companies offered to businesses. You could manage your virtual machines remotely, choosing when to boot them up or shut them off, paying only for virtual hardware you requested, without affecting the virtual machines of other companies. Hosting companies could “collect rent” for multiple VMs running on a single computer.

After all, as long as you have an interface to manage it, and you are able to set up your software on it, does it really matter whether it is a physical or virtual machine?

Issue summary: Renting out virtual hardware instead of physical hardware meant that instead of having to move hardware around and manage it, you could send the data for running an OS to the hosting company and have them be responsible for hardware operations.

What I’ll be covering next

Next issue: [LMG S12] Issue 149: History of commercial computing - containerisation

Can we push this further? Could we get hosting companies to not only help us run the hardware, but the operating system as well? Yes, yes we can!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 147: Operating systems on virtual hardware

2021-11-20T08:00:00+08:00

Previously: Virtual hardware can be created in the form of drivers that respond to a program’s requests for hardware resources. If a bootup program enumerates hardware devices and receives a response, then as long as it continues to receive valid and correct responses, it can work with the virtual hardware to run an operating system.

So … what is it like to run an operating system (OS) on virtual hardware? I promised screenshots, but they probably won’t be as exciting as you expect—it looks quite normal!

Creating a virtual machine

I don’t want to purchase a VMware license, so I will be using an alternative virtual machine product instead: Oracle’s free Virtualbox. This is what it looks like, running on Arch Linux on my laptop:

Virtualbox main interface

Let’s create a new virtual machine. It asks which OS I intend to use, presumably so it can pick the right virtual hardware to use (MacOS virtual machines may be more picky? I’m not too sure here either). I don’t have a valid Windows license to use, so I’m just going to demo with Arch Linux instead.

Whichever OS you pick, you are going to have to find a way to install that OS into your virtual machine—whether you are using actual boot media (such as a DVD drive), or virtual media.

Creating a virtual machine

Cool, you actually get to configure how much of your computer’s memory the virtual machine will get to use.

Configuring memory size

Without a storage disk, a computer isn’t much. Virtualbox needs something it can use as a disk; the usual way is to create a file on your system that Virtualbox uses as a virtual disk. The space occupied by this virtual disk can be preallocated up front (so that your virtual machine doesn’t accidentally “run out of storage space” before the disk is full), or it can be dynamically allocated, only taking up as much space as is actually used.

I don’t have an existing virtual disk file, so I will ahead and create one here (not shown). If I have one from a previous installation, I can use it here instead.

Configuring hard disk

The machine is created; the main interface now looks like this. Note that all the “hardware” you see there is virtual!

Virtualbox main interface, with one virtual machine

If you are doing this on your own computer, at this point you might want to go into Settings and see what else you can toy around with: number of (virtual) CPUs, sharing some folders on your system with the virtual machine (they show up as shared network folders), adding more disks or even a virtual optical drive, etc.

Setting up boot media

I could try to boot it now, but I already know it won’t work; there is nothing in the hard disk to boot from. With a physical computer, at this point we will attempt to install the OS from a DVD drive or flash drive. You could allow your virtual machine to access the DVD drive or flash drive in order to do this, or you can do it virtually. Most operating systems (including Windows) provide virtual boot media for installation: an ISO file is a virtual optical disk.

I can download the ISO boot file for Arch Linux, but it seems they have gotten savvier lately and actually provide virtual boot media in the form of virtual hard disks! Let’s use that instead. I add it as a second virtual disk in Virtualbox:

Virtualbox hard disk configuration. packer-virtualbox.vmdk is the virtual boot media for Arch Linux

Once I start the virtual machine, it begins its boot sequence, and I interrupt it by pressing F12 to go to the boot menu (otherwise it will attempt to boot from the main virtual hard disk, and fail to detect any OS). Interesting to see here that it isn’t actually connected to a monitor: virtualbox presents a virtual display device, to which the virtual machine sends its video signal. Virtualbox captures these signals and displays them within the window instead. So you can flexibly configure the window size, and the virtual machine just thinks the display device it is connected to has been resized.

Virtualbox boot menu. Through the virtual (AHCI) disk controller, the virtual machine detects two (virtual) disks: the new disk I created, and the Arch Linux boot media that I loaded.

Running a virtual machine

I select the second disk to boot up Arch Linux for installation, and the login prompt appears:

The Arch Linux login prompt, in Virtualbox. Yes, this is how Arch Linux gets installed the first time.

I wont go any further at this point, because then I’d just be showing you how to set up Arch Linux. But I trust this is enough to give you an idea: running a virtual machine feels just like running a physical machine, but in a window!

Okay hold on, how do we shut this thing down? It is not recommended for a computer to be unplugged without a proper shutdown, so … how do we do that with a virtual machine?

Even the powerdown and reset buttons on a computer are actually hardware signals which the OS receives and uses to trigger a series of actions. We can send these signals virtually too, through Virtualbox’s Machine menu:

The Virtualbox Machine menu.

And with that, let’s shut it down.

Issue summary: Running a virtual machine is like running a physical machine, but within a window in your OS.

What I’ll be covering next

Next issue: [LMG S12] Issue 148: History of commercial computing - cohosting

Right, so this has been really cool and all, but not something a layperson would use on a daily basis usually. And the setup still seems … rather technical? So why does this deserve its own season?

Remember that this happened before the turn of the century, so the tech industry has had two decades to figure out how to make money out of this. And two decades is the equivalent of a whole lifetime in this industry. How did virtualisation change the landscape of commercial computing? Time to take another walk in recent history.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 146: Virtual hardware

2021-11-13T08:00:00+08:00

Previously: Programs do not usually deal with the gnarly details of hardware, but instead access it through an interface. They access storage devices through a filesystem, and access hardware through drivers.

How does one trick an operating system (OS) into coexisting with other operating systems on a single machine? By virtualising hardware into virtual drivers!

Virtual network hardware

Let’s take an example. Take a look at your network devices: there’s one for your LAN port, there’s one for your wifi card, and these days there may be one each for your Bluetooth chip and 4G/5G modem too.

If your workplace requires a VPN, you may have noticed that it adds a new network device. How is this possible if you don’t actually have any new network hardware?

Remember that in an operating system, access to hardware is mediated through drivers. A driver decides how to present its interfaces to the operating system. In essence, nothing prevents a driver from presenting multiple interfaces to the OS, provided the driver is able to receive requests and respond to them.

A VPN uses its own driver to present an additional interface to the OS, and that is how we end up with “virtual hardware”.

Virtual storage

We are used to thinking of storage as referring to a hard disk, or solid state disk. But technically anything that is capable of representing bits can be used as storage—with an appropriate driver.

Some operating systems/programs provide drivers for RAM disks—a storage disk that uses computer memory. These appear as a normal disk drive (in Windows) or mountpoint (in Linux). Managing files in a RAM disk is speedy, because computer memory is much faster than a storage device.

Virtual memory

In Issue 55), I explained how the operating system offers and controls access to computer memory, the pagefile (Issue 117)), as well as hardware devices through a single addressing interface: virtual memory.

When a program requests access to the printer and the OS responds with “here, you can send your request to memory address 0x35a4b2ff”, how is it to know if the data is going to a physical printer, or to a virtual one¹?

Virtual hardware

Take a look at your Device Manager in Control Panel. What do you see?

Device Manager in Windows 10
Note that what you are seeing are not the actual hardware (which the OS cannot possibly know). These are interfaces to the hardware.

A whole set of drivers and interfaces which the OS uses to carry out its work.

Many of these were initialised during bootup (Issue 112)), when the OS kernel (the core of the OS) enumerates the available hardware by sending out signals and seeing what hardware responds.

So a bunch of engineers at VMware thought: what if we … made drivers to present virtual hardware emulating the CPU, memory, storage devices, … and even the chipset? What if we then we booted the BIOS (the bootup program loaded on a computer’s mainboard; see Issue 112)), got the virtual hardware to respond when the BIOS enumerates hardware, and then basically simulated all the signals that hardware would actually send?

We end up with a virtual machine—one that you can actually install an OS on!

Issue summary: Virtual hardware can be created in the form of drivers that respond to a program’s requests for hardware resources. If a bootup program enumerates hardware devices and receives a response, then as long as it continues to receive valid and correct responses, it can work with the virtual hardware to run an operating system.

What I’ll be covering next

Next issue: [LMG S12] Issue 147: Operating systems on virtual hardware

So … what is it like to run an operating system on virtual hardware? Screenshots incoming!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

These virtual printers do, in fact, exist. It is why some OSes offer a “Print to PDF” printer device: the program effectively sends print commands to another program, which interprets the commands to produce a PDF file. This is possible because both printers and the PDF format share a common language: Postscript (see Issue 51)). ↩

Issue 145: What an app wants, what an app needs

2021-11-06T08:00:00+08:00

Previously: In 1999, VMware launched VMware Workstation, which allowed multiple operating systems to run off a single machine.

In Season 5 (Issue 53)), I went into some detail on how our programs work. The programming language they are written in gets compiled into CPU instructions, which get carried out by the CPU.

But the program does not handle everything on its own. In fact, it does not usually have direct access to hardware (unless requested from and provided by the operating system (OS)). All such access is abstracted and mediated through the operating system. The program requests and receives memory space, reads and writes files, and processes keyboard/mouse input—through the operating system.

How does the program know it is not living in a simulation? If the environment it operates in responds to its requests, the program continues running without a care. A keypress event is a keypress event, and the program can respond to it, whether it really came from a keyboard or not. A file that has data to read can be treated like a file, whether it is really stored on the hard disk or actually streamed from cloud storage.

Computing interfaces

This paradigm is really powerful, because it enables us to pipe data from place to place. It enables us to build interfaces: instead of writing a program that has to grapple with the gnarly details of files in binary format, I can write a program that deals with a file interface instead. The interface lets me read and write data, while the gnarly details are handled one layer down, by the filesystem.

And this is how we can open documents from a flash drive without even realising that the flash drive uses a different filesystem from our system disk.

The OS is just another special set of programs that mediate access to the hardware. Yet, it is easy to forget that even the OS’s many programs do not actually deal with the gnarly details of hardware, but with an interface to the hardware. This interface are the drivers (Issue 120)) we keep hearing about, and which our OS keeps telling us could not be found or are outdated.

As long as the “drivers” respond in the right way, the operating system continues to carry out its instructions as programmed.

How does one trick an operating system into coexisting with other operating systems on a single machine?

Issue summary: Programs do not usually deal with the gnarly details of hardware, but instead access it through an interface. They access storage devices through a filesystem, and access hardware through drivers.

What I’ll be covering next

Next issue: [LMG S12] Issue 146: Virtual hardware

Examples incoming!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 144: Programs-in-a-vat

2021-10-30T08:00:00+08:00

Previously: The Apple M1 is a souped-up iPhone processor, with unified memory.

I want to circle back to talking about processors again in this season, because there are a couple of pretty world-shaking ideas I haven’t fully fleshed out in Layman’s Guide yet.

One of them is—hmm where do I begin. As early as 1641, in Meditations on First Philosophy, Descartes proposes that “All that up to the present time I have accepted as most true and certain I have learned either from the senses or through the senses; but it is sometimes proved to me that these senses are deceptive, and it is wiser not to trust entirely to anything by which we have once been deceived.” In other words, Descartes isn’t always sure that he believes what he sees; his senses sometimes deceive him about the nature of reality.

More than three centuries later, in 1999, the Wachowski brothers translate this idea into a more modern form: what if the world as we know it is a simulation running on some other cosmic, otherworldly hardware? Is it possible to signal to our senses so convincingly that a simulacrum may be thought of as real?

Hold that thought, because this is the Layman’s Guide to Computing, not Philosophy (although there is plenty of that in Computing as well!). For millennials like me who came of age in the early noughties, The Matrix defined the zeitgeist for the next couple of decades, whatever one may think of its aesthetics. It is difficult to overcredit it with this.

VMware is founded

But allow me to put a little dent in that reputation: the year is 1998, in Palo Alto, California. A company named VMware had just been founded. One year later, in 1999, they demoed and launched VMware Workstation, their first product. In that demo, engineers demonstrated how VMware Workstation could allow users to run MS-DOS, Linux, FreeBSD, and multiple versions of Windows—that’s multiple operating systems (OS)—off a single machine.

The release of VMware Workstation probably happened too late to influence the direction of The Matrix, but no doubt there was talk in the air of virtualising OSes in the years leading up to it. To be clear, this is not dual-booting, in which a user can choose, through a boot menu, which OS to boot into (Issue 112). We are talking about multiple OSes, running simultaneously, off a single machine.

How does such a thing happen?

Issue summary: In 1999, VMware launched VMware Workstation, which allowed multiple operating systems to run off a single machine.

What I’ll be covering next

Next issue: [LMG S12] Issue 145: What an app wants, what an app needs

What enables an operating system to function, convinced it has full control of hardware? For that matter, how would it even know if the hardware … is really hardware?

… 👻

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 143: Implications (Part 2) – Future Goals

2021-10-23T08:00:00+08:00

Previously: Using the same hardware for both smartphones and laptops would make it much easier to write apps for both platforms. The closer they are in features, hardware, and software support, the easier things will be for developers.

So, let’s get some Likely-Asked-Questions (LAQs) out of the way in this last issue.

If developing for a single chip makes things much easier for developers, can we just decide to develop everything for the M1 chip and somehow force Apple to share the chip design?

You know as well as I do that the answer is no. Besides, Apple doesn’t care about the hardware needs of devices other than its own. You want other features they don’t care about? Too bad.

So only Apple users get to enjoy unified memory?

Yes, for now. We’ll need to wait a few years for other chipmakers to figure out their own hardware implementations.

Who’s likely to achieve it first?

This’ll have to be a company that designs its own CPUs and GPUs.

The incentives don’t align for Intel. Their main business has never relied on capable graphics, and they are much more concerned with saving the server market from ARM’s and AMD’s encroachment.

What about AMD? After all, they were one of the earliest companies to push for a similar idea: Heterogeneous System Architecture. And they achieved it to a lesser extent, with their Accelerated Processing Units (see Issue 142); a fancy term for CPU+GPU).

AMD has made this possible in software; that means as a programmer, you can command the CPU to store data at a memory address, pass that address to the GPU, and then get the GPU to retrieve data from that address. But in practice, benchmarks show that passing data this way falls short of the actual throughput that would be expected; possibly the hardware support is just not there yet, and not easy enough to use.

But the incentives line up quite well for AMD. If they achieve it, the performance of their APUs, their mid-range product, will see a significant boost. But they will need significant influence with developers to develop software development kits (SDKs) that developers can use to take advantage of unified memory, and that’s a big investment of resources.

Nvidia is putting a lot of effort into catching up on the CPU side of things, and they have been pushing lots of ARM chip designs to complement their strength in graphics cards. They have also recently bought ARM, so they also seem like a strong contender to implement unified memory. My gut sense is that it is not high on their priority list, as their primary business is still parallel computing and related applications, such as machine learning and scientific computing.

The work for this will have to be ongoing, of course, and likely started since 2015 or so; starting in 2021 is way too late!

What does this mean for Apple?

They are now almost fully in control of their own hardware and software. The main limitations where their control does not reach is their cloud computing (where iCloud happens), and the manufacturing (likely still TSMC in the near future). Their concerns now will be much more international than before.

Should we expect to see unified memory on non-Apple chips?

Yes, definitely, it’s something the industry has been working towards, just way too slowly … and hopefully the M1’s existence will put some pressure on those development timelines.

I suspect the main cause of inertia is all the legacy software that still has to be supported. Because Intel and AMD have a lot of business riding on keeping compatibility with past hardware, they can’t make sweeping changes across their entire range of products, unlike Apple. Every change that is made to an existing line of chips has to still keep it working when customers run their existing software.

How does this affect consumers?

Probably not much effect, beyond the gradual speed gains from generation to generation that we are already seeing.

The more significant effect is, I think, the miniaturisation of mobile systems. Already the mainboard for a laptop like the Macbook has shrunk to a narrow rectangle; most of the space for devices is now taken up by energy storage (i.e. batteries). The limiting factor now seems to be energy density: how many grams of batteries we will need per hour of laptop use. I suspect this is going to keep laptops more or less at the same size; the laptop is a mature form factor at this point and will gradually age.

What’s more exciting is when unified memory architectures can be miniaturised sufficiently for wearables. We are going to need that if we want augmented reality (AR) systems, e.g. graphics projected directly on a lens in front of our eyes, in a compact form factor. Many virtual reality (VR) and AR systems currently come in bulky designs that sit heavily on the body; there is much room for improvement here.

Wrapping up

This somehow ended up as a crash course in CPUs and GPUs, all in one season. I didn’t mean to carry out an industry analysis here, and this is definitely not a forecast to be relied on! It’s just a very interesting story to follow and I can’t help but think about what’s happening on multiple levels.

What I’ll be covering next

Next issue: [LMG S12] Issue 144: Programs-in-a-vat

How does a program on the computer know if it is in a simulation?

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 142: Implications (Part 1) - Software

2021-10-16T08:00:00+08:00

Previously: The Apple A14 and Apple M1 are essentially the same chip architecture: they use almost the same building blocks, just with different numbers of them. On top of that, the Apple M1 implements unified memory, allowing the CPU and GPU (and other SoC components) to share the same system memory, greatly facilitating intra-chip communication.

So, before 2020: smartphones are smartphones, laptops are laptops. They use different types of CPUs with different architectures (Issue 141)) and even different instruction sets (Issue 53)). Never the twain shall meet.

After 2020: It turns out that smartphone chips can be upgraded and used in laptops, while remaining essentially the same architecture? Its power consumption dial can be turned down to almost zero but also turned all the way up?

That opens up the possibility that smartphones and laptops can run on the same hardware, and there’s nothing technically stopping apps compiled (Issue 54)) for that instruction set to run on both!¹

Hmm, where has something like this happened before?

The big console alignment

Sometime in mid-2013, Microsoft announced the Xbox One (henceforth XB1), the successor to the Xbox 360. The 360 ran on a PowerPC CPU made by IBM—different from smartphone chips that used the ARM instruction set, and also different from laptops that use the x86 instruction set.²

The Xbox One, on the other hand, uses a CPU+GPU made by AMD³, following the x86 instruction set.

The Xbox One essentially uses a custom laptop chip!

This was interesting news because earlier that year, in Feb 2013, Sony had announced the PlayStation 4 (PS4), which was … also running on an AMD CPU+GPU! The previous iteration, the PlayStation 3 (PS3), was running on an interesting custom architecture that used PowerPC cores and a completely original GPU.

At this point it would be oh-so-tempting, for a tech nerd, to descend into point-by-point comparisons of the hardware specifications of both consoles. We will fortunately not be doing that.

What’s more important is what this meant for the video games.

Alignment in game development

If you wanted to write a game for the Xbox 360, you had to learn its API: which functions to call to make it do what you want, how to store data into its storage, and so on. It’s a lot of time and effort to look at your options and figure out the limitations, and how to work around them to achieve what you want in your game.

And if you wanted to make the same game for the PS3, you now had to learn a completely different API, running on hardware with completely different limitations, and figuring out completely different approaches to achieve the same end. While the game might feel the same, the time and effort is almost as much as what it would take for a new game!

The XB1 and PS4, on the other hand, are much more similar. They both use AMD CPU+GPUs with similar architecture. While Microsoft and Sony may add their own features on top of the chips and the software, the API is ultimately guided by hardware decisions. If you made a game for XB1 and wanted to port it to PS4, the effort of learning a new API is greatly lessened.

The gulf between smartphones and laptops

Back to smartphones vs laptops. Running on two different types of chips, using different architectures and instruction sets.

We have seen some forays from one into the other: Intel’s doomed Medfield chip was an attempt to bring the x86 architecture to smartphones, while Google has been trying to get ARM chips into Chromebooks, with limited but increasing success.

But now that we have an iPhone 12 using the A14 chip, a Macbook using the M1 chip, and we know that the A14 and M1 are essentially the same architecture and the same instruction set … it does suggest that the challenges of making software for both devices now primarily exist on the software side. The gulf of hardware incompatibility has been closed.

Feature alignment

The M1 chip is capable of power standby (i.e. screen off with the CPU in a low-power state), in a way that most laptop chips aren’t. This is a key feature for smartphone software and operating systems, and the M1 paves the way for laptop chip-makers to introduce this feature into their processors as well.

And the M1, being living proof that unified memory is possible, would also likely push existing companies to speed up development towards that goal.

Issue summary: Using the same hardware for both smartphones and laptops would make it much easier to write apps for both platforms. The closer they are in features, hardware, and software support, the easier things will be for developers.

I was looking for a way to sneak in the XB1-and-PS4 story, and I think I found just the right place for it. It really does excite me to think that one day a developer could write software for a smartphone, and it would work on laptops with minimal modification, and vice-versa. And perhaps a decade from now, we’d be scratching our heads why we even had to choose between the two!

What I’ll be covering next

Next issue: [LMG S10] Issue 143: Implications (Part 2) – Future Goals

So what’s next? Is unified memory the holy grail for hardware, and is there any further room for improvement? I’ll share some thoughts in the next issue.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Nothing, that is, besides all the software workarounds that will need to be written … ↩
In an interesting narrative twist, PowerPC was the architecture that Macbooks used before Apple switched them to Intel processors. And now Xbox did the same thing. ↩
AMD calls it an Accelerated Processing Unit (APU). Doesn’t matter for us. ↩

Issue 141: The Apple A14 and M1

2021-10-09T08:00:00+08:00

Previously: Shared memory is easier to implement when a company has control over the designs of both CPU and GPU.

So, to recap:

Most companies design either CPUs or GPUs, but seldom are well-positioned¹ to be excellent in both.

Among the companies that design both CPUs and GPUs, almost none of them² make CPUs for both mobile (smartphones + tablets) as well as laptops (including low- to mid-range desktops).³

Which leaves Apple in the (current) position of being the only chip company with a design for both mobile as well as laptop.

The difficulties of power scaling

It’s not that other companies have not attempted this.

They have tried to scale down laptop chips to achieve smartphone-like power consumption, but found that laptop chips can’t power down the way smartphone chips can, and require more circuitry to achieve that.

They have also tried to scale smartphone chips up to achieve laptop-like computational capacity, but found that simply pushing more electrical power doesn’t help all that much. Beyond a certain frequency limit, you simply need more bandwidth and more units, and designing a chip that easily accommodates more units like this just requires a very different design.

It seems that designing a chip that can go from 4W all the way to 65W (and possibly higher) requires intentional engineering, not simply modifying an existing smartphone/laptop chip design or bolting on/removing features.

The Apple A14 vs the Apple M1: similarities

Apple has managed to do just this with the Apple A14 and M1. They are, at heart, the same chip design! (In processor parlance, we say they have the same chip architecture.)

Let’s see:

The Apple A14’s key hardware.
Source: Apparently an online Arabic image gallery site (I have no idea why this picture is so hard to find!)

The Apple M1’s key hardware.
Source: TechBuzzPro

We can also compare these features via Wikipedia:

Apple A14

6-core CPU (4 low-power⁴ cores “Icestorm”, 2 high-performance “Firestorm” cores)
4-core GPU
8-core NPU
4GB memory (iPhone 12) / 6GB memory (iPhone 12 Pro)

Apple M1

8-core CPU (4 low-power⁴ cores “Icestorm”, 4 high-performance “Firestorm” cores)
8-core GPU
16-core NPU
8GB memory / 16GB memory

Notice that at heart, they are using the same building blocks: ~~low-power~~ high-efficiency cores, high-performance cores, GPU cores, and NPU cores (I suspect these are GPU-like cores but optimised for machine learning, i.e. they probably power Siri and other parts of the OS which lean on AI features); the A14 and M1 just has different numbers of them.

So one really amazing thing about the Apple M1 is that it is actually a boosted Apple A14: almost double the hardware!

It’s like when Magnemites join together and evolve into a Magneton …

The Apple A14 vs the Apple M1: differences

What’s different between the A14 and M1, besides the number of key chips? Apple isn’t forthcoming with the details, but we can guess about minor details like the image processor (for camera imaging), storage controller (the M1 can use high-power solid-state disks (SSDs) which the A14 can’t).

The major difference announced between the A14 and M1 launch is that the M1 has unified memory.

Unified memory vs CPU–GPU transfers

Back in Issue 139), I mentioned that unified memory needs really high bandwidth to support access by the SoC components.

Today, laptop processors use an interface called PCIe to connect CPUs to GPUs. PCIe has a bandwidth of up to 16 GB/s⁵.

The M1’s unified memory has a bandwidth of up to 58 GB/s reading from memory, and 36 GB/s writing to memory. Definitely an improvement.

Unified memory: what’s yours is also mine

The 8GB/16GB of system memory is used by both CPU and GPU. It is not partitioned at boot; both the CPU and GPU (and other parts of the SoC, such as the NPU) have full access to all system memory.

This greatly simplifies intra-chip communication, as all subchips in the SoC can request access to memory! The GPU no longer needs to keep its own (power-guzzling) memory. This reduces the motherboard space that is needed, lowers power consumption, and decreases latency for data transfer between CPU and GPU (Issue 139)): a triple-compounding win.

Issue summary: The Apple A14 and Apple M1 are essentially the same chip architecture: they use almost the same building blocks, just with different numbers of them. On top of that, the Apple M1 implements unified memory, allowing the CPU and GPU (and other SoC components) to share the same system memory, greatly facilitating intra-chip communication.

Some implications of the Apple A14–Apple M1 familial connection: the Apple M1 is truly capable of smartphone-like standby, a feature that Intel’s and AMD’s laptop chips have been striving for but not quite achieved.

It’s a lot to detail here, so instead I will do so—in a separate issue.

What I’ll be covering next

Next issue: [LMG S10] Issue 142: Implications (Part 1) - Software

Besides the reported fact that the M1 is really very fast (and yes I will spend a little time explaining just how fast), what else does this herald for expectations in the software on devices? Coming up next issue :)

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Companies that achieve both CPU and GPU excellence generally have business incentives that align with that goal (as opposed to, say, making low-power or cheap processors) ↩
Intel had a short-lived but ultimately doomed attempt at a smartphone chip (it was named Medfield). ↩
I‘m going to ignore servers here because I can! And because they’re not really relevant to a discussion on low-power consumer chips. ↩
These are the same ones labelled “high-efficiency”, which is marketing speak for “designed to use very little power” ↩↩
Bandwidth of 16 GB/s is for PICe 3.0; PCIe 4.0 will support up to 32 GB/s, but graphics cards won’t use that much bandwidth to communicate with the CPU. ↩

Issue 140: The shared memory dream

2021-10-02T08:00:00+08:00

Previously: Around 2015, the high-performance computer industry quickly realised that this would be much more efficient if the CPU and GPU could share the same memory. This idea was labelled heterogeneous systems architecture (HSA).

Let’s rewind a bit further from last issue. That was in 2015.

Circa 2009, changes were happening on the desktop motherboard, as the memory controller hub (MCH) came on-board the CPU to reduce latency when communicating with memory (Issues 134)–135)). But the memory chips themselves remained on the motherboard, and this was the case even in 2018, in Apple’s Macbook Air (Issue 136)).

Bringing memory on-board

Smartphones can’t afford to do that; every bit of mainboard space is precious! The Apple A-series processors have been gradually moving more and more memory into the CPU, where it enjoys lower latency communicating with the CPU.

In 2013, Apple released the iPhone 5S, using the Apple A7 SoC. This was Apple’s first 64-bit SoC (Issue 55)), and by this point Apple had managed to bring 1GB of memory onto the SoC package. By 2018, With the Apple A12 SoC, the on-board memory had increased up to 4GB on high-end iPhone X models.

So in 2015, the high-performance folks (working with workstations and servers) were dreaming of the CPU and GPU sharing memory, while from 2013, in smartphones, the CPU, GPU, and system memory were already cohabiting in the same chip package! CPU, GPU, and memory all living in the same space … how does this work?

Memory: yours or mine?

Remember this diagram?

An Intel Core i-series ATX system chipset diagram.
The MCH is merged into the CPU, but still a discrete unit.
DDR refers to computer memory, while GDDR refers to graphics card memory (Issue123))
Source: Ars

Apple is pretty tight-lipped about the technical details of its products, but if the industry standard is anything to go by, the GPU will usually have its own memory, separate from the CPU.

After all, CPUs and GPUs don’t do the same work, or even work the same way (Issue 123)). They use different memory, they use memory differently, they store data differently, and if they accidentally overwrote each other’s data … well, your device would just crash.

So … that on-board memory, whose is it? CPU’s, or GPU’s?

Successful sharing looks like …

One thing that makes it difficult to share memory is that the CPU and GPU have to “speak the same language”; they need a common shared understanding of the workflow involved in passing data through shared memory.

This is easier to develop when a single company has control over both CPU and GPU designs. This is not always the case; many smartphones have CPU designs from one company and GPU designs from another!

For instance, the Apple A-series processors initially used GPUs from a graphics company called Imagination Technologies, designed by their PowerVR division. With a CPU and GPU from different teams, working in different ways, shared memory is not likely to happen¹.

But in the A10 SoC, released in 2016, Apple had subtly started to replace parts of the GPU with their own in-house designs. The A10 would be the last in the line of the “Fusion” SoC series.

When the A11 SoC was released in late 2017—first in the “Bionic” series of SoCs—PowerVR’s GPU had been replaced by Apple’s own design².

Apple is finally in the position of working towards shared memory with their Bionic-series SoCs, with the A14 being the fourth “Bionic” SoC.

Issue summary: Shared memory is easier to implement when a company has control over the designs of both CPU and GPU.

The story which began in Issue 138) is coming to a close soon! Next issue, the curtain falls, the A14 and M1 are released, and Apple (probably) pulls the chip industry in a new direction again.

What I’ll be covering next

Next issue: [LMG S10] Issue 141: The Apple A14 and M1

And finally I can geek out over the A14 and M1 😎 don’t worry, I’ll keep it on-topic.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

What about other companies that had control over the CPU and GPU designs? Such as AMD, Samsung, Qualcomm, …? It’s a long story, and not really suitable for a layman newsletter. Sorry. ↩
The design is technically Apple’s, but they had been learning from many generations of working with PowerVR’s GPU, so the early initial designs are very likely heavily influenced by it. ↩

Issue 139: What’s before this line is mine, what’s after this line is yours

2021-09-25T08:00:00+08:00

Previously: A system-on-chip (SoC) combines the core functionality of a system—processing, graphics, memory, and control—into a single chip package.

I am eager to dig into the meat of the A14 and M1! But first I must set up a story.

The hUMA race

Circa 2015 (actually even a couple of years before that), the industry suddenly seemed to wake up and realise that graphics cards could do a lot more than just play video games. The nature of how they work (Issue 121) & 122)) makes them very amenable to solving problems in scientific computing, particularly in simulations, which use up computational resources by the petaflop, and energy by the megawatt.

In a nutshell, the problem the industry now faces is this:

The GPU is massively powerful … at doing a small subset of things. You can solve scientific equations but can’t run a computer with only a GPU.
The CPU is nimble, and much more suited for everyday tasks, like starting up a computer and connecting to multiple peripherals, and basically creating a useable digital environment for humans.
It thus makes the best sense to use the CPU to set up the heavy-lifting for the GPU, and have the GPU return the results after computation.

Remember this diagram from Issue 134)?

Think about how information would flow here:

The CPU requests data from the hard disks, which get put into system memory (DDR; left-most side).
It carries out some pre-processing on that data to set up the task for the GPU, reading from memory, and storing the results back in memory.
The data is copied from system memory to the GPU, which stores that data in GPU memory (GDDR; right-most side).
The GPU carries out the task, storing the results in GPU memory.
The CPU requests the data from GPU memory, copying it back into system memory.

Are you seeing lots of wasted effort there? I bolded it in case you missed it. So much copying of information!

The high-performance computer industry quickly realised that it could be much more efficient if the CPU and GPU could share the same memory.

The information flow in this hypothetical memory-sharing system would be simplified to this:

The CPU requests data from the hard disks, which get put into shared memory.
The CPU pre-processes the data, storing it back into shared memory.
The CPU sends the location of the data to the GPU, which then reads from shared memory and carries out the task, storing the results back into shared memory
The CPU retrieves the results directly from shared memory.

We save time, bandwidth, and resources without having to copy data between CPU and GPU, twice! The only drawback is that with so many components (CPU, GPU, and others) accessing memory at the same time, you are going to need memory with really high bandwidth.

The industry gave this dream a name. They called it heterogeneous system architecture (HSA), using a heterogeneous unified memory architecture (hUMA) i.e. shared memory.

Nvidia’s heterogeneous unified memory architecture (HUMA) dream
Source: WCCFtech

It turns out that this is a pretty difficult task—consider the amount of bandwidth needed to support CPU and GPU access. Today no product from any company (besides Apple) fully implements this in its SoCs (Issue 138)) yet.

Issue summary: Around 2015, the high-performance computer industry quickly realised that this would be much more efficient if the CPU and GPU could share the same memory.

I should stop here with this issue, and summarise the struggles of these companies in the next issue. They will make Apple’s success with the A14 & M1 a much more compelling read :)

What I’ll be covering next

Next issue: [LMG S10] Issue 140: The shared memory dream

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 138: System-on-Chip (SoC)

2021-09-18T08:00:00+08:00

Previously: The M1 goes one step further: not only does it make do with fewer chips, it does so with passive cooling.

In Issue 136), I showed the miniaturisation of the Macbook mainboard through a series of pictures. While the laptop has remained the same size mostly (apart from getting slimmer), that is not the case with its components. The bigger components, like memory and storage, changed from being separate discrete parts to being another component soldered directly to the mainboard.

But that only gets us so far; even in the M1 Macbook Air, the mainboard is still almost the entire length of a phone. There’s got to be something else.

Today, let’s see how the iPhone has evolved.

What’s in a smartphone: 2008

Rewind to 2008: one year after the first-generation iPhone was launched, the iPhone 3G was released. These early smartphones let us see every little chip that was required to run a smartphone:

iPhone 3G mainboard, with parts labelled
There are lots of small, auxiliary processors around the CPU.
Source: iFixit

In spirit and form, the early smartphones were a lot like the early desktop mainboards (Issue 132)): lots of chips performing highly specific functions.

After all, a smartphone has no need (or space) for a peripheral controller hub (PCH) (Issue 134)) when it does not have add-on peripherals, and no need for a memory controller hub (MCH) when it can put the memory directly on the same chip as the CPU.

At this point, Apple was still using a CPU based on a design by ARM, and manufactured by Samsung. 2 years later, Apple had its own in-house processor: the Apple A4, their own design.

What’s in a smartphone: 2010

This time, Apple had switched to an internal layout distinctly different from the iPhone 3G, and the basic layout (mainboard beside battery) would become a pattern for subsequent iPhone generations: battery taking up almost half the space, charging and audio circuitry at the bottom near the charging port, camera and antennas near the top, and everything else beside the battery.

iPhone 4 on the left, iPhone 12 Pro on the right
The basic layout of the iPhone has been preserved over a decade.
Source: iFixit and iFixit

What’s the difference between this layout and the 3G? Let’s have a look at the iPhone 4’s mainboard:

iPhone 4 mainboard. I got lazy with the labelling because, well, there’s nothing to label!
The CPU is the huge chip labelled “A4”, and there’s memory and the 3G chip on the back.
Source: iFixit

Similar to the transition from AT to ATX motherboards (Issue 132) and 134)), the iPhone underwent a great miniaturisation—in a single generation!

System-on-Chip

What happened to all those separate chips? Most of them got moved onboard, into the A4 chip, or other auxiliary chips. The great consolidating brought all their functionality under one roof.

The A4 chip carries:

CPU¹
GPU
MCH
PCH

This chip is responsible for:

performing calculations
rendering graphics (Issue 123))
managing the flow of information between CPU, GPU, and memory (previously the job of the MCH)
managing the flow of information between storage, network, and the MCH (previously the job of the PCH)

It is literally an entire system on a chip: a system-on-chip (SoC)!

More and more functionality would gradually be migrated into the SoC itself, with fewer auxiliary chips required: sensors, gyroscopes, image processors for the camera, etc. More educational perhaps would be to look at what’s not included in the SoC, particularly by the time we get to the the iPhone 12’s SoC, called the A14.

Not on the A14 SoC:

solid state disk (only part of it is in the SoC)
power management
4G & 5G
audio

The above functionality is highly specialised, especially in modern² smartphones. It manages the remaining parts of the phone: camera & mic/speakers, wifi & bluetooth, and telecommunications (4G/5G). Telecommunications in particular require a lot of power and would have contributed to unnecessary heating in the small CPU package.

And this is how we shrink a laptop mainboard even further.

Issue summary: A system-on-chip (SoC) combines the core functionality of a system—processing, graphics, memory, and control—into a single chip package.

What I’ll be covering next

The M1’s design shares a lot more in common with the A14 on the iPhone and iPad than it does with the laptop CPUs that came before it. I want to go into a bit more detail about this in the next issue, so that it’s easier to see just how different it is from a typical laptop.

First question: what exactly does “unified memory” mean? Why is Apple making it such a big deal?

Next issue: [LMG S10] Issue 139: What’s before this line is mine, what’s after this line is yours

Next issue, we look at a trend that started being reported on in 2015: the high-performance computing industry realised that the CPU and GPU need to have much more integrated memory sharing.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

The CPU carries (some) onboard memory for itself, but the main bulk of memory is still on the mainboard. ↩
I know it’s strange to differentiate older vs newer smartphones when the technology is only 1.5 decades old. But the evolution of smartphone designs over the course has been significant enough that yes, I am going to make this distinction :) ↩

Issue 137: The M1 Macbook Air

2021-09-11T08:00:00+08:00

Previously: Slim laptops have been undergoing a gradual transition: more and more of their chips are no longer available as a replaceable card, but instead soldered directly to the mainboard. Since 2017/2018, most slim laptops pretty much have CPU, memory, storage, and network chips all soldered directly to the mainboard.

Let’s get to it: Intel vs M1 Macbook Air!

The 2020 Macbook Air: passing the torch

Here’s the Macbook Air in 2020. There was one in early 2020 using an Intel Core CPU, and one in late 2020 using the Apple M1 CPU.

Macbook Air, early-2020 (Intel, left) vs late-2020 (M1, right)

Source: iFixit

They look almost identical, but some parts are noticeably different … can you spot the differences?

The cooling fan (upper left) is there in the early 2020 (Intel) model, but gone in the late 2020 (M1) model.
Besides the CPU (upper centre in both models, under a heatsink), the Intel model has a mysterious-looking chip (upper right, covered in black shrouding)

You may have spotted other differences in the hardware, but since this issue is focused on the mainboard and CPU, let’s zoom in on those. Let’s have a closer look at their mainboards:

2020 Intel Macbook Air mainboard, front and back
The Intel CPU unfortunately sits under the huge heatsink, shown with its 4 securing screws
Memory and solid state disk are on separate chips (most likely on the back)
Source: iFixit Store

2020 M1 Macbook Air mainboard, front and back
There are fewer big chips, but the single biggest chip there is much bigger, and Apple-branded
Memory is integrated into the CPU, but the solid state disk sits on a separate pair of chips
Source: iFixit

The M1 Macbook Air: all aboard

Overall, it looks like the M1 has “swallowed” a number of chips. Compared to the 2020 Intel model, the M1 has brought on-board computer memory (the two black chips on the M1), and Apple’s T2 chip (the back shrouded chip on the 2020 Intel Macbook Air). These are major components for computer operation.

So not only does the M1 incorporate more components, it does so while drawing less power—the lack of a cooling fan implies it is passively cooled. From Issue 129), this suggests the M1 Macbook Air also uses less power (8–12W) for its tasks. And reviews for the Macbook M1 Air suggest it is not being thermally throttled except under the heaviest of loads.

How did Apple manage to design a processor like this?

The Apple M1: evolved from a smartphone chip

To get into that story, I’ll have to go even more mobile, and look at smartphone CPUs. After all, the Apple M1 actually evolved from the Apple A-series CPUs for their iPhone and iPad. That starts next issue.

What about other Intel Core laptops running Windows?

They are largely undergoing the same transition, just more slowly. This is the Microsoft Surface Laptop in 2017:

Microsoft Surface Laptop mainboard
CPU(red), memory (orange), solid state disk (yellow), and network card (green) are all soldered on. (Outlined in cyan are the display control chips)
Source: iFixit

Larger-sized laptops that can afford the space may still have solid state storage on a separate card.

Issue summary: The M1 goes one step further: not only does it make do with fewer chips, it does so with passive cooling!

The last issue simply went on too long, especially with all the images, so I figured this issue would stand better as a Core-vs-M1 comparison, instead of being the tail of an evolution-of-Air issue. So it’s short.

What I’ll be covering next

Next issue: [LMG S10] Issue 138: System-on-Chip (SoC)

If smartphones are even smaller than laptops, how do they do it? Laptops seem to have exhausted all the tricks, and those boards still look pretty big.

Next issue, I’ll talk about the next step in the evolution of shrinking mainboards.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 136: The mobile workstation – laptops

2021-09-04T08:00:00+08:00

Previously: A modern CPU is manufactured through a process called photolithography, by which the CPU components are etched onto the silicon substrate by successive layers of chemicals, masking, and laser exposure. When the CPU components could be made small enough, the MCH and CPU were designed onto the same chip, and this is the design used by the Intel Core i7 (1st-gen).

In the last 4 issues, I walked through the general evolution of desktop computers. Let’s go more mobile, and look at laptops. How does something as big as a desktop shrink down to the size of a laptop? And what are the tradeoffs involved?

I addressed the power part of the formula in Issue 130), on power limits; laptops are slimmer in part because part of them—the AC adapter—lies outside the system.

Let’s look at the rest of it.

Laptops use slimmer components

Laptops use slimmer memory than desktops:

Desktop memory (DIMM) vs laptop memory (SODIMM)
Source: Quora

On a desktop mainboard, desktop memory sticks out perpendicularly from the mainboard, all the better to cram memory sticks together and maximise the use of space.

On a laptop mainboard, laptop memory sticks lie parallel to the mainboard, to reduce the mainboard height and allow a slim laptop profile.

As far as I know … there aren’t any other significant differences to highlight (besides size). Unless you’re overclocking, just get the kind of memory your computer/laptop needs. These days, small-form-factor desktops use laptop memory (SODIMM) as well!

Laptops use slimmer hard drives compared to desktops as well:

Desktop hard drive (3.5″) vs Laptop hard drive (2.5″)
Source: M2WifiCards

Desktop hard drives are larger, use larger platters (3.5″ diameter), and hence draw more power (at both 12V and 5V voltages). Laptop hard drives are smaller, use smaller platters (2.5″ diameter), and draw less power (at 5V voltage only). This is why smaller external hard drives, which use laptop hard drives, can be powered over USB, but larger external hard drives, which use desktop hard drives, need an external AC adapter.

These days, laptops have mostly made the transition to solid state disks, and you are much less likely to see hard drives in laptops.

The slim laptop in 2010: Macbook Air

Let’s examine how a characteristic slim laptop, the Macbook Air, has changed in the past 10 years.

In 2010, the Macbook Air had its solid state disk and wifi network card on separate (replaceable) cards. But the CPU, GPU, and memory were all soldered directly onto the motherboard.

The solid state disk in the 2010 Macbook Air
Source: iFixit

The wifi network card on the 2010 Macbook Air
Source: iFixit

The CPU (left) and GPU (right) on the 2010 Macbook Air
Source: iFixit

Another view. CPU (red), GPU (orange), and memory (yellow) are directly soldered onto the mainboard
Where is the chipset? I don’t know; the Macbook Air does not seem to use the same chipset as Intel-powered desktops
Source: iFixit

The slim laptop in 2018: also Macbook Air

By 2018, while the outside of the Macbook Air still looks much the same, the insides are rather different:

CPU (red) and solid state disk (yellow) are directly soldered onto the mainboard
Where’s the chipset? Notice that the CPU seems to have 2 chips on it? They are the CPU and chipset; two chips in one CPU package!
The next image shows the reverse side. Other chips are ignored here, see the iFixit article for full identification
Source: iFixit

Memory (red) and wifi network chip (orange) are also directly soldered onto the mainboard
This laptop only had integrated graphics; Intel had upped its integrated graphics performance sufficiently by this point
Other chips are ignored here, see the iFixit article for full identification
Source: iFixit

Components which in the 2010 Air were on separate cards are now all soldered directly to the mainboard! On the one hand, this saves space, which can be used for other features, or just for larger batteries. It also reduces the cost of manufacturing; connectors are costly to engineer and manufacture. On the other hand, it means upgradeability goes out the window.

Issue summary: Slim laptops have been undergoing a gradual transition: more and more of their chips are no longer available as a replaceable card, but instead soldered directly to the mainboard. Since 2017/2018, most slim laptops pretty much have CPU, memory, storage, and network chips all soldered directly to the mainboard.

Sorry about the image dump, I figured it would still be more convenient than having to click-through to see the images :)

What I’ll be covering next

Next issue: [LMG S10] Issue 137: The M1 Macbook Air

The M1 goes even further than the 2018 Macbook Air, in one pretty significant way. Next issue, we compare how the Intel and M1 Macbooks Air differ!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 135: Part 2 – Unifying the CPU and MCH (post-2008)

2021-08-28T08:00:00+08:00

Previously: Light takes 0.3 ns to travel 10 cm, approximately the distance by wire between the CPU and the MCH. This potentially causes operations between the CPU and MCH to slow down by one cycle, at frequencies above 3 GHz. One way the Intel Core i-series resolves this conundrum is to move the MCH into the CPU.

Time to close up some open plot points from last issue:

The number of pins on 1st-gen Core i7 is almost triple that of the Pentium 4; what are all those pins for?
The MCH has been moved into the CPU to improve latencies, but how is it possible to make it small enough to do that?
Are there any disadvantages?

I’ll answer the second question first. It’s quite simple really.

You see, for circuit components, size doesn’t always benefit performance. A large transistor does essentially the same thing as a smaller transistor. So making them smaller is advantageous really; you can fit more into a single chip!

Making a modern CPU

Modern CPUs are manufactured through a process called photolithography—literally it means “etching with light” (Greek; photo- “light” + litho- “stone” + -graphie “to draw”). By layering chemicals over the silicon base, putting a mask over them, and exposing them to light, a series of chemical reactions are induced to create the circuit pattern on the CPU.

Multiple CPUs are created on a single die this way, then individually cut and processed, in multiple steps spanning several months¹. The precision and fineness of the etching laser determine how small we can create components on this substrate. As the manufacturing process improves, semiconductor manufacturing companies are able to create CPUs that can cram more and more transistors into each square mm (or inch) of silicon die.

Besides being able to cram more transistors into the same space, it turns out that smaller components also use much less power! So we not only get performance gains, we get power efficiency gains as well—two birds with one stone.

CPU diagram of the Intel Core i7 (1st-gen)
The memory controller, misc. IO, and QPI areas perform the role that the MCH used to take up
Source: AnandTech

Moving in

Over multiple generations of process improvements, the MCH and the CPU could finally be made small enough that they could both reasonably fit into the same die. There are, of course, implications.

Previously, the CPU only needed pins to communicate with the MCH. Now, the combined chip needs more pins than before to communicate with the computer memory, graphics processing unit (GPU), and PCH.

So that answers the first question of what the additional pins are for.

Working as one unit

Which leaves the third question: besides latency improvements, are there any other advantages?

Mainboard manufacturers save on the cost of the MCH chipset, which works out to about $40. Pretty significant when a mid-range mainboard costs $80–$160.

With the MCH and its requisite wires gone, the mainboard can be shrunk further; motherboards gradually shrank from ATX and microATX form factor sizes, to smaller form factors, such as ITX and the current popular NUC form factors.

Mainboard form factors

ATX: 30×24cm (12×9.6 in)
microATX: 24×24cm (9.6×9.6 in)
ITX: 17×17cm (7×7 in)
NUC: 10×10cm (4×4 in)

And the disadvantages … well, none on the consumer side actually. It seems to be positive all around!

Well actually, complexity rears its ugly head in power-saving features.

Previously, when the computer is in standby (Issue 115)), the CPU could be safely shut down (i.e. cut power to CPU), leaving only the MCH minimally powered so the computer memory retains its information.

With the MCH and the CPU now sharing the same chip, they have to be put in separate power zones so that the MCH portion remains powered while in standby, while the CPU can be shut down safely, making the chip more complicated than its predecessors.

But that is of little concern for us layfolks.

Issue summary: A modern CPU is manufactured through a process called photolithography, by which the CPU components are etched onto the silicon substrate by successive layers of chemicals, masking, and laser exposure. When the CPU components could be made small enough, the MCH and CPU were designed onto the same chip, and this is the design used by the Intel Core i7 (1st-gen).

This is where the story stops with Intel for this season; their current-gen Core series still uses much the same chipset diagram, and a similar basic architecture, so there’s little new info of relevance for me to add here.

What I’ll be covering next

Next issue: [LMG S10] Issue 136: The mobile workstation – laptops

To continue the story towards the Apple M1, it’s time to switch our lens to the mobile world: tablets, smartphones, and things smaller than a laptop. How are these things designed? What are their CPUs like? We’ll examine the evolution of the iconic Macbook Air, from 2010 to 2020 (warning: image-heavy!)

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

See Three months, 700 steps: Why it takes so long to produce a computer chip (WashPo) for a more comprehensive description of the process ↩

Issue 134: Part 1 – the Intel Core i-series launches!

2021-08-21T08:00:00+08:00

Previously: The ATX form factor also brought with it a new breed of computers with more specialised chipsets: the memory controller hub (MCH) and peripheral controller hub (PCH). The MCH specialises in high-throughput requirements, such as computer memory and graphics. The PCH specialises in lower-throughput needs.

Last issue, we looked at the ATX form factor by Intel, which replaced the AT form factor by IBM. While the AT could get by with a smattering of chips, which worked fine for mostly text-only computers, the ATX has much higher throughput requirements. To help the CPU focus on serving the user’s applications, two chipsets—the memory controller hub (MCH) and peripheral controller hub (PCH), take charge of managing the data throughput. The MCH manages data between CPU, computer memory, the graphics processor unit (GPU), and the PCH, while the PCH manages data between the peripherals (audio, storage, network, USB, …) and the MCH.

An Intel pre-Core i-series ATX system chipset diagram.
The MCH and PCH (labelled ICH here for unimportant reasons) support the CPU in its data operations
DDR refers to computer memory, while GDDR refers to graphics card memory (Issue123))
Source: Ars

There are terms for each of the connections between chips, which I won’t get into because it largely won’t concern us until we have to design performant systems.

The evolution of Intel ATX

The technical geeks are probably fuming at this point because ATX is a motherboard standard, while I’m talking about the evolution of processors which have little to do with the motherboards, at least not directly … but that’s of little importance at this point. Because we first need to talk about CPU pins.

From Issue 131), I gave a simple model of the limitations of data transfer:

There is a max frequency they can operate at, and a limit to the number of wires they can be connected to (throughput = no. of wires × frequency)

The number of pins on processors have been steadily increasing up to this point, and so have the frequencies of processors. The Pentium 4 was succeeded by the Pentium D, then the Pentium Dual Core, then the Core 2. This Core processor preceded the Core i3/i5/i7 processors we know today; I’ll refer to this family of processors as the pre-i Core (rather than the more technical LGA775 series).

Pentium 4: 478 pins
Core, Core 2 (pre-i Core): 775 pins
Core (i7, first-gen): 1155 pins

Yup, the number of pins have almost tripled since the Pentium 4! Remember that more pins does not make the CPU itself calculate faster, it just helps it to transfer data faster.

What are all those pins for, if there is the MCH to manage data flow?

Let’s talk about the limitations of the pre-i Core setup.

Communication at a distance

Wait … don’t electrical signals travel at speeds near the speed of light?¹ How would disappearing the MCH improve latency?

Consider some numbers:

The typical distance between the CPU and the MCH is about 5 cm (2 in). Since the wires between them are not straight, let’s approximately double that to 10 cm (4 in). Light would take 0.3 ns to travel that distance. Which is roughly one clock cycle on a 3 GHz processor—at 3 billion cycles per second, each cycle takes a third of a billionth of a second!¹

Remember that everything in a computer needs to happen like clockwork: for data to sync up, when the CPU sets a bit to one, the other party has to detect the bit signal before the clock cycle ends. If not, it will have to wait for the next clock cycle, causing the operation to slow down and take two clock cycles instead of one.

It’s like when you don’t manage to post the mail by 5pm, the postman has emptied the mailbox, and now you have to wait for 5pm the next day for your mail to be picked up instead.

It’s all about throughput … but also latency

If light is taking one clock cycle to get out of the CPU, you have a problem. Raise the frequency higher than 3GHz, and you can cause a one-cycle lag just waiting for data to come in from the MCH, and to go out again to the MCH. That would counter-intuitively slow down the CPU.

Solution: move the MCH into the CPU!

Squeezing more tenants into the building

Wait … you can just do that?

I will need many more issues to lay out the mechanics of this, so I won’t—I think it’s way beyond the scope of a layman’s guide at that point!—but let’s see what I can come up with in the next issue.

Issue summary: Light takes 0.3 ns to travel 10 cm, approximately the distance by wire between the CPU and the MCH. This potentially causes operations between the CPU and MCH to slow down by one cycle, at frequencies above 3 GHz. One way the Intel Core i-series resolves this conundrum is to move the memory controller into the CPU.

This is what I spent years reading and thinking about to explain, and I finally get to lay it out in text. Incredibly excited to get to the next few issues!

What I’ll be covering next

Next issue: [LMG S10] Issue 135: Part 2 – Unifying the CPU and MCH (post-2008)

Next issue: how the ATX form factor evolved to eliminate the MCH. Sorry to end on a cliffhanger!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

I want to just make a note here that while I believe my choice of analogy is justified, the numbers are wildly off: RealWorldTech here puts the 1st-gen Core at approx 30 ns, for technical reasons that will take at least half a season to unpack (definitely not layman content!). But he also notes that latency for remote memory (i.e. memory not on the CPU, but on the motherboard) is “roughly 30 ns slower than local [memory]” (i.e. memory residing directly on the CPU). So the remote-vs-local latency gap is real and significant! ↩↩

Issue 133: the ATX form factor (post-1995)

2021-08-14T08:00:00+08:00

Previously: Chipsets served as go-betweens in the AT form factor by IBM.

In 1993, Intel launched its Pentium line of processors; barely two years later, in 1995, Intel launched the ATX form factor. This was the beginning of Intel’s dominance in the desktop space, and they could well afford to dictate most of the standards for this form factor.

Chipset diagram

Mainboards at this point were complicated enough that as part of the marketing, tech publications had taken to staring at diagrams of how the chips were connected. These diagrams are called chipset diagrams.

This is the chipset diagram of a typical mainboard for the Pentium 4:

I tried to find a chipset diagram that used human terms instead of acronyms.
This is the best I could do. Annotations my own.
Source: Hexus

The memory controller hub (MCH) now takes on a much bigger role; it is managing data transfer between the CPU, graphics card, computer memory, and the PCH.

The peripheral controller hub (PCH), while managing connections to many more devices, actually has less work to do; these are all low-throughput devices that don’t send much data to the CPU.

ATX mainboard

And this is where the components are found on the motherboard:

A motherboard for the Pentium 4, with key components outlined.
Annotations are my own.

The CPU clearly draws the most power and produces the most heat here. But notice now that the MCH is no longer bare; it now produces so much heat (4–10 W) that it needs to be passively cooled with a heatsink (Issue 129)). the PCH, on the other hand, is still chill enough to get by bare naked (4 W or less).

3D graphics at this point is a rapidly growing industry, especially for videogames. Graphics cards needed much more throughput to the CPU and memory, so the MCH grew to fit into this role as the mediator between these throughput-hungry components

Issue summary: The ATX form factor also brought with it a new breed of computers with more specialised chipsets: the memory controller hub (MCH) and peripheral controller hub (PCH). The MCH coordinates high-throughput components, such as computer memory and graphics. The PCH specialises in lower-throughput needs.

Much as I try to avoid using acronyms, here they are really just easier to read.

What I’ll be covering next

Next issue: [LMG S10] Issue 134: Part 1 – the Intel Core i-series launches!

I don’t know if you noticed, but there seem to be fewer chips here than on the AT board. That’s misleading though; the components that were on the AT board are also on the ATX board, but greatly shrunk. Some of the functionality that used to require multiple chips on AT have been replaced by a single chip in ATX, hence the appearance of simplicity. In reality, the ATX mainboard is more complex!

Next issue, onward with the integration!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 132: the AT form factor (pre-1995)

2021-08-07T08:00:00+08:00

Previously: CPUs have limited throughput, since there is a max frequency they can operate at, and a limit to the number of wires they can be connected to (throughput = no. of wires × frequency). Later designs of early computers increased the capability of computers by delegating more work to secondary chips.

When computers began hitting the mainstream market, they were designed to be able to use interchangeable parts so as to reduce cost and inventory. To support this effort, manufacturers came up with standards for how to lay out computer components on a mainboard; the different patterns came to be known as form factors.

The AT form factor, by IBM, is one of the early ones. An AT motherboard looks something like this:

The AT mainboard

An AT motherboard, with key components outlined.
Annotations are my own.
Original: Wikipedia

Graphics cards, usually added as an expansion card, communicated with the CPU (under the heatsink) through a chipset, while the CPU communicated with memory through another chipset.

At this point, graphics were still barely powerful enough to run 3D graphics (this was before Windows 95!), and the chipsets mainly served as go-betweens between memory, expansion slots (called buses), and the CPU.

After 1995, this would change.

Issue summary: Chipsets served as go-betweens in the AT form factor by IBM.

What I’ll be covering next

Next issue: [LMG S10] Issue 133: the ATX form factor (post-1995)

Short issue here, just to introduce the idea of chipsets! You can see the chips on the AT board look very similar. On the ATX form factor, they will begin to differentiate and specialise.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 131: What do early CPUs and startup founders have in common?

2021-07-31T08:00:00+08:00

Previously: AC power from the wall uses electric current that alternates directions, while DC power from batteries uses electric current that flows in one direction only. All electronics are DC-only, and require an AC-DC adapter to be powered from the wall. The AC-DC conversion produces a significant amount of heat; AC-DC adapters are usually external unless the device has sufficient space or cooling capacity for it.

This season, let’s open up that computer case and see what’s inside. Where does everything fit, and how does all that information get around? More importantly, how are computers able to cover such a large range of sizes, from towering desktops to tiny smartphones?

What a computer wants, what a computer needs

The common model of a computer is that it … computes. It calculates. It takes in numbers, and spits out more numbers.

That’s not quite right.

While a computer does carry out compute operations, these are far outnumbered by load/store operations (Issue 58)). Why so much loading and storing of data?

Moving data

The CPU itself has precious little storage (<20 MB of cache storage); it is only a wee little chip! Most of the data in a computer is stored in a hard drive or solid state drive; let’s just call them storage drives for now.

So CPUs have to read data from a storage drive. This is a slow operation, because storage drives are slow; writing to storage drives is even slower than reading from them.

In the meantime, the CPU needs a place to dump working data; this is computer memory (2–32GB). Memory is slower than the CPU’s cache, but much faster than a storage drive.

That’s 3 places to stash data so far: storage drives, CPU cache, and computer memory. You with me so far?

Pipelines

The next place that often requires lots of data is the graphics card (Issue 123)). For you to play a video game, the computer has to:

Load game data from the storage disk,
Store most of it in memory while it’s doing some number crunching in its cache,
Get the crunched numbers to the graphics card for rendering graphics (Issue 122)),
Load more data from memory while crunching more numbers, and passing them to the graphics card.

This involves far more loading and storing than computation. And there are limitations to how quickly data can be transferred.

Throughput

How does data get transferred? Through very fine wires usually. One side (e.g. the CPU) applies a voltage to the wire, the other side (e.g. memory) checks the voltage on the wire. No applied voltage = 0, applied voltage = 1.

How does the CPU know when to apply the voltage, and the memory know when to check it? These operations are synchronised through cycles, like a highly coordinated factory. A CPU operates on a frequency of up to billions of cycles per second, each cycle potentially transferring one bit of data (Issue 40)) if there are no delays.

Typically, the transfer rate is somewhat slower; how do we transfer more data per second? By adding more wires! With two wires, we can transfer two bits per cycle; four wires = four bits per second, eight wires = 8 bits per second … at some point, we run into a different problem. The CPU is a small chip, and there is only so much surface area for us to connect wires to.

An Intel Skylake desktop CPU.
Each gold contact on the under-surface connects to a pin on the motherboard when the CPU is seated properly in its socket
Source: Wikipedia

Well, that just sucks.

The limits of one chip

Come to think of it, humans are much the same; we only have two hands and two legs, there are limits to how fast we can do things, and limits to how long we can stay awake working. We mostly get around these limitations by learning to delegate.

In the same way, computer designs evolved to delegate more work to secondary chips, leaving the CPU to focus on computation. We’ll explore the gradual evolution of these architectures, so you can better appreciate the elegance of the Apple M1’s design ;)

Issue summary: CPUs have limited throughput, since there is a max frequency they can operate at, and a limit to the number of wires they can be connected to (throughput = no. of wires × frequency). Later designs of early computers increased the capability of computers by delegating more work to secondary chips.

The more I learned about computer architecture, the more I see parallels to startups and organisational culture in general. I was really looking at ways of organising information flows, and observing how the computational limitations of different parts influence the design of the whole chip! This is a constant work in progress, which is why we keep seeing new CPU designs emerge each year.

What I’ll be covering next

Next issue: [LMG S10] Issue 132: the AT form factor (pre-1995)

Let’s start from—nah, I wont go all the way back to the beginning, just to the point where computer architecture was already recognisable in its early modern form. Next issue, a big welcome for the AT form factor!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 130: Power limits

2021-07-24T08:00:00+08:00

Previously: The larger the surface area, the faster an object loses heat. The larger the temperature difference between an object and its surroundings, the faster the object loses heat. Heat is bad for computers, and CPUs will need cooling to be able to process computations quickly. A mobile phone thus typically uses no more than 4 W of power, a laptop can use 25–45 W, and a desktop can usually use 65 W and more. Two popular ways of increasing the cooling capacity of a device is to attach a larger piece of metal to the chip (passive cooling), or use a fan to force air over the heatsink (active cooling).

Point 1: A powerful device produces lots of heat.
Point 2: A device that produces lots of heat needs a large surface area (directly in contact with the heat source) to stay (relatively) cool.

These are two of the primary factors determining how tiny a computer can be. Can something the size of an iPhone be as powerful as something the size of a Macbook? It depends on how much cooling is available to it!

One more factor to add in this issue: power. Without power, none of your devices would work … and that is one more source of heat to be dissipated, incidentally.

Batteries and DC power

Some devices need to carry their own stored energy (in the form of batteries); the devices are powered by direct current (DC) from batteries. In this form of power transmission, electric current only flows one way—this is why it is important to put batteries in the right way! Electric current comes out from one end, and re-enters from the other end, marked with + and – symbols.

Batteries seldom provide power at the voltage required by devices and CPUs. For example, smartphone batteries usually have a voltage of about 3.7 V, even though the CPU usually requires about 1 V to operate. This allows the batteries to power the device with a low current¹, so as to minimise the heating effect of current in the wires.

That means some power conversion has to take place on the smartphone’s mainboard. Fortunately, DC-to-DC conversion is highly efficient (though not 100%), so it doesn’t contribute much heat in the device.

Wall sockets and AC power

On the other hand, alternating current (AC) from wall sockets has electric current flowing both ways—the current switches directions 50 or 60 times a second (see this PDF of Worldwide AC Voltages & Frequencies). Connecting this directly to a device that needs DC is looking for trouble! This AC power source has to be converted to DC through an AC-DC converter (a.k.a. “power brick”, “power supply”, “AC adapter”, …), and that process currently only goes up to 90% peak efficiency².

That means if you have a desktop running at 100W (maybe while gaming or encoding video files), the AC-DC converter alone draws 111W at the wall socket, wastes 11W (in the form of heat), and provides 100W of power to the desktop.

And heat is the enemy of CPUs.

External vs internal power supplies

Ughh, power bricks … so many different types, with different connectors, and we never quite know if we can use one laptop’s power brick on another laptop (at least, until USB Type-C power for laptops came along; more in Issue 127))

In a laptop, you really do not want the power supply dumping this heat into the laptop! The laptop already has enough work to do getting heat from the CPU out of that cramped space into the surroundings. You don’t want to give it more heat to remove, and risk throttling the CPU’s performance (Issue 129)). It’s better that the AC converter/power brick remains external to the laptop, dumping that heat into the surroundings directly without heating up the laptop’s internal space.

The internals of the Apple AC converter (i.e. power adapter)
Source: Ken Shiriff

In larger devices—the Mac Mini, game consoles (e.g. PS4 or Xbox One), and larger desktops, there’s plenty of space in the device’s internals, and they have sufficiently powerful cooling systems that can remove this heat. In desktops especially, the power supply may be large enough that it has its own cooling fan!

The internals of a desktop power supply.
Notice the cooling fan mounted on the back with 4 silver screws, and the two silver heatsinks mounted vertically
Source: Wikipedia

So for large devices, it makes sense to hide the power supply within the device for a sleeker look.

In fact, for high-power devices, a power adapter is a poor option. Since power adapters don’t have their own cooling fans, they have limited cooling ability, and are liable to overheat easily if they have to provide >100 W to a device (remember that this means they release at least 11 W of heat, while passive cooling can typically dissipate up to 8 W).

Device categories

Putting together the information from Issue 129) and this issue, we can deduce that devices seem to sort themselves into form-factor categories depending on how much power they draw, and how much heat they put out:

Devices drawing >100W at peak, and putting out >80 W of heat:

Generally large, directly powered from the wall (by AC), with power supply within the device.

In rare cases they do use external AC adapters (such as high-power gaming laptops).

Devices drawing 12–65W at peak, putting out 25–45W of heat:

These devices cannot be passively cooled, and thus require active cooling (i.e. a cooling fan and heatsink).

To avoid adding heat to the device from the AC-DC conversion process, they usually use external AC adapters.

Even in devices as small as the Nintendo Switch, you can usually spot the cooling vents where the cooling fan blows warm air from the device into the surroundings.

These may be safely powered by USB Type-C.

Devices drawing <12W at peak, putting out <10W of heat:

These devices can be passively cooled.

Large devices, such as tablets, have a larger surface area to dissipate heat and can afford to draw as much as 8–10W, while smaller devices such as smartphones typically have to remain under 5W.

These are usually powered by USB (at a voltage of 5V), though some may draw power at 9V.

The M1 Macbook Air is passively cooled and thus in this category because its M1 processor is configured to limit its maximum heat output to approx. 10W or less; the M1 Macbook Pro has a cooling fan and a higher max heat output configuration, which allows it to perform at greater capacity.

If you spot a device in the wild that claims to have performance much greater than its form factor—its shape, size, footprint—suggests, you would be wise to suspect over-optimism or a scam! At least until it is clear how they plan to provide that power and get rid of that heat …

Issue summary: AC power from the wall uses electric current that alternates directions, while DC power from batteries uses electric current that flows in one direction only. All electronics are DC-only, and require an AC-DC adapter to be powered from the wall. The AC-DC conversion produces a significant amount of heat; AC-DC adapters are usually external unless the device has sufficient space or cooling capacity for it.

I’m noticing a pattern: issues where I explain concepts tend to be shorter than issues where I explain limits due to engineering and the physics of reality. I hope I can shorten the latter without sacrificing practical knowledge. Let me know how you’re finding these issues :)

What I’ll be covering next

Next issue: [LMG S11] Issue 131: What do early CPUs and startup founders have in common?

This season was focused on firmware and computer components; it is part 1 of a set of concepts I need, so as to explain why the Apple M1 processor is a game-changer for personal computers. I explained what a graphics card is and what it does, I explained why some laptops are upgradeable and why some are not, I explained why some devices need cooling fans and others don’t, and I summarised the relationship between device form factors and their power limits.

Part 2 will extend this exploration inside the computer. I noticed that layfolks’ mental concept of a computer typically includes the idea that there is a CPU, memory, a hard disk/solid state drive, and maybe a graphics card inside a computer. That’s plenty good enough for everyday life; it’s like understanding that all the employees of a company are in a particular building. But it is insufficient for understanding why the M1 is so much faster; you’re going to need to know where the employees are situated, and what their workflow is like!

It’s a tempting but misleading story to imagine that Apple simply has much better engineers; I would say that their engineers were instead under the influence of incentives that allowed them to imagine a more coherent architecture. Let’s get into it starting next issue, again beginning from first principles: how exactly does the CPU, memory, and storage disk work together?

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Power = Voltage × Current, so to provide the same power with a lower current, you have to provide it at a higher voltage ↩
AC-DC converters typically have a range of input power they can convert (e.g. 0–65 W for laptop adapters, 0–500 W for desktops). The efficiency is highest at about 50% of that load, and efficiency drops as the load increases or decreases from that point. ↩

Issue 129: Cooling

2021-07-17T08:00:00+08:00

Previously: Upgradable parts need a slot or socket to be inserted into; these slots/sockets need to be made robust enough, causing them to take up more space than a soldered part. Devices which were designed to be small and portable generally eliminate these as far as possible, opting to have parts directly soldered to the board instead.

Why do computers need power?

Other home appliances I can understand. They need to heat up air/water, move air/water around, or extract heat from air/water to move it elsewhere. These things all need energy. But a computer … all it does is move electrons around! All the information in a computer that changes is just electrons moving; that should not need so much power, should it?

As it turns out, energy-information equivalence theories posit that manipulating information does increase entropy which does involve energy—but this is the Layman’s Guide to Computing, not a physics newsletter. Let’s just say that managing information, in its abstract sense, needs very little energy.

Energy usage in computers

What happens to all the energy that a computer uses, then? Some miniscule amount of it goes to manipulating information. A tiny amount goes to lighting up LEDs (these devices somehow always have LEDs), and maybe running the cooling fans. The rest of it is wasted as heat.

The history of computing is also a history of less and less wasted heat. The Cray 2 supercomputer, in 1985, needed 195 kW to produce 1.9 gigaflops (1.9 billion floating-point operations per second; more context in Issue 123)) of computational performance. The iPhone XS, in 2016, needed less than 1 W to produce 1 gigaflop.

No, I’m not going into the environmental concerns and carbon footprint of computing. We have more tangible and immediate concerns here.

Because heat is bad for computers. When microprocessor chips heat up, the semiconductor material they are made of no longer behaves as it should; it gets “leaky”, allowing electrons to go where they shouldn’t. The data it handles starts to get corrupted, and it eventually crashes.

Thermal throttling

In the past, microprocessors were simpler, and had no thermal controls whatsoever; if you allowed them to run as-is, without any cooling assistance, they would just run until they began smoking and sometimes even catch fire (enjoy this 2005 video of a CPU doing just that).

Today, CPUs are somewhat more sophisticated. Once they start heating up to their thermal limit, onboard thermal control circuitry will attempt to throttle the CPU’s performance to keep it at a safe temperature. So you will have a laggy and unresponsive CPU, but at least it’s not on fire!

Still, most CPUs are going to need some sort of heatsink that helps to draw heat away from it, and dissipate the heat into the surroundings.

Getting rid of heat

There are two things you need to know about heat loss to the surroundings:

The larger the surface area, the faster an object loses heat.
The larger the temperature difference between object and surroundings, the faster the object loses heat.

CPUs are kind of at a disadvantage here.

They are really small; the part that is in contact with the heatsink is usually about 400 sq mm (approx. 2 cm by 2 cm, or 0.64 sq in)¹
The thermal limit for most CPUs is only about 100 °C (212 F), compared to most metals which have theirs in the hundreds or even over a thousand °C

Heatsinks

In practice, this means that a bare chip can only run about 4 W before it starts to run into its thermal limits (we call this overheating). Anything more powerful than a basic home router or smart device (such as the Amazon Echo or Google Home Assistant) will need some sort of heatsink to avoid overheating.

The simplest way to cool a chip is to slap a piece of metal on it to increase the surface area (factor 2). This is known as passive cooling. A paste called thermal paste is applied between the heatsink and chip to improve the transfer of heat. To pack as much surface area as possible into a tiny space, this piece of metal usually has long, thin fins, giving the characteristic look of heatsinks:

A passive heatsink on the northbridge chip of a computer mainboard
Source: found on Superuser

The effectiveness of passive heatsinks depends on the ambient airflow around it. Some creative setups that manage to get the metal case itself in contact with the CPU can readily cool up to 45 W, with zero fan noise!

For mobile phones, tablets, and laptops, such heatsinks would add too much to the device thickness. Instead, the CPUs are usually in direct contact with a larger metal surface, sometimes even the metal back of the device; this is why they feel warm to you in the first place. This allows tablets to use up to approx. 8 W of power.

The logic board cover in an iPad Pro 11 has copper inside; it helps to spread heat to the rest of the device instead of concentrating it all in one spot
Source: iFixit

Where there isn’t this luxury of space, another option is to use a slim heatsink, and increase its cooling ability by forcing air through it. This form of cooling is called active cooling, and usually done with a fan of some sort, a popular option for thicker laptops. This allows laptops to run between 25–45 W, and desktop computers to run 65 W and hotter (with larger heatsinks, of course)

An active heatsink on the CPU of a computer mainboard
Source: found on Superuser

These numbers are for stereotypical mobile devices, laptops, and desktops; stranger or hybrid designs may have different cooling capacities (e.g. a tiny cube desktop might only have 25–35 W of cooling capacity).

Issue summary: The larger the surface area, the faster an object loses heat. The larger the temperature difference between object and surroundings, the faster the object loses heat. Heat is bad for computers, and CPUs will need cooling to be able to process computations quickly. A mobile phone thus typically uses no more than 4 W of power, a laptop can use 25–45 W, and a desktop can usually use 65 W and more. Two popular ways of increasing the cooling capacity of a device is to attach a larger piece of metal to the chip (passive cooling), or use a fan to force air over the heatsink (active cooling).

If you walk away with only one thing from this issue, it’s that heat dissipation constrains the max performance for any device, and heat dissipation is usually constrained by the surface area available for cooling.

What I’ll be covering next

Next issue: [LMG S10] Issue 130: Power limits

Computers turn electrical power to heat, drawing a miniscule amount off for computation. But where does that power come from? In the last issue of Season 10, I’ll touch on a highly underrated hardware component: the power supply.

This and more, next issue!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Multiple CPUs are produced from a single semiconductor wafer. Keeping CPUs small maximises the yield from manufacturing, and reduces the chance of a manufacturing error on any single CPU. This is going to take a whole ’nother season to explain … ↩

Issue 128: Upgradeability

2021-07-10T08:00:00+08:00

Previously: USB Power Delivery is a specification that describes how much voltage and current can be supplied by different categories of USB cables. It allows power delivery at different levels for all kids of connected devices, up to 100W. This should help to simplify cable setups that otherwise require multiple kinds of cables between two closely interconnected devices (such as a laptop and an external monitor).

At some point in the past, computers could be upgraded with all kinds of parts: you could upgrade to a better network card, a better processor, or add more memory, without changing out the entire computer!

The short history of personal computers

After IBM released the IBM 360, the first mainframe that was meant to be used in both large-scale and small-scale applications, it quickly realised that providing service support for it was going to be a nightmare if each version required its own specialised support. In the 1970s, it was already thinking of a family-of-parts concept that allowed its products to use a set of interchangeable parts to reduce the number of these parts.

But Intel, founded in 1971, had made a major breakthrough in manufacturing low cost microprocessors. The price of computers dropped steadily in response, allowing the personal computer (PC) to become affordable to home users, and the family-of-parts concept came along for the ride.

The early computers, when purchased, needed a network card to connect to a network, needed a sound card to produce sound, needed a graphics card to view graphics … you basically needed a card for everything!

Upgradeable parts

On a desktop, you could upgrade a number of components.

Desktop CPUs were purchased separately, and had to be installed into a socket on a mainboard (also called a motherboard).

A CPU socket on a mainboard
Source: Digital Trends

Desktop computer memory was likewise purchased separately, and slotted in:

Memory slots on a desktop mainboard
Source: Bleuwire

On top of that, you still had to buy your own DVD/Bluray drives, storage disks, and other peripherals, to be plugged in laboriously to the mainboard. Many of these features made it into laptops as well.

Laptop upgrades

Because of the smaller space available, the CPU socket had to go. Most laptops have their CPUs soldered directly to the mainboard, and the CPU cannot be upgraded separately. Buying a laptop with a better CPU often means buying it with a different configuration, mainboard and all.

But laptops retained the ability to upgrade memory through the use of memory slots. These were oriented flat along the board to reduce space.

Memory slots on a laptop
Source: Laptopmag

And they also retained the ability to swap out or replace their disks, whether hard disk or solid state drive.

The hard disk slot on a laptop
Source: TweakTown

The SSD slots on a laptop
Source: EaseUS

Limitations of upgradable parts

A common limitation here is that replaceable components take up more space than directly-soldered ones, because the sockets and slots cannot be manufactured too thin if they are to be robust. As the industry moves towards thin-and-light designs, and hardware support moves to a system that replaces the whole device rather than individual parts, these replaceable parts are already on their way out.

On the Apple Macbook, Apple has already soldered down the CPU, memory, and SSD, allowing for no upgrades. This is also the case for many other thin-and-light laptop manufacturers.

Issue summary: Upgradable parts need a slot or socket to be inserted into; these slots/sockets need to be made robust enough, causing them to take up more space than a soldered part. Devices which were designed to be small and portable generally eliminate these as far as possible, opting to have parts directly soldered to the board instead.

There are of course other reasons for this transition, economic as well as financial, which I will not tackle here. I just wanted the chance to show some upgrade slots so if you ever get to see an old laptop opened up, you know what they are for!

What I’ll be covering next

Next issue: [LMG S10] Issue 129: Cooling

Next issue, another dive into another issue: why is my laptop running so hot? Why is it so loud? Why can’t I get a more powerful processor in this tiny PC? Why is my i7 slower than my friend’s i5?

This and more, next issue!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 127: USB Type-C Power Delivery

2021-07-03T08:00:00+08:00

Previously: USB is a (licensed) technical standard that describes how devices connect to each other through a cable. USB Type-C is a new connector standard that supports USB 3, DisplayPort, HDMI, and Thunderbolt. It is able to carry multiple types of data simultaneously, in limited combinations. In a USB connection, one device acts as the host while the other acts as the device; the host initiates all communication.

Last week, I differentiated the USB Type-C cable specification from the USB3 data specification; the former describes how the cable and connector should be, while the latter describes how to transmit USB data over a Type-C cable. Remember that in addition to USB3 data, the Type-C cable can also transmit HDMI or DisplayPort video data!

What else can a Type-C cable do? Oh, right—provide power to devices. In other words, charge them.

What’s wrong with USB2 charging?

Absolutely nothing! At this point, it was not expected that USB would be needed for powering anything more than some basic peripherals, like keyboards, mice, and anything less power-consuming than a small external hard disk. 5V (volts) of voltage with 0.5A (amperes) could provide 2.5W of power, and that was considered plenty enough.

The USB Implementers’ Forum (USB-IF) didn’t want to set too high a standard for USB devices and peripherals (including USB-certified cables); original equipment manufacturers (OEMs), the ones who actually had to manufacture the goods, would complain about the high cost. Nobody likes being undercut by cheap knockoffs that don’t bother applying for the license and following the specs (do you notice the official USB logo when buying?). So for a long time, we had up to 2.5W of power from USB ports.

And then tablets came along, drawing 8–12W of power to do whatever they have to. Aside from the iPad, which used its own connector, these were charged over USB. So their manufacturers had to come up with kludges to work around USB limitations. They had Quick Charge, Dash Charge, and all kinds of other standards which were not approved by the USB-IF, just to allow their cables to provide up to 12W of power (3A of current, more in some cases) to their tablets while charging.

USB 3.0 bumped the limit up to 0.9A (providing 4.5W), which was nice but far from enough. The hard limit was the cable itself though: drawing anything more than 5A over the usual thin USB cables would cause them to heat up to unsafe levels. Clearly, something more was needed.

USB Power Delivery

From 2012, the USB-IF finally added a Power Delivery (PD) specification, which allowed power to be delivered over USB cables at different voltages.

In addition to 5V, which is used by phones, tablets, and their power banks, the PD spec also allows charging at 9V (fast charging for some devices), 15V (for higher-power devices like the Nintendo Switch), and 20V (what most laptops use for PD charging). With a current of up to 5A, this technically allows up to 100W of power to be delivered—sufficient for pretty much all laptops.

Which voltage is used?

The actual voltage to be delivered by the charging host is negotiated with the host. When a charging device is connected, it communicates the voltages it can support, the host compares it with the voltages it can supply, and power at a supported voltage is delivered.

Can any cable be used?

Greater current requires a thicker cable to be used, as thinner cables have more resistance and will heat up to unsafe levels. A cable following the USB-PD specification will negotiate the correct voltage and current in any case, so if your cable is not charging at a level you know is supported by both your devices, do check the cable rating. You may have to buy a higher-rated cable.

Are there any advantages to buying a more expensive cable?

Users of external monitor screens and docking bays often have to connect multiple cables from those devices to their laptops for power, (USB) data, and video output. With USB Type-C and USB-PD specifications unifying these three requirements into one cable, we will (eventually) be able to connect a laptop to a Type-C monitor using a Type-C cable, and this cable will supply power plus allow the laptop to use all devices connected to the monitor.

It’s supposed to simplify the physical cable mess, at the cost of having to manage a more complicated specification. Let’s see how that plays out in the next decade.

Issue summary: USB Power Delivery is a specification that describes how much voltage and current can be supplied by different categories of USB cables. It allows power delivery at different levels for all kinds of connected devices, up to 100W. This should help to simplify cable setups that otherwise require multiple kinds of cables between two closely interconnected devices (such as a laptop and an external monitor).

What I’ll be covering next

Next issue: [LMG S10] Issue 128: Upgradeability

Millennials and other older computer users may remember the glory days of the desktop, when almost any component could be removed and swapped out for another. Laptops used to enjoy this upgradability to a lesser extent; the laptop memory and hard disk came as separate slotted cards that could be replaced with upgraded versions for increased performance.

What happened to that trend today? The reason mainly lies in the realm of economics, but I figured I’d use the chance to dig a little deeper and explain what is going on with the hardware that no longer allows this to happen.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 126: USB Type-C

2021-06-26T08:00:00+08:00

Previously: Analog formats such as VGA mostly contain the control signals that the CRT needs to operate, while digital formats such as HDMI and DisplayPort contain image data that the device must convert to control signals. Analog signals need a digital-analog-conversion (DAC) chip to be converted to digital signals, hence VGA-HDMI adapters tend to be more costly than DisplayPort-HDMI adapters. Dedicated graphics cards generally support more simultaneous output video streams than integrated graphics cards.

This week, I attempt to untangle the confusion around USB Type-C, informally also referred to as USB-C.

What is USB Type-C?

It is a connector standard. It sets standards for this connector:

USB-C plug
Image via Wikipedia

What does a connector standard do? It determines how many pins the connector should have, and what each of the pins should be used for, how the connector should be shaped, how the docking port (where the cable gets plugged into) should be designed, and other similar details. It’s all about the docking.

Pinout diagram of a USB-C plug
Image via Wikipedia

But won’t somebody think about the data?!

Ah, now we’re going back in history …

Universal Serial Bus (USB) is a (set of) industry standards that sets requirements and protocols for—well, how data is transferred from one device to another. It is maintained by the USB Implementers Forum (USB-IF). The first version of the standard was released in 1996, second version (USB 2.0) in 2000, and third version (USB 3.0) in 2008.

While USB 2.0 (or Hi-Speed USB) supported a transfer rate of up to 60 MB/s, USB 3.0 supports up to 625 MB/s, allowing for much faster transfers from external disks and other devices.

We don’t have to worry so much about these versions, because USB is designed to be backward-compatible. That means all devices that support USB2 also support USB1, and all devices supporting USB3 also support USB2. The primary advantage that each successive USB version has over previous versions is higher throughput, more supported features, and more connectors to confuse (okay, that last isn’t an advantage 😛).

Comparison of USB connector plugs, excluding USB Type-C plugs
Image via Wikipedia

As you can see, this makes for a lot of confusion, especially when compatibility is mixed: USB2 ‘A’ connectors are meant to go into USB3 ‘A’ receptacles, but USB2 ‘B’ connectors aren’t meant to go into USB3 ‘B’ receptacles …!

So USB Type-C is meant to be the one connector standard to rule them all. It even has rotational symmetry, so it shouldn’t matter which way you plug it in!

So what is a USB device?

Technically speaking, it is a device whose manufacturer has paid for a USB license, sent their device for certification, and passed the USB-IF’s certification requirements, allowing the manufacturer to put the USB logo on the packaging.

Practically speaking, it is any device that has a USB port, allows USB devices to be connected to it, or allows itself to be connected to other USB devices, and basically behaves like a USB device. (If it walks like a duck and talks like a duck …)

Furthermore, it is important to differentiate between a USB host and a USB device. The host acts as the controller of the device, and initiates all communication between the two. For instance, if you attempt to connect two Android phones to each other with a USB cable, one must act as the host and the other as a device, even if both are capable of acting as hosts. The host decides what can be done through the connection.

This helps to explain why a USB-charging battery pack cannot also be an external storage device at the same time, i.e. you cannot combine a phone battery pack with an external hard disk and hope to charge your phone + access the external hard disk at the same time. When your phone charges from the battery pack, it acts as the USB device (in charging mode); when it accesses a hard disk, it is acting as the USB host. It cannot do both simultaneously!

Data supported over USB Type-C cables

In addition to USB3.0 data (and later versions of USB 3), the USB Type-C specification also allows the Type-C connector to carry other kinds of data, if supported by the device:

DisplayPort video data for monitors and computer display devices (Issue 124))
HDMI video data for monitors and consumer electronics devices (also covered in Issue 124)
Thunderbolt 3 data, for high-data-transfer devices

The Type-C specification even supports modes that allow a Type-C cable to carry multiple types of data simultaneously. For instance, a Type-C connector that connects a laptop to a monitor can:

carry DisplayPort video data, enabling the laptop to use the monitor as an external display,
carry USB 3 data, enabling the laptop to use USB ports and other features (such as audio inputs/outputs) on the monitor

… All this comes with the caveat that one must read the manufacturers’ fine print to see if these features are supported on the respective devices. Just because the USB spec allows it, doesn’t mean that a particular device implements it!

For example, some laptops may have two Type-C ports, but only one of those ports will support Thunderbolt and DisplayPort; the other port sometimes only supports USB3 data.

Issue summary: USB is a (licensed) technical standard that describes how devices connect to each other through a cable. USB Type-C is a new connector standard that supports USB 3, DisplayPort, HDMI, and Thunderbolt. It is able to carry multiple types of data simultaneously, in limited combinations. In a USB connection, one device acts as the host while the other acts as the device; the host initiates all communication.

What I’ll be covering next

Next issue: [LMG S10] Issue 127: USB Type-C Power Delivery

This issue was about how data is handled over USB Type-C. Next issue, how power is handled over USB Type-C. After all, every day, millions of devices are getting powered over Type-C: from smartphones, to Internet-of-Things (IoT) devices, to full-size laptops. How is a single type of cable able to cover such a wide range, when earlier cable types could not?

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 125: Analog and digital conversion

2021-06-19T08:00:00+08:00

Previously: The VGA video format originated in the time of cathode-ray televisions (CRTs). It was superseded by HDMI, a video format standardised by consumer electronics companies. DisplayPort, on the other hand, is a video format standardised by computer display companies.

The bulk of the story has been written in Issue 123), so this issue will be short.

Why two digital formats? HDMI vs DisplayPort

HDMI is a consumer electronics standard, and is thus heavily focused on broadcast and home video needs. HDMI primarily supports video and audio data. It also carries some control signals through CEC (for Consumer Electronics Control) capability, enabling a video game console or set-top box to send remote-control commands to a television set via the HDMI connection.

DisplayPort is a computer display standard, focused on computing needs. DisplayPort supports video data, optionally audio data, and additional data (such as USB). Since 2014, compatible devices can also transmit DisplayPort signal format over USB-C, provided both the transmitting and receiving devices support it.

Analog vs digital formats: a recap

Digital formats differ from analog formats, because they do not carry the raw signal for the device. Instead, they carry information about the image, encoding the image data (Issue 43)) into video form; after all, video is just a series of moving images! The device takes on the responsibility of figuring out how to make the images appear on-screen, which is why digital TVs require significantly more electronics than CRT TVs.

Analog to digital conversion

An analog signal does not easily convert to a digital signal! Analog-to-digital converters, such as the VGA-HDMI adapters that seem to be needed universally, have to figure out how to process a wave-like (analog) signal, and convert it into the pixel data that constitute an image. These adapters need a digital-analog conversion (DAC) chip to carry out that conversion.

In contrast with analog signals, digital signals usually carry uncompressed video data. Digital-to-digital converters thus do not need to carry out any conversion—it’s the same image! Most of these converters merely need pins to be mapped to each other, which makes them cheaper (e.g. DisplayPort-HDMI converters).

A chip that is able to handle multiple formats and produce a robust output is costly, and it is expected that a good adapter will cost quite a bit. That is no guarantee, however, that a costly adapter will always be a good adapter.

Annddddd … back to graphics cards

The graphics card is in charge of converting the final rasterised signal (Issue 122)) to a video signal, depending on the video format that is required. Naturally, this requires additional chips. Most integrated graphics chips support VGA and HDMI, while DisplayPort support is usually reserved for higher-end devices.

Higher-end graphics cards offer support for more video formats. Furthermore, they also have the capability to rasterise and output video streams for multiple screens, enabling multi-screen support for those who need it. If you find that you need more than two screens for work or play, you are likely going to need a dedicated video card that supports three or more simultaneous video output ports.

Issue summary: Analog formats such as VGA mostly contain the control signals that the CRT needs to operate, while digital formats such as HDMI and DisplayPort contain image data that the device must convert to control signals. Analog signals need a digital-analog-conversion (DAC) chip to be converted to digital signals, hence VGA-HDMI adapters tend to be more costly than DisplayPort-HDMI adapters. Dedicated graphics cards generally support more simultaneous output video streams than integrated graphics cards.

I hope this sufficiently explains a question I hear so often: why do VGA-HDMI adapters cost so much? I’m also glad this issue ended up much shorter than I expected.

In general, if your adapter/cable needs a chip to carry out signal conversion, it’s going to cost more than a plain cable.

What I’ll be covering next

Next issue: [LMG S10] Issue 126: USB Type-C

Next issue, let’s zap some common questions about the latest USB standard! This is going to stretch over two issues. First up: USB-C for data.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 124: Video formats

2021-06-12T08:00:00+08:00

Previously: Graphics cards contain lots of tiny cores that are much better at performing the same calculation for lots of decimal numbers. These cores are organised into compute units; a graphics card with more compute units can perform more calculations every second. Graphics cards have their own onboard memory, separate from the CPU. GPU memory is different from computer memory; it is configured for much higher data throughput. Integrated graphics are GPUs that are integrated into a CPU chip; these do not have their own onboard memory, and share memory with the CPU.

Ah, the esoteric, tricky, complicated art of shooting electromagnetic radiation into the eyes of humans … entire tomes have been written about this. And I will attempt to summarise the pertinent parts into a single newsletter issue. The hubris!

It’s really something when you suddenly remember that television has been around since the 1930s, while computers in some recognisable form were a 1970s invention. The first part of the computer to be invented was the screen!¹

How did screens work if computers weren’t invented? A crash course:

Cathode-ray tube (CRT) televisions

Early colour television screens had primary-colour (red, blue, and green) phosphor dots embedded in the user-facing portion of the screen. These dots emit coloured light when struck by electrons. At the back of the television, cathodes made of barium oxide are heated, causing them to emit electrons.

These electrons, when emitted, fly in all directions, but they are shaped into a beam by an electric field (hence the name “cathode ray tube”, which you might have seen in the form of the acronym CRT).

1: Barium oxide cathode, which is heated to emit electrons (labelled “2”), which are shaped into a a beam by an electric field (labelled “3”)
4: Deflecting coils, discussed in the next paragraph
8: Coloured phosphor dots, arranged on a flat screen (labelled “7”).
Image via Wikipedia

This electron beam could be aimed at any of the phosphor dots by a set of electromagnetic deflecting coils mounted along the sides of the TV, on the inside surface. One set, oriented vertically (mounted left-right), controlled the horizontal deflection of the electron beam, while another set, oriented horizontally (mounted top-bottom), controlled the vertical deflection.

The electron beam produced by the cathode is deflected by (electromagnetic) deflecting coils.
Image via Wikipedia

To produce an image, the electron beam is manipulated to scan across the screen, one line at a time. Each pass across the screen causes it to strike phosphor dots, emitting light in a line. A variety of techniques (microdeflections, masks, and filters) are used to ensure the correct dots are struck.

Each line of the screen is laboriously scanned with this technique, about 60 times a second. This means that the screen “updates” with a refresh rate of 60 Hz.

To make this happen, a varying voltage is applied across the two sets of deflecting coils. The required pattern for the deflecting coils has to come from the television signal source; the television signal from the broadcasting station, therefore, closely resembles the pattern required by the deflecting coils. The television itself applies almost no processing to this signal! (Remember that the chips used to do this kind of processing had not been invented yet.)

These are what we call analog signals. Phonographs and early telephones also used analog signals.

Video Graphics Array (VGA)

So when the computer was first invented, and these screens were widely available, there was no need to reinvent the screen. Graphics cards (Issue 123)) simply had to figure out how to emit analog signals that would work with CRT screens.

The graphics standard for doing so is called VGA (Video Graphics Array), and was first released by IBM in 1986. An organisation, the Video Electronics Standards Association (VESA), was quickly formed in 1989, spearheaded by Nippon Electric Company (NEC), to extend this standard and allow it to support higher resolutions (up to 1080p!).

LCDs replaced CRTs

As CRTs grow larger, they ran into a few problems. CRTs were big, bulky, and heavy. The larger you made them, the longer you had to make the cathode ray tube, which made them immensely heavy!

By this point, LCD technology had been developed. Instead of using a scanning electron beam, it consisted of a backlight² behind a liquid crystal layer (hence the term liquid crystal display, LCD).

The liquid crystal layer consisted of pixels of each primary colour. Each pixel had an adjustable transparency, which depended on the voltage applied across it (high voltage = transparent, low voltage = opaque); a cluster of red, green, and blue pixels formed a single image pixel on screen. By applying different voltages across each primary-colour pixel, we can put an image together.

Digital signals

CRTs controlled the voltage plates directly to deflect an electron beam, through an analog signal. But LCDs use an internal processor to determine what voltage to apply across each liquid crystal pixel. As the technology improved, lower voltages could be used to reduce power usage. So LCDs need a different kind of signal: a digital one, consisting of the raw image data (Issue 43)).

The exodus to digital formats

As digital television became more feasible due to decreasing microprocessor and LCD screen costs, digital formats sprung up to replace VGA.

An early competitor, Digital Visual Interface (DVI), was launched by a working group convened by some computer makers (Intel, Silicon Image, Compaq, Fujitsu, HP, IBM, NEC). It was very quickly superceded by High-Definition Multimedia Interface (HDMI), an interface which implemented standards set by consumer electronics companies (Hitachi, Sanyo, Silicon Image, Sony, Technicolor, Toshiba).

HDMI and mini-HDMI connectors displayed from top to bottom. Both support the HDMI video standard
Image via DataPro

This was followed by the DisplayPort (DP) standard, developed by PC and chip manufacturers and standardised by VESA, and released in in 2006.

DisplayPort, mini-DisplayPort, and Thunderbolt Type-C connectors displayed from left to right. All three support the DisplayPort video standard
Image via DataPro

Issue summary: The VGA video format originated in the time of cathode-ray televisions (CRTs). It was superseded by HDMI, a video format standardised by consumer electronics companies. DisplayPort, on the other hand, is a video format standardised by computer display companies.

Phew. This issue is much longer than I would like; there is so much history to these things! The HDMI-vs-DisplayPort question/complaint I keep hearing is one that only made sense for me in the context of the respective industries they sprung from, and this is something I think most layfolks could definitely understand.

What I’ll be covering next

Next issue: [LMG S10] Issue 125: Analog and digital conversion

Now that I’ve laid out the key differences between VGA, HDMI, and DisplayPort, we can talk about … video adapters.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

To be pedantic, the first part of the computer to be invented was actually the keyboard, but they were non-electronic and were called typewriters then. ↩
Older LCDs used cold-cathode fluorescent lamps (CCFLs) for backlights, but today LEDs are used instead. ↩

Issue 123: Graphics cards: The Pixel Factory

2021-06-05T08:00:00+08:00

Previously: Computers are general-purpose machines that usually process integer calculations. The graphics pipeline requires more specialised hardware that can process decimal number calculations. This is why high-performance graphics usually requires a graphics card.

So why are gamers so agog over graphics cards (also known as video cards)? That’s because they do one thing really well! Unlike CPUs which often have to process an unpredictable workload, the graphics pipeline involves performing the same categories of calculations over and over again.

Graphics compute units

These calculations, which I gave an overview of in Issue 122), take in tables of numbers, crunch them mathematically, and spit out another table of numbers. Since the calculations are predictable, we don’t need very complicated hardware that enables switching instructions based on the input. We can used specialised cores—clusters of transistors that are custom-fit for the purpose, cram lots of them into a circuit board, and end up with much better performance for the graphics pipeline compared to the CPU.

These cores are organised into groups known as compute units, and graphics card companies often differentiate the lower-end and higher-end cards by the number of compute units they have, which indicate the computation rate (measured in flops, which stand for floating-point¹ operations per second). Better graphics require more computation, so more flops correlate with better graphics.

Graphics memory

Graphics cards often come with their own memory chips, based on a different technology (GDDR, standing for “graphics double data rate”, vs DDR for CPUs). Graphics memory chips (GDDR) are optimised for higher bandwidth (more gigabytes transferred per second), while CPU memory chips (DDR) are optimised for lower latency (lower time to response). These are soldered directly onto the graphics card to keep memory readily accessible by the compute units.

Power and heat

Unlike CPUs, which (for desktops) seldom draw more than 100W by themselves, graphics cards (for desktops) can draw up to 300W. Correspondingly, more of the space on graphics cards are taken up by components that try to keep this immense power under control. Voltage regulation modules help to adjust input voltages to what the compute units and memory requires.

A graphics card, here exposed without its shroud and cooler.
The graphics chip is in the middle, surrounded by graphics memory chips (smaller, black squares) close to the edges of the board. The larger protruding gray rectangles are voltage regulation modules.
Image from Gamersnexus

Other uses

The graphics card excels at carrying out predictable decimal calculations in a pipeline. Besides graphics, what else can it be used for?

In research, they have been purposed to perform calculations for simulations, which often involve processing the same calculation in bulk.

For consumer purposes, they have also been used for video decoding (decompressing videos for playback), and lately even video encoding (compressing videos to a smaller size). These also involve performing the same types of calculations in a predictable pipeline.

Most recently, they are being used for machine learning (also known as “artificial intelligence”) models, again because those involve predictable pipelines (Do you are see a pattern here?)

Integrated graphics

Not all computers need a full-size GPU to render graphics on screen. Intel processors, and some of the newer AMD ones, contain what is known as integrated graphics. That means that these CPUs have a graphics processor unit (GPU) integrated into the same chip. This integrated GPU provides basic capabilities which allow typical users to use a computer, and even run low-end graphics programs, without needing to buy a higher-end graphics card.

Integrated GPUs do not have their own memory. They share computer memory with the CPU. That means that they reserve a (configurable) amount of computer memory for graphics use (typically up to 128 MB), and video card drivers (Issue 120)) enable the GPU to use more system memory (up to 50% for Intel integrated graphics). This means that integrated graphics use slower memory compared to dedicated graphics cards that have their own memory.

Issue summary: Graphics cards contain lots of tiny cores that are much better at performing the same calculation for lots of decimal numbers. These cores are organised into compute units; a graphics card with more compute units can perform more calculations every second. Graphics cards have their own onboard memory, separate from the CPU. GPU memory is different from computer memory; it is configured for much higher data throughput. Integrated graphics are GPUs that are integrated into a CPU chip; these do not have their own onboard memory, and share memory with the CPU.

What I’ll be covering next

Next issue: [LMG S10] Issue 124: Video formats

I haven’t talked about the last part, because this issue is long enough already, and because the next part can fill a whole issue by itself. Next up, the last stage: actually displaying pixels on a screen.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
~~a video card? [Issue 113]~~

“Floating-point” is a fancy term for decimal, so-called because the way they are represented allows the decimal point to be placed differently, unlike with integers. ↩

Issue 122: The great flattening

2021-05-29T08:00:00+08:00

Previously: 3D models are represented with vertices (points), edges (line segments between points), and faces in a computer. Images known as textures can be mapped to faces to give the impression of detail.

Having a model represented in a computer as a large set of numbers is cool, but nobody does 3D modelling like that. We need something to look at! We need a way to convert our model into a flat picture, ideally displayed on our monitors. And this conversion process needs to be fast enough that as we rotate or change the view of our model, the computer can keep up, displaying the changes in real-time.

This process is called rendering.

A pipeline for rendering pictures

During rendering, data from the modelling program is processed in a series of steps; these steps together are known as the graphics pipeline. If you’ve done perspective drawing or tried to figure out how to create clever camera tricks, you already have a sense of what the computer needs to do here.

The 3D cube on screen looks different depending on where our eye is, and which direction it is looking. Our distance to the cube affects how much distortion the view undergoes. Lighting also affects how the cube appears, by making shaded areas appear darker, and lit areas appear brighter.

Finally, this 3D model needs to be “distorted” into a 2D view so it can be displayed on a screen. (We don’t have 3D holo-projectors yet … 😢)

Lastly, since our screens display images as a grid of pixels, we need to figure out the best way to convert the distorted 2D view into a pixel grid. This part of the pipeline is known as rasterisation. Here, the computer figures out what colour each pixel should be, based on which part of the model actually gets projected here. Hidden parts do not need to be rasterised, and neither do parts of the model which are outside the screen.

All these steps take place in the graphics pipeline.

The hardware

The pipeline used to be carried out by the CPU (Issue 53)), but that isn’t ideal. The CPU’s hardware is optimised for general-purpose processing: keeping track of integers (i.e. natural numbers like 1, 2, 3, …), adding or subtracting them, and resetting them. It has many more computational units that carry out this calculation.

Graphics processing, on the other hand, requires a different kind of calculation. The position of vertices do not fit nicely into integers; we have to carry this out using decimal numbers (1.46776, 2.58704, –3.57514, …). The CPU does not encounter these often, and therefore does not have many of these computational units.

For graphics, we need a different kind of processor, one that is jam-packed with decimal computational units. We need a graphics card.

Issue summary: Computers are general-purpose machines that usually process integer calculations. The graphics pipeline requires more specialised hardware that can process decimal number calculations. This is why high-performance graphics usually requires a graphics card.

What I’ll be covering next

Next issue: [LMG S10] Issue 123: Graphics cards: The Pixel Factory

So what does a graphics card do, just pop out pixels like nobody’s business?

Yep! Next week, a quick intro to graphics cards, and why they are so amazing. And, because this is recent news, some coverage on how the M1 differs from most laptops in the way it manages the CPU and GPU (graphics processing unit).

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a video card? [Issue 113]

Issue 121: In graphic detail

2021-05-22T08:00:00+08:00

Previously: Driver files provide information about the driver, and instructions on how to receive information from the device, and encode information to be passed to the device. The operating system may come with generic driver files for the device, but custom driver files might provide better performance or additional features.

This issue, let’s start from scratch with graphics: how does a machine that only processes 1s and 0s work with graphics? For starters, let’s think: what can we represent about graphics if we only have numbers?

Representing graphics using numbers

Numbers can be used to represent points (a vertex, in graphics-speak). In 3D space, a point can be represented with 3 numbers, usually written in math as (x, y, z).

A line (edge, in graphics-speak) can be represented as a segment between two points. So we can represent straight-line segments using two points. Curved-line segments are trickier, but for now a lazy way to represent them is just … using a series of points 😛 like connect-the-dots puzzles!

Surfaces are … a bit trickier. What is the minimum number of points we need to represent a flat surface? Turns out the answer is 3 points: if we pick any 3 points that are not along the same line, we can join them along their edges and they form a triangle, which is a flat surface!

In graphics-speak, we call these surfaces faces. Four points which form a flat four-pointed face are called quads, and we can do the same for polygons with more points too. But most graphics hardware just deals with triangles and quads, because everything else can be represented using triangles and quads anyway.

Making 3D models

3D modelling software (also referred to as computer-aided design software, or CAD software) helps us with the process of creating vertices (plural for vertex), edges, and faces. Any object we model digitally is just a collection of vertices, edges, and faces: we call such collections a mesh.

Textures

With just meshes, we quickly run into the limits of what can be represented. For simplicity of calculation, each face can only have one colour. To make really detailed and realistic models, we need very finely detailed meshes. These are problematic because a lot of calculation is needed to make these models appear on screen; the more faces it has, the more calculation is needed!

One way to reduce the number of faces in the mesh while still creating a decent model is to use images on the face. (I explained how images are represented in computers in Issue 43)) We will need additional information to describe the scale and rotation of the image on the face, but at least we can use the same image across multiple faces if necessary. Instead of having to model a hundred thousand blades of grass, I could just model a few stalks, and use a grassy texture to complete the impression.

Scene modelling

Just having models is pretty boring. We will usually be putting multiple models in a scene. Besides models, a scene needs to have lighting, a way to model the sky as the background, and other niceties.

We can use numbers to describe the position, luminosity (i.e. brightness), hue (i.e. colour), and other properties of lighting in our scene.

Issue summary: 3D models are represented with vertices (points), edges (line segments between points), and faces in a computer. Images known as textures can be mapped to faces to give the impression of detail.

Great, we have a way to use numbers to describe a 3D model; that’s nice progress. But how do we turn those numbers into a picture on screen?

What I’ll be covering next

Next issue: [LMG S10] Issue 122: The great flattening

Numbers, numbers, and more numbers … isn’t the CPU great at numbers? Why do gamers and 3D modellers have the hots for graphics cards instead? Stay tuned next week!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a video card? [Issue 113]

Issue 120: Drivers, the glue between hardware and firmware

2021-05-15T08:00:00+08:00

Previously: Solid-state disks are much faster than hard disks because they have no moving parts, so no time is wasted waiting for parts to get into the right position. However, they are more expensive than hard disk drives.

So this issue, we finally get to a question that most folks would have asked at some point after buying a new device:

“What is a (device) driver and why do I need one?”

This issue is going to focus on USB devices, since that is by far the largest category of devices that people use. But driver files are needed for all hardware, including disks, monitors, hardware timers, controller chips, …

USB Devices

The USB specification was first released in 1996, and today it contains 21 categories of devices, each containing even more subcategories.

Each category of device has its own protocol for sending information to the computer, and receiving information from the computer.

I don’t think I am over-exaggerating things to say that there are an uncountable number of USB devices in the world today. For this reason, it makes no sense to complain to the USB Implementers’ Forum (USB-IF), which is responsible for designing the USB specification. Device manufacturers themselves have to be the one responsible for making their device work with our computers.

They do so by providing driver files with their devices. In a not-so-distant past, every device you bought came with drivers on a CD-ROM or DVD-ROM, which you had to install before use. Today, it is more likely that you’ll download these drivers from the manufacturer’s website. If this is a widely used device (e.g. monitors or input devices like keyboards or mice), it might even make its drivers (and updates to those drivers) available through Windows Update.

What is a driver file?

Simply put, a driver file tells the computer:

information about the device (its name, category, and available features)
how to interpret signals coming from the device (through the USB cable),
how to encode signals to be sent to the device so that it can understand them.

The USB-IF maintains a database of vendors and their products. Companies that wish to have their products recognised should get a vendor ID through the USB-IF; this also allows them to use the USB-IF logo (for USB-certified™) on their product packaging if the product passes certification.

When a device is inserted, it passes information including its vendor ID and product ID to the computer. Each driver file also includes the vendor ID and product ID it is meant for use with. This allows the computer to verify that the correct driver is installed for the device.

If you insert a device and Windows says it does not recognise it, you can be sure the problem has something to do with the device file (if the device is otherwise working properly).

On your computer, you can view a list of your devices and inspect their driver details using Device Manager.

Device Manager in Windows 10

Generic drivers

When you first set up your computer, it is not going to have driver files for devices that are already connected. How are you supposed to use your keyboard, mouse, and monitor, among other things?

The major operating systems come with generic driver files pre-packaged. These generic driver files support a class of devices (e.g. input devices, pointing devices, display devices, …) for basic features only. Manufacturers that wish their devices to work upon plugging in (this capability is also known as Plug-and-Play) should aim to support these basic features through the use of generic drivers.

They can then encourage the customer to install custom drivers which may improve the device performance (e.g. for wifi or LAN networking devices), or provide additional features (e.g. storing mouse button configuration settings, or programmable mouse/keyboard buttons).

Issue summary: Driver files provide information about the driver, and instructions on how to receive information from the device, and encode information to be passed to the device. The operating system may come with generic driver files for the device, but custom driver files might provide better performance or additional features.

Driver files were one of those mysterious things that made perfect sense once I took the time to think about why they are needed. I hope this issue does that for you too.

What I’ll be covering next

Next issue: [LMG S10] Issue 121: In graphic detail

One part I’ve always wanted to tackle with this season is explaining the process of getting graphics onto your monitor screen. The explanations I’ve found online are either too vague, trying to paper over the details with metaphors, or far too technical, going into exhaustive detail about the graphics rendering pipeline.

I want to strike more of a middle ground, to help those who have read the former and are trying to bridge the gap to the latter. The next two issues are going to talk about the processes involved in going from model (the computer’s view of things) to graphics (the final rendered display).

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
~~a driver file and why do I need one? [Issue 98]~~
a video card? [Issue 113]

Issue 119: Solid-state disks, an upgrade from hard disks

2021-05-08T08:00:00+08:00

Previously: A hard disk consists of a read arm, and a set of magnetic platters which store data. To read or write data, the read arm must move to the appropriate track of the rotating platter, and detect the magnetic field (for reading), or attempt to magnetise the domains on the platter (for writing). Operations that require the read arm to access different parts of the magnetic platters intermittently result in slower read speeds.

Hard disks were, for a long time, the best affordable storage technology we had. But then something better came along: the solid state disk¹ (SSD).

What is a solid state disk?

It is, to put it simply, a small circuit board with lots of chips, that plugs into your laptop.

It has multiple storage chips, quite similar to the ones in your thumb drives/flash drives but much faster, and one controller chip to rule em all.

A solid state disk, plugged into a laptop slot (but not secured)

Hmm … like computer memory?

Computer memory also consists of chips on a circuit board, right? But memory gets wiped after the computer loses power … but that doesn’t happen with SSDs; why?

Computer memory sticks, inserted into the memory slot of a computer motherboard

Computer memory uses capacitors, which are like micro-sized batteries. They hold a charge when powered, and store either a 1 or 0 state by being charged or uncharged, respectively.

Solid state storage, on the other hand, uses gated transistors instead of capacitors. They lock a bunch of electrons behind a gate to fill a storage cell (storing a 1), and empty it by forcing the electrons out (strong a 0). This is slower than charging/discharging a capacitor, but hey you don’t lose your data when the power goes out!

Solid state disks have no moving parts

As you can see, there are no read arms or magnetic platters involved. No waiting for a platter to spin up, no waiting for a read arm to move back and forth … access is almost instantaneous².

This is a big deal when reading data from multiple locations; the response is hundreds of times quicker than a hard disk!

What’s the drawback?

The biggest drawback for now is of course price. Solid state disks cost much more, per GB, than a hard disk drive (HDD).

Another potential drawback is that SSDs have a limited lifespan. This lifespan is not measured in months or years, but in the amount of data written.

You see, each time electrons are forced through the gate (called a program-erase cycle), the gate gets weaker. Do it tens of thousands of times, and eventually the gate gets too weak to hold electrons. A budget SSD typically has a lifespan of 30,000 program-erase cycles. For a 256GB SSD, that’s 7.68 million GBs of data-writing before it fails!

This was a big concern for early SSDs, which had program-erase cycles in the low thousands. Today, with technology improving the program-erase cycle lifespan of SSDs, most users would not come close to reaching the end-of-life of an SSD.

Issue summary: Solid-state disks are much faster than hard disks because they have no moving parts, so no time is wasted waiting for parts to get into the right position. However, they are more expensive than hard disk drives.

I managed to write this issue without once mentioning write amplification or NAND, and I consider that a significant personal achievement.

What I’ll be covering next

Next issue: [LMG S10] Issue 120: Drivers, the glue between hardware and firmware

So … hard disk drives and solid state disks work pretty differently under the hood, yet when you plug them into a computer they just … work? How does that happen?

For that matter, how do devices just work when we plug them into a computer?

This next week, when I finally talk about … drivers!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

The name has to do with the fact that its working principles are based on solid state physics, and not on the solidity of the disk itself. ↩
Feels almost instantaneous to humans … but this still takes a few microseconds, which is considered slow for a computer! ↩

Issue 118: When I run two file-copy processes at the same time, why are they much slower?

2021-05-01T08:00:00+08:00

Previously: Operating systems use a page file on the storage disk as a complement to physical memory. This allows OSes to behave more performantly than they would if they did not have a page file. Data that is rarely accessed is moved to the pagefile (“paged out”), and can be paged in when it is needed later, albeit with a performance hit.

It’s kind of funny the moment you realise how much of what an operating system (OS) does is try to mitigate the slowness of hard disks. But why are they so slow? There’s an actual explanation for that, and along the way it will help us understand a few things about why OSes sometimes behave funny.

What is a hard disk?

A hard disk is a magnetic platter that stores data. This platter contains trillions of magnetic atoms; yep, they’re atom-sized magnets! These atoms are grouped into clusters called magnetic domains; they are so tiny that 500 of these placed in a straight line would stretch the diameter of a human hair! These domains can align themselves in one of two different ways: let’s call them “up” and “down”. Each bit (as in, 8 bits = 1 byte) is stored as the alignment of a magnetic domain: up represents 1, and down represents 0.

To read data, all you need to do is move a tiny electromagnet over each domain, and use it to see which way the domain is aligned. This can be done by measuring the current flowing through the electromagnet, a detailed explanation of which is beyond the scope of this newsletter (perhaps in a future newsletter titled “Layman’s Guide to Physics” or something).

To write data, you pass a current through the electromagnet to magnetise the domain below it whichever way you want; just spin the platter and keep changing the current to write a series of 1s or 0s.

Put 3-5 platters together, attach the electromagnet to a moving arm (called the read arm), control the whole thing with some microchips, and you have a hard disk.

A picture of an opened hard disk, showing the read arms and magnetic platters

Characteristics of a hard disk

A hard disk spins at a constant speed, because it is much more complicated to figure out how to read/write stuff when the speed can vary. The larger hard disks, which go into desktops and use 3.5-inch platters, spin at 7200rpm, while the smaller hard disks, which go into laptops and use 2.5-inch platters, spin at 5400rpm.

Data stored near the circumference of a magnetic platter is faster to read than data stored near its centre. For this reason, OSes that are installed on hard disks usually attempt to partition the storage space so that the operating system is stored on domains closer to the circumference.

The read arm moves really close to the platters during disk operation! The gap between them (called the head gap or flying height) is half the thickness of a human hair. This is why you do not want to drop hard disks while they are in operation; the slightest movement of the read arm towards the magnetic platter causes it to gouge the platter surface and damage it permanently: this is called a head crash.

Read and write operations

The hard disk is ultimately a mechanical device; each operation involves moving parts.

Reading from or writing to a domain involves:

Spinning up the platter (if it isn’t already spinning)
Moving the read arm to the correct position
Measuring or inducing a current

This means that each time the hard disk needs to access data from a different region of the disk, there is significant lag time (~5ms; see Issue 57)). This is the time needed for all those movement described above. It is thus advantageous to try to put all the data you need in contiguous domains¹, to minimise read arm movement.

So when an OS tries to perform two data operations at the same time (instead of sequentially), the read arm has to move a lot more to access data from different regions. And this is why, if you have multiple data operations to perform on a hard disk, you should try to do them sequentially instead of simultaneously!

Issue summary: A hard disk consists of a read arm, and a set of magnetic platters which store data. To read or write data, the read arm must move to the appropriate track of the rotating platter, and detect the magnetic field (for reading), or attempt to magnetise the domains on the platter (for writing). Operations that require the read arm to access different parts of the magnetic platters intermittently result in slower read speeds.

This was something that took me a while to figure out; first, to notice that it was actually happening, and second, to read up enough about hard drives and take some (nonworking) ones apart to understand. And now I can explain it to you :)

What I’ll be covering next

Next issue: [LMG S10] Issue 119: Solid-state disks, an upgrade from hard disks

Next, we look at the technology that has been steadily replacing hard disks as system disks in most desktops and laptops. These have managed to eliminate the latency due to moving parts, and enable much higher read and write speeds.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

This process is what millennials might remember as defragmentation, or defragging. ↩

Issue 117: Swap space

2021-04-24T08:00:00+08:00

Previously: Hibernation mode causes the computer to store the data configuration into a hibernation file on disk. When powered up, the OS reads the data configuration from the file back into memory. This lets the system avoid having to do a full shutdown and bootup; it performs a shorter version of these two sequences instead.

In Issue 57), I laid out the transfer speeds and latencies for a few places in the computer where data can be stored:

Hard disk drive (HDD): ≈5 ms response latency, 100 MB/s transfer speed
Solid state disk (SSD): up to 0.1 ms response latency, 0.5–1+ GB/s transfer speed
Physical memory (RAM): 0.1 µs (0.0001 ms) response latency, 20GB/s transfer speed
CPU register: <1 ns (<0.000001 ms) response latency

Apart from the CPU itself, physical memory is one of the fastest places to store and retrieve data. It’s just such a pity that there’s a limited amount of it, and you really can’t get more than 32GB of it in a single computer (for future readers, this is written in early 2021). If you bought a cheaper computer that only has 8GB or only 4GB, this quickly limits the number of apps you can have open at any single moment.

What happens when you run out of memory?

Running out of memory

When a program needs more memory, it requests it from the operating system (OS), and waits for the OS to give that memory. After all, it can’t do anything else until the memory is available.

The OS, on the other hand, will give that memory if it is available. If not, it will wait for other programs to free up memory before passing that memory to the waiting program. If no program is willing to release memory …

I guess the computer just hangs 🤷

Mitigating out-of-memory problems

If memory is limited, what can we do to increase it?

As it turns out, not all parts of memory are constantly being written to or read from. Lots of it is just sitting there, waiting for that one moment when the data is needed. Kind of wasteful if out of 8GB of memory, only 2GB is actively changing in any hour.

The OS could write that mostly-static 6GB to disk and free it up for other programs, maybe? Then when the data is needed, the OS reads it back from disk before passing it to the program. Yes, this would mean a slower response time for programs that have been idle for a while, but surely better than hanging because it ran out of memory?

This is what all modern OSes do. They write unused in-memory data to a file known as the page file. In Windows, the page file is C:\pagefile.sys. When data is moved from physical memory to the page file, it is said to be paged out, and when moved from page file to physical memory, it is paged in.

In older systems or software, you may also see this page file referred to as a swap file, or swap space. On Linux, a subregion of storage space can be set aside as a swap partition.

Why is it called the page file?

Computer memory is organised into pages, each page typically being 4 KiB (notice that disk sectors are also typically 4 KiB (Issue 106)) …). The computer may have only 8GB of physical memory, but present 16GB of virtual memory (Issue 56)) to programs. Kind of like how banks only hold some liquidity but present their assets as being much more …

So the 8GB “shortfall” is actually in the pagefile, not in memory. The pagefile essentially acts like (much slower) memory! The OS reads pages from it and writes pages to it, through virtual memory accesses.

This sounds like a waste of space, can I choose not to use it?

Though most OSes are configured to use a page file by default, you can configure it to not do so, although this is ill-advised (and I won’t explain why here; this is a newsletter aimed at layfolks). Modern OSes are pretty smart at managing page files, understanding that it is much slower than physical computer memory. They have their own algorithms to decide when to move data to a pagefile, and these days they even forecast when data that was paged out will soon be needed by the program.

OSes need page space for all kinds of reasons: as a kind of “emergency space” when physical memory is full, as working space for optimising data layout in memory, etc. So that disk space is definitely not going to waste!

Issue summary: Operating systems use a page file on the storage disk as a complement to physical memory. This allows OSes to behave more performantly than they would if they did not have a page file. Data that is rarely accessed is moved to the pagefile (“paged out”), and can be paged in when it is needed later, albeit with a performance hit.

Pagefiles are pretty amazing, even though they used to be a huge pain in older systems, when they were stored on slowwww hard disks. These days, page files stored on solid state disks are pretty fast! While the performance hit of a page miss is still noticeable (like when you switch to a Chrome tab that you haven’t touched in a while), it is far from the groan-inducing wait it used to be.

What I’ll be covering next

Next issue: [LMG S10] Issue 118: When I run two file-copy processes at the same time, why are they much slower?

Next week, a new season! With operating systems much more fleshed out for you (I hope), I can finally delve into the nuts and bolts of the machine itself: the hardware.

Since this is a layman’s guide, I’m going to focus on the interesting bits, the parts that actually answer long-standing questions :)

Let’s start with a bang: sometimes you are copying a file from one flash drive to your external hard disk. And you need to copy another file from your external hard disk to the same flash drive (or even a different one). You’re scared; can the computer handle it? But this is taking so long; let’s just try it anyway.

And you notice the speed suddenly plummets. Why does that happen? Even with 117 issues of Layman’s Guide written, I don’t have enough background laid down to explain this. So let’s delve into the hard disk in the first issue of Season 10.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

Issue 116: Hibernation

2021-04-17T08:00:00+08:00

Previously: When you shut a computer down, it sends an exit signal to all running programs to get them to do their exit routine. This process can sometimes take a long time. To preserve the data configuration in memory while minimising power draw, a computer can go into standby mode: all hardware except the memory gets powered down, until the computer is woken up from standby.

Okay, so standby/sleep mode lets you keep the data configuration in memory by putting the computer into a sort of low-power mode, where only the memory still has power while the rest of the computer is shut down.

If you’re not just going for a short jaunt; you’re maybe getting on a flight and don’t have the laptop charger with you and want your laptop to still be in the same configuration (windows open, Spotify playing, etc) without drawing any power … is that possible?

By now, you know the answer is yes! You just put the computer in hibernation mode!

Hibernation

How is it possible to preserve data configuration in memory without any power at all? Any time you notice magic like this happening, you should immediately suspect that data storage is involved somehow.

When you put a computer into hibernate, Windows dumps the contents of computer memory into a hibernation file on your storage disk (check if you have C:\hiberfil.sys on your computer disk; that’s the one), then shuts down your computer. Yup, it’s a shutdown, except the programs don’t have to exit (because remember, the data configuration is already preserved). So this is much faster than a normal shutdown, but not as fast as standby which involves no shutdown at all.

When you press the power button again to bring the system back, the computer still has to go through the bootup sequence (memory was completely wiped, remember). But at the point where control is handed over to the OS (Issue 112)), the OS does not carry out its usual preparations. Instead, it reads the contents of the hibernation file into memory, then acts as though the system is already booted up! You just log in, and everything is the way it was just before hibernation (keeping fingers crossed …)

Issue summary: Hibernation mode causes the computer to store the data configuration into a hibernation file on disk. When powered up, the OS reads the data configuration from the file back into memory. This lets the system avoid having to do a full shutdown and bootup; it performs a shorter version of these two sequences instead.

Okay, so that’s one more mystery explained.

One thing I didn’t think was critical to explain above, but which eventually crops up for most people, is that the hibernation file takes up a lot of space! It needs to contain all the data in memory, so if you have a laptop with 16GB of memory, your hibernation file will be around that size too. If you’re wondering why Windows is taking up so much space, this is usually one of those reasons.

If you found C:\hiberfil.sys, you might have also found C:\pagefile.sys, another huge file, though not as huge as the hibernation file. What is this one used for? I’ll explain that in the season finale :)

What I’ll be covering next

Next issue: [LMG S9] Issue 117: Swap space

Sometimes, you get a laptop with 2 or 4 GB of memory, and you wonder how it’s able to run Chrome with 46 tabs open; shouldn’t you have run out of memory by now? In the next issue, I’ll explain this trick that lets computers pretend they have more memory!

I wanted to write this issue earlier in the season, but it just never presented an opportune moment. So let’s wrap up with it.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

Issue 115: Shutdown & standby

2021-04-10T08:00:00+08:00

Previously: Embedded operating systems are unlike user operating systems. They are designed to run the software needed for an appliance’s operation, and are not meant to be used by users directly. Since they are considered somewhere between software and hardware, they are usually referred to as firmware.

In Issue 112), I described the bootup process, and what a process it is! We have sayings in the English language that talk about things taking years to build up, but only seconds to destroy, and that’s certainly the case here. It takes so many steps to get data into computer memory in a way that makes it useable for us humans, and the mere press of a power switch wipes that data configuration once computer memory loses power.

This is obviously undesirable; your work in progress might not have been saved, data that is being transferred to disks might not be written properly and end up getting corrupted (Issue 109)), and you should never switch off a computer at the power socket this way!

Forced shutdowns

But sometimes, the computer gets itself so stuck that there’s just no other way to do it. On a laptop or smartphone, which has a built-in battery and cant be shut off immediately at the power socket, this is usually done by holding the power button for more than 10 seconds. This will literally cut off power to the mainboard immediately.

What does a proper shutdown do differently?

Shutdown

To avoid the Problems described above, we need to give the computer time to wind down. Most programs have a proper exit routine (y’know, the red dot on macOS or the red cross in Windows). This gets the program to write any last bits of data, release any file or database locks (Issue 82)), quit, and then release its resources back to the OS. When shutting down, all running programs and services have this exit routine invoked, and they usually quit pretty quickly.

The exceptions to this usually have to do with disk operations, the slowest of operations on a computer. If a program has lots of data to save to disk, or is waiting for a disk operation to complete, it will remain open while attempting to exit, and prevent the computer from shutting down. If you’re wondering why a folder seems to be stopping the computer from shutting down, this is usually the reason!

“Wait, I’m not copying anything …” Yep, but File Explorer in Windows does more than just let you copy, paste, and delete. It also generates image and video thumbnails for you, create file indexes to speed up searching, and so on. And these are disk operations too!

On rare occasions, these disk operations can mean your computer takes up to an hour to shutdown! My best advice, if you’re not in a hurry, is to just leave it alone and let it do its thing. Much safer than having to scan for errors later.

Standby/Sleep

On older computers, shutting down and booting up took more than 10 seconds, which is quite a killjoy when you are getting ready for a presentation and just need to stuff your laptop into a bag, get up to the conference room, and switch it on again. That feeling when you’re looking at a room full of people looking back at you, waiting for your laptop to boot up … awkwardddddd

Some of us just keep our laptops open (because closing the lid causes it to shut down, but you can change that in the settings by the way), and hold it that way while running around. And the purpose of that is just to keep the computer memory powered so we don’t lose the precious data configuration of the computer memory!

What if there was a way to keep power going to computer memory, but have the rest of the computer system powered off, to save on power? That would be perfect for hopping from coffee joint to coffee joint!

This is exactly what happens when you put a computer into standby (some systems also refer to this as sleep). The computer processor goes into a power state where only the computer memory remains powered (Issue 112)). Everything else is powered off—screen, network hardware, storage disks, … until you power the system on again. The memory configuration remains intact, and everything is exactly the way you left it at the moment you put the laptop into standby. Best of all, you didn’t have to go through the long and tedious shutdown-bootup sequence again!

Issue summary: When you shut a computer down, it sends an exit signal to all running programs to get them to do their exit routine. This process can sometimes take a long time. To preserve the data configuration in memory while minimising power draw, a computer can go into standby mode: all hardware except the memory gets powered down, until the computer is woken up from standby.

Lots of people don’t know about computer sleep, and resort to workarounds to keep their computer running while they have to move around. But now you know.

What I’ll be covering next

Next issue: [LMG S9] Issue 116: Hibernation

Sleep and hibernate; what’s the difference? As it turns out, a lot!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

Issue 114: In the beginning (firmware)

2021-04-03T08:00:00+08:00

Previously: If you can’t get to a BIOS screen, it is likely a hardware problem and has to be solved by a technician. If you can’t get the the OS loading screen, it’s a bootloader problem and needs to be solved with more geekery. If something goes wrong with OS loading, and fails to fix itself on subsequent reboots, it’s probably time for a system refresh or reinstall.

In Issue 111), I described the bootup process. These days, anything more complex than a calculator typically has an operating system (OS) running it. Your TV set-top box, your router, even your (high-end) printer and your car have an operating system running them these days!

Embedded operating systems

That OS is quite different from the one running on your laptop. It’s not meant for users to interact with directly. It primarily runs services which the appliance provides; you’re not allowed to install your own apps on it unless it has a manufacturer-approved app store or you are hacking it.

We call these embedded OSes.

Hardware and Software

We often see the term “hardware” used to refer to physical tools and implements that are needed for a job; they’re “hard” in the sense that they often come fixed into a certain configuration, are not easily changeable, and remain that way for the rest of their lifetime.

“Software”, on the other hand, is … fuzzier, more malleable, less clearly defined. Almost anything goes for software! This is usually the user-facing side of a system.

Firmware

We usually consider laptop OSes to be software, because it has an installation process, you get small regular updates on it, you can add and remove features … if it looks like software, talks like software, and behaves like software, you might as well call it software.

Embedded OSes are kinda different. They come preinstalled; if they don’t, you won’t know how to install it yourself on a brand new, unprogrammed appliance. You don’t get small hotfixes and updates, only big version changes. And you can’t control what features it has.

If software sits at the layer nearest the user, and hardware sits at the layer nearest the machine, then I guess we’ll call the middle layer firmware.

So when you see the word firmware used in an interface, I guess you can just think of it as the “OS of the device” :)

Issue summary: Embedded operating systems are unlike user operating systems. They are designed to run the software needed for an appliance’s operation, and are not meant to be used by users directly. Since they are considered somewhere between software and hardware, they are usually referred to as firmware.

Okay, that’s one thing out of the way. Now we can move on to shutdowns, sleeps, and hibernates!

What I’ll be covering next

Next issue: [LMG S9] Issue 115: Shutdown & standby

I often hear a lot of confusion about this in my workplaces, and from folks too. This is often the cause of many problems-that-happen-when-I-press-the-power-button. Let’s see what we can uncover.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
~~firmware? [Issue 34]~~
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

Issue 113: A computer’s existential crisis (boot failure)

2021-03-27T08:00:00+08:00

Previously: When a computer is booted up, it runs the BIOS from a chip on the motherboard. The chip checks that core parts are present, checks for a storage disk containing a bootloader, loads it into memory, and hands over control. The bootloader loads the operating system kernel. The operating system kernel then does whatever it needs to do to get the system ready for use.

In this issue, doom and gloom.

Nothing happens

Sometimes, you press the power button and nothing happens. Absolute stillness.

This is not a software problem; something is wrong with the computer’s power source (the power supply unit of a desktop, or the battery/charging system of a laptop), and you have to bring it to a technician.

Something happens, and then nothing happens

If you hear fans spinning, the power source works. But if nothing appears on screen after that, and then the system seems to reboot again, it may mean the BIOS has failed; if you attempted to upgrade the BIOS recently, that might be a possible cause. A friendly geek might be able to fix this for you.

If you hear fans spinning for a long time, and things seem to be happening, but nothing is showing on screen, your video card (future issue, I promise) may be borked. Can’t do anything other than sending it in to a technician.

If you have attempted a driver upgrade recently (again, future issue), that might be a possible cause.

When the BIOS works, you should see the startup screen—always a comforting sight.

Bootloader issues

Bootloaders are reliable as anything, so seldom anything goes wrong here.

The most common bootloader problem is that it does not detect any OSes to boot—may be the case the first time you attempt dual-boot, or get advanced enough to start playing with bootloaders (Windows and Mac don’t let you do that, so these are almost always Linux users). This is … something that can only be solved with more geekery, and is beyond the scope of this newsletter. Sorry 😬

OS booting problems

The OS, being the most complex part of the bootup process, is often where things go wrong.

If nothing has changed in the system since the last bootup, usually things go smoothly. So it’s almost always changes to the system that cause this.

Sometimes you installed a program that dabbles in the system innards—a system-optimising program maybe—and it attempts to change some settings but without propagating them to all the required places. This leaves the system in an inconsistent state.

Sometimes you attempted a Windows Upgrade¹, possibly a major one, and it did not go as hoped. Windows usually will attempt to undo the damage it wrought, and may or may not succeed in leaving your system in a bootable state.

Sometimes some system files get corrupted—remember that this can happen if you force a computer to power off before it has finished its shutdown properly. If the OS can’t find that file, it sometimes initiates a search for it in the system disk, which can take … really long.

In my opinion, the best way to resolve these issues, especially if they are recurring, is with a system refresh or reinstall.

Issue summary: If you can’t get to a BIOS screen, it is likely a hardware problem and has to be solved by a technician. If you can’t get the OS loading screen, it’s a bootloader problem and needs to be solved with more geekery. If something goes wrong with OS loading, and fails to fix itself on subsequent reboots, it’s probably time for a system refresh or reinstall.

I told you, gloom and doom. Modern OSes are getting more sensible at not requiring human intervention, so when they fail to resolve their own issues, there is often little a layperson can do on their own.

The best thing you can do for yourself and your data is to ensure you have a copy of your critical data elsewhere. Always have backups of things you are working on.

What I’ll be covering next

Next issue: [LMG S9] Issue 114: In the beginning (firmware)

I tried to find a place to talk about firmware in this and the previous issue, but couldn’t do so without taking the story in weird directions. So we’ll take a short detour to do that next issue, before I talk about shutdowns, sleeps, and hibernates.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
a video card? [Issue 113]

These days it’s more like you attempted to stop a Windows Upgrade, but did not manage to do so or it is unskippable. ↩

Issue 112: Bootstrapping into existence (bootup)

2021-03-20T08:00:00+08:00

Previously: Moving a file (within the same disk region) merely updates its file table record, and this happens really quickly. Copying a file, or moving it to a different disk/region, involves copying the contents and then updating the file table record, and is considerably slower. Deleting a file only requires that its file table record be removed, and is a very fast operation (if it does not involve the Recycle Bin).

We humans wake up in the morning and magically there is information in our heads. We open our eyes, trust that all our limbs and body parts are there, and we just get up and make coffee.

Computers, they are different in that regard. After they light up the power LED, there is just … nothing. Computer memory needs power to hold its information¹, so once you really power down your computer, all data in memory is gone. Your operating system (OS), your open programs, your file data … all that now resides only in your hard disk. Whatever was not written to hard disk would have been lost by now (Issue 108)).

Before anything can be done, the computer needs to read all that information back in to memory. But there’s nothing in memory … you need a program loaded in memory to do that, but that program is still residing in file storage—it’s a Catch-22! The OS has to bootstrap itself into existence first, in a process known as the bootup.

Bootup

The way to sidestep the bootup paradox is to have a very tiny chip perform the bootstrap. This tiny chip is already wired up in a particular way from birth; it contains a program called the BIOS (short for Basic Input/Output System)², which does only three things:

Check that the core parts are present: CPU, memory, video card, keyboard, storage disk.
At the BIOS screen³, you can press a button to configure BIOS settings.
The BIOS checks the available disks to see if they are “bootable”, i.e. contain information in a specific part of the disk that contains a bootloader. In modern BIOSes, you can override this process by specifying a valid disk containing the bootloader.
The BIOS loads the bootloader into memory, and runs it.

The bootloader

The BIOS is not optimised—it’s only a basic system after all—and would take forever to try to get the OS running on its own. So, with its limited resources, the BIOS calls in bigger guns: the bootloader.

The bootloader is not the OS! It has only one job: to load the OS during the boot process, and carry out whatever is necessary before that.

Usually, the first thing that needs to be loaded is the filesystem (Issue 106)). Without that, no program will know how to read in data from the storage disks! At this point, if the bootloader detects disk errors or uncompleted operations in the journal (Issue 110)), it may attempt to scan for errors or complete those operations before proceeding with the rest of the bootup process.

Some systems may contain multiple OSes: Mac users may want to run Windows using Parallels Desktop, Windows users may want to dabble in Linux, and many Linux users dual-boot Windows as well. The bootloader, with the help of the filesystem, detects other operating systems on the storage disks, and offers the user a choice of which one to boot. If there is only one OS found, this step might be skipped.

Once an operating system is selected, the bootloader loads the OS kernel, which contains its core instructions, and hands control over. The bootup sequence is not yet complete, but the bootloader has completed its job.

The operating system startup

The operating system is not ready to accept user input at this point yet. It still has to mount other storage disks and their subregions, check that other hardware is available and working properly (such as hardware timers, which are very important for an operating system), start up various necessary services, start up programs which asked to be started up when the system boots (like the annoying Adobe Updater, or your sound card drivers and utility), etc.

Finally, when it is ready, it displays the login screen.

Issue summary: When a computer is booted up, it runs the BIOS from a chip on the motherboard. The chip checks that core parts are present, checks for a storage disk containing a bootloader, loads it into memory, and hands over control. The bootloader loads the operating system kernel. The operating system kernel then does whatever it needs to do to get the system ready for use.

This a prelude to the next issue, which is where I attempt to explain, as simply as possible, the usual things that go wrong in the bootup process.

What I’ll be covering next

Next issue: [LMG S9] Issue 113: A computer’s existential crisis (boot failure)

We’ve all been there. The computer doesn’t get to the login screen. We are shocked that our daily driver, our churner of spreadsheets, can do this to us. Worse, it has left us in an environment we are wholly unfamiliar with: a blinking cursor in a sea of black, cryptic text explaining nothing, and the unbearable pressure of a system silently begging us: fix me.

Sometime in the future: What is:

~~booting up? [Issue 15]~~
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]

Memory comprises lots of capacitors wired together, and capacitors slowly leak charge when they are not powered. Memory chips need periodic “refreshing” every few milliseconds to avoid losing data due to this charge leakage. ↩
Modern BIOSes are no longer stored permanently on a chip, but written to flash storage (similar to a thumbdrive) that is soldered directly to the motherboard. ↩
Today, this process is hidden by a branded startup screen on machines that come with Windows pre-installed. You can usually disable that screen if you like to see this bootstrap process happening. ↩

Issue 111: Copying, moving, and deleting files

2021-03-13T08:00:00+08:00

Previously: Filesystem journals are a record of changes made to the disk, so as to enable those changes to be rolled back, or to be completed properly in case of sudden interruption.

Okay, that’s enough of a look at the nitty-gritty of moving data around. In this issue, let’s turn our attention to the file table, where interesting things happen as well.

What happens when we copy a file?

A copy of the data is made, and moved to an available location (series of sectors) on the disk, and a new file table record is created. No magic yet.

What happens when we move a file?

Okay, this is slightly more complex, but not difficult.

First of all, if we are moving the data within the same disk … do we really need to move the data? In layman terms, when we “move” a file, we are not actually changing its physical location in the disk. We are basically updating a file so that its stated location, which used to be something like C:\Users\myusername\Desktop\video.mp4, will become C:\Users\myusername\Videos\video.mp4. This is information about the data, and is not stored with the data—it is actually stored in the file table!

So when we “move” a file within the same disk region (i.e. within the warehouse, in our earlier analogy), we are actually just updating the file table record, without copying or writing any of its contents. This is why it can seem so amazingly fast to “move” a multi-gigabyte file.

But if we are “moving” the data to a different disk, or to a different region on the same disk (e.g. moving something from the C:\ region to the D:\ region), it is like moving cargo from one warehouse to another. Each warehouse keeps its own file table, so we can’t merely update the file table and keep the contents on the same pallet on the rack. The cargo needs to be moved to a different warehouse, and the file table record copied over as well. For data, this means the contents are copied to the new location, the file table is updated, and if both happen without error, the original contents and file table record are deleted.

What happens when we delete a file?

Ah, fun times. Okay, so I suppose most of you keep your computers on the default setting, where pressing the Delete key doesn’t actually delete a file irretrievably, but simply puts it in the Recycle Bin.

And when you are deleting a lot of files … you notice this can take a long time. Obviously the contents are still around (so that you can still restore them), but the file table records are now hidden from the user, so that they appear to be deleted. If you are deleting 100,000 files, that is 100,000 different file table records to update.

You can, of course, change this setting (and I won’t teach you how to do that here, but it’s just a Google search away). If you disable Recycle Bin, or use the Shift+Delete hotkey combination to force a permanent deletion, something different happens instead. Intuitively, it seems like the OS should actually clear out the (digital) space previously occupied by the file contents. But this offers no advantage; to change a 1 to a 0, you have to write a 0 to it; there is no erasing in the world of storage!

The faster way is to simply tell the filesystem that this file no longer exists, and the space it used to take up can now be overwritten. And that is done … by simply deleting the file table record. This is a simpler operation than updating, which is why permanent deletes are so much faster than Recycle-Bin deletes.

This is also why, if you accidentally permanently deleted a file, you still have a chance of recovering it with file recovery software! The recovery software ignores the file table, and instead scans the entire disk for its actual contents, trying to see if there are any valid fragments of files left. If the space has not been overwritten by anything else, you just might be lucky enough to recover its contents. The chances of this decrease as you use the disk and write over parts of the disk which contained the file data, so this has to be done as soon as possible.

Issue summary: Moving a file (within the same disk region) merely updates its file table record, and this happens really quickly. Copying a file, or moving it to a different disk/region, involves copying the contents and then updating the file table record, and is considerably slower. Deleting a file only requires that its file table record be removed, and is a very fast operation (if it does not involve the Recycle Bin).

I’ve been waiting a long time to explain this, and can finally get it out on screen. Phew!

What I’ll be covering next

Next issue: [LMG S9] Issue 112: Bootstrapping into existence

Now I can move on to something closer to the hardware, and talk about what happens when the power button is pressed.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]

Issue 110: Safeguarding against data corruption with a journal

2021-03-06T08:00:00+08:00

Previously: Fast writes dump the data to a write cache (in computer memory), then update the file table to look like the file is already written to disk. However, if power is cut before all data is properly moved from the write cache to disk, the data in memory is lost, and file corruption usually results.

Last issue, I showed how a write cache can “speed up” file write operations by allowing data to be dumped to a write cache in memory first. The OS “completes” the operation, making it look the file has been successfully written to disk, when in actuality parts of it are still in memory waiting to be written. This works fine as long as we don’t suffer catastrophic power loss before everything gets put on disk; once power is cut, all the contents of the write cache in memory are lost!

All right, we are into worst-case scenarios here. How do we recover from something like this?

The first thing we’ll need to know is what had happened; what data made it through and was written, and what data didn’t make it. We need a record of changes to the disk.

File journals

This record, in a filesystem, is called a journal. Keeping a journal requires more CPU cycles and can slow down file write operations somewhat, but as disk performance increases, this is increasingly considered a worthwhile tradeoff for data reliability.

The default filesystems on major OSes today are all journalled, and NTFS is no exception. I’m not going into details of how this is done, and instead will list the things that a journal enables Windows 10 to do.

Disk recovery with a journal

So what happens when power is cut and a computer reboots? It goes into recovery mode, where it checks the log for incomplete operations, and attempts to complete them. Any files that were “lost” (i.e. file table records that have been separated or desynced from the actual data) get moved to a separate folder, and the disk is considered okay for use again.

Disk transaction rollbacks

When installing a new application, sometimes you encounter that odd screen where the installation fails because something that needed to happen couldn’t happen. Oh great, the application could not complete installing, and now we have to remove it … how do we undo all the things that were done?

The journal lets you see a list of all the changes that were made to the disk from the start of the installation, so that you can perform the equivalent steps to reverse their effect.

System restore with journals

The same thing happens with some Windows updates. Windows stores a lot of the old data as backup, in case you ever want to roll back some system changes, or return to an older version of Windows (minus some breaking updates, for example). And again, the filesystem journal lets you do that, by identifying which changes were responsible for the update.

Issue summary: Filesystem journals are a record of changes made to the disk, so as to enable those changes to be rolled back, or to be completed properly in case of sudden interruption.

Journals are serious magic, and are what enable OSes to recover gracefully from a crash. They are kind of divorced from everyday experience, because it is usually unwise to mess with them directly, but I hope the past few issues explain why it is important to not get impatient with your computer when it seems a little slow moving data round.

What I’ll be covering next

Next issue: [LMG S9] Issue 111: Copying, moving, and deleting files

I am done with the more technical stuff! Now we can really move on to more everyday experiences, like copying, deleting, and moving files :)

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]

Issue 109: Speeding up data operations

2021-02-27T08:00:00+08:00

Previously: Safe writes ensure that all the data is written to disk sectors properly first before updating the file table. The result is that write operations take a longer time to complete.

If there’s anything to take away from the previous issue, it’s that doing things the right way takes time. And sometimes we are okay with taking shortcuts to get something done faster, the left way.

The write cache

If you have a shipment coming in that urgently needs to be dumped into the warehouse quickly (e.g. because the ship needs to leave quickly for another delivery, or to free up space on the docks), how do we speed this up?

What usually happens is the goods will be offloaded onto some empty space outside the warehouse. The ship will leave first, and assume that the warehouse will be able to get the goods in just fine.

What is the OS equivalent of this? Hard disk writes are actually pretty slow; they can get up to 100 MB/s, but usually hit a sustained speed of 60 MB/s for system disks, and about 20 MB/s over USB. The operating system therefore sets up a “dumping space” in memory; files can be read from other disks at up to 200 MB/s, and written to memory at a few GB/s!

This space is known as the write cache. (I’ve previously covered the idea of caches in Issues 39) and 57), and the write cache works on a similar idea.)

Speeding up file access … and its drawbacks

So how do we speed up the process of copying data to a disk? We could dump the data into the write cache (assuming there is enough space), then update the file table first, and let the OS slowly copy the data from the write cache into the disk. Anyone who needs access to the file records will then be able to go about their merry way, and if they need data from the file, they can just copy it from the disk cache instead of the disk. Easy peasy!

This was what happened in Windows XP. File copying appears to be speeded up, and users are happy.

But remember what happens when power to the computer is accidentally cut, or if a program suddenly hangs, or the computer sometimes runs out of memory.

Yup … it shuts down (or reboots), and the contents of memory are wiped clean. And any data that has not been written to disk is lost forever. And … the file table has already been updated to look like there is valid data!

This is when things look really bad.

Issue summary: Fast writes dump the data to a write cache (in computer memory), then update the file table to look like the file is already written to disk. However, if power is cut before all data is properly moved from the write cache to disk, the data in memory is lost, and file corruption usually results.

Well okay … it’s not a lost cause right? Yup, we do have ways to mitigate this kind of damage, and one such way is called a journal.

What I’ll be covering next

Next issue: [LMG S9] Issue 110: Safeguarding against data corruption with a journal

Woot, answered another sometime-in-the-future question, albeit in a slightly different fashion from how I thought I would do it.

Next issue, I explain how we can still do things the fast way, carefully.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
~~why does computer memory exist when apps can read directly from the hard disk? [Issue 105]~~

Issue 108: Safeguarding data operations

2021-02-20T08:00:00+08:00

Previously: When write operations are interrupted prematurely, filesystem corruption often results.

When a batch of data (shipment of cargo)) arrives at our warehouse, there are two things that need doing:

1) The data needs to be written into sectors (the smallest unit of storage that the disk handles), 2) The file table needs to be updated.

Writing data the safe way

If the write operation gets interrupted, it is preferable that our file table is not updated; this way, we will not find any reference to the data in the file table, and we can attempt the write operation again.

This will appear strange in a warehouse analogy; the cargo, damaged or intact, is still occupying space on the racks! But remember that when we are talking about a hard disk storing data, there is no physical cargo; the data exists as a specific arrangement of electrons/atoms. We can override the existing arrangement of electrons/atoms without having to reset it first.

Drawbacks of safe writing

This way of storing data first before updating the file table is advantageous in its security; if the write operation is interrupted halfway, we are less likely to suffer filesystem corruption.

But for large batches of data (or large shipments of cargo), this means a long wait … and in the meantime, nothing else can happen! Hard drives only have one writing needle, which is like a warehouse only having one forklift. If you have other applications waiting for that file (similar to other employees waiting for the file record to appear in the file table), they will be twiddling their thumbs until the last sector of data is written, the last pallet of cargo loaded onto the racks.

But this is the right thing to do, isn’t it? It doesn’t matter; Windows Vista did it this way, waiting for write operations to complete before updating the file metadata in the file table, and the result is that people complained that it was slower. People do not like things slow!

Why did Windows XP feel faster, then? Because it did it the other way round!

Issue summary: Safe writes ensure that all the data is written to disk sectors properly first before updating the file table. The result is that write operations take a longer time to complete.

Actually, Windows 10 uses a system similar to this for mounting portable devices (USB hard drives, flash drives, etc) by default. They sped it up in other ways. So if you accidentally unplug a drive before it is completely done writing … usually it won’t completely screw things up.

What I’ll be covering next

Next issue: [LMG S9] Issue 109: Speeding up data operations

If you want, there are settings you can adjust to make Windows 10 access a portable device the fast way. I don’t teach it here, because this is not the newsletter for it. But we’ll see how the fast way works in the next issue.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
why does computer memory exist when apps can read directly from the hard disk? [Issue 105]

Issue 107: The challenges of storage

2021-02-13T08:00:00+08:00

Previously: A hard disk is organised into sectors, which are the smallest unit of storage. The OS’s filesystem determines how and where to store each file on the hard disk. The filesystem manages the file metadata in a file table, separate from the actual contents of the file.

Last issue I painted the picture of a warehouse, where cargo gets stored in racks, their contents and details are stored in the file table (in the warehouse office), and we can easily look up the details of each shipment of cargo without having to inspect rack by rack.

Let’s screw things up here, shall we?

Read/write failure

As mechanical devices, hard disks are prone to failure; I’ll go into detail why in a future season on hardware. Remember that when this hard drive is housed in a computer, that computer can get disturbed by shocks or jolts. If it is housed in an external USB enclosure, sometimes we do terrible things to that enclosure … like when we drop them (usually accidentally 😬).

This is like the warehouse experiencing an earthquake. Sometimes, a small earthquake may be uneventful … other times, we might even get away with a moderate earthquake, if nothing is happening in the warehouse. But if it happens while a forklift is carrying out a precarious loading operation … 🙈

Well, that’s terrible. We’ll have to write off that shipment. Luckily, we can analogy here; data can always be re-copied again. The problem is … when a shipment comes in, we have to load the shipment onto the rack, and update the file table, so that its contents match whatever is on the racks. Which should we do first? If the loading is uneventful, it doesn’t really matter; after 5 minutes or so, both the file table and the racks will be in sync. But if the earthquake happens during this loading, they won’t be!

We don’t need an earthquake to screw things up. Remember that a hard drive, or even a flash drive, needs power to run. When we unplug a hard disk or flash drive while it is still operating, it’s like cutting power to the entire warehouse (including its forklifts) at the same time. If nothing is happening in the warehouse, it is safe to do so. But if there are ongoing operations, well you don’t need me to tell you that’s not a good idea.

Reads are usually safe

Okay, we’ll have to step away from the warehouse analogy for a bit when we talk about reading data from a hard drive; this is usually safe, because unlike cargo, reading data does not destroy the existing copy. We can mount a filesystem (i.e. gain access to it) for reading only, which will protect it from accidentally having its data overwritten.

It is the writes that always get us. When a write operation does not go smoothly, the result is usually filesystem corruption; the file table is no longer up to date, or worse, some parts of it might be improperly written, resulting in the company no longer knowing what is on the warehouse racks.

Issue summary: When write operations are interrupted prematurely, filesystem corruption often results.

There are two ways we can try to get data onto the disk (get cargo into the warehouse): Write the data first, then update the file table, or vice-versa. Which way is better? We’ll compare the pros and cons in the next issue.

What I’ll be covering next

Next issue: [LMG S9] Issue 108: Safeguarding data operations

That remove-safely thing we all hate to do when we unplug our hard drives? Yeah I’m getting to that part.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
a driver file and why do I need one? [Issue 98]
why does computer memory exist when apps can read directly from the hard disk? [Issue 105]

Issue 106: Organising storage

2021-02-06T08:00:00+08:00

Previously: The OS takes care of booting up, login and user management, window management, memory allocation, storage interfaces, background services, peripheral management, and much more. Access to these services, where allowed, is provided in the form of software libraries that developers can use.

So far, what we understand about storage is that apps send data to an operating system (OS), which then stores it on storage devices (such as hard disks, solid state disks, etc) for later retrieval. And in Issues 99) & 100), I explained that eventually everything gets stored as a file.

What exactly is a file?

Files in operating systems

Let’s start from what we know about files. When we open File Explorer in Windows and open any folder (aka directory), we see a bunch of things with colourful icons, that we know of as files. And we also see things with folder icons that we know as folders (or directories).

They also have additional details displayed alongside them, such as the date/time (henceforth referred to as datetime) it was created, the last datetime it was modified, and the last datetime it was accessed. You might also get additional information, such as what type of file it is, its size, the user who owns that file, etc—in other words, the metadata of the file. That’s a fancy word to refer to data (datetime, filetype, size, …) about the data (the files).

Now, a brief introduction to the hardware, where the actual files get stored.

Introducing storage

When you buy a hard drive (or solid state drive) and connect it up to your computer (or laptop), the computer does not have fine-grained control over every bit of storage. Instead, the hard drive gets presented as a huge volume of space, like a warehouse.

This volume is organised into sectors (these days, a modern hard drive sector is 4KiB), which are the smallest unit of storage on a hard disk. If you have to store 1KiB of data, it will still take up an entire sector; the rest of the space is filled with zeroes. This is similar to the way a warehouse is managed by pallets, and not by single cardboard boxes. If your shipment does not fill the entire pallet, the rest of the space is “wasted”.

Each sector has an address (yes, just like memory!). To store data on the hard disk, the OS has to “tell” the hard disk a) what data to store, and b) the address at which to store the data.

The hard drive itself does not inherently have any system for managing your files or folders; it can’t tell you where draft1.docx is stored, nor tell you what the size is. It only takes care of storing data at addresses, and retrieving data from addresses!

Managing storage

And this is where the operating system comes in. An operating system installing itself is like a company occupying a warehouse. You’ve got to impose some kind of order on the space!

In an operating system, this job is delegated to the file system, a sort of facility director who manages the storage space. Windows uses NT File System (NTFS) for its own space, MacOS uses High Performance File System (HPFS), while portable storage devices (e.g. USB drives) often use File Allocation Table (FAT).

An operating system that is not “aware” of other filesystems will not be able to read storage devices formatted with those filesystems. This is why you can’t just take out a disk from a Mac system and expect it to open in Windows when plugged in. Windows can read HPFS disks, but cannot write data to them. Both OSes can handle FAT (phew!), which is why it is possible to pass files from a Windows user to a Mac user with a portable flash disk.

How filesystems work

Intuitively, we might imagine that the metadata for each file gets stored with its data. This is like storing the shipment details of each package with the package itself. But when you need to find a particular shipment in the warehouse, you can’t be checking every single rack!

Instead, you would store shipment records in a master file, usually in some kind of separate office, which would list the shipment details alongside the location of the cargo (e.g. by rack number and level). The equivalent of this master file on a filesystem is the file table.

In NTFS, this is the Master File Table (MFT). The MFT lists all the files on the disk (by their full path, e.g. C:\Windows\system32\notepad.exe), along with its metadata. This makes it easy for Windows to show you the data quickly when you open any folder in File Explorer; it can get all this data from the MFT easily!

Sounds peachy. What can go wrong with this picture?

Issue summary: A hard disk is organised into sectors, which are the smallest unit of storage. The OS’s filesystem determines how and where to store each file on the hard disk. The filesystem manages the file metadata in a file table, separate from the actual contents of the file.

What I’ll be covering next

Next issue: [LMG S9] Issue 107: The challenges of storage

This is where I talk about all the ways we make life difficult for computers, but easier for ourselves. Like unplugging USB drives before removing them safely …

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
~~How do apps know where a file starts and ends? [Issue 49]~~
a driver file and why do I need one? [Issue 98]
why does computer memory exist when apps can read directly from the hard disk? [Issue 105]

Issue 105: Operating Systems

2021-01-30T08:00:00+08:00

Previously: Shared secrets allow secured access to resources, such as databases or other services. These shared secrets are typically kept on a server controlled by the app developer. For mobile apps, they are usually stored with the operating system, inaccessible to other apps.

Last season, I explained the differences between web apps, mobile apps, and laptop apps. Web apps operate in a browser environment, mobile apps operate in a mobile operating system (OS) environment, and laptop apps operate in a desktop OS environment.

The OS is that complex piece of software between the apps and the hardware that runs it all. The hardware speaks its own special code, while apps are run in machine code written for the CPU. The OS makes sure that everything runs smoothly.

What does everything entail?

What an operating system does

1. Boot-up

When the machine is powered on, how does it know what hardware is available? This hardware discovery is done by the bootloader (which for the purposes of explanation we can consider as part of the OS). It checks for the available hardware resources, and compiles a list of them for the operating system. Added some memory? Added a new hard drive? The bootloader should detect it.

2. Login and user management

While a personal computer (PC) might have only one user, many PCs are used by more than one person. Multiple login accounts allow each user to have their own space on the computer for their files.

In addition to the user accounts, an operating system also creates multiple system accounts to manage services on the OS.

3. Window management

Ever wondered how apps on a PC can have their windows styled so similarly? That’s because the OS provides a standardised window style for apps, with standardised actions that can be performed (maximise, minimise, close, resize, …). Apps can choose not to use this standardised window style, but then they are on their own when it comes to window styling, and they will have to write their own code for all of these actions.

4. Memory allocation and deallocation

Applications primarily run off computer memory, which is a hundred times faster than the hard disk (I’ll explain more in a future season). When an application is first launched, it is allocated a small amount of memory space for its data. If it wants more, it can’t just reach for the memory space and grab more chunks as needed; other apps are using memory as well, and it might inadvertently overwrite data stored by other apps.

So instead, access to memory is mediated by the OS. All requests for more memory are sent to the OS, which will return addresses for available chunks of memory. And any memory that the app frees up is released back into the common pool, for use by other apps.

5. Storage interfaces

Applications do not need to know how the user has their computer set up: what kind of disks they have, or which disk they put the OS on. But what if they need to allow the user to save a file, or load an existing file? The OS provides standard interfaces for doing so—you would likely have seen the “Select A File” interface on any OS. The OS takes care of file reading and writing, sparing the application developers from having to worry about the details.

6. Background services

Applications run by the user typically expect input only from one user; if you are running Microsoft Word at the same time as your sister, each instance of Microsoft Word expects to interact with either you or sister. However, there are special system applications that often have to deal with input from multiple users. For example, antivirus software or Windows Printer Management need to run for all users at the same time, and should not be terminated like a typical program.

These applications thus run differently: they run as background services, which are special applications which do not interact with the desktop. Background services are managed differently from applications, and typically can only be terminated by the system administrator.

7. Peripheral management

Do you use Bluetooth wireless headphones? USB devices of any sort? Did you just buy a new gamepad? The OS detects them and takes care of driver installation (usually). You can see all detected devices and installed drivers in the Device Manager, under Control Panel, in Windows 10.

I’ve just covered the main ones that most users would use on a regular basis. The OS manages much more stuff, including OS updates, per-user settings, etc—remember the Registry Editor? (Issue 100)) For a look at what else the OS manages, take a peek in the Control Panel or System Preferences of your OS.

These form a supporting network of services that applications can draw on. Instead of having to work out the full details of implementation for each of these features, they can use the OS’s provided functions to do the work for them. These functions are typically bundled in a standard library (Issue 17)) provided by the OS manufacturer. For example, .NET is a collection of libraries for Windows applications that developers can use to simplify their work.

Issue summary: The OS takes care of booting up, login and user management, window management, memory allocation, storage interfaces, background services, peripheral management, and much more. Access to these services, where allowed, is provided in the form of software libraries that developers can use.

I’ve touched on per-user data storage in Season 8, and on memory allocation in Season 5, so I will skip those for the rest of this season. Window management and background services isn’t going to be interesting for a layman’s guide, so let’s skip those too. Instead, I will try to set the stage for a next season about hardware, by explaining more about how storage and computer peripherals work, and how the operating system simplifies access to them. and then I’ll get to booting up—how a computer gets ready from the moment you press the power button.

If I still have issues left after that, I might geek out a bit and talk about fonts ;)

What I’ll be covering next

Next issue: [LMG S9] Issue 106: Organising storage

First up: How exactly is data organised in the hard disk? How does an operating system manage it all?

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
How do apps know where a file starts and ends? [Issue 49]
a driver file and why do I need one? [Issue 98]
why does computer memory exist when apps can read directly from the hard disk? [Issue 105]

Issue 104: Storing sensitive data

2021-01-23T08:00:00+08:00

Previously: A race condition happens when threads depend on instructions happening with coincidental timing for success. When instructions are not executed with appropriate timing, one or more threads can get stuck waiting on a response that never comes.

To wrap up this season on how apps work, I’m going to try answering a question I had on my mind as I was still new to computers: where do my secrets get stored? If I don’t want them to be stored, what are my options?

I will answer that from an app’s perspective in this issue.

Why would I want to keep secrets from my users?

You just wrote an app. Your app syncs data to a cloud database (Issue 90)). But the cloud database has many other app developers using it as well—how would it know it is you and not some other malicious hacker? It recognises you via a shared secret: a token or an API key that you can see after you log in to your dashboard on their website.

After your users install the app, every request sent by your app to the cloud database has to be authenticated using this shared secret. That means you are going to have to get this shared secret onto the app somehow. But this has to happen without the user being able to see it or access it, otherwise a savvy user could use that key to gain access to your cloud database.

Storing secrets on the web

The code that is loaded by the user’s browser runs under their control, so putting the shared secret anywhere in that code is a bad idea. Any savvy user who knows how to view the script’s source can potentially find it!

A much safer option is to store the secret with the code that runs on your server. But not in the server’s source code! If you are most developers, you would be using some kind of version control system (Issue 19)) that maintains a copy of your source code and all its changes. If you are using Github or some other public platform for this, you have to be very careful that the shared secret is not visible (or otherwise guessable) just by reading the source code.

For a simple shared secret, such as a short string of characters, app developers usually use environment variables. These are pieces of information that are kept in memory only, accessible by the app, and are set by the operating system whenever the app starts up. The server where you run your code will let you configure the environment variables that your app needs, keeping them out of sight of the users.

Storing secrets in a mobile app

Mobile apps are supported by a host of services provided by the operating system (OS), typically managed by Google or Apple. They each offer a way for you to store a shared secret with the OS. Your app can use this shared secret from the OS to encrypt information so that other apps are not able to access it. When your app starts up, it requests the secret from the OS, and uses it to decrypt the secret again.

Storing secrets in a laptop app

If you are developing a laptop app that does not rely on a connection to your server (i.e. a “standalone” app), your options are somewhat more limited. Since all your app code and resources will be in the user’s machine and thus accessible to the user, your best bet is to find some way to obfuscate it and hope no one finds it easily.

This is one reason why so many apps require an online connection: it is much easier to hide secrets on a machine you own and control! With a server connection, you can require the app to retrieve the secret from the server, and delete the temporary copy of the secret after use.

Storing passwords

If your users log in with an email and password (which is almost every online service ever), you don‘t actually store their passwords; that is terrible security practice, even if you do it in a database! A nifty piece of software technology, known as a hash function, takes that password (regardless of length) and turns it into a unique¹ hash with a fixed number of characters.

Examples of hashes:

661c425549bc70b98e908325b8c64f82
056cd6eb540ace37e64572c64c778d45
239b1ddbb45caf82408cb89f13816185

What you do, then, is to store the password hash instead of the password. When a user sends a username and password, you hash their attempted password, compare it with the stored hash, and see if they are the same.

The hashes are designed to be difficult to reverse. The state-of-the-art algorithm used today can generate hashes that would take millions of years to reverse using hardware currently available. But there are techniques that can reverse hashes of some older algorithms in as little as 30 minutes, so if you are a developer, please find out which one to use!

Issue summary:

Shared secrets allow secured access to resources, such as databases or other services. These shared secrets are typically kept on a server controlled by the app developer. For mobile apps, they are usually stored with the operating system, inaccessible to other apps.

Phew, we had enough issues here to cover the main parts. And I managed to answer one of the sometime-in-the-future questions! Actually, I also answered another one on software installation earlier, in issues 99) and 100), so I’m going to go ahead and strike it off.

What I’ll be covering next

Next issue: [LMG S9] Issue 105: Operating Systems

This wraps up another season of Layman’s Guide on how apps work. Next season, I am going to zoom out and look at the environment that apps operate in: the operating system. Yep, I’m going to tackle the most complex pieces of software ever to be written, and try to explain them in terms that laypeople can understand 😅

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
~~involved in installing a piece of software? [Issue 48]~~
How do apps know where a file starts and ends? [Issue 49]
~~a password hash? [Issue 63]~~
a driver file and why do I need one? [Issue 98]

In practice, there is a very low chance that two different passwords may end up giving the same hash. This technology is still being improved! ↩

Issue 103: Why apps hang even with multiple threads

2021-01-16T08:00:00+08:00

Previously: Applications are assigned a thread by the OS for running a sequence of instructions. The instructions are executed sequentially, and the app cannot proceed if it gets stuck on any instruction.

Multithreading

An app can hang if its sole thread gets stuck. In some cases, an app can be written to make use of multiple threads. This is possible when a computer has more than one processing core, or if an operating system is designed to divide computing time among multiple threads.

Race conditions

Trying to design apps to use multiple threads is hard! Apps running into an error is one thing; a more subtle form of failure is known as a race condition. This happens when the success of two or more tasks depend on near-perfect timing which the threads have little control over.

Main and secondary threads

A common pattern is to have the app’s graphical interface and main code run in its own thread, with any subsidiary tasks (such as opening files) running in a secondary thread. If the task in the secondary thread is taking too long, the main thread can still issue instructions to terminate the secondary thread’s task, and thereby restore order and control.

Sounds fair enough. How might this fail?

Lets take an example: the main thread has a task that involves sending a signal to secondary thread, and then waiting for a response from it. Secondary thread has a task that involves sending a signal to main thread, and then waiting for a response from it. Both tasks complete successfully when they are carried out independently. But what if the main and secondary threads both run those tasks near-simultaneously, before the other thread has a chance to respond? They both get stuck waiting for a response. The app has just hung!

Multiple worker threads

Another pattern is to split the job up into multiple parts, and have multiple threads each take a part of the job. When they have all completed, the completed parts are then stitched back together into the finished result.

But this has its own ways of failing too.

The threads have to coordinate their job status, and often do so by updating a common set of data. Thread 1 might request access to that data to update it. To ensure that the data doesn’t change before it is done, it will usually request a lock (Issue 82)) on it, to prevent other threads from modifying it at the same time.

One way this can fail in practice is if two or more worker threads request a lock simultaneously. They both get a lock, because at the moment their requests are processed, nothing else has locked the resource. But now they can’t proceed to modify the data because it has been locked by another thread that isn’t them.

This situation is known as a deadlock. This and similar situations are just one out of many ways that apps can hang.

Issue summary: A race condition happens when threads depend on instructions happening with coincidental timing for success. When instructions are not executed with appropriate timing, one or more threads can get stuck waiting on a response that never comes.

Yep, multithreaded programming is hard.

What I’ll be covering next

Next issue: [LMG S8] Issue 104: Storing sensitive data

To wrap up this season on apps, I’ll look at one last question: how do apps keep our data secure

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
a password hash? [Issue 63]
a driver file and why do I need one? [Issue 98]

Issue 102: Threading

2021-01-09T08:00:00+08:00

Previously: An app crashes when it encounters a situation it can’t handle, or when it attempts to perform an operation that is disallowed by the operating system.

This post is a prelude to talking about app hangs. Hangs are both simple yet complicated to talk about, but there’s a piece of the puzzle that has to come into the picture first. That piece is about how apps work.

In a computer, the operating system (OS) has to coordinate the requested actions of so many different apps. How does it know which action came from which app?

Through a mechanism known as threads.

Threading

When you run an app, the OS creates a separate thread. A thread is a sequence of programmed instructions, like a thread of thought. Or a thread of bureaucracy. The OS completes each instruction in the thread, and if it gets stuck on any single task, it cannot move on.

Sometimes, this is good and necessary, like when you need input from the user (don’t be doing anything else until I tell you what I need!). Other times, it is unnecessary waiting.

What causes threads to get stuck?

Some of us really hate math, but not computers! The math is hardly ever what causes threads to stop.

Like in the workplace, it is often other ~~people~~ devices.

When an app (running in a thread) tries to open a file to read data from it, the operating system has to look up the virtual memory address (Issue 55)), follow it to the hard disk or solid state disk, and then wait for the disk to respond with the data.

And in that moment, lots of things can go wrong.

If the disk is failing, and unable to read the sector where the data resides, it will usually keep attempting to do so. Meanwhile, back in the operating system, the thread is stuck. It cannot move on, because the previous instruction to open the file has not completed. It can’t even decide to abort the currently-running instruction—telling the app to stop is already another instruction which has to wait!

The only thing to do now is wait for the OS to realise that this thread is taking too long to do its thing, and forcibly terminate the thread. This is known as a thread timeout.

Is there any way to work around this? Yep! The termination instruction has to come from a separate thread. This means the app has to run multiple threads.

Issue summary: Applications are assigned a thread by the OS for running a sequence of instructions. The instructions are executed sequentially, and the app cannot proceed if it gets stuck on any instruction.

What I’ll be covering next

Next issue: [LMG S8] Issue 103: Why apps hang even with multiple threads

Processors today already have multiple cores, and many apps can already run on multiple threads. Why do they still hang? I’ll answer this in the next issue :)

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
a password hash? [Issue 63]
a driver file and why do I need one? [Issue 98]

Issue 101: Why apps crash

2021-01-02T08:00:00+08:00

Previously: Windows systems categorise data into two types: files, and settings. Files are stored under an appropriate subfolder in C:\, while other storage devices and network locations are stored elsewhere or given their own drive letters. Settings are managed through the Windows Registry, which is stored in C:\Windows\System32\Config\ and C:\Windows\Users\Name\.

Besides general slowness, two of the most frustrating experiences we have with computers is when they crash, and when they hang.

You mean there’s a difference? Sometimes we use the two terms interchangeably, but they are really not the same.

Remember: applications are just a list of computer instructions telling the computer what to do: where to get the data, how to process it, and what to return. When the instructions make perfect sense, everything goes well. But sometimes they don’t.

Crashing

A crash happens when the app receives (a) a response that it does not know how to handle, or (b) is not allowed to carry out.

Unhandled responses

A common error made by many programming newbies (including me) is failing to account for all the ways that things can go wrong. For example, if I am writing a simple app to read a text file and perform some calculations, an obvious step in the app is sending a request to the operating system (OS) to open the text file.

Even that simple step is fraught with many possible failures! The text file may have been locked by another app (which is writing data to the file), or the user running the app might not have permission to open the file (especially if it is in another user’s home directory), or …

Well, a whole bunch of things can go wrong. And when they do, the OS throws an error. If the app does not have any code to handle that error … game over, it cannot proceed and it crashes abruptly.

This is a lot more common than you think, even for experienced programmers, especially when a process that isn’t expected to throw an error actually does it. And sometimes it just can’t be helped: when your computer runs out of memory, and an app requests for more memory but doesn’t get it, and it just cant go on without that memory … it crashes.

Illegal instructions

Memory in the computer is managed by the OS (Issue 65)), which partitions it into different zones. The memory used by OS processes is protected from access by other apps (for your privacy and protection), and memory used by an app cannot be used by another app, unless it is shared memory space.

So when an app sends an instruction requesting to access memory space it does not have authorisation to, or when the OS itself attempts to access an address that it can’t (especially addresses that point to hardware devices) … it crashes. An app crash just brings you back to your desktop, but an OS crash usually leads to the famous Blue Screen Of Death (BSOD).

These days, OSes are better at handling crashes. If the crash occurs in the window management system (the part that lets apps create windows on screen and icons in the taskbar), Windows can often just restart it without restarting or touching the rest of the OS. But if it happens in a critical part that can’t be restarted by itself, then … BSOD :)

Issue summary: An app crashes when it encounters a situation it can’t handle, or when it attempts to perform an operation that is disallowed by the operating system.

Definitely oversimplified for ease of understanding, but I see no point going into the technical details unless a future issue calls for it.

Before going into app hangs, I’ll need to talk about threads first. If you have heard of multithreading before, yep I am going to talk about that next issue!

What I’ll be covering next

Next issue: [LMG S8] Issue 102: Threading

“Many hands make light work” is true for computers as well, and I’ll go into more detail about how a computer uses its many hands to speed up the work it does :) Before that, let’s examine the simple case of an app doing only one thing at a time: the single-threaded app.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
a password hash? [Issue 63]
a driver file and why do I need one? [Issue 98]

Issue 100: Where does all the app data go? A look at Windows systems

2020-12-26T08:00:00+08:00

Previously: MacOS, Linux, and other similar systems treat everything as a file, organised into appropriate subfolders.

Previous issue: Mac- and Linux-like systems. Since the key points are so short, let’s summarise:

On Mac-like systems, the top-level folders are

an /Applications folder for apps
a /Library folder for shared files (see Issue 17), but on Mac this extends to things like sounds, profile pics, colors, …)
a /System folder for, well, you know what.
a /Network folder for accessing resources on the network (such as shared folders)
a /Users folder for accessing user folders and files
a /Drives folder for accessing other storage devices (e.g. USB drives)

Linux systems are similarly divided, but into differently named folders.

This issue: (A rant on) Windows.

Windows-like systems (who am I kidding, there’s only Windows)

Unlike Mac-like systems, where all data comes in the form of a file, Windows systems recognise two types of data: settings, and files.

Files in Windows systems

Apps get put in C:\Program Files or C:\Program Files (x86), for 64-bit and 32-bit programs respectively¹
Library? Shared files? Ha! (I’ll talk about this further down)
System files go into C:\Windows
Network resources, well … don’t really have a … well they are a different category of location that does not start with a drive letter and instead starts with \\, unless you assign these locations to a drive letter, then they have a drive letter. Sorta.
User files go into C:\Users
Other storage devices are auto-detected and assigned a drive letter, though not always consistently.

And now we talk about settings.

Settings in Windows systems

Settings are stored in the C:\Windows\System32\Config\ and C:\Windows\Users\Name\ folder, which technically makes them system files, which … wait, how are apps supposed to access them then?

App developers are supposed to do it through a system library, which provides variables named like ApplicationData.LocalSettings, ApplicationDataCompositeValue, and RoamingSettings. These variables let developers store and retrieve settings, which all end up stored in a system known as the Windows Registry. And Administrators can edit them using something known as the Registry Editor.

The Windows Registry consists of 5 top-level areas (known as hives), each one beginning with the word HKEY_:

HKEY_CLASSES_ROOT is for storing application settings, and file extension information (e.g. which app to use to open each type of file extension)
HKEY_CURRENT_USER is for storing settings and configuration specific to the current (logged in) user
HKEY_LOCAL_MACHINE is for storing settings and configuration common to all users (e.g. default settings)
HKEY_USERS is for storing settings and configuration of each user. The HKEY_CURRENT_USER data for all users is stored here, and copied to HKEY_CURRENT_USER when they log in.
HKEY_CURRENT_CONFIG is for storing information about the computer’s configuration and resources

The usual way of finding out how to modify a particular setting for X is to google “registry setting for X” and proceed from there.

Logos, backgrounds, buttons, and other application data? They go into C:\Program Files (or C:\Program Files (x86) if still 32-bit) for traditional Windows apps, or into C:\Program Files\WindowsApps for Windows App Store apps. What if other apps also need to use them? Then they go into ~~C:\Library, just kidding, if only it were so easy~~ C:\Program Files\Common Files, but you’ll notice it’s pretty empty. Usually, they’ll be stored within the app’s folder, and you have to find out where to edit the Windows Registry so other programs know where to find them (apparently you can look in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\KnownDLLs). Whoa, wait, what happened to apps not messing around in each other’s folders—

OKAY MOVING ON—What if a user installs a program that they don’t want other users using? It goes into C:\Users\username\AppData\. User settings? They go into—nah, they don’t go into a file, they’re supposed to be settings so they go into the Windows Registry somewhere under HKEY_CURRENT_USER. Temporary files? They go into C:\Windows\Temp; isn’t that a system folder? Well yes, but if you put it in C:\Temp folks will complain and Disk Cleanup will not find it.

So to uninstall a Windows app, you run its uninstaller. Which may or may not work perfectly. Or it might remove its files but still appear in the Program List because it did a terrible job cleaning up its settings in Windows Registry. So you reinstall the program, this time using a third-party app that helps you track app installations and registry changes, so that it detects what new files/settings it creates, and then when you uninstall the program you do it through the third-party app so that it hopefully removes all traces once and for all.

Phew. And that’s all I hope.

Issue summary: Windows systems categorise data into two types: files, and settings. Files are stored under an appropriate subfolder in C:\, while other storage devices and network locations are stored elsewhere or given their own drive letters. Settings are managed through the Windows Registry, which is stored in C:\Windows\System32\Config\ and C:\Windows\Users\Name\.

Okay so this ran much longer than I expected. In fact, it ran so long that I split it into two issues. I promised to explain computers as simply and jargon-free as possible, and I hope I have managed to do that. I am definitely biased, and that I do not apologise for, because this newsletter issue would be half its original length if *muttering* some operating systems would just follow sensible principles that other operating systems have no problem following …

What I’ll be covering next

Next issue: [LMG S8] Issue 101: Why do apps crash?

Moving on from app files and settings, the next few issues will explore common app problems. Coming up next issue: why do apps crash?

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
a password hash? [Issue 63]
a driver file and why do I need one? [Issue 98]

I really was hoping not to have to explain 32-bit vs 64-bit programs ever since Issue 55), so for now let’s just say 32-bit programs are for 32-bit CPUs and 64-bit programs are for 64-bit CPUs. Unfortunately many old 32-bit apps have not caught up with the times and converted themselves to 64-bit apps, so Windows has to do hacky stuff to make old 32-bit apps work on modern 64-bit CPUs. ↩

Issue 99: Where does all the app data go? A look at Mac-like systems

2020-12-19T08:00:00+08:00

Previously: Apps generally handle three categories of files: its own (permanent) app files, (shared) user files, and (ephemeral) temporary files.

What we are here to find out is: where do these apps keep their data, and how can we get rid of them (if we really want to)?

Web apps and mobile apps will not be discussed here, because they are much more heavily sandboxed, everything gets confined into the app’s little prison, and we generally don’t have these concerns when it comes to them.

Just one little niggle …

I am so sorry to burden you with this otherwise unrelated information, but since there are a significant number of Windows users and a significant number of MacOS users, I had to bring this up at some point.

Windows and Mac manage this differently, so we are going to have to talk about two different kinds of systems. I will spend more time on the Windows system, because it needs more time.

Let’s get the easy one out of the way first: this issue deals with …

Mac-like systems¹

On MacOS (and Linux) systems, everything is a file. All files get stashed into some kind of folder.

I personally prefer this because you have everything sorted into sensible top-level folders². Mac has:

an /Applications folder for apps
a /Library folder for shared files (see Issue 17), but on Mac this extends to things like sounds, profile pics, colors, …)
a /System folder for, well, you know what.
a /Network folder for accessing resources on the network (such as shared folders)
a /Users folder for accessing user folders and files
a /Drives folder for accessing other storage devices (e.g. USB drives)

and then, similar to Linux, it has /bin, /etc and other weird-looking folders that we don’t need to worry about at this point. Just treat them similar to system files and try not to touch them. The apps that we install generally do not clutter up these folders unnecessarily.

Logos, backgrounds, buttons, and other application data? They go into /Applications. What if other apps also need to use them? Then they go into /Library. What if a user installs a program that they don’t want other users using? It goes into /Users/username/Applications. User settings? They go into /Users/username/Library (under a subfolder for the app). Temporary files? They go into /Library/Caches.

Linux systems are similarly divided, but into differently named folders. Everything is still a file, and belongs in some folder somewhere.

So to uninstall an app, you remove its files from /Applications or /Users/username/Applications and from /Library, and that’s usually it. Apps are usually quite good at doing that themselves, so you don’t need to worry.

And then we deal with Windows systems in the next issue.

Issue summary: MacOS, Linux, and other similar systems treat everything as a file, organised into appropriate subfolders.

I’m keeping this issue short because the next issue will be much longer. *Ominous music plays*

What I’ll be covering next

Next issue: [LMG S8] Issue 100: Where does all the app data go? A look at Windows systems

The reasons for the difference between Mac-like systems and Windows systems is, again, historical, but I better prepare you because you are not going to like the next issue.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
a password hash? [Issue 63]
a driver file and why do I need one? [Issue 98]

The technical term is “Unix-like systems”, but we don’t need to know that, even if all the Unix fanfolks are pointing pitchforks at me now. ↩
There’s lots of things that make less sense once we get into more detail, but fortunately we don’t do that here. ↩

Issue 98: Temporary files

2020-12-12T08:00:00+08:00

Previously: A laptop app can do practically anything, if it is running through the Administrator/root account. Sandboxing is carried out through permission control.

Thus far, I’ve summarised the salient differences between web apps, mobile apps, and laptop apps (my own terminology). I think we can move on to talking about their similarities.

There is a category of app that needs access to your device storage. It might be Youtube trying to download (part of) a stream onto your device for playback. Or it might be Tiktok trying to help you record a video for uploading. These apps need storage access so they can stash all the data into files, rather than hogging device memory with it. The same way we stash things into drawers and cabinets when we don’t need them, so they don’t clutter the space around us. And then we forget about them until we run out of space 😅

You almost never see where those files appear. They get hidden … somewhere *gesticulates around*.

Temporary files

These files are short-lived; they typically don’t stick around for more than a few days. For that reason, they are known as temporary files.

In a webapp, the browser generally stores these files into one of its allocated folders, somewhere in C:\Users\[USERNAME]\AppData\Local\Google\Chrome\User Data\Default\Cache or the like. They get cleared when you clear your browser cache. On a mobile or laptop app, the operating system designates a space for temporary files, in C:\Windows\TEMP or AppData\Local\Temp for Windows, or /tmp on Linux. You can clear those files through Disk Cleanup in Windows.

In general, temporary files are things you are not supposed to think about too much. The operating system has ways to clear them regularly. Apps are supposed to use these standardised locations to stash temporary files, and attempting to place them anywhere else is considered impolite, like leaving your stuff lying around in an office or otherwise public space.

User files

But then, sometimes the app produces useful data that you want to keep around! Your journal which you keep in a Word document, photos of your cat or dog, and the copious, copious video files …

Webapps have no space for you to do that. You are just supposed to save them onto your phone or laptop; the browser has no way for different users to stash their own files.

Most smartphones assume they are going to be used by a single user, and you just stash those files directly into phone storage. Not the best system, but it is what it is.

Laptops are where it gets a bit more interesting. Most laptop operating systems (OSes) assume they might be used by multiple users, each on their own account (hence the login screen), and therefore allocate separate spaces where each user may keep their stuff, inaccessible to other users except Administrators (Issue 97)).

On Windows, these users each have their own folder in C:\Users\; on MacOS, that’s in /Users/; on Linux, it’s typically /home/. (Don’t ask about the \s vs /s; its one of those things that’s just the way history happened and has no real technical reason behind it.)

App files

Of course, each app needs to have its own space to keep its own files, which allow it to do what it does.

Webapps get their own folder somewhere in C:\Users\[USERNAME]\AppData\Local\Google\Chrome\..., they can only see what is in that folder, and they cannot see what is in the parent folder, or sibling folders. It’s sandboxing, again (Issue 92))!

Mobile apps get stored into /data/app or some similar folder, and you’re not supposed to think too hard about where, because of *handwaving* sandboxing. The same idea applies: The app is not supposed to know, or be able to see, where other apps store their data! Eyes on the app’s own data only, and the user’s data (you did give it permission to access storage, right?), and any temporary data which it has created.

Laptop apps get stored in C:\Program Files, and interestingly enough have some kind of civil arrangement where they agree not to delete each others’ files, although antivirus programs have this passive-aggressive low-key thing where they like to mark each others’ program files as potential malware *shrug*.

System files

These files were around ~~in the beginning of time~~ when the OS was installed; that means when you bought your laptop, they were already there, and any sensible system would prevent non-Administrators from mucking around with them. Windows stashes them in C:\Windows, while MacOS and Linux store them in /bin/, /lib, and various similarly opaque folders.

I might go into more detail about these, possibly in a future season when I talk about operating systems, but for now we are done talking about categories of files. Phew!

Issue summary: Apps generally handle three categories of files: its own (permanent) app files, (shared) user files, and (ephemeral) temporary files.

In reality, there are a whole bunch of different filetypes and other little details that apps need to worry about, but this is a newsletter for layfellas so let’s start simple.

What I’ll be covering next

Next issue: [LMG S8] Issue 99: Where does all the app data go? A look at Mac-like systems

I separated temporary files and user files in this issue. Temp files are files that come and go, like stray cats, while user files are shared with other apps as well and there’s really not very much that you can predict about them except hope users don’t do anything too crazy. System files are strictly off-limits so don’t even think about that.

But meanwhile, as an app developer, even after you exclude the above categories of files, there is still a whole bunch of questions you kinda need to worry about at some point:

where do I store my logos and backgrounds and buttons and other data?
where do I keep my settings?
where do I keep user settings?
if my program is meant to make a USB device useable, where do I drop the driver files? (Yep, that’s gonna need its own issue)
what if someone uninstalls my program but I want them to be able to keep their settings around in case they decide to reinstall and then it can feel exactly the way they left off?

Yeah I have no idea who actually thinks about that last question either, but it’s something that seems to get asked in every software uninstallation *shrug*.

So let’s get into that in the next issue.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
a password hash? [Issue 63]
a driver file and why do I need one? [Issue 98]

Issue 97: Laptop apps

2020-12-05T08:00:00+08:00

Previously: Mobile apps are sandboxed by the operating system. As a result, they have to bundle all the libraries they need, and are not allowed to share libraries with other apps. This results in mobile apps with huge filesizes.

This issue is going to be short, because laptop apps are … well, almost unlimited in what they can do.

Web apps are sandboxed by the web browser. Mobile apps are sandboxed by the mobile operating system (OS). Laptop apps are sandboxed by hte desktop OS (yup, Windows on laptop and desktop is practically the same).

The main difference here lies in the difference between a mobile and desktop OS. Mobile OSes do not allow mobile apps to share libraries with other apps, and restrict their privileges (Issue 96)). Desktop OSes, on the other hand, let you do anything that is computationally possible … if you have permission.

The Admin account

A desktop OS often has an all-powerful user, known as the Administrator (Windows/MacOS), or root user (Linux). This user does not need permission to do anything. But with great power comes great responsibility, and with an admin account it is all too easy to do something that renders the computer unuseable.

So lower-privilege accounts exist—these are the user accounts. Logging in as a user gives you limited privileges: often you cannot change OS files, install or remove apps, or do anything risky. This is, for the most part, how desktop OSes sandbox the computer environment from damage by other apps.

What an admin can do

So what happens when you run an app on an admin account?

This app can:

Edit, delete, rename OS-related files
Create new “virtual” (emulated) hardware devices, and manage drivers for it
Send data to any device, or receive data from any device
Make changes to storage devices, including the disk where the OS itself is installed (but not the partition¹ where the OS is installed)
Run programs in the background
Send data over the network to any IP address, over any port (Issue 33))
Prevent other programs from doing so (Issue 34))
Install libraries that can be used by other programs
Access OS settings and make changes that affect OS operation
… and many more things!

User Account Control

What if a user needs access to some of these permissions (but not all)? Does that mean they need to become an Admin?

Windows, and other OSes as well, usually have some way to give users limited permissions for some tasks. Windows uses User Account Control, which pops up a dialog box to alert the user. If the user gives permission for the app to proceed, then it is able to do so. If it is running on a user account, it can only perform tasks that the user account is allowed to perform. Linux uses the concept of groups; for a user to have permission to access bluetooth, for example, the linux OS often requires the user to be added to the bluetooth group in the OS.

Issue summary: A laptop app can do practically anything, if it is running through the Administrator/root account. Sandboxing is carried out through permission control.

This is the reason why you should still buy a machine with a desktop operating system if you plan to be doing anything really productive; the sandboxing systems of web and mobile apps ultimately still impose a significant limit what you can do with the device. This is intentional; it is done for your safety! But if you want your device to do more, you’ll often need to override these “safety limits”, and that is where desktop operating systems come in.

What I’ll be covering next

Next issue: [LMG S8] Issue 98: Temporary files

Next issue is going to round off this mini-arc on how different kinds of apps operate. In the process of doing whatever it is they do, apps often generate temporary files that can be safely removed. How does this work for web apps and mobile apps? And for laptop apps?

From there, I’ll expand to talking about how apps store their data on and retrieve their data from the operating system.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

I’ll talk about partitions in a future issue, when I move on to hardware devices ↩

Issue 96: Why are mobile apps so large in size?

2020-11-28T08:00:00+08:00

Previously: Mobile apps, unlike web apps, can bundle resources and libraries to be installed to a mobile device. They can also request access to storage, and typically have a higher memory limit than web apps.

In the process of creating an app, a developer often needs to use libraries (Issue 17)), which are ready-packaged chunks of code she can run without having to write the code herself. Some provide core functions (e.g. sending information via the Internet, or checking if a data connection is available), while some provide optional features (e.g. mobile payments, or user feedback systems).

Libraries that provide core functions are typically provided by the operating system (OS), which ultimately controls the device’s resources, but all else have to be pulled in by the app, somehow.

Web apps vs mobile apps

Web apps, which I covered in Issue 94), pull in any resources they need through web requests. This includes any libraries that they need. The browser allows it to do this, but prevents access to most parts of the operating system, and allows limited access to camera, sound, storage, etc. The sandboxing features of the browser make web apps generally safer to access.

A mobile app, on the other hand, is sandboxed by the operating system. Most of the resources it needs have to be present at the time of running the app, and that includes libraries. In an unsandboxed environment, commonly used libraries (e.g. mobile payment libraries) could be installed in the OS and shared by the apps. But this opens up a means of unauthorised access to multiple apps: hack this library successfully, and all other apps on the OS are also affected!

The sandboxing system in a mobile device does not allow this. Each app must bundle all the libraries it requires, to be installed into storage after downloading. This way, if an app has one or more libraries compromised, it would at least not expose the user’s data in other apps.

Libraries in a mobile app

The tradeoff to separating all these mobile apps and preventing sharing, is that each app now comes with its own copy of all the libraries it needs. And the file size can really add up—you already see it in the huge app sizes. I unbundled the installation package of a popular shopping app, Lazada, just to see what is inside it.

The list is way too long to post as an image, or even as text; it has over 300,000 code functions bundled inside! I should note that I am not an Android developer and can’t tell you very much about whether these libraries are absolutely necessary, but here are some noteworthy libraries included that I can make an educated guess about:

Libraries to translate code from one programming language to another (often to translate an easier language into a faster or better-supported language)
Compatibility fallback & device evaluation libraries (probably for devices on older Android versions)
Layout libraries (for calculating placement of window frames in devices with different screen sizes)
Graphics, media, augmented reality (AR) libraries (for graphics rendering, video playback, capturing images from camera, etc)
System, network, version detection and updating, etc
Animation libraries (I see one from AirBnB)
Analytics libraries (to track user and ad engagement, and do A/B testing)
Debug, crash reporting, logging libraries (for troubleshooting app crashes)
Integration libraries (for login using FB and other accounts)
Maps, location
Search, image search, QR code, user feed, and related services
Database access (it seems to use Google Firebase, in addition to others)
Mobile payments

and then there is the app itself, which contains code for:

Address validation, checkout, delivery
User feed and homepage, login management, recommendations
Search

and many others which I don’t know about.

Why aren’t web apps so huge then?

They are! But they don’t need many of these (e.g. code translation and compatibility), and most of the libraries in 1–13 would have been loaded separately from the main page (see Issue 78) for a visual example). Much of the functionality would not need to be loaded or installed upfront, only when it is required (e.g. map display).

For a web app, many more functions would also have been offloaded to Lazada’s servers, such as address validation. On a mobile app, this code is included upon installation to reduce data usage.

More importantly, you have little idea how large a web app really is, since you are never shown its filesize anywhere ;)

Issue summary: Mobile apps are sandboxed by the operating system. As a result, they have to bundle all the libraries they need, and are not allowed to share libraries with other apps. This results in mobile apps with huge filesizes.

It’s worth thinking about what this says whenever we hear about so much data being transmitted over the internet. Much of this data is actually duplicated data (for security of inefficiency reasons), or metadata (for data management), or overhead data (because of the way the data is packaged). Just like Amazon packaging!

What I’ll be covering next

Next issue: [LMG S8] Issue 97: Laptop apps

Finally we can move on to more fully explore the complexity of apps that integrate more closely with the operating system: laptop apps!

This should be enough of a primer before I go on to talk about where all this app data goes, and then about app installation and uninstallation (and hence strike out another “sometime in the future” question, woohoo! 🙌)

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 95: What’s in a mobile app?

2020-11-21T08:00:00+08:00

Previously: Web apps require the browser to request memory on their behalf, and thus their memory usage shows up under the browser process in the OS Task Manager. Web apps use this data to store a more convenient (but larger) representation of the webpage document, and to store the data needed by the app.

“Why use the mobile app when there’s already a website?”

“Why even have a mobile app that looks almost exactly like the website?”

I’m not going to answer from an aesthetic or user experience point of view, you’re all experts in your own preferences :) Instead, I’ll focus on whats actually under the hood in this newsletter issue.

If you haven’t read Issue 93) on web apps (a.k.a. websites), I’d recommend you do that for the full context, because in this issue I’ll highlight some key differences between web apps and mobile apps.

Resources

Web apps have to request every single image, video, non-text object on the page via a web request. Caching (Issue 39))—storing these resources for offline use—can reduce subsequent load times, but the first load will still be the most painful.

A mobile app can package the most common, unchanging resources (logos, button images, backgrounds, etc) into the mobile app itself, so they can be loaded directly in the app, without having to make a web request and wait for the response. This lets it load faster (theoretically … in practice, many apps still have to retrieve other data from the server, so the loading speed improvement is marginal)

Flexibility

The document object model, or DOM (Issue 94)) is how web apps keep track of all the elements and their contents on a page, but it is not the most efficient way to do so. A mobile app has more choice in deciding which user interface library (Issue 17)) it wants to use.

Storage access

A mobile app can request permission to access storage on the mobile device, allowing it to store files (images, data, …) on the device without having to interrupt the user each time. It is not limited only to browser storage interfaces (localstorage, sessionstorage) and browser databases (IndexedDB)—see Issue 93)—but can use other kinds of interfaces and databases, if they are available on the mobile device, or bundled into the mobile app.

Memory use

Both Android and iOS impose a memory limit on each app that is running. And they treat a mobile browser as a single app, despite all the web apps running inside it. So a web app has to share that limit with all the other web apps running in the mobile browser (which is why your tabs have to reload so often—they are also cleared often!).

On the other hand, a mobile app can have that per-app limit all to itself.

All told, a mobile app has more resources, which it requests directly from the OS instead of via the web browser, and it has more freedom in using those resources. How so? And more importantly, why are some mobile apps just so darn huge?

This and mobile app sandboxing explained next issue.

Issue summary: Mobile apps, unlike web apps, can bundle resources and libraries to be installed to a mobile device. They can also request access to storage, and typically have a higher memory limit than web apps.

Mobile apps are a bit of a weak spot for me since I haven’t had as much experience here as I had in other areas, but nonetheless the limits of sandboxing are pretty visible. For the most part, we have accepted this tradeoff between size and security since storage space became much cheaper. But this tradeoff is also apparent not only in software, but also in business management and other areas: to increase security, we often also have to increase bloat.

What I’ll be covering next

Next issue: [LMG S8] Issue 96: Why are mobile apps so large in size?

Remember the days when most apps we downloaded and installed on a laptop were 2MB or less? Today, mobile apps are many times that size. This is partly because of the way sandboxing is done for mobile apps. How so? I’ll go into more detail next week ;)

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 94: Why do web browsers take up so much memory?

2020-11-14T08:00:00+08:00

Previously: Web apps have limited access to the device’ storage, and can only store data in browser-managed databases. Progressive Web Apps (PWAs) can additionally register service workers that run in the background. Because they are so cleanly sandboxed, they can be easily removed by clearing the browser cache and storage, and deregistering any service workers manually.

Apps once lived on a computer. You double-clicked them or pressed Enter or right-clicked Run, a window pops up and a rectangle appears on your taskbar. If Task Manager is showing you that it is using up gobs of memory, you just End Task and the rectangle disappears. Later in this season I’ll say more about what it is like to live in a computer, but we are on web apps now.

Web apps are somewhat more complicated. Because they are so tidily sandboxed (Issue 92)), they cannot actually live on your computer. Instead, they live in your browser.

Living in a web browser

Living in a computer means that the operating system (OS) takes care of you; it gives you the memory and disk space you need, gives you CPU time to run your processes, and gives you access to devices (such as the screen and keyboard).

When you live in the browser, the browser takes care of you. Everything you need is requested from the operating system (OS) by the browser. The space that the web app uses in localstorage, sessionstorage, IndexedDB, and the cache, is space that the browser requested. The CPU cycles that the web app needs are cycles that the browser has requested.

When you open the OS Task Manager, where do these requests show up? Under (one of) the browser processes, naturally. If you have more than 20 browser tabs open for more than 5 apps, it shouldn’t be surprising that they are using a lot of memory; I’ll go into why shortly. More worryingly, that’s not helpful if you’re trying to figure out which browser tab to close so you can play your memory-consuming video game.

Only the browser has that information; you will have to open the browser’s Task Manager (another google away) to see that information.

A web app’s needs

I have a browser tab open now, with a Google Sheet loaded. What is the Google Sheet app doing on that page? Let’s open DevTools and find out.

DevTools has a really cool tab labelled “Memory”, and it has a nice visual depiction of what the app is doing with all that memory:

DevTools in Firefox lets you inspect the memory that web apps use.
The Memory tab shows what is stored in memory.
Most of the memory here is being taken up by javascript objects.

Javascript objects here are Javascript’s own internal representation of data, which is quite similar to a document database’s format. Altogether, they take up 32 MiB of memory space (difference between MB and MiB is covered in Issue 40)). Google Sheets is juggling a lot of data internally, data which is not stored in IndexedDB or localstorage!

other, taking up 15 MiB, seems to be pointing to a javascript library that Google Sheets is using to render the spreadsheet.

strings are simpler than objects, each one representing a snippet of text, or possibly even a number. They only take up 5MiB. scripts are the internally stored scripts that the page is executing; they take up 11 MiB.

domNode is where it gets interesting. We are used to seeing HTML documents (Issue 50)) as a plain text document with lots of formatting, but in a browser it becomes more than just text. Each part of the page, an HTML element, can have its properties changed by Javascript as the page reacts to new data, or to user input.

The HTML Document Object Model

It would be too computationally taxing to keep scanning through the text document to figure out which part of the page is meant to be changed. Instead, the browser has its own way of storing the hierarchy of elements: each menu option falls under a menu heading, which falls under the navigation bar, which falls under the header, which falls under the main document, and so on. If each browser had its own way of doing that, a web developer would have to learn all of them to make a webpage that worked across all browsers; that’s terrible!

Instead, the web standardised on one way of doing so: the HTML Document Object Model (DOM).

Internally, a web browser converts the HTML page into a DOM—a data structure that makes it easy to find the specific HTML element (or elements) that need to be modified by each function. The HTML DOM for the page I’m on takes up 14 MiB, which may sound like a lot, until you remember that each element also has associated metadata stored along with its content. And Google Sheets has lots of elements!

Issue summary: Web apps require the browser to request memory on their behalf, and thus their memory usage shows up under the browser process in the OS Task Manager. Web apps use this data to store a more convenient (but larger) representation of the webpage document, and to store the data needed by the app.

And that is how a web app uses up 84MiB of memory space. If you have multiple tabs running the same app (e.g. multiple Google Sheets open), some of the memory can be shared (Issue 84)) by these tabs (e.g. scripts), but otherwise each tab is going to have its own memory needs.

In the earlier days of the internet, when spreadsheets were still a separate app, this memory usage would have showed up in the OS Task Manager under Lotus 1-2-3, Microsoft Excel, or some other spreadsheet program. Today, it shows up under Chrome or Firefox, and the details are only inspectable through the browser’s Task Manager.

No wonder browsers get all the blame these days.

What I’ll be covering next

Next issue: [LMG S8] Issue 95: What’s in a mobile app?

This issue felt like a data dump; I know it’s a lot all to take in 😅 In my childhood, I had access to lots of books with these cutaways showing the inner mechanisms of devices of all sorts, and I loved those books. It’s rather harder to do the same with software, since there’s nothing to physically slice through (even if only in the imagination!) I hope that the screenshots in this season of LMG will help you imagine the inner mechanisms of apps. Let me know if it’s working for you, and if there’s anything you’d like to see :)

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 93: What’s in a web app?

2020-11-07T08:00:00+08:00

Previously: Sandboxing is a catch-all term for the concept of ensuring apps don’t have access to resources outside of their privileges. Sandboxed apps are generally safer than non-sandboxed apps in terms of security, and easier to manage, terminate, and uninstall.

The simplest apps we use do not generate data; think about your calculator, which simply crunches calculations and displays the result (conveniently ignoring the ones with memory slots for storing calculated values …). Or currency converters, or timezone converters … no storage needed.

So how do web apps store data?

Data storage in web apps

As a web-first programming language, Javascript programs were not expected to have to access, open, or create files on the device. That would make it really easy for a malicious script to download malware to a folder, where it could be accidentally invoked! Instead, it relies on other features to store and retrieve data for specific purposes:

To remember user logins (the “Remember me” feature you see on almost every login screen), web apps can set/unset cookies (Issue 69)) in the browser.
To obtain files for use, the web app can invoke a File Select dialog for the user to choose a file, such as for uploading to the server. The web app is not allowed to access arbitrary files this way.
If data needs to be provided to the user in the form of a file, it can be stored on the disk with the user’s permission through a download dialog.
The web app can store data through a browser feature called localstorage. This is a key-value database, managed by the browser, that allows you to store data (the value) tagged to a key. The same way a hotel lobby holds your luggage for you and lets you access it through a luggage tag, or the way you can rent a locker for storing your stuff (value) and access it through the locker key.
For data that is only needed in that tab (e.g. partially filled form data), and can be safely deleted when the tab is closed, the browser provides sessionstorage. This works similarly to localstorage.
For more significant amounts of data, web apps can use IndexedDB, a more advanced database also managed in the browser. It is a document database (Issue 88)), with each document tagged to a key in a key-object system.

DevTools in Firefox lets you inspect the data that web apps keep.
The Storage tab shows what is stored in cache, cookies, IndexedDB, localstorage, and sessionstorage.
IndexedDB, localstorage, and sessionstorage are key-value databases that store the data (value) tagged to a key.

Requesting and receiving data on a server

So a web app doesn’t much in the way of storage access, but they were not designed for that at all. Most of the heavy lifting is not meant to be done in the browser, but elsewhere, on a server. A web app would send heavy workloads to a server (typically owned by the same company) through an API (Issue 4)) through a web request (Issue 9)), and receive the results through a server response (Issue 8)).

A web app would also need resources for display: images, videos, PDFs, … these are requested and received via web requests as well.

Cool, so a web developer can just write Javascript code to get the data and resources it needs, display stuff to the user, wait for the user to interact, and then make more requests to the server to calculate stuff, or send the app more data (such as the user’s tweets or posts or other stuff).

Just one problem with this: if internet connectivity is intermittent or laggy, none of this is going to work! Even when the internet is fine, it makes for a very slow experience. How do we improve this?

One way is to cache (Issue 39)) as many things as possible: header images, logos, emojis, icons, … these can all be stored in localstorage and accessed even when the app is offline.

Some apps, such as Google Docs, will also store user data in IndexedDB for a smoother experience—imagine having to wait for a request-response round-trip to the Google servers for every word you type. The data gets modified in IndexedDB first, and then synced to the servers. If the device gets disconnected from the internet, at least you will still be able to read whatever is in IndexedDB (and if you have enabled offline access, you can even edit the data in IndexedDB, and the Docs app will attempt to sync it to the server once connectivity is restored).

Running background processes

Then what’s with these popups on some websites asking you to install them? And how are some websites actually able to send us notifications? Something has got to be running in the browser background for these to happen, and none of what we have learned so far explains that … what gives?

Enter Progressive Web Apps (PWAs). With some Googling and lots of reading on StackOverflow and other web documents, a web developer can get started meeting the various requirements needed to create a PWA.

And in exchange for that inconvenience, she can use service workers: javascript scripts that run on their own and are not dependent on the browser tab staying open. These service workers can listen for messages from the server, carry out some processing, make requests and receive responses, all independently from the app running in the tab. On mobile devices, they are gradually gaining more features as well, such as access to the Share feature (enabling users to share content with the PWA), and being able to access cameras, microphones, location services, and other things (provided the user grants permission).

Installing web apps

Okay, wait. WHAT?!

I imagine most folks would be okay with cookies, with localstorage and even IndexedDB. You want things from an app, you gotta give it space to work, right? That’s fair.

But service workers, what?! You mean if I click Install, these apps get to run stuff in the background in my browser, even after the tab is closed? If I’m not cool with that, I can just choose not to install and then these service workers won’t get installed, right?

Umm, I don’t know how to break this to you gently, but nope.

Once you visit a site (URL) with a registered service worker, your browser automatically registers it. The only thing the Install button does is to add a shortcut on your Desktop/Home Screen, and perhaps enable some features (such as mobile Share). But the service workers are already there.

Uninstalling web apps

And now the good news.

Because web apps are so cleanly sandboxed, they don’t stick tendrils into your operating system or device storage (beyond the space reserved by the browser, anyway). Removing apps and their files just involves clearing your browser cache and website storage. You will, however, have to deregister the service workers manually; please google for instructions.

Issue summary: Web apps have limited access to the device’ storage, and can only store data in browser-managed databases. Progressive Web Apps (PWAs) can additionally register service workers that run in the background. Because they are so cleanly sandboxed, they can be easily removed by clearing the browser cache and storage, and deregistering any service workers manually.

What I’ll be covering next

Next issue: [LMG S8] Issue 94: Why do web browsers take up so much memory?

I’m finally starting to answer one of the sometime-in-the-future questions below, and can’t wait to get to the meat of “What is involved in installing a piece of software?”; it’ll be a ride! :)

Before I move on to compare web apps with mobile apps, I’m going to take a short detour next issue and answer a question I hear all too often: “Why do web browsers take up so much memory?”

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 92: All about apps

2020-10-31T08:00:00+08:00

Previously: Depending on what you need a database for, there may be online database platforms that can manage and automate much of the work for you. Airtable, Smartsheet, Knack, and Zoho Creator are just 4 of many options that offer an easier way to set up and input your data, then access them through apps or other means.

There was a time when apps, short for applications, were these files that came on a CD or diskette, usually ending in .exe (if your settings enabled you to see file extensions). You double-clicked them or pressed ‘Enter’, and things happened.

Today … the idea of an app is more nebulous. The .exe files are still there, but now there are also apps that you install from the app store. And what’s up with webpages that display an app installation pop-up and create an icon on your home screen or desktop? Are they the same kind of app? If not, what’s the difference between them?

Before I answer that question, we need to talk about an important concept called sandboxing.

Why do apps need to be sandboxed?

An unrestrained app running in your operating system would have access to any and all resources on that machine. It could potentially modify or remove system files, halt running programs or accidentally overwrite their memory contents, and so on.

The operating system) takes some measures to minimise or prevent this. But even within these limits, a app developed with good intentions could still potentially cause trouble. It might accidentally hang on to audio access when it no longer needs it, make a mess of the user’s home folder, leave files in limbo, etc. Programs can leave a mess, and a way to limit this mess is very welcome.

This is why sandbox systems were already being researched as early as the 1970s, and are still an ongoing research interest at many institutions.

What is sandboxing?

If you are old enough you might remember playing in a sandbox. While there are usually no explicit rules about how to play in a sandbox, there is usually one unspoken rule:

What was in the sandbox, stays in the sandbox.

A sandbox limits the mess, yet gives you unrestrained freedom within that box.

In a computer, a sandbox system imposes restrictions on running applications. Some common restrictions include:

only being able to access/change/delete files within a particular subfolder
only having access to some OS resources (e.g. not being able to access USB devices, or audio, or webcam)
having limited privileges while the application is not active (e.g. no internet access when not being directly interacted with, and just running in the background)

These restrictions are intended to limit any damage and keep the system more stable than it might otherwise have been.

So one way to categorise laptop apps, mobile apps, and web apps is in the amount of sandboxing they are subject to.

Issue summary: Sandboxing is a catch-all term for the concept of ensuring apps don’t have access to resources outside of their privileges. Sandboxed apps are generally safer than non-sandboxed apps in terms of security, and easier to manage, terminate, and uninstall.

Keeping it vague in this issue because the details really differ between operating systems, kinds of apps, and even the way they are distributed. We’ll be digging into details next issue, starting with the most heavily sandboxed app: the web app.

What I’ll be covering next

Next issue: [LMG S8] Issue 93: What’s in a web app?

From just little snippets of script that animated buttons and counted visitors back in the 90s, Javascript now powers a huge portion of the Internet, processing payments, serving ads, and much more besides. We’ll look at how this gets packaged into a web app next issue.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 91: Commercial database alternatives

2020-10-11T11:00:00+08:00

Previously: A URI (Uniform Resource Identifier) is required to connect to a database. This URI can be provided by a hosting service provider that runs your own database for you, or by a cloud service provider that runs your database on their platform.

So you’re running up against the limits of a spreadsheet and want to do more with the data inside it. Databases sound cool and kind of like what you want right now. But writing a whole app and setting up the database yourself, or even getting someone else to do it and checking their work … it all sounds like so much.

What to do?

Airtable

Airtable is a database-like platform that lets you set up Bases (similar to databases), which can contain different tables for your data. You can specify a specific data type for each table, limit entries to a list of options, and even create lookups (match the value here with a column in another table, and return data from another column in the same row).

Just as databases don’t have a single canonical view, and everything depends on queries, Airtable also lets you create different views of your data. You can set it up as a list, a gallery, a job status board, and filter it as you like.

Interestingly, Airtable also dynamically generates an API for each of your bases, so that apps you create have a way to retrieve, modify, or delete data from the database. That saves you a lot of trouble having to set up your own database, for simple needs.

Smartsheet

Smartsheet is another platform that lets you create sheets with different views. Unlike Airtable, is leans more heavily towards workplace workflows, with built-in task management features and integration with many services. If you are already using one or more of these services, Smartsheet could be a way to store information for collaboration.

Knack

Knack is yet another database-as-a-platform, which also allows you to craft queries to extract the data you need. It has an interesting feature that lets you specify how tables relate to each other (e.g. Contact connects with one Company, Company connects with many contacts) to improve queries.

Knack also lets you create simple apps with limited access to the data, for employee or customer use. If you mainly need internal apps for disseminating or allowing field access to data, this is probably a simpler option than hiring an app programmer/company.

Zoho Creator

Zoho Creator is a database platform that is more focused on app-building (or so it appears). The database just comes bundled as part of the deal. Another option for corporate operations-focused apps.

Issue summary: Depending on what you need a database for, there may be online database platforms that can manage and automate much of the work for you. Airtable, Smartsheet, Knack, and Zoho Creator are just 4 of many options that offer an easier way to set up and input your data, then access them through apps or other means.

The best thing about these cloud services is that you probably don’t need to learn SQL or other advanced query languages to use them. A passing familiarity with spreadsheets, and time to sit down and watch tutorial videos, is probably sufficient to get started.

What I’ll be covering next

Next issue: [LMG S8] Issue 92: All about apps

I’ve spent a whole season talking about data (Season 4, Issue 40) to Issue 52)), then detoured to talking about computers, and the internet, and now back to databases. I think that’s plenty of foundation to finally move on to something more familiar: apps.

What exactly are apps and what do they do? What are they like under the surface? What makes them tick?

This and more in Season 8 … which will start after a two-week hiatus. It has been really fun putting finger to keyboard and watching everything come together, but I noticed the quality of recent issues has been sliding more than I’d like. I’m going to take a little break to reconsolidate, recuperate, and think about the next couple of seasons.

See you next issue!

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 90: Using a database

2020-10-03T08:00:00+08:00

Previously: Graph databases treat the details of things as secondary, and optimise for managing the network of relationships. A graph database can quickly look up how things are related to each other, and return the results.

At some point in the past, getting a database meant talking to a consultant or contractor, who would then sit with you to understand your requirements, then set everything up for you without letting you touch any part of it. And that is probably for the benefit of you both. But today, for SMEs with some relevant expertise, it is actually possible to get your own database up and running very quickly.

Setting up a database on a server

If you have admin rights to the workplace server (which can be both a blessing and a curse), you’ll have to find the setup instructions that came with the server software (or Google it online). I’m sorry, it is painful for layfolks (and even for many experienced database admins) and there just isn’t an easier way yet.

Registering a database in the cloud

If you do not have admin rights to the workplace server, you usually ask your friendly server administrator to help you install the database and set up a web admin panel for you. They will give you a URL and login credentials for that web admin panel, and you configure the database through the database section of the admin panel.

If your company has decided to do away with organic IT support, your next bet is to outsource that help from cloud services. Each of the major cloud providers provide multiple database types for your perusal. Some app hosting services will also host a database for you (usually intended for app use, but who’s asking?).

Relational databases - Amazon Relational Database Service - Google Cloud SQL - Microsoft Azure SQL Database

Document databases (You will see many of them referred to as NoSQL databases) - Amazon DynamoDB - Google Cloud Firestore (part of Firebase) - Microsoft Azure Cosmos Database

Graph databases - Amazon Neptune - Microsoft Azure Cosmos also has an API (Issue 4)) for graph databases

Getting the database identifier

After you have successfully registered a database (of any type), you will be given a connection URI (Uniform Resource Identifier), which is a fancy way of saying “URL to identify your database uniquely”. It can be a simple line of text, like:

mongodb://mongodb0.example.com:27017

which identifies your database as a mongodb (document) database running on the server at mongodb0.example.com on port 27017. (I covered server hostnames in Issue 29) and port numbers in Issue 33)).

or it can look like:

postgres://myusername:myverylongwindedpasswordwhichisobviouslygeneratedbyacomputerandnotahuman@ec2-52-207-124-89.compute-1.amazonaws.com:5432/d77ila0heea1lk

which identifies your database as a postgres (relational) database running on the server at ec2-56-486-386-34.compute-5.amazonaws.com on port 5432, and your particular database is named d77ila0heea1lk (you can run multiple databases on a single server).

Connecting to a database

This is where it gets a bit trickier.

If you are using another online service that integrates with your database, that service needs to know your URI and its associated information. The service will either ask your for your login/authentication credentials, hostname, and port separately, or ask for it in a single URI, or some mix of the two options.

If you are hiring your own developer (including possibly yourself), you will have to figure out which module you need to connect to the database.

For example, MongoDB in Python: MongoClient('mongodb://mongodb0.example.com:27017')

And for PostGreSQL in Python: psycopg2.connect('postgres://myusername:myverylongwindedpasswordwhichisobviouslygeneratedbyacomputerandnotahuman@ec2-52-207-124-89.compute-1.amazonaws.com:5432/d77ila0heea1lk')

Note: It is considered insecure to simply leave your login credentials in code like that. Please read up on best practices for importing sensitive information from more secure sources in your programming language of choice.

Issue summary: A URI (Uniform Resource Identifier) is required to connect to a database. This URI can be provided by a hosting service provider that runs your own database for you, or by a cloud service provider that runs your database on their platform.

Once you go through the painful process the first time, it gets easier. A lot of engineering work has been done to make this possible: connect to a database with one identifier. URIs are their own fascinating bit of information engineering, definitely not within the scope of Layman’s Guide. It is something to think about whenever you need to identify everything in your office or warehouse with a unique name (think barcode system or inventory/asset management).

What I’ll be covering next

Next issue: [LMG S7] Issue 91: Commercial database alternatives

What if we don’t want to do all of that? Next issue, to wrap up this season, I’ll give you some alternatives that sit somewhere between a full database solution, and a simple Excel/Google Sheets spreadsheet.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 89: Graph Databases

2020-09-26T08:00:00+08:00

Previously: Document databases organise data into documents, each containing a number of field-value pairs. each value can itself be a document, and multiple values/documents can be grouped under a field. Document databases do not enforce data consistency across documents, so those rules need to be managed by the application which is using the database. This allows document databases to continue operating even when partitioned, at the cost of some consistency.

In the past two issues, I laid out how relational databases primarily focus on the relations between tables, while document databases primarily focus on organising data into documents. I’ll look at one more application today.

If I’m trying to start a new social media platform today, I would have to store posts and user account data into a database. Which type of database should I use?

I could use a relational database, but joining multiple tables to get a chain of posts, Twitter-style, could get ugly and involve lots of lookups … that is going to be one laggy service at scale!

I could use a document database, but it would involve retrieving each post one at a time, searching to find posts which are linked to it, and then checking which posts are linked to those posts … that is too many searches!

Maybe I’m approaching this wrong. I don’t need to relate many different types of tables or retrieve self-contained documents here. I am actually trying to store a humongous, densely linked network of data—a graph!

What?

Okay, stay with me here, I know you are thinking of a horizontal and a vertical axis, and axis labels and bars and lines and—that’s not the kind of graph I am talking about.

“In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects.”
— Graph theory (Wikipedia)

That’s what I’m talking about. And it looks like this:

This network graph shows the co-editing patterns on Wikipedia. The size of the arrows indicate the number of Wikipedia editors for one language edition of Wikipedia, who also edited another language edition.
Source: Wikimedia Commons

Okay, phew.

Graph databases: a network of relationships

So if I’m going to make a social media platform that can retrieve chains of posts, how would a graph database make it easier?

A graph database will still need to have some data for the users and posts:

personA:User {name:"Alice"}
personB:User {name:"Bob"}
...
post001:Post {tags:"...", contents:"..."}
post002:Post {tags:"...", contents:"..."}
...

But the heart of the graph database is the data that stores the relationships between those users and posts:

(personA)-[:SAYS_TO {message:"..."}]->(personB)
(personB)-[:SAYS_TO {message:"..."}]->(personA)
...

If I want to lookup a conversation between Alice and Bob, I can search for SAYS_TO relationships with Alice and Bob at either end of the relationship arrow (-->), and sort the results in chronological order.

Graph databases put relationships first

What about posts and comments? For social media, we can treat them as the same type of data (Post), but link them with relationships:

(personA)-[:WROTE {datetime:"..."}]->(post001)
(personB)-[:WROTE {datetime:"..."}]->(post003)
(personC)-[:WROTE {datetime:"..."}]->(post005)
(personD)-[:WROTE {datetime:"..."}]->(post007)
(personA)-[:WROTE {datetime:"..."}]->(post011)
(personB)-[:WROTE {datetime:"..."}]->(post013)
(personA)-[:WROTE {datetime:"..."}]->(post017)
...
(post003)-[:REPLY_TO {datetime:"..."}]->(post001)
(post005)-[:REPLY_TO {datetime:"..."}]->(post003)
(post007)-[:REPLY_TO {datetime:"..."}]->(post003)
(post011)-[:REPLY_TO {datetime:"..."}]->(post005)
(post013)-[:REPLY_TO {datetime:"..."}]->(post011)
(post017)-[:REPLY_TO {datetime:"..."}]->(post013)
...

Because the relationships contain only the bare minimum data for figuring out the network, they are quick to search through. I don’t have to load the names, post tags, post contents, and other irrelevant detail.

Although I would still have to retrieve post001, check for replies, check those replies for replies, and so on, this is much faster with relationships between labels. A graph database optimises for this type of lookup.

Once I have figured out which users and posts are involved in this chain, I can then retrieve their information in a subsequent query. I won’t even need to load all the information at a go, since the app user is not going to see the contents of later posts until they scroll.

Issue summary: Graph databases treat the details of things as secondary, and optimise for managing the network of relationships. A graph database can quickly look up how things are related to each other, and return the results.

So there you go, three types of databases in three weeks. I picked these three because they’re the least technical to give an overview of (in my opinion), and are three different ways of thinking about data that I think you are likely to encounter.

There are, of course, other types of databases: key-value stores (used heavily in web browsers), wide column databases, search databases (very similar to document-based), … but beyond this point the differences are primarily technical, and not really suitable for this newsletter.

What I’ll be covering next

Next issue: [LMG S7] Issue 90: Using a database

I’ve been cracking my head trying to come up with 2 more topics to round up this season on databases. I suppose most layfolks would (hopefully) never ever have to start or run their own database. But it could be helpful to know what is needed to get a database up and running, and the most common ways of getting access to one. Expect a short issue next week.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 88: Document Databases

2020-09-19T08:00:00+08:00

Previously: Relational databases are designed to maintain a well-structured set of data tables through constraint rules. This makes them very useful for preventing accidental inconsistencies in data, but make any changes to the data schema difficult to implement. Changing from one schema to another involves downtime and a migration.

One problem I keep running into with Excel is when I think the data has a consistent structure, but halfway through I realise that it actually doesn’t: sometimes I might have two students with different categories of accomplishments, and that requires a big change in the way I design the columns.

Document databases bypass this problem by not enforcing a strict schema on the data. That is not to say you can’t; it is optional and up to you to enforce.

Document databases: a collection of fields and values

When we think of documents, we usually think of Office documents, or PDFs, or things that are … more associated with the way a workplace works.

These documents are not the ones I have in mind when talking about document databases. In these databases, documents are simply bits of data grouped together. Each bit of data is described by a field. For example, I might start out defining a student document this way:

{
  name: "Harry Potter",
  school: "Hogwarts School of Witchcraft and Wizardry",
  characteristics: "lightning-shaped scar on forehead"
}

I can add more fields later, if I wish:

{
  name: "Harry Potter",
  school: "Hogwarts School of Witchcraft and Wizardry",
  characteristics: "lightning-shaped scar on forehead"
  mother: Lily Potter,
  father: James Potter,
  ...
}

But what makes document databases truly document-oriented is the way they can be nested. Suppose I want to expand a bit more on this student’s education, to include the years of study. I could expand each entry in the school field to include that:

{
  name: "Harry Potter",
  school: {
    name: "Hogwarts School of Witchcraft and Wizardry",
    start: "1991",
    end: "1997"
  }
  characteristics: "lightning-shaped scar on forehead"
}

Yup, now I’ve just expanded the value of the school field into … another document! This document has a name field, a start field, and an end field. I can embed documents just about any place I want.

I can also group multiple values under a field:

{
  ...
  characteristics: ["wears glasses", "lightning-shaped scar on forehead"]
}

I can also group multiple documents under a field. It’s documents all the way down!

Collections: the only way to organise documents

While relational databases have tables for organising rows, document databases have collections for organising documents.

Each collection can contain multiple documents. There is no constraint on what kind of documents each collection can contain.

I could have a collection for teachers containing only teacher documents, a collection for students containing only student documents, a collection for subjects containing only subject documents, … or I could just have a collection for the department containing a mix of all three types of documents.

What can I do with a document database?

Just about … anything? If you can think of a way to organise the data as documents, you can put it into a document database.

A document database lets you find documents based on its fields. I can look up all documents which have a name field, or check that the word “Harry” is in the name field. I could look for students who enrolled in the year "1991" or later, or more specifically students who enrolled in "Hogwarts School of Witchcraft and Wizardry" in "1991" or later.

Drawbacks

Since this is not a relational database, you don’t have the protection of foreign keys and other features that stop you from making the data inconsistent—there’s no concept of enforced consistency here! You’ll have to write those rules into your app when it accesses the document database; the database won’t enforce them for you.

Advantages

Data organised as documents tends to be more self-contained. Since the database does not enforce consistency, it has less to worry about when edits or changes are made to the database. In a distributed document database, we thus sacrifice some consistency—unless we make pains to ensure it in our application code.

This does provide an advantage: when the distributed document database suffers a network outage, causing it to partition into multiple clusters (Issue 86)), the database can continue to operate. However, each cluster only has access to its own data, and not data on the other clusters. Over time, each cluster will become less and less consistent, since changes in each cluster are not synchronised to other clusters.

Once the network issue is resolved and the clusters are synchronised again, these changes can subsequently be merged following rules for resolving conflicts. The database remains operational throughout the ordeal, just with some desynchronisation.

Issue summary: Document databases organise data into documents, each containing a number of field-value pairs. Each value can itself be a document, and multiple values/documents can be grouped under a field. Document databases do not enforce data consistency across documents, so those rules need to be managed by the application which is using the database. This allows document databases to continue operating even when partitioned, at the cost of some consistency.

What I’ll be covering next

Next issue: [LMG S7] Issue 89: Graph Databases

Okay, relational and document databases were easy enough. They are more easily mapped to spreadsheets and file/folder hierarchies, respectively.

But now we go up the abstraction ladder, and get to more abstract ideas of data. In a social network, the user profile is usually the least significant part of the account; what often matters most is how this account is linked to other accounts (followers and following). The study of such interlinked objects is known in mathematics as graph theory (nope, not the kind of graphs we are so used to in reports). This is where terms like “social graph”, the representation of your social network on Facebook or Twitter, comes from.

What is the most intuitive way to represent, store, and modify this kind of graph data? Using a graph database, of course.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 87: Relational Databases

2020-09-12T08:00:00+08:00

Previously: To increase the performance of a distributed database, we can scale up/scale vertically by increasing the computers’ performance, or scale out/scale horizontally by adding more computers. Distributed databases can only prioritise two of the following three factors: consistency, availability, partitioning (CAP theorem).

I’ve already discussed one big strength of relational databases in Issue 84) when I illustrated how the JOIN keyword, one of many SQL commands (Issue 83)), can join our data from multiple tables into a single view. This is where we look under the surface to see what makes that possible.

Linking tables through foreign keys

From Issue 84):

To join the Customer and Sales data so that we get the sales data along with custName, we would write a SQL query like this:

SELECT salesID, orderDate, custID FROM Sales JOIN Customer ON Sales.custID = Customer.custID

Here, Sales.custID refers to the custID of the Sales table, while Customer.custID refers to the custID of the Customer table. This query effectively says “select the salesID, orderDate, and custID columns from Sales table, and add data from the Customer table where the custID column matches”. This will return:

Did you catch the fact that there were actually two custID columns? One in the Sales table, and one in the Customer table … by linking two tables like that, we actually introduce a point of potential breakage.

Suppose one day, a customer goes out of business, or changes name, and the corresponding Customer entry gets deleted. Now if we accidentally attempt to retrieve Sales to that customer, the SQL command will fail because it is unable to find the entry.

We can protect ourselves from this kind of error by declaring Sales.custID as a foreign key in Customer, thus informing the database that Sales.custID is actually a column from Customer. If we attempt to delete that customer again, the database will help to check if that entry is referenced by other tables as a foreign key. Entries can only be deleted if they are not referenced by other entries.

These and other constraints allow us to protect ourselves from inadvertent harm, but over time, they accumulate and make a relational database very hard to modify. Database administrators will tell you to think about your database tables in advance, as even attempting to add a column or change a column type is going to be a pain in future!

The tradeoff: downtime for database maintenance and migrations

To modify a relational database, we have to shut it down¹, and migrate the database from the old schema to the new schema. In essence, we are exporting our data and re-importing it again. Attempting to migrate while the database is active—known as a live migration—is strongly discouraged, as changing a database while a migration is in progress can introduce data inconsistency; a real headache with constraints!

Relational databases can also develop problems that require them to be shut down and rectified. It’s the tradeoff for having a consistent and structured way to store our data, and automated rules to enforce this structure.

Relational databases: excellent for predictable data needs

If you don’t expect to be changing your database schema often, or if you are able to design the schema to minimise such migrations, relational databases can be quite excellent for your needs. Please consult a professional database engineer if you are planning to use a database for your business needs.

Issue summary: Relational databases are designed to maintain a well-structured set of data tables through constraint rules. This makes them very useful for preventing accidental inconsistencies in data, but make any changes to the data schema difficult to implement. Changing from one schema to another involves downtime and a migration.

What I’ll be covering next

Next issue: [LMG S7] Issue 88: Document Databases

Relational databases work well for data that we can imagine as an Excel table. But often, we have data that might not share the same set of properties, or might not have a predictable structure (such as online collaboration data). Such data is more intuitively imagined as a set of documents than as a set of tables. What do databases that encourage a document-based model of data look like?

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

There are ways to avoid this, but I’ll let a real database administrator tell you about how to make it happen. ↩

Issue 86: Distributed databases

2020-09-05T08:00:00+08:00

Previously: Forms that naïvely inject user-submitted data into a SQL query template may end up sending valid SQL commands to the database, with disastrous consequences.

So far, we have been assuming that the database runs from a single computer, and all its data is stored on one as well. What happens when it outgrows this single computer?

We could add more disk space, more memory, more cores on the processor; this is called vertical scaling/scaling up (because we are increasing the performance of the computer, which usually feels like pushing up the performance bar on the vertical axis of a graph).

Or we could spread that database over two or more computers. And keep them constantly synchronised. This is called horizontal scaling/scaling out (because we are adding more computers, which is usually depicted as adding more units on a horizontal axis).

We can only take vertical scaling so far; at some point we will have the most powerful server possible and it still won’t be enough. So if we are expecting massive growth, that means we will need a distributed database.

Wait, who actually expects a database to not have to store a lot of information?

There are tiny databases out there!

These are often used in places where the task is not expected to grow beyond a single PC. For example, the database that stores your WhatsApp messages on your mobile phone, or a tiny database that stores records from a remote standalone sensor. These databases are designed to be extremely efficient at handling small amounts of data, to use very little memory, and/or to ensure that data is always written securely.

Okay, fine. Back to distributed databases

Buying more computers to run a server is similar to hiring more employees to do the company’s work. The good: you now have more help. The bad: you now have to talk to them! Regularly!

In distributed databases, there are three factors that are impossible to achieve together in full:

Consistency — reading the same data multiple times should not give us different results
Availability — we should get a response from the database quickly
Partition tolerance — If network disruptions or software/hardware failures break communication, our cluster of servers break up into smaller clusters—they get partitioned. Computers in each subcluster can communicate with each other, but not with computers outside the subcluster. Under such conditions, the system should still continue to operate.

This is known as the CAP theorem: you can only really prioritise two out of the three factors.

Consistency and Availability

The database we have been examining so far in Season 7 are known as relational databases, which handle data in the form of tables. When implemented as a distributed database, they often prioritise consistency and availability.

How does that work? When our distributed database is being hit with 100,000s of requests per second, more than one computer can handle, we need multiple computers to serve these requests. These computers had better be synchronised (to achieve consistency) so that the request will always return the same response from any of those computers.

One way to achieve this is to have a Single Source of Truth: perhaps we design it so that only one “leader” computer handles edits/changes to the database, which then get sent to all the other “follower” computers. (This assumption that reading data occurs much more frequently than writing/changing data holds up for most use cases.) What happens if the “leader” computer goes down, and our distributed database goes from a leader-follower system to a partitioned bunch of followers? No writes can happen, the system is no longer operational.

(There are multiple theorems on how to design this system to automatically/manually select a new leader, but I won’t go into that here. The fundamental problem of ensuring consistency and availability in such cases remains.)

When a partition happens

So it comes down to this: when communication failure happens in a scenario like the above, we have to choose.

If we need a workaround to ensure that updates on one computer still reaches all the computers so that the data is consistent, that is going to be slow — we lose availability.

If we want to achieve availability, we could have each computer just return or update the data it has, then worry about synchronisation later — we lose consistency.

If you find yourself in the position of having to choose a distributed database, it would be immensely helpful to know upfront which 2 factors you want to prioritise!

Examples

Consistency and Availability: Bank databases fall in this category. Financial transactions must be accurate, and people need to quickly know whether they were successful. So we have to live with these databases requiring regular maintenance (usually late at night) to minimise the risk of partitioning failure.

Partitioning and Consistency: Authentication systems are relied upon to ensure that data is only accessed by people who are authorised to do so, and cannot afford to go down for long periods of time. This requires that permissions be properly synchronised across all computers, so consistency is key. These two factors are more important than ensuring a speedy response.

Partitioning and Availability: Essential services, such as Google Maps, have to remain operational even with (recoverable) failures, and still have to respond in a reasonable amount of time (otherwise real-time navigation would fail). Roads do not change often, so it is okay if the info we are getting is slightly out of date; we might occasionally get a slower route or find ourselves at a business whose operating hours are not updated in Google Maps, but these are not critical failures.

The CAP theorem does not say we can never have the third factor! It means we have to pick 2 factors to prioritise, and live with the lowered performance of the third.

Issue summary: To increase the performance of a distributed database, we can scale up/scale vertically by increasing the computers’ performance, or scale out/scale horizontally by adding more computers. Distributed databases can only prioritise two of the following three factors: consistency, availability, partitioning (CAP theorem).

This actually ran longer than I expected; the examples were an unplanned addition that I think helps to clarify use cases for each combination.

What I’ll be covering next

Next issue: [LMG S7] Issue 87: Relational Databases

I’ll spend the next 3 issues talking about 3 major types of databases in use today. This isn’t strictly layman content, but I suspect in some non-technical conversations these terms may pop up. More importantly, I think the 3 major types cover 3 different concepts of data, and I hope that elaborating on these in a little bit more detail will help to develop a more nuanced way of thinking about data.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 85: SQL Injections

2020-08-29T08:00:00+08:00

Previously: SQL queries let you join multiple tables based on specified conditions using the JOIN keyword. This enables crafting complex queries to return only the specific data that is required.

SQL databases are really powerful; this is usually a good thing since it allows developers to do amazing things with the data inside. But it can also lead to disastrous consequences in the unsupervised hands of inexperienced developers. And matters can be even worse if these powers are not carefully granted. A malicious actor could “borrow” these powers to wreak havoc on the database!

Relevant xkcd comic

Adding data to an SQL database

Adding data to an SQL database is easy. If our Customer table looks like this (from Issue 84)):

The relevant SQL query to add another customer is:

INSERT INTO Customer
VALUES (Ernest, ernest@lmn.com, 57564986)

What could go wrong?

Interacting with an SQL database

The most direct way of managing and interacting with a database is through its commandline tool. Needless to say, this is not how you would want your users using it. It’s just a terrible user experience, and gives them waaaay too much power.

So we usually design a frontend—an app, webpage, or database form—that formats and lays out the data nicely for them, and limits the things they can do to the data. This frontend will usually only allow users to edit or delete existing data, and add new data. Then it constructs an SQL query to be sent to the database. The code to do this might look like the following:

custName = request.form['custName']
custEmail = request.form['custEmail']
custContact = request.form['custContact']
sql.execute(f'INSERT INTO Customer VALUES ({custName}, {custEmail}, {custContact})')

This code naïvely inserts data from the submitted form into the database without any checks. That’s not smart; the contact number might have the wrong number of digits, the email might not even have an ‘@’, and people often type the wrong things in the wrong fields.

What else could go wrong?

SQL Injections: sending SQL commands through an unsecured form

A malicious/clever user might attempt to submit the following form data:

Customer Name: Ernest Customer Email: ernest@lmn.com Customer Contact: 10); DROP TABLE Customers—

Why would they do that? When inserted into the template above, the full SQL query becomes:

INSERT INTO Customer VALUES (Ernest, ernest@lmn.com, 10); DROP TABLE Customers--)

Two things to explain: - the semicolon (;) indicates the end of an SQL query. It is used to write two or more queries in one line. - The database ignores everything after the --. It is a useful way to add comments to SQL queries (for human consumption) … or to make the database ignore invalid syntax (such as the standalone )), which is what happens in this case.

So the database ends up executing this:

INSERT INTO Customer VALUES (Ernest, ernest@lmn.com, 10);
DROP TABLE Customers

Goodbye, Customer table …

Data leakage through SQL injections

This app is probably going to have some kind of search or filtering feature, where we enter a name to search for and get results that match. If we were searching for a user named George, an inexperienced developer might send this as the SQL query:

SELECT * FROM Customer WHERE custName = George

If I submit the following in the search box:

Customer Name: George OR 1=1

It might get naïvely substituted to form the following query:

SELECT * FROM Customer WHERE custName = George OR 1=1

The database will attempt to parse this, and come across custName = George OR 1=1. It gets interpreted as “return all results from Customer table where the custName column matches the result of George OR 1=1”.

It will then attempt to evaluate George OR 1=1. By the unintuitive reasoning of computer logic, this always evaluates to True, and results in the database returning … all the rows in Customer.

Conclusion

If you’re going to use a database with a frontend, get an experienced developer to do it. If all you have are inexperienced developers, send them for the appropriate training. If you don’t have developers, use an established product over an untested one. If in doubt, find someone with the relevant credentials to ask for advice.

Issue summary: Forms that naïvely inject user-submitted data into a SQL query template may end up sending valid (but otherwise unathorised) SQL commands to the database, with disastrous consequences.

This would have been 3–5 times as long if I had started going into some basic ways to prevent this kind of mistake. Fortunately, this is just a layman’s guide, and I can foist that responsibility off to the rest of the internet.

On a serious note, database security is a whole field of study. If you are using a database for enterprise purposes, please give database security the resources it needs; there are just so many ways that things can go wrong!

What I’ll be covering next

Next issue: [LMG S7] Issue 86: Distributed databases

So far, we have been assuming that the database runs from a single computer, and all its data is stored on one as well. What happens when it outgrows this single computer? Why, it then gets transmitted and infects another computer … just kidding, we then have to spread that database over two or more computers. And keep them constantly synchronised. If that sounds like a pain, you are exactly right! More on this next issue.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

Issue 84: JOIN – supercharged VLOOKUP

2020-08-22T08:00:00+08:00

Previously: Structured Query Language (SQL) is a computer language for managing data in databases. It has keywords and keyphrases that let you filter rows and columns, group and order data, perform basic arithmetic on data, and more. It is complex and powerful, but astute and efficient use requires specialised training.

VLOOKUP: The bread-and-butter of spreadsheets

If I have a Customer data table that looks like this:

And a Sales data table that looks like this:

I could add a custName column to the sales table that looks up the custID, and inserts the custName info from the same row. This feature of spreadsheets is known as VLOOKUP (vertical lookup)¹. This is what the formula for each cell in custName would look like:

Let’s break down each part of that formula:

=VLOOKUP(C2,Customer!A:D,2)

This means “in columns A:D of the Customer table, look for the value from cell C2 (which is 1) in the first column of the Customer table, and return the value from the same-row cell in the 2nd column of the Customer table.”

What if you needed to insert more than one column? What if you need to “join” two or more tables? Your spreadsheet would soon be filled with VLOOKUP cells, and this really slows down the performance of the spreadsheet. This method is not suitable for data involving millions of rows, for sure.

SQL JOIN: VLOOKUP on steroids

In a database, there is no “standard view” of the data. All data you want to see has to be retrieved with a query. So it makes no sense to require cells filled with VLOOKUPS; we just need to figure out how to do the equivalent in a query. The keyword for that is called JOIN.

To join the Customer and Sales data so that we get the sales data along with custName, we would write a SQL query like this:

SELECT salesID, orderDate, custID FROM Sales
JOIN Customer ON Sales.custID = Customer.custID

Here, Sales.custID refers to the custID of the Sales table, while Customer.custID refers to the custID of the Customer table. This query effectively says “select the salesID, orderDate, and custID columns from Sales table, and add data from the Customer table where the custID column matches”. This will return:

That is much easier—once you’ve been trained in SQL syntax—than writing separate VLOOKUP formulas for each column you want, and having to maintain a whole table of formulas!

You can even join more than two tables together with a query like:

SELECT salesID, orderDate, custID, invoiceID, Customer.custName, Customer.custContact, invoiceDate, invoiceAmt FROM Sales
JOIN Customer ON Sales.custID = Customer.custID
JOIN Invoice ON Sales.invoiceID = Invoice.invoiceID

This is barely scratching the surface of what SQL can do; it has at least 4 types of JOINs, and many more ways of crafting queries to return specifically the data you want.

SQL queries are a whole different way of talking to your computer, and they can be really frustrating to write for people who are new to it. But they are behind many of the interfaces you see, which seem to seamlessly pull data from multiple sources together into a coherent view.

Issue summary: SQL queries let you join multiple tables based on specified conditions using the JOIN keyword. This enables crafting complex queries to return only the specific data that is required.

What I’ll be covering next

Next issue: [LMG S7] Issue 85: SQL injections

Databases are immensely powerful software systems when it comes to searching for information. One recurring challenge that all admins face is ensuring that only authorised use is permitted; how do we prevent malicious activity from being able to access the database?

Next week, I will introduce a common vulnerability that web developers always have to guard against: SQL injection.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]

There is an equivalent feature for columns known as HLOOKUP (horizontal lookup) that looks up info in a row and inserts data from the same column, but it is not as popular. So the VLOOKUP name is more commonly used for this kind of operation. ↩

Issue 83: Structured Query Language

2020-08-08T08:00:00+08:00

Previously: A database system follows rules that enable multiple users to send commands to the database at the same time. The system attempts to execute each action one at a time, locking data that is in use by other users, and ensuring that each user does not carry out actions that they are not permitted to. Such systems are better able to prevent data corruption compared to a text-based system.

Have you experienced the pain of having really huge tables in your spreadsheet, sometimes spanning more than a hundred columns? Then you might know how painful it can be trying to filter data from it, e.g. if your boss just wants a few columns of info from certain rows. Like if he asks for the performance numbers of employees who are up for promotion.

In a spreadsheet, you would have to apply filters for nextPromoYear to only show the appropriate rows, then you’ll have to hide all the other irrelevant columns. Or you’d just copy all more-than-a-hundred columns for those rows into another new spreadsheet, and manually delete the unnecessary columns.

Database designers don’t want to to do that. You should be able to ask the database to do this querying and filtering for you, and return you only the data you want. But how would that be designed?

Structured Query Language: the universal database language

Structured Query Language (SQL) is another computer language designed to manage data in databases. It reads almost like English, but more logical and less poetic. It has its own syntax and grammar, which are not the same as in English. And sending a proper SQL query to any database that supports it will get you what you want.

Here’s what an SQL query for the above info might look like:

SELECT employeeName, teamName, salesCount, salesTotal FROM SalesData
WHERE nextPromoYear = 2020
GROUP BY teamName
ORDER BY salesTotal;

The SELECT keyword lets you filter only the columns you want FROM a table
The WHERE keyword lets you filter only the rows you want, based on one or more criteria
The GROUP BY keyphrase lets you group the returned data based on values in a column
The ORDER BY keyphrase lets you sort the returned results according to values in a column

A database has no “main view”

One difficulty many people have in “upgrading” from a spreadsheet mindset to a database mindset is that they expect to have a “main spreadsheet” where (almost) all the data lives, and where sub-spreadsheets pull data from. In a database, all data lives in separate tables, and are joined only when a query is executed. The only way to get data from a database is to use queries!

Most websites or software you are using which retrieves data for you usually end up executing one or more queries such as the above to get that data. And the job of the database software is to interpret such commands, pull the data from the various tables together, collate it correctly, and send it to you.

A database can give you almost exactly what you want

By using these and many other keywords and keyphrases, it is possible to put together a query that gives you only the data you want. SQL has arithmetic functions such as count, average, sum, and it can even return only unique values.

The tradeoff is that you have to learn another language, and use it regularly enough to understand the ins and outs. This is why every big corporation has a data team that can do this!

Issue summary: Structured Query Language (SQL) is a computer language for managing data in databases. It has keywords and keyphrases that let you filter rows and columns, group and order data, perform basic arithmetic on data, and more. It is complex and powerful, but using it in an astute and efficient manner requires specialised training.

What I’ll be covering next

Next issue: [LMG S7] Issue 84: JOIN – supercharged VLOOKUP

I haven’t even touched on SQL’s really powerful features yet. Filtering data from a table is fine, but if my data is spread across many tables, how do I pull that data together? Excel folks have a command they rely on heavily to do this, and it is called VLOOKUP. I’ll show you the SQL version next issue.

Sometime in the future: What is:

booting up? [Issue 15]
XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
firmware? [Issue 34]
OpenType? And what are fonts anyway? [Issue 42]
What is involved in installing a piece of software? [Issue 48]
How do apps know where a file starts and ends? [Issue 49]
What is a password hash? [Issue 63]