Layman's Guide to Computing - Season 11

Issue 143: Implications (Part 2) – Future Goals

2021-10-23T08:00:00+08:00

Previously: Using the same hardware for both smartphones and laptops would make it much easier to write apps for both platforms. The closer they are in features, hardware, and software support, the easier things will be for developers.

So, let’s get some Likely-Asked-Questions (LAQs) out of the way in this last issue.

If developing for a single chip makes things much easier for developers, can we just decide to develop everything for the M1 chip and somehow force Apple to share the chip design?

You know as well as I do that the answer is no. Besides, Apple doesn’t care about the hardware needs of devices other than its own. You want other features they don’t care about? Too bad.

So only Apple users get to enjoy unified memory?

Yes, for now. We’ll need to wait a few years for other chipmakers to figure out their own hardware implementations.

Who’s likely to achieve it first?

This’ll have to be a company that designs its own CPUs and GPUs.

The incentives don’t align for Intel. Their main business has never relied on capable graphics, and they are much more concerned with saving the server market from ARM’s and AMD’s encroachment.

What about AMD? After all, they were one of the earliest companies to push for a similar idea: Heterogeneous System Architecture. And they achieved it to a lesser extent, with their Accelerated Processing Units (see Issue 142); a fancy term for CPU+GPU).

AMD has made this possible in software; that means as a programmer, you can command the CPU to store data at a memory address, pass that address to the GPU, and then get the GPU to retrieve data from that address. But in practice, benchmarks show that passing data this way falls short of the actual throughput that would be expected; possibly the hardware support is just not there yet, and not easy enough to use.

But the incentives line up quite well for AMD. If they achieve it, the performance of their APUs, their mid-range product, will see a significant boost. But they will need significant influence with developers to develop software development kits (SDKs) that developers can use to take advantage of unified memory, and that’s a big investment of resources.

Nvidia is putting a lot of effort into catching up on the CPU side of things, and they have been pushing lots of ARM chip designs to complement their strength in graphics cards. They have also recently bought ARM, so they also seem like a strong contender to implement unified memory. My gut sense is that it is not high on their priority list, as their primary business is still parallel computing and related applications, such as machine learning and scientific computing.

The work for this will have to be ongoing, of course, and likely started since 2015 or so; starting in 2021 is way too late!

What does this mean for Apple?

They are now almost fully in control of their own hardware and software. The main limitations where their control does not reach is their cloud computing (where iCloud happens), and the manufacturing (likely still TSMC in the near future). Their concerns now will be much more international than before.

Should we expect to see unified memory on non-Apple chips?

Yes, definitely, it’s something the industry has been working towards, just way too slowly … and hopefully the M1’s existence will put some pressure on those development timelines.

I suspect the main cause of inertia is all the legacy software that still has to be supported. Because Intel and AMD have a lot of business riding on keeping compatibility with past hardware, they can’t make sweeping changes across their entire range of products, unlike Apple. Every change that is made to an existing line of chips has to still keep it working when customers run their existing software.

How does this affect consumers?

Probably not much effect, beyond the gradual speed gains from generation to generation that we are already seeing.

The more significant effect is, I think, the miniaturisation of mobile systems. Already the mainboard for a laptop like the Macbook has shrunk to a narrow rectangle; most of the space for devices is now taken up by energy storage (i.e. batteries). The limiting factor now seems to be energy density: how many grams of batteries we will need per hour of laptop use. I suspect this is going to keep laptops more or less at the same size; the laptop is a mature form factor at this point and will gradually age.

What’s more exciting is when unified memory architectures can be miniaturised sufficiently for wearables. We are going to need that if we want augmented reality (AR) systems, e.g. graphics projected directly on a lens in front of our eyes, in a compact form factor. Many virtual reality (VR) and AR systems currently come in bulky designs that sit heavily on the body; there is much room for improvement here.

Wrapping up

This somehow ended up as a crash course in CPUs and GPUs, all in one season. I didn’t mean to carry out an industry analysis here, and this is definitely not a forecast to be relied on! It’s just a very interesting story to follow and I can’t help but think about what’s happening on multiple levels.

What I’ll be covering next

Next issue: [LMG S12] Issue 144: Programs-in-a-vat

How does a program on the computer know if it is in a simulation?

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 142: Implications (Part 1) - Software

2021-10-16T08:00:00+08:00

Previously: The Apple A14 and Apple M1 are essentially the same chip architecture: they use almost the same building blocks, just with different numbers of them. On top of that, the Apple M1 implements unified memory, allowing the CPU and GPU (and other SoC components) to share the same system memory, greatly facilitating intra-chip communication.

So, before 2020: smartphones are smartphones, laptops are laptops. They use different types of CPUs with different architectures (Issue 141)) and even different instruction sets (Issue 53)). Never the twain shall meet.

After 2020: It turns out that smartphone chips can be upgraded and used in laptops, while remaining essentially the same architecture? Its power consumption dial can be turned down to almost zero but also turned all the way up?

That opens up the possibility that smartphones and laptops can run on the same hardware, and there’s nothing technically stopping apps compiled (Issue 54)) for that instruction set to run on both!¹

Hmm, where has something like this happened before?

The big console alignment

Sometime in mid-2013, Microsoft announced the Xbox One (henceforth XB1), the successor to the Xbox 360. The 360 ran on a PowerPC CPU made by IBM—different from smartphone chips that used the ARM instruction set, and also different from laptops that use the x86 instruction set.²

The Xbox One, on the other hand, uses a CPU+GPU made by AMD³, following the x86 instruction set.

The Xbox One essentially uses a custom laptop chip!

This was interesting news because earlier that year, in Feb 2013, Sony had announced the PlayStation 4 (PS4), which was … also running on an AMD CPU+GPU! The previous iteration, the PlayStation 3 (PS3), was running on an interesting custom architecture that used PowerPC cores and a completely original GPU.

At this point it would be oh-so-tempting, for a tech nerd, to descend into point-by-point comparisons of the hardware specifications of both consoles. We will fortunately not be doing that.

What’s more important is what this meant for the video games.

Alignment in game development

If you wanted to write a game for the Xbox 360, you had to learn its API: which functions to call to make it do what you want, how to store data into its storage, and so on. It’s a lot of time and effort to look at your options and figure out the limitations, and how to work around them to achieve what you want in your game.

And if you wanted to make the same game for the PS3, you now had to learn a completely different API, running on hardware with completely different limitations, and figuring out completely different approaches to achieve the same end. While the game might feel the same, the time and effort is almost as much as what it would take for a new game!

The XB1 and PS4, on the other hand, are much more similar. They both use AMD CPU+GPUs with similar architecture. While Microsoft and Sony may add their own features on top of the chips and the software, the API is ultimately guided by hardware decisions. If you made a game for XB1 and wanted to port it to PS4, the effort of learning a new API is greatly lessened.

The gulf between smartphones and laptops

Back to smartphones vs laptops. Running on two different types of chips, using different architectures and instruction sets.

We have seen some forays from one into the other: Intel’s doomed Medfield chip was an attempt to bring the x86 architecture to smartphones, while Google has been trying to get ARM chips into Chromebooks, with limited but increasing success.

But now that we have an iPhone 12 using the A14 chip, a Macbook using the M1 chip, and we know that the A14 and M1 are essentially the same architecture and the same instruction set … it does suggest that the challenges of making software for both devices now primarily exist on the software side. The gulf of hardware incompatibility has been closed.

Feature alignment

The M1 chip is capable of power standby (i.e. screen off with the CPU in a low-power state), in a way that most laptop chips aren’t. This is a key feature for smartphone software and operating systems, and the M1 paves the way for laptop chip-makers to introduce this feature into their processors as well.

And the M1, being living proof that unified memory is possible, would also likely push existing companies to speed up development towards that goal.

Issue summary: Using the same hardware for both smartphones and laptops would make it much easier to write apps for both platforms. The closer they are in features, hardware, and software support, the easier things will be for developers.

I was looking for a way to sneak in the XB1-and-PS4 story, and I think I found just the right place for it. It really does excite me to think that one day a developer could write software for a smartphone, and it would work on laptops with minimal modification, and vice-versa. And perhaps a decade from now, we’d be scratching our heads why we even had to choose between the two!

What I’ll be covering next

Next issue: [LMG S10] Issue 143: Implications (Part 2) – Future Goals

So what’s next? Is unified memory the holy grail for hardware, and is there any further room for improvement? I’ll share some thoughts in the next issue.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Nothing, that is, besides all the software workarounds that will need to be written … ↩
In an interesting narrative twist, PowerPC was the architecture that Macbooks used before Apple switched them to Intel processors. And now Xbox did the same thing. ↩
AMD calls it an Accelerated Processing Unit (APU). Doesn’t matter for us. ↩

Issue 141: The Apple A14 and M1

2021-10-09T08:00:00+08:00

Previously: Shared memory is easier to implement when a company has control over the designs of both CPU and GPU.

So, to recap:

Most companies design either CPUs or GPUs, but seldom are well-positioned¹ to be excellent in both.

Among the companies that design both CPUs and GPUs, almost none of them² make CPUs for both mobile (smartphones + tablets) as well as laptops (including low- to mid-range desktops).³

Which leaves Apple in the (current) position of being the only chip company with a design for both mobile as well as laptop.

The difficulties of power scaling

It’s not that other companies have not attempted this.

They have tried to scale down laptop chips to achieve smartphone-like power consumption, but found that laptop chips can’t power down the way smartphone chips can, and require more circuitry to achieve that.

They have also tried to scale smartphone chips up to achieve laptop-like computational capacity, but found that simply pushing more electrical power doesn’t help all that much. Beyond a certain frequency limit, you simply need more bandwidth and more units, and designing a chip that easily accommodates more units like this just requires a very different design.

It seems that designing a chip that can go from 4W all the way to 65W (and possibly higher) requires intentional engineering, not simply modifying an existing smartphone/laptop chip design or bolting on/removing features.

The Apple A14 vs the Apple M1: similarities

Apple has managed to do just this with the Apple A14 and M1. They are, at heart, the same chip design! (In processor parlance, we say they have the same chip architecture.)

Let’s see:

The Apple A14’s key hardware.
Source: Apparently an online Arabic image gallery site (I have no idea why this picture is so hard to find!)

The Apple M1’s key hardware.
Source: TechBuzzPro

We can also compare these features via Wikipedia:

Apple A14

6-core CPU (4 low-power⁴ cores “Icestorm”, 2 high-performance “Firestorm” cores)
4-core GPU
8-core NPU
4GB memory (iPhone 12) / 6GB memory (iPhone 12 Pro)

Apple M1

8-core CPU (4 low-power⁴ cores “Icestorm”, 4 high-performance “Firestorm” cores)
8-core GPU
16-core NPU
8GB memory / 16GB memory

Notice that at heart, they are using the same building blocks: ~~low-power~~ high-efficiency cores, high-performance cores, GPU cores, and NPU cores (I suspect these are GPU-like cores but optimised for machine learning, i.e. they probably power Siri and other parts of the OS which lean on AI features); the A14 and M1 just has different numbers of them.

So one really amazing thing about the Apple M1 is that it is actually a boosted Apple A14: almost double the hardware!

It’s like when Magnemites join together and evolve into a Magneton …

The Apple A14 vs the Apple M1: differences

What’s different between the A14 and M1, besides the number of key chips? Apple isn’t forthcoming with the details, but we can guess about minor details like the image processor (for camera imaging), storage controller (the M1 can use high-power solid-state disks (SSDs) which the A14 can’t).

The major difference announced between the A14 and M1 launch is that the M1 has unified memory.

Unified memory vs CPU–GPU transfers

Back in Issue 139), I mentioned that unified memory needs really high bandwidth to support access by the SoC components.

Today, laptop processors use an interface called PCIe to connect CPUs to GPUs. PCIe has a bandwidth of up to 16 GB/s⁵.

The M1’s unified memory has a bandwidth of up to 58 GB/s reading from memory, and 36 GB/s writing to memory. Definitely an improvement.

Unified memory: what’s yours is also mine

The 8GB/16GB of system memory is used by both CPU and GPU. It is not partitioned at boot; both the CPU and GPU (and other parts of the SoC, such as the NPU) have full access to all system memory.

This greatly simplifies intra-chip communication, as all subchips in the SoC can request access to memory! The GPU no longer needs to keep its own (power-guzzling) memory. This reduces the motherboard space that is needed, lowers power consumption, and decreases latency for data transfer between CPU and GPU (Issue 139)): a triple-compounding win.

Issue summary: The Apple A14 and Apple M1 are essentially the same chip architecture: they use almost the same building blocks, just with different numbers of them. On top of that, the Apple M1 implements unified memory, allowing the CPU and GPU (and other SoC components) to share the same system memory, greatly facilitating intra-chip communication.

Some implications of the Apple A14–Apple M1 familial connection: the Apple M1 is truly capable of smartphone-like standby, a feature that Intel’s and AMD’s laptop chips have been striving for but not quite achieved.

It’s a lot to detail here, so instead I will do so—in a separate issue.

What I’ll be covering next

Next issue: [LMG S10] Issue 142: Implications (Part 1) - Software

Besides the reported fact that the M1 is really very fast (and yes I will spend a little time explaining just how fast), what else does this herald for expectations in the software on devices? Coming up next issue :)

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Companies that achieve both CPU and GPU excellence generally have business incentives that align with that goal (as opposed to, say, making low-power or cheap processors) ↩
Intel had a short-lived but ultimately doomed attempt at a smartphone chip (it was named Medfield). ↩
I‘m going to ignore servers here because I can! And because they’re not really relevant to a discussion on low-power consumer chips. ↩
These are the same ones labelled “high-efficiency”, which is marketing speak for “designed to use very little power” ↩↩
Bandwidth of 16 GB/s is for PICe 3.0; PCIe 4.0 will support up to 32 GB/s, but graphics cards won’t use that much bandwidth to communicate with the CPU. ↩

Issue 140: The shared memory dream

2021-10-02T08:00:00+08:00

Previously: Around 2015, the high-performance computer industry quickly realised that this would be much more efficient if the CPU and GPU could share the same memory. This idea was labelled heterogeneous systems architecture (HSA).

Let’s rewind a bit further from last issue. That was in 2015.

Circa 2009, changes were happening on the desktop motherboard, as the memory controller hub (MCH) came on-board the CPU to reduce latency when communicating with memory (Issues 134)–135)). But the memory chips themselves remained on the motherboard, and this was the case even in 2018, in Apple’s Macbook Air (Issue 136)).

Bringing memory on-board

Smartphones can’t afford to do that; every bit of mainboard space is precious! The Apple A-series processors have been gradually moving more and more memory into the CPU, where it enjoys lower latency communicating with the CPU.

In 2013, Apple released the iPhone 5S, using the Apple A7 SoC. This was Apple’s first 64-bit SoC (Issue 55)), and by this point Apple had managed to bring 1GB of memory onto the SoC package. By 2018, With the Apple A12 SoC, the on-board memory had increased up to 4GB on high-end iPhone X models.

So in 2015, the high-performance folks (working with workstations and servers) were dreaming of the CPU and GPU sharing memory, while from 2013, in smartphones, the CPU, GPU, and system memory were already cohabiting in the same chip package! CPU, GPU, and memory all living in the same space … how does this work?

Memory: yours or mine?

Remember this diagram?

An Intel Core i-series ATX system chipset diagram.
The MCH is merged into the CPU, but still a discrete unit.
DDR refers to computer memory, while GDDR refers to graphics card memory (Issue123))
Source: Ars

Apple is pretty tight-lipped about the technical details of its products, but if the industry standard is anything to go by, the GPU will usually have its own memory, separate from the CPU.

After all, CPUs and GPUs don’t do the same work, or even work the same way (Issue 123)). They use different memory, they use memory differently, they store data differently, and if they accidentally overwrote each other’s data … well, your device would just crash.

So … that on-board memory, whose is it? CPU’s, or GPU’s?

Successful sharing looks like …

One thing that makes it difficult to share memory is that the CPU and GPU have to “speak the same language”; they need a common shared understanding of the workflow involved in passing data through shared memory.

This is easier to develop when a single company has control over both CPU and GPU designs. This is not always the case; many smartphones have CPU designs from one company and GPU designs from another!

For instance, the Apple A-series processors initially used GPUs from a graphics company called Imagination Technologies, designed by their PowerVR division. With a CPU and GPU from different teams, working in different ways, shared memory is not likely to happen¹.

But in the A10 SoC, released in 2016, Apple had subtly started to replace parts of the GPU with their own in-house designs. The A10 would be the last in the line of the “Fusion” SoC series.

When the A11 SoC was released in late 2017—first in the “Bionic” series of SoCs—PowerVR’s GPU had been replaced by Apple’s own design².

Apple is finally in the position of working towards shared memory with their Bionic-series SoCs, with the A14 being the fourth “Bionic” SoC.

Issue summary: Shared memory is easier to implement when a company has control over the designs of both CPU and GPU.

The story which began in Issue 138) is coming to a close soon! Next issue, the curtain falls, the A14 and M1 are released, and Apple (probably) pulls the chip industry in a new direction again.

What I’ll be covering next

Next issue: [LMG S10] Issue 141: The Apple A14 and M1

And finally I can geek out over the A14 and M1 😎 don’t worry, I’ll keep it on-topic.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

What about other companies that had control over the CPU and GPU designs? Such as AMD, Samsung, Qualcomm, …? It’s a long story, and not really suitable for a layman newsletter. Sorry. ↩
The design is technically Apple’s, but they had been learning from many generations of working with PowerVR’s GPU, so the early initial designs are very likely heavily influenced by it. ↩

Issue 139: What’s before this line is mine, what’s after this line is yours

2021-09-25T08:00:00+08:00

Previously: A system-on-chip (SoC) combines the core functionality of a system—processing, graphics, memory, and control—into a single chip package.

I am eager to dig into the meat of the A14 and M1! But first I must set up a story.

The hUMA race

Circa 2015 (actually even a couple of years before that), the industry suddenly seemed to wake up and realise that graphics cards could do a lot more than just play video games. The nature of how they work (Issue 121) & 122)) makes them very amenable to solving problems in scientific computing, particularly in simulations, which use up computational resources by the petaflop, and energy by the megawatt.

In a nutshell, the problem the industry now faces is this:

The GPU is massively powerful … at doing a small subset of things. You can solve scientific equations but can’t run a computer with only a GPU.
The CPU is nimble, and much more suited for everyday tasks, like starting up a computer and connecting to multiple peripherals, and basically creating a useable digital environment for humans.
It thus makes the best sense to use the CPU to set up the heavy-lifting for the GPU, and have the GPU return the results after computation.

Remember this diagram from Issue 134)?

Think about how information would flow here:

The CPU requests data from the hard disks, which get put into system memory (DDR; left-most side).
It carries out some pre-processing on that data to set up the task for the GPU, reading from memory, and storing the results back in memory.
The data is copied from system memory to the GPU, which stores that data in GPU memory (GDDR; right-most side).
The GPU carries out the task, storing the results in GPU memory.
The CPU requests the data from GPU memory, copying it back into system memory.

Are you seeing lots of wasted effort there? I bolded it in case you missed it. So much copying of information!

The high-performance computer industry quickly realised that it could be much more efficient if the CPU and GPU could share the same memory.

The information flow in this hypothetical memory-sharing system would be simplified to this:

The CPU requests data from the hard disks, which get put into shared memory.
The CPU pre-processes the data, storing it back into shared memory.
The CPU sends the location of the data to the GPU, which then reads from shared memory and carries out the task, storing the results back into shared memory
The CPU retrieves the results directly from shared memory.

We save time, bandwidth, and resources without having to copy data between CPU and GPU, twice! The only drawback is that with so many components (CPU, GPU, and others) accessing memory at the same time, you are going to need memory with really high bandwidth.

The industry gave this dream a name. They called it heterogeneous system architecture (HSA), using a heterogeneous unified memory architecture (hUMA) i.e. shared memory.

Nvidia’s heterogeneous unified memory architecture (HUMA) dream
Source: WCCFtech

It turns out that this is a pretty difficult task—consider the amount of bandwidth needed to support CPU and GPU access. Today no product from any company (besides Apple) fully implements this in its SoCs (Issue 138)) yet.

Issue summary: Around 2015, the high-performance computer industry quickly realised that this would be much more efficient if the CPU and GPU could share the same memory.

I should stop here with this issue, and summarise the struggles of these companies in the next issue. They will make Apple’s success with the A14 & M1 a much more compelling read :)

What I’ll be covering next

Next issue: [LMG S10] Issue 140: The shared memory dream

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 138: System-on-Chip (SoC)

2021-09-18T08:00:00+08:00

Previously: The M1 goes one step further: not only does it make do with fewer chips, it does so with passive cooling.

In Issue 136), I showed the miniaturisation of the Macbook mainboard through a series of pictures. While the laptop has remained the same size mostly (apart from getting slimmer), that is not the case with its components. The bigger components, like memory and storage, changed from being separate discrete parts to being another component soldered directly to the mainboard.

But that only gets us so far; even in the M1 Macbook Air, the mainboard is still almost the entire length of a phone. There’s got to be something else.

Today, let’s see how the iPhone has evolved.

What’s in a smartphone: 2008

Rewind to 2008: one year after the first-generation iPhone was launched, the iPhone 3G was released. These early smartphones let us see every little chip that was required to run a smartphone:

iPhone 3G mainboard, with parts labelled
There are lots of small, auxiliary processors around the CPU.
Source: iFixit

In spirit and form, the early smartphones were a lot like the early desktop mainboards (Issue 132)): lots of chips performing highly specific functions.

After all, a smartphone has no need (or space) for a peripheral controller hub (PCH) (Issue 134)) when it does not have add-on peripherals, and no need for a memory controller hub (MCH) when it can put the memory directly on the same chip as the CPU.

At this point, Apple was still using a CPU based on a design by ARM, and manufactured by Samsung. 2 years later, Apple had its own in-house processor: the Apple A4, their own design.

What’s in a smartphone: 2010

This time, Apple had switched to an internal layout distinctly different from the iPhone 3G, and the basic layout (mainboard beside battery) would become a pattern for subsequent iPhone generations: battery taking up almost half the space, charging and audio circuitry at the bottom near the charging port, camera and antennas near the top, and everything else beside the battery.

iPhone 4 on the left, iPhone 12 Pro on the right
The basic layout of the iPhone has been preserved over a decade.
Source: iFixit and iFixit

What’s the difference between this layout and the 3G? Let’s have a look at the iPhone 4’s mainboard:

iPhone 4 mainboard. I got lazy with the labelling because, well, there’s nothing to label!
The CPU is the huge chip labelled “A4”, and there’s memory and the 3G chip on the back.
Source: iFixit

Similar to the transition from AT to ATX motherboards (Issue 132) and 134)), the iPhone underwent a great miniaturisation—in a single generation!

System-on-Chip

What happened to all those separate chips? Most of them got moved onboard, into the A4 chip, or other auxiliary chips. The great consolidating brought all their functionality under one roof.

The A4 chip carries:

CPU¹
GPU
MCH
PCH

This chip is responsible for:

performing calculations
rendering graphics (Issue 123))
managing the flow of information between CPU, GPU, and memory (previously the job of the MCH)
managing the flow of information between storage, network, and the MCH (previously the job of the PCH)

It is literally an entire system on a chip: a system-on-chip (SoC)!

More and more functionality would gradually be migrated into the SoC itself, with fewer auxiliary chips required: sensors, gyroscopes, image processors for the camera, etc. More educational perhaps would be to look at what’s not included in the SoC, particularly by the time we get to the the iPhone 12’s SoC, called the A14.

Not on the A14 SoC:

solid state disk (only part of it is in the SoC)
power management
4G & 5G
audio

The above functionality is highly specialised, especially in modern² smartphones. It manages the remaining parts of the phone: camera & mic/speakers, wifi & bluetooth, and telecommunications (4G/5G). Telecommunications in particular require a lot of power and would have contributed to unnecessary heating in the small CPU package.

And this is how we shrink a laptop mainboard even further.

Issue summary: A system-on-chip (SoC) combines the core functionality of a system—processing, graphics, memory, and control—into a single chip package.

What I’ll be covering next

The M1’s design shares a lot more in common with the A14 on the iPhone and iPad than it does with the laptop CPUs that came before it. I want to go into a bit more detail about this in the next issue, so that it’s easier to see just how different it is from a typical laptop.

First question: what exactly does “unified memory” mean? Why is Apple making it such a big deal?

Next issue: [LMG S10] Issue 139: What’s before this line is mine, what’s after this line is yours

Next issue, we look at a trend that started being reported on in 2015: the high-performance computing industry realised that the CPU and GPU need to have much more integrated memory sharing.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

The CPU carries (some) onboard memory for itself, but the main bulk of memory is still on the mainboard. ↩
I know it’s strange to differentiate older vs newer smartphones when the technology is only 1.5 decades old. But the evolution of smartphone designs over the course has been significant enough that yes, I am going to make this distinction :) ↩

Issue 137: The M1 Macbook Air

2021-09-11T08:00:00+08:00

Previously: Slim laptops have been undergoing a gradual transition: more and more of their chips are no longer available as a replaceable card, but instead soldered directly to the mainboard. Since 2017/2018, most slim laptops pretty much have CPU, memory, storage, and network chips all soldered directly to the mainboard.

Let’s get to it: Intel vs M1 Macbook Air!

The 2020 Macbook Air: passing the torch

Here’s the Macbook Air in 2020. There was one in early 2020 using an Intel Core CPU, and one in late 2020 using the Apple M1 CPU.

Macbook Air, early-2020 (Intel, left) vs late-2020 (M1, right)

Source: iFixit

They look almost identical, but some parts are noticeably different … can you spot the differences?

The cooling fan (upper left) is there in the early 2020 (Intel) model, but gone in the late 2020 (M1) model.
Besides the CPU (upper centre in both models, under a heatsink), the Intel model has a mysterious-looking chip (upper right, covered in black shrouding)

You may have spotted other differences in the hardware, but since this issue is focused on the mainboard and CPU, let’s zoom in on those. Let’s have a closer look at their mainboards:

2020 Intel Macbook Air mainboard, front and back
The Intel CPU unfortunately sits under the huge heatsink, shown with its 4 securing screws
Memory and solid state disk are on separate chips (most likely on the back)
Source: iFixit Store

2020 M1 Macbook Air mainboard, front and back
There are fewer big chips, but the single biggest chip there is much bigger, and Apple-branded
Memory is integrated into the CPU, but the solid state disk sits on a separate pair of chips
Source: iFixit

The M1 Macbook Air: all aboard

Overall, it looks like the M1 has “swallowed” a number of chips. Compared to the 2020 Intel model, the M1 has brought on-board computer memory (the two black chips on the M1), and Apple’s T2 chip (the back shrouded chip on the 2020 Intel Macbook Air). These are major components for computer operation.

So not only does the M1 incorporate more components, it does so while drawing less power—the lack of a cooling fan implies it is passively cooled. From Issue 129), this suggests the M1 Macbook Air also uses less power (8–12W) for its tasks. And reviews for the Macbook M1 Air suggest it is not being thermally throttled except under the heaviest of loads.

How did Apple manage to design a processor like this?

The Apple M1: evolved from a smartphone chip

To get into that story, I’ll have to go even more mobile, and look at smartphone CPUs. After all, the Apple M1 actually evolved from the Apple A-series CPUs for their iPhone and iPad. That starts next issue.

What about other Intel Core laptops running Windows?

They are largely undergoing the same transition, just more slowly. This is the Microsoft Surface Laptop in 2017:

Microsoft Surface Laptop mainboard
CPU(red), memory (orange), solid state disk (yellow), and network card (green) are all soldered on. (Outlined in cyan are the display control chips)
Source: iFixit

Larger-sized laptops that can afford the space may still have solid state storage on a separate card.

Issue summary: The M1 goes one step further: not only does it make do with fewer chips, it does so with passive cooling!

The last issue simply went on too long, especially with all the images, so I figured this issue would stand better as a Core-vs-M1 comparison, instead of being the tail of an evolution-of-Air issue. So it’s short.

What I’ll be covering next

Next issue: [LMG S10] Issue 138: System-on-Chip (SoC)

If smartphones are even smaller than laptops, how do they do it? Laptops seem to have exhausted all the tricks, and those boards still look pretty big.

Next issue, I’ll talk about the next step in the evolution of shrinking mainboards.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 136: The mobile workstation – laptops

2021-09-04T08:00:00+08:00

Previously: A modern CPU is manufactured through a process called photolithography, by which the CPU components are etched onto the silicon substrate by successive layers of chemicals, masking, and laser exposure. When the CPU components could be made small enough, the MCH and CPU were designed onto the same chip, and this is the design used by the Intel Core i7 (1st-gen).

In the last 4 issues, I walked through the general evolution of desktop computers. Let’s go more mobile, and look at laptops. How does something as big as a desktop shrink down to the size of a laptop? And what are the tradeoffs involved?

I addressed the power part of the formula in Issue 130), on power limits; laptops are slimmer in part because part of them—the AC adapter—lies outside the system.

Let’s look at the rest of it.

Laptops use slimmer components

Laptops use slimmer memory than desktops:

Desktop memory (DIMM) vs laptop memory (SODIMM)
Source: Quora

On a desktop mainboard, desktop memory sticks out perpendicularly from the mainboard, all the better to cram memory sticks together and maximise the use of space.

On a laptop mainboard, laptop memory sticks lie parallel to the mainboard, to reduce the mainboard height and allow a slim laptop profile.

As far as I know … there aren’t any other significant differences to highlight (besides size). Unless you’re overclocking, just get the kind of memory your computer/laptop needs. These days, small-form-factor desktops use laptop memory (SODIMM) as well!

Laptops use slimmer hard drives compared to desktops as well:

Desktop hard drive (3.5″) vs Laptop hard drive (2.5″)
Source: M2WifiCards

Desktop hard drives are larger, use larger platters (3.5″ diameter), and hence draw more power (at both 12V and 5V voltages). Laptop hard drives are smaller, use smaller platters (2.5″ diameter), and draw less power (at 5V voltage only). This is why smaller external hard drives, which use laptop hard drives, can be powered over USB, but larger external hard drives, which use desktop hard drives, need an external AC adapter.

These days, laptops have mostly made the transition to solid state disks, and you are much less likely to see hard drives in laptops.

The slim laptop in 2010: Macbook Air

Let’s examine how a characteristic slim laptop, the Macbook Air, has changed in the past 10 years.

In 2010, the Macbook Air had its solid state disk and wifi network card on separate (replaceable) cards. But the CPU, GPU, and memory were all soldered directly onto the motherboard.

The solid state disk in the 2010 Macbook Air
Source: iFixit

The wifi network card on the 2010 Macbook Air
Source: iFixit

The CPU (left) and GPU (right) on the 2010 Macbook Air
Source: iFixit

Another view. CPU (red), GPU (orange), and memory (yellow) are directly soldered onto the mainboard
Where is the chipset? I don’t know; the Macbook Air does not seem to use the same chipset as Intel-powered desktops
Source: iFixit

The slim laptop in 2018: also Macbook Air

By 2018, while the outside of the Macbook Air still looks much the same, the insides are rather different:

CPU (red) and solid state disk (yellow) are directly soldered onto the mainboard
Where’s the chipset? Notice that the CPU seems to have 2 chips on it? They are the CPU and chipset; two chips in one CPU package!
The next image shows the reverse side. Other chips are ignored here, see the iFixit article for full identification
Source: iFixit

Memory (red) and wifi network chip (orange) are also directly soldered onto the mainboard
This laptop only had integrated graphics; Intel had upped its integrated graphics performance sufficiently by this point
Other chips are ignored here, see the iFixit article for full identification
Source: iFixit

Components which in the 2010 Air were on separate cards are now all soldered directly to the mainboard! On the one hand, this saves space, which can be used for other features, or just for larger batteries. It also reduces the cost of manufacturing; connectors are costly to engineer and manufacture. On the other hand, it means upgradeability goes out the window.

Issue summary: Slim laptops have been undergoing a gradual transition: more and more of their chips are no longer available as a replaceable card, but instead soldered directly to the mainboard. Since 2017/2018, most slim laptops pretty much have CPU, memory, storage, and network chips all soldered directly to the mainboard.

Sorry about the image dump, I figured it would still be more convenient than having to click-through to see the images :)

What I’ll be covering next

Next issue: [LMG S10] Issue 137: The M1 Macbook Air

The M1 goes even further than the 2018 Macbook Air, in one pretty significant way. Next issue, we compare how the Intel and M1 Macbooks Air differ!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 135: Part 2 – Unifying the CPU and MCH (post-2008)

2021-08-28T08:00:00+08:00

Previously: Light takes 0.3 ns to travel 10 cm, approximately the distance by wire between the CPU and the MCH. This potentially causes operations between the CPU and MCH to slow down by one cycle, at frequencies above 3 GHz. One way the Intel Core i-series resolves this conundrum is to move the MCH into the CPU.

Time to close up some open plot points from last issue:

The number of pins on 1st-gen Core i7 is almost triple that of the Pentium 4; what are all those pins for?
The MCH has been moved into the CPU to improve latencies, but how is it possible to make it small enough to do that?
Are there any disadvantages?

I’ll answer the second question first. It’s quite simple really.

You see, for circuit components, size doesn’t always benefit performance. A large transistor does essentially the same thing as a smaller transistor. So making them smaller is advantageous really; you can fit more into a single chip!

Making a modern CPU

Modern CPUs are manufactured through a process called photolithography—literally it means “etching with light” (Greek; photo- “light” + litho- “stone” + -graphie “to draw”). By layering chemicals over the silicon base, putting a mask over them, and exposing them to light, a series of chemical reactions are induced to create the circuit pattern on the CPU.

Multiple CPUs are created on a single die this way, then individually cut and processed, in multiple steps spanning several months¹. The precision and fineness of the etching laser determine how small we can create components on this substrate. As the manufacturing process improves, semiconductor manufacturing companies are able to create CPUs that can cram more and more transistors into each square mm (or inch) of silicon die.

Besides being able to cram more transistors into the same space, it turns out that smaller components also use much less power! So we not only get performance gains, we get power efficiency gains as well—two birds with one stone.

CPU diagram of the Intel Core i7 (1st-gen)
The memory controller, misc. IO, and QPI areas perform the role that the MCH used to take up
Source: AnandTech

Moving in

Over multiple generations of process improvements, the MCH and the CPU could finally be made small enough that they could both reasonably fit into the same die. There are, of course, implications.

Previously, the CPU only needed pins to communicate with the MCH. Now, the combined chip needs more pins than before to communicate with the computer memory, graphics processing unit (GPU), and PCH.

So that answers the first question of what the additional pins are for.

Working as one unit

Which leaves the third question: besides latency improvements, are there any other advantages?

Mainboard manufacturers save on the cost of the MCH chipset, which works out to about $40. Pretty significant when a mid-range mainboard costs $80–$160.

With the MCH and its requisite wires gone, the mainboard can be shrunk further; motherboards gradually shrank from ATX and microATX form factor sizes, to smaller form factors, such as ITX and the current popular NUC form factors.

Mainboard form factors

ATX: 30×24cm (12×9.6 in)
microATX: 24×24cm (9.6×9.6 in)
ITX: 17×17cm (7×7 in)
NUC: 10×10cm (4×4 in)

And the disadvantages … well, none on the consumer side actually. It seems to be positive all around!

Well actually, complexity rears its ugly head in power-saving features.

Previously, when the computer is in standby (Issue 115)), the CPU could be safely shut down (i.e. cut power to CPU), leaving only the MCH minimally powered so the computer memory retains its information.

With the MCH and the CPU now sharing the same chip, they have to be put in separate power zones so that the MCH portion remains powered while in standby, while the CPU can be shut down safely, making the chip more complicated than its predecessors.

But that is of little concern for us layfolks.

Issue summary: A modern CPU is manufactured through a process called photolithography, by which the CPU components are etched onto the silicon substrate by successive layers of chemicals, masking, and laser exposure. When the CPU components could be made small enough, the MCH and CPU were designed onto the same chip, and this is the design used by the Intel Core i7 (1st-gen).

This is where the story stops with Intel for this season; their current-gen Core series still uses much the same chipset diagram, and a similar basic architecture, so there’s little new info of relevance for me to add here.

What I’ll be covering next

Next issue: [LMG S10] Issue 136: The mobile workstation – laptops

To continue the story towards the Apple M1, it’s time to switch our lens to the mobile world: tablets, smartphones, and things smaller than a laptop. How are these things designed? What are their CPUs like? We’ll examine the evolution of the iconic Macbook Air, from 2010 to 2020 (warning: image-heavy!)

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

See Three months, 700 steps: Why it takes so long to produce a computer chip (WashPo) for a more comprehensive description of the process ↩

Issue 134: Part 1 – the Intel Core i-series launches!

2021-08-21T08:00:00+08:00

Previously: The ATX form factor also brought with it a new breed of computers with more specialised chipsets: the memory controller hub (MCH) and peripheral controller hub (PCH). The MCH specialises in high-throughput requirements, such as computer memory and graphics. The PCH specialises in lower-throughput needs.

Last issue, we looked at the ATX form factor by Intel, which replaced the AT form factor by IBM. While the AT could get by with a smattering of chips, which worked fine for mostly text-only computers, the ATX has much higher throughput requirements. To help the CPU focus on serving the user’s applications, two chipsets—the memory controller hub (MCH) and peripheral controller hub (PCH), take charge of managing the data throughput. The MCH manages data between CPU, computer memory, the graphics processor unit (GPU), and the PCH, while the PCH manages data between the peripherals (audio, storage, network, USB, …) and the MCH.

An Intel pre-Core i-series ATX system chipset diagram.
The MCH and PCH (labelled ICH here for unimportant reasons) support the CPU in its data operations
DDR refers to computer memory, while GDDR refers to graphics card memory (Issue123))
Source: Ars

There are terms for each of the connections between chips, which I won’t get into because it largely won’t concern us until we have to design performant systems.

The evolution of Intel ATX

The technical geeks are probably fuming at this point because ATX is a motherboard standard, while I’m talking about the evolution of processors which have little to do with the motherboards, at least not directly … but that’s of little importance at this point. Because we first need to talk about CPU pins.

From Issue 131), I gave a simple model of the limitations of data transfer:

There is a max frequency they can operate at, and a limit to the number of wires they can be connected to (throughput = no. of wires × frequency)

The number of pins on processors have been steadily increasing up to this point, and so have the frequencies of processors. The Pentium 4 was succeeded by the Pentium D, then the Pentium Dual Core, then the Core 2. This Core processor preceded the Core i3/i5/i7 processors we know today; I’ll refer to this family of processors as the pre-i Core (rather than the more technical LGA775 series).

Pentium 4: 478 pins
Core, Core 2 (pre-i Core): 775 pins
Core (i7, first-gen): 1155 pins

Yup, the number of pins have almost tripled since the Pentium 4! Remember that more pins does not make the CPU itself calculate faster, it just helps it to transfer data faster.

What are all those pins for, if there is the MCH to manage data flow?

Let’s talk about the limitations of the pre-i Core setup.

Communication at a distance

Wait … don’t electrical signals travel at speeds near the speed of light?¹ How would disappearing the MCH improve latency?

Consider some numbers:

The typical distance between the CPU and the MCH is about 5 cm (2 in). Since the wires between them are not straight, let’s approximately double that to 10 cm (4 in). Light would take 0.3 ns to travel that distance. Which is roughly one clock cycle on a 3 GHz processor—at 3 billion cycles per second, each cycle takes a third of a billionth of a second!¹

Remember that everything in a computer needs to happen like clockwork: for data to sync up, when the CPU sets a bit to one, the other party has to detect the bit signal before the clock cycle ends. If not, it will have to wait for the next clock cycle, causing the operation to slow down and take two clock cycles instead of one.

It’s like when you don’t manage to post the mail by 5pm, the postman has emptied the mailbox, and now you have to wait for 5pm the next day for your mail to be picked up instead.

It’s all about throughput … but also latency

If light is taking one clock cycle to get out of the CPU, you have a problem. Raise the frequency higher than 3GHz, and you can cause a one-cycle lag just waiting for data to come in from the MCH, and to go out again to the MCH. That would counter-intuitively slow down the CPU.

Solution: move the MCH into the CPU!

Squeezing more tenants into the building

Wait … you can just do that?

I will need many more issues to lay out the mechanics of this, so I won’t—I think it’s way beyond the scope of a layman’s guide at that point!—but let’s see what I can come up with in the next issue.

Issue summary: Light takes 0.3 ns to travel 10 cm, approximately the distance by wire between the CPU and the MCH. This potentially causes operations between the CPU and MCH to slow down by one cycle, at frequencies above 3 GHz. One way the Intel Core i-series resolves this conundrum is to move the memory controller into the CPU.

This is what I spent years reading and thinking about to explain, and I finally get to lay it out in text. Incredibly excited to get to the next few issues!

What I’ll be covering next

Next issue: [LMG S10] Issue 135: Part 2 – Unifying the CPU and MCH (post-2008)

Next issue: how the ATX form factor evolved to eliminate the MCH. Sorry to end on a cliffhanger!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

I want to just make a note here that while I believe my choice of analogy is justified, the numbers are wildly off: RealWorldTech here puts the 1st-gen Core at approx 30 ns, for technical reasons that will take at least half a season to unpack (definitely not layman content!). But he also notes that latency for remote memory (i.e. memory not on the CPU, but on the motherboard) is “roughly 30 ns slower than local [memory]” (i.e. memory residing directly on the CPU). So the remote-vs-local latency gap is real and significant! ↩↩

Issue 133: the ATX form factor (post-1995)

2021-08-14T08:00:00+08:00

Previously: Chipsets served as go-betweens in the AT form factor by IBM.

In 1993, Intel launched its Pentium line of processors; barely two years later, in 1995, Intel launched the ATX form factor. This was the beginning of Intel’s dominance in the desktop space, and they could well afford to dictate most of the standards for this form factor.

Chipset diagram

Mainboards at this point were complicated enough that as part of the marketing, tech publications had taken to staring at diagrams of how the chips were connected. These diagrams are called chipset diagrams.

This is the chipset diagram of a typical mainboard for the Pentium 4:

I tried to find a chipset diagram that used human terms instead of acronyms.
This is the best I could do. Annotations my own.
Source: Hexus

The memory controller hub (MCH) now takes on a much bigger role; it is managing data transfer between the CPU, graphics card, computer memory, and the PCH.

The peripheral controller hub (PCH), while managing connections to many more devices, actually has less work to do; these are all low-throughput devices that don’t send much data to the CPU.

ATX mainboard

And this is where the components are found on the motherboard:

A motherboard for the Pentium 4, with key components outlined.
Annotations are my own.

The CPU clearly draws the most power and produces the most heat here. But notice now that the MCH is no longer bare; it now produces so much heat (4–10 W) that it needs to be passively cooled with a heatsink (Issue 129)). the PCH, on the other hand, is still chill enough to get by bare naked (4 W or less).

3D graphics at this point is a rapidly growing industry, especially for videogames. Graphics cards needed much more throughput to the CPU and memory, so the MCH grew to fit into this role as the mediator between these throughput-hungry components

Issue summary: The ATX form factor also brought with it a new breed of computers with more specialised chipsets: the memory controller hub (MCH) and peripheral controller hub (PCH). The MCH coordinates high-throughput components, such as computer memory and graphics. The PCH specialises in lower-throughput needs.

Much as I try to avoid using acronyms, here they are really just easier to read.

What I’ll be covering next

Next issue: [LMG S10] Issue 134: Part 1 – the Intel Core i-series launches!

I don’t know if you noticed, but there seem to be fewer chips here than on the AT board. That’s misleading though; the components that were on the AT board are also on the ATX board, but greatly shrunk. Some of the functionality that used to require multiple chips on AT have been replaced by a single chip in ATX, hence the appearance of simplicity. In reality, the ATX mainboard is more complex!

Next issue, onward with the integration!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 132: the AT form factor (pre-1995)

2021-08-07T08:00:00+08:00

Previously: CPUs have limited throughput, since there is a max frequency they can operate at, and a limit to the number of wires they can be connected to (throughput = no. of wires × frequency). Later designs of early computers increased the capability of computers by delegating more work to secondary chips.

When computers began hitting the mainstream market, they were designed to be able to use interchangeable parts so as to reduce cost and inventory. To support this effort, manufacturers came up with standards for how to lay out computer components on a mainboard; the different patterns came to be known as form factors.

The AT form factor, by IBM, is one of the early ones. An AT motherboard looks something like this:

The AT mainboard

An AT motherboard, with key components outlined.
Annotations are my own.
Original: Wikipedia

Graphics cards, usually added as an expansion card, communicated with the CPU (under the heatsink) through a chipset, while the CPU communicated with memory through another chipset.

At this point, graphics were still barely powerful enough to run 3D graphics (this was before Windows 95!), and the chipsets mainly served as go-betweens between memory, expansion slots (called buses), and the CPU.

After 1995, this would change.

Issue summary: Chipsets served as go-betweens in the AT form factor by IBM.

What I’ll be covering next

Next issue: [LMG S10] Issue 133: the ATX form factor (post-1995)

Short issue here, just to introduce the idea of chipsets! You can see the chips on the AT board look very similar. On the ATX form factor, they will begin to differentiate and specialise.

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]

Issue 131: What do early CPUs and startup founders have in common?

2021-07-31T08:00:00+08:00

Previously: AC power from the wall uses electric current that alternates directions, while DC power from batteries uses electric current that flows in one direction only. All electronics are DC-only, and require an AC-DC adapter to be powered from the wall. The AC-DC conversion produces a significant amount of heat; AC-DC adapters are usually external unless the device has sufficient space or cooling capacity for it.

This season, let’s open up that computer case and see what’s inside. Where does everything fit, and how does all that information get around? More importantly, how are computers able to cover such a large range of sizes, from towering desktops to tiny smartphones?

What a computer wants, what a computer needs

The common model of a computer is that it … computes. It calculates. It takes in numbers, and spits out more numbers.

That’s not quite right.

While a computer does carry out compute operations, these are far outnumbered by load/store operations (Issue 58)). Why so much loading and storing of data?

Moving data

The CPU itself has precious little storage (<20 MB of cache storage); it is only a wee little chip! Most of the data in a computer is stored in a hard drive or solid state drive; let’s just call them storage drives for now.

So CPUs have to read data from a storage drive. This is a slow operation, because storage drives are slow; writing to storage drives is even slower than reading from them.

In the meantime, the CPU needs a place to dump working data; this is computer memory (2–32GB). Memory is slower than the CPU’s cache, but much faster than a storage drive.

That’s 3 places to stash data so far: storage drives, CPU cache, and computer memory. You with me so far?

Pipelines

The next place that often requires lots of data is the graphics card (Issue 123)). For you to play a video game, the computer has to:

Load game data from the storage disk,
Store most of it in memory while it’s doing some number crunching in its cache,
Get the crunched numbers to the graphics card for rendering graphics (Issue 122)),
Load more data from memory while crunching more numbers, and passing them to the graphics card.

This involves far more loading and storing than computation. And there are limitations to how quickly data can be transferred.

Throughput

How does data get transferred? Through very fine wires usually. One side (e.g. the CPU) applies a voltage to the wire, the other side (e.g. memory) checks the voltage on the wire. No applied voltage = 0, applied voltage = 1.

How does the CPU know when to apply the voltage, and the memory know when to check it? These operations are synchronised through cycles, like a highly coordinated factory. A CPU operates on a frequency of up to billions of cycles per second, each cycle potentially transferring one bit of data (Issue 40)) if there are no delays.

Typically, the transfer rate is somewhat slower; how do we transfer more data per second? By adding more wires! With two wires, we can transfer two bits per cycle; four wires = four bits per second, eight wires = 8 bits per second … at some point, we run into a different problem. The CPU is a small chip, and there is only so much surface area for us to connect wires to.

An Intel Skylake desktop CPU.
Each gold contact on the under-surface connects to a pin on the motherboard when the CPU is seated properly in its socket
Source: Wikipedia

Well, that just sucks.

The limits of one chip

Come to think of it, humans are much the same; we only have two hands and two legs, there are limits to how fast we can do things, and limits to how long we can stay awake working. We mostly get around these limitations by learning to delegate.

In the same way, computer designs evolved to delegate more work to secondary chips, leaving the CPU to focus on computation. We’ll explore the gradual evolution of these architectures, so you can better appreciate the elegance of the Apple M1’s design ;)

Issue summary: CPUs have limited throughput, since there is a max frequency they can operate at, and a limit to the number of wires they can be connected to (throughput = no. of wires × frequency). Later designs of early computers increased the capability of computers by delegating more work to secondary chips.

The more I learned about computer architecture, the more I see parallels to startups and organisational culture in general. I was really looking at ways of organising information flows, and observing how the computational limitations of different parts influence the design of the whole chip! This is a constant work in progress, which is why we keep seeing new CPU designs emerge each year.

What I’ll be covering next

Next issue: [LMG S10] Issue 132: the AT form factor (pre-1995)

Let’s start from—nah, I wont go all the way back to the beginning, just to the point where computer architecture was already recognisable in its early modern form. Next issue, a big welcome for the AT form factor!

Sometime in the future: What is:

XSS? [Issue 8]
a good reason developers write code and give it away for free online? [Issue 21]
OpenType? And what are fonts anyway? [Issue 42]