<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Layman's Guide to Computing - Season 07</title><link href="https://ngjunsiang.github.io/laymansguide/" rel="alternate"></link><link href="https://ngjunsiang.github.io/laymansguide/feeds/season-07.atom.xml" rel="self"></link><id>https://ngjunsiang.github.io/laymansguide/</id><updated>2020-10-11T11:00:00+08:00</updated><entry><title>Issue 91: Commercial database alternatives</title><link href="https://ngjunsiang.github.io/laymansguide/issue091.html" rel="alternate"></link><published>2020-10-11T11:00:00+08:00</published><updated>2020-10-11T11:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-10-11:/laymansguide/issue091.html</id><summary type="html">&lt;p&gt;Depending on what you need a database for, there may be online database platforms that can manage and automate much of the work for you. Airtable, Smartsheet, Knack, and Zoho Creator are just 4 of many options that offer an easier way to set up and input your data, then access them through apps or other&amp;nbsp;means.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; A &lt;span class="caps"&gt;URI&lt;/span&gt; (Uniform Resource Identifier) is required to connect to a database. This &lt;span class="caps"&gt;URI&lt;/span&gt; can be provided by a hosting service provider that runs your own database for you, or by a cloud service provider that runs your database on their&amp;nbsp;platform.&lt;/p&gt;
&lt;p&gt;So you’re running up against the limits of a spreadsheet and want to do more with the data inside it. Databases sound cool and kind of like what you want right now. But writing a whole app and setting up the database yourself, or even getting someone else to do it and checking their work &amp;#8230; it all sounds like so&amp;nbsp;much.&lt;/p&gt;
&lt;p&gt;What to&amp;nbsp;do?&lt;/p&gt;
&lt;h2&gt;Airtable&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://airtable.com/"&gt;Airtable&lt;/a&gt; is a database-like platform that lets you set up Bases (similar to databases), which can contain different tables for your data. You can specify a specific data type for each table, limit entries to a list of options, and even create lookups (match the value here with a column in another table, and return data from another column in the same&amp;nbsp;row).&lt;/p&gt;
&lt;p&gt;Just as databases don’t have a single canonical view, and everything depends on queries, Airtable also lets you create different views of your data. You can set it up as a list, a gallery, a job status board, and filter it as you&amp;nbsp;like.&lt;/p&gt;
&lt;p&gt;Interestingly, Airtable also dynamically generates an &lt;span class="caps"&gt;API&lt;/span&gt; for each of your bases, so that apps you create have a way to retrieve, modify, or delete data from the database. That saves you a lot of trouble having to set up your own database, for simple&amp;nbsp;needs.&lt;/p&gt;
&lt;h2&gt;Smartsheet&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.smartsheet.com/platform"&gt;Smartsheet&lt;/a&gt; is another platform that lets you create sheets with different views. Unlike Airtable, is leans more heavily towards workplace workflows, with built-in task management features and integration with many services. If you are already using one or more of these services, Smartsheet could be a way to store information for&amp;nbsp;collaboration.&lt;/p&gt;
&lt;h2&gt;Knack&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.knack.com/tour"&gt;Knack&lt;/a&gt; is yet another database-as-a-platform, which also allows you to craft queries to extract the data you need. It has an interesting feature that lets you specify how tables relate to each other (e.g. Contact connects with &lt;em&gt;one&lt;/em&gt; Company, Company connects with &lt;em&gt;many&lt;/em&gt; contacts) to improve&amp;nbsp;queries.&lt;/p&gt;
&lt;p&gt;Knack also lets you create simple apps with limited access to the data, for employee or customer use. If you mainly need internal apps for disseminating or allowing field access to data, this is probably a simpler option than hiring an app&amp;nbsp;programmer/company.&lt;/p&gt;
&lt;h2&gt;Zoho&amp;nbsp;Creator&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.zoho.com/creator/"&gt;Zoho Creator&lt;/a&gt; is a database platform that is more focused on app-building (or so it appears). The database just comes bundled as part of the deal. Another option for corporate operations-focused&amp;nbsp;apps.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Depending on what you need a database for, there may be online database platforms that can manage and automate much of the work for you. Airtable, Smartsheet, Knack, and Zoho Creator are just 4 of many options that offer an easier way to set up and input your data, then access them through apps or other&amp;nbsp;means.&lt;/p&gt;
&lt;p&gt;The best thing about these cloud services is that you probably don’t need to learn &lt;span class="caps"&gt;SQL&lt;/span&gt; or other advanced query languages to use them. A passing familiarity with spreadsheets, and time to sit down and watch tutorial videos, is probably sufficient to get&amp;nbsp;started.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S8] Issue 92: All about&amp;nbsp;apps&lt;/p&gt;
&lt;p&gt;I’ve spent a whole season talking about data (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue040.html"&gt;Season 4, Issue 40&lt;/a&gt;) to &lt;a href="https://ngjunsiang.github.io/laymansguide/issue052.html"&gt;Issue 52&lt;/a&gt;)), then detoured to talking about computers, and the internet, and now back to databases. I think that’s plenty of foundation to finally move on to something more familiar:&amp;nbsp;apps.&lt;/p&gt;
&lt;p&gt;What exactly are apps and what do they do? What are they like under the surface? What makes them&amp;nbsp;tick?&lt;/p&gt;
&lt;p&gt;This and more in Season 8 &amp;#8230; which will start after a two-week hiatus. It has been really fun putting finger to keyboard and watching everything come together, but I noticed the quality of recent issues has been sliding more than I’d like. I’m going to take a little break to reconsolidate, recuperate, and think about the next couple of&amp;nbsp;seasons.&lt;/p&gt;
&lt;p&gt;See you next&amp;nbsp;issue!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 90: Using a database</title><link href="https://ngjunsiang.github.io/laymansguide/issue090.html" rel="alternate"></link><published>2020-10-03T08:00:00+08:00</published><updated>2020-10-03T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-10-03:/laymansguide/issue090.html</id><summary type="html">&lt;p&gt;A &lt;span class="caps"&gt;URI&lt;/span&gt; (Uniform Resource Identifier) is required to connect to a database. This &lt;span class="caps"&gt;URI&lt;/span&gt; can be provided by a hosting service provider that runs your own database for you, or by a cloud service provider that runs your database on their&amp;nbsp;platform.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Graph databases treat the details of things as secondary, and optimise for managing the network of relationships. A graph database can quickly look up how things are related to each other, and return the&amp;nbsp;results.&lt;/p&gt;
&lt;p&gt;At some point in the past, getting a database meant talking to a consultant or contractor, who would then sit with you to understand your requirements, then set everything up for you without letting you touch any part of it. And that is probably for the benefit of you both. But today, for SMEs with some relevant expertise, it is actually possible to get your own database up and running very&amp;nbsp;quickly.&lt;/p&gt;
&lt;h2&gt;Setting up a database on a&amp;nbsp;server&lt;/h2&gt;
&lt;p&gt;If you have admin rights to the workplace server (which can be both a blessing and a curse), you’ll have to find the setup instructions that came with the server software (or Google it online). I’m sorry, it is painful for layfolks (and even for many experienced database admins) and there just isn’t an easier way&amp;nbsp;yet.&lt;/p&gt;
&lt;h2&gt;Registering a database in the&amp;nbsp;cloud&lt;/h2&gt;
&lt;p&gt;If you do not have admin rights to the workplace server, you usually ask your friendly server administrator to help you install the database and set up a web admin panel for you. They will give you a &lt;span class="caps"&gt;URL&lt;/span&gt; and login credentials for that web admin panel, and you configure the database through the database section of the admin&amp;nbsp;panel.&lt;/p&gt;
&lt;p&gt;If your company has decided to do away with organic &lt;span class="caps"&gt;IT&lt;/span&gt; support, your next bet is to outsource that help from cloud services. Each of the major cloud providers provide multiple database types for your perusal. Some app hosting services will also host a database for you (usually intended for app use, but who’s&amp;nbsp;asking?).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Relational databases&lt;/strong&gt;
- Amazon Relational Database Service
- Google Cloud &lt;span class="caps"&gt;SQL&lt;/span&gt;
- Microsoft Azure &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;Database&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Document databases&lt;/strong&gt; (You will see many of them referred to as NoSQL databases)
- Amazon DynamoDB
- Google Cloud Firestore (part of Firebase)
- Microsoft Azure Cosmos&amp;nbsp;Database&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Graph databases&lt;/strong&gt;
- Amazon Neptune
- Microsoft Azure Cosmos also has an &lt;span class="caps"&gt;API&lt;/span&gt; (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue004.html"&gt;Issue 4&lt;/a&gt;)) for graph&amp;nbsp;databases&lt;/p&gt;
&lt;h2&gt;Getting the database&amp;nbsp;identifier&lt;/h2&gt;
&lt;p&gt;After you have successfully registered a database (of any type), you will be given a connection &lt;span class="caps"&gt;URI&lt;/span&gt; (Uniform Resource Identifier), which is a fancy way of saying “&lt;span class="caps"&gt;URL&lt;/span&gt; to identify your database uniquely”. It can be a simple line of text,&amp;nbsp;like:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mongodb://mongodb0.example.com:27017&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;which identifies your database as&amp;nbsp;a &lt;code&gt;mongodb&lt;/code&gt; (document) database running on the server&amp;nbsp;at &lt;code&gt;mongodb0.example.com&lt;/code&gt; on&amp;nbsp;port &lt;code&gt;27017&lt;/code&gt;. (I covered server hostnames in &lt;a href="https://ngjunsiang.github.io/laymansguide/issue029.html"&gt;Issue 29&lt;/a&gt;) and port numbers in &lt;a href="https://ngjunsiang.github.io/laymansguide/issue033.html"&gt;Issue 33&lt;/a&gt;)).&lt;/p&gt;
&lt;p&gt;or it can look&amp;nbsp;like:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;postgres://myusername:myverylongwindedpasswordwhichisobviouslygeneratedbyacomputerandnotahuman@ec2-52-207-124-89.compute-1.amazonaws.com:5432/d77ila0heea1lk&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;which identifies your database as&amp;nbsp;a &lt;code&gt;postgres&lt;/code&gt; (relational) database running on the server&amp;nbsp;at &lt;code&gt;ec2-56-486-386-34.compute-5.amazonaws.com&lt;/code&gt; on&amp;nbsp;port &lt;code&gt;5432&lt;/code&gt;, and your particular database is&amp;nbsp;named &lt;code&gt;d77ila0heea1lk&lt;/code&gt; (you can run multiple databases on a single&amp;nbsp;server).&lt;/p&gt;
&lt;h2&gt;Connecting to a&amp;nbsp;database&lt;/h2&gt;
&lt;p&gt;This is where it gets a bit&amp;nbsp;trickier.&lt;/p&gt;
&lt;p&gt;If you are using another online service that integrates with your database, that service needs to know your &lt;span class="caps"&gt;URI&lt;/span&gt; and its associated information. The service will either ask your for your login/authentication credentials, hostname, and port separately, or ask for it in a single &lt;span class="caps"&gt;URI&lt;/span&gt;, or some mix of the two&amp;nbsp;options.&lt;/p&gt;
&lt;p&gt;If you are hiring your own developer (including possibly yourself), you will have to figure out which module you need to connect to the&amp;nbsp;database.&lt;/p&gt;
&lt;p&gt;For example, MongoDB in&amp;nbsp;Python: &lt;code&gt;MongoClient('mongodb://mongodb0.example.com:27017')&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;And for PostGreSQL in&amp;nbsp;Python: &lt;code&gt;psycopg2.connect('postgres://myusername:myverylongwindedpasswordwhichisobviouslygeneratedbyacomputerandnotahuman@ec2-52-207-124-89.compute-1.amazonaws.com:5432/d77ila0heea1lk')&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; It is considered insecure to simply leave your login credentials in code like that. Please read up on best practices for importing sensitive information from more secure sources in your programming language of&amp;nbsp;choice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; A &lt;span class="caps"&gt;URI&lt;/span&gt; (Uniform Resource Identifier) is required to connect to a database. This &lt;span class="caps"&gt;URI&lt;/span&gt; can be provided by a hosting service provider that runs your own database for you, or by a cloud service provider that runs your database on their&amp;nbsp;platform.&lt;/p&gt;
&lt;p&gt;Once you go through the painful process the first time, it gets easier. A lot of engineering work has been done to make this possible: connect to a database with one identifier. URIs are their own fascinating bit of information engineering, definitely not within the scope of &lt;em&gt;Layman’s Guide&lt;/em&gt;. It is something to think about whenever you need to identify everything in your office or warehouse with a unique name (think barcode system or inventory/asset&amp;nbsp;management).&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 91: Commercial database&amp;nbsp;alternatives&lt;/p&gt;
&lt;p&gt;What if we don’t want to do all of that? Next issue, to wrap up this season, I’ll give you some alternatives that sit somewhere between a full database solution, and a simple Excel/Google Sheets&amp;nbsp;spreadsheet.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 89: Graph Databases</title><link href="https://ngjunsiang.github.io/laymansguide/issue089.html" rel="alternate"></link><published>2020-09-26T08:00:00+08:00</published><updated>2020-09-26T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-09-26:/laymansguide/issue089.html</id><summary type="html">&lt;p&gt;Graph databases treat the details of things as secondary, and optimise for managing the network of relationships. A graph database can quickly look up how things are related to each other, and return the&amp;nbsp;results.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Document databases organise data into documents, each containing a number of field-value pairs. each value can itself be a document, and multiple values/documents can be grouped under a field. Document databases do not enforce data consistency across documents, so those rules need to be managed by the application which is using the database. This allows document databases to continue operating even when partitioned, at the cost of some&amp;nbsp;consistency.&lt;/p&gt;
&lt;p&gt;In the past two issues, I laid out how relational databases primarily focus on the &lt;strong&gt;relations&lt;/strong&gt; between tables, while document databases primarily focus on organising data into &lt;strong&gt;documents&lt;/strong&gt;. I’ll look at one more application&amp;nbsp;today.&lt;/p&gt;
&lt;p&gt;If I’m trying to start a new social media platform today, I would have to store posts and user account data into a database. Which type of database should I&amp;nbsp;use?&lt;/p&gt;
&lt;p&gt;I could use a relational database, but joining multiple tables to get a chain of posts, Twitter-style, could get ugly and involve lots of lookups … that is going to be one laggy service at&amp;nbsp;scale!&lt;/p&gt;
&lt;p&gt;I could use a document database, but it would involve retrieving each post one at a time, searching to find posts which are linked to it, and then checking which posts are linked to those posts … that is too many&amp;nbsp;searches!&lt;/p&gt;
&lt;p&gt;Maybe I’m approaching this wrong. I don’t need to relate many different types of tables or retrieve self-contained documents here. I am actually trying to store a humongous, densely linked network of data—a&amp;nbsp;graph!&lt;/p&gt;
&lt;h2&gt;What?&lt;/h2&gt;
&lt;p&gt;Okay, stay with me here, I know you are thinking of a horizontal and a vertical axis, and axis labels and bars and lines and—that’s not the kind of graph I am talking&amp;nbsp;about.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects.”&lt;br /&gt;
— &lt;a href="https://en.wikipedia.org/wiki/Graph_theory"&gt;Graph theory&amp;nbsp;(Wikipedia)&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That’s what I’m talking about. And it looks like&amp;nbsp;this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Wikipedia multilingual network graph, showing circles representing languages, and arrows between pairs of circles, representing editors who edited both languages." src="https://ngjunsiang.github.io/laymansguide/issue089_01.png" /&gt;&lt;br /&gt;
&lt;em&gt;This network graph shows the co-editing patterns on Wikipedia. The size of the arrows indicate the number of Wikipedia editors for one language edition of Wikipedia, who also edited another language edition.&lt;br /&gt;Source: &lt;a href="https://en.wikipedia.org/wiki/File:Wikipedia_multilingual_network_graph_July_2013.svg"&gt;Wikimedia&amp;nbsp;Commons&lt;/a&gt;&lt;/em&gt;    &lt;/p&gt;
&lt;p&gt;Okay,&amp;nbsp;phew.&lt;/p&gt;
&lt;h2&gt;Graph databases: a network of&amp;nbsp;relationships&lt;/h2&gt;
&lt;p&gt;So if I’m going to make a social media platform that can retrieve chains of posts, how would a graph database make it&amp;nbsp;easier?&lt;/p&gt;
&lt;p&gt;A graph database will still need to have some data for the users and&amp;nbsp;posts:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;personA:User {name:&amp;quot;Alice&amp;quot;}
personB:User {name:&amp;quot;Bob&amp;quot;}
...
post001:Post {tags:&amp;quot;...&amp;quot;, contents:&amp;quot;...&amp;quot;}
post002:Post {tags:&amp;quot;...&amp;quot;, contents:&amp;quot;...&amp;quot;}
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But the heart of the graph database is the data that stores the relationships between those users and&amp;nbsp;posts:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;SAYS_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;SAYS_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If I want to lookup a conversation between Alice and Bob, I can search&amp;nbsp;for &lt;code&gt;SAYS_TO&lt;/code&gt; relationships with Alice and Bob at either end of the relationship arrow&amp;nbsp;(&lt;code&gt;--&amp;gt;&lt;/code&gt;), and sort the results in chronological&amp;nbsp;order.&lt;/p&gt;
&lt;h2&gt;Graph databases put relationships&amp;nbsp;first&lt;/h2&gt;
&lt;p&gt;What about posts and comments? For social media, we can treat them as the same type of data&amp;nbsp;(&lt;code&gt;Post&lt;/code&gt;), but link them with&amp;nbsp;relationships:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post003&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post005&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post007&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post011&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post013&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;personA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;WROTE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post017&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post003&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;REPLY_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post005&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;REPLY_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post003&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post007&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;REPLY_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post003&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post011&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;REPLY_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post005&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post013&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;REPLY_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post011&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post017&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;REPLY_TO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post013&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Because the relationships contain only the bare minimum data for figuring out the network, they are quick to search through. I don’t have to load the names, post tags, post contents, and other irrelevant&amp;nbsp;detail.&lt;/p&gt;
&lt;p&gt;Although I would still have to&amp;nbsp;retrieve &lt;code&gt;post001&lt;/code&gt;, check for replies, check those replies for replies, and so on, this is much faster with relationships between labels. A graph database optimises for this type of&amp;nbsp;lookup.&lt;/p&gt;
&lt;p&gt;Once I have figured out which users and posts are involved in this chain, I can then retrieve their information in a subsequent query. I won’t even need to load all the information at a go, since the app user is not going to see the contents of later posts until they&amp;nbsp;scroll.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Graph databases treat the details of things as secondary, and optimise for managing the network of relationships. A graph database can quickly look up how things are related to each other, and return the&amp;nbsp;results.&lt;/p&gt;
&lt;p&gt;So there you go, three types of databases in three weeks. I picked these three because they’re the least technical to give an overview of (in my opinion), and are three different ways of thinking about data that I think you are likely to&amp;nbsp;encounter.&lt;/p&gt;
&lt;p&gt;There are, of course, other types of databases: key-value stores (used heavily in web browsers), wide column databases, search databases (very similar to document-based), … but beyond this point the differences are primarily technical, and not really suitable for this&amp;nbsp;newsletter.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 90: Using a&amp;nbsp;database&lt;/p&gt;
&lt;p&gt;I’ve been cracking my head trying to come up with 2 more topics to round up this season on databases. I suppose most layfolks would (hopefully) never ever have to start or run their own database. But it could be helpful to know what is needed to get a database up and running, and the most common ways of getting access to one. Expect a short issue next&amp;nbsp;week.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category><category term="document"></category></entry><entry><title>Issue 88: Document Databases</title><link href="https://ngjunsiang.github.io/laymansguide/issue088.html" rel="alternate"></link><published>2020-09-19T08:00:00+08:00</published><updated>2020-09-19T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-09-19:/laymansguide/issue088.html</id><summary type="html">&lt;p&gt;Document databases organise data into documents, each containing a number of field-value pairs. Each value can itself be a document, and multiple values/documents can be grouped under a field. Document databases do not enforce data consistency across documents, so those rules need to be managed by the application which is using the database. This allows document databases to continue operating even when partitioned, at the cost of some&amp;nbsp;consistency.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Relational databases are designed to maintain a well-structured set of data tables through constraint rules. This makes them very useful for preventing accidental inconsistencies in data, but make any changes to the data schema difficult to implement. Changing from one schema to another involves downtime and a&amp;nbsp;migration.&lt;/p&gt;
&lt;p&gt;One problem I keep running into with Excel is when I &lt;em&gt;think&lt;/em&gt; the data has a consistent structure, but halfway through I realise that it actually doesn’t: sometimes I might have two students with different categories of accomplishments, and that requires a big change in the way I design the&amp;nbsp;columns.&lt;/p&gt;
&lt;p&gt;Document databases bypass this problem by not enforcing a strict schema on the data. That is not to say you can’t; it is &lt;em&gt;optional&lt;/em&gt; and up to you to&amp;nbsp;enforce.&lt;/p&gt;
&lt;h2&gt;Document databases: a collection of fields and&amp;nbsp;values&lt;/h2&gt;
&lt;p&gt;When we think of documents, we usually think of Office documents, or PDFs, or things that are … more associated with the way a workplace&amp;nbsp;works.&lt;/p&gt;
&lt;p&gt;These documents are not the ones I have in mind when talking about document databases. In these databases, &lt;strong&gt;documents&lt;/strong&gt; are simply bits of data grouped together. Each bit of data is described by a field. For example, I might start out defining a student document this&amp;nbsp;way:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;{
  name: &amp;quot;Harry Potter&amp;quot;,
  school: &amp;quot;Hogwarts School of Witchcraft and Wizardry&amp;quot;,
  characteristics: &amp;quot;lightning-shaped scar on forehead&amp;quot;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I can add more fields later, if I&amp;nbsp;wish:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;{
  name: &amp;quot;Harry Potter&amp;quot;,
  school: &amp;quot;Hogwarts School of Witchcraft and Wizardry&amp;quot;,
  characteristics: &amp;quot;lightning-shaped scar on forehead&amp;quot;
  mother: Lily Potter,
  father: James Potter,
  ...
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But what makes document databases truly &lt;em&gt;document-oriented&lt;/em&gt; is the way they can be nested. Suppose I want to expand a bit more on this student’s education, to include the years of study. I could expand each entry in&amp;nbsp;the &lt;code&gt;school&lt;/code&gt; field to include&amp;nbsp;that:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;{
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Harry Potter&amp;quot;&lt;/span&gt;,
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;school&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;{
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hogwarts School of Witchcraft and Wizardry&amp;quot;&lt;/span&gt;,
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;start&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;1991&amp;quot;&lt;/span&gt;,
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;end&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;1997&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;}
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;characteristics&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;lightning-shaped scar on forehead&amp;quot;&lt;/span&gt;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Yup, now I’ve just expanded the value of&amp;nbsp;the &lt;code&gt;school&lt;/code&gt; field into &amp;#8230; another document! This document has&amp;nbsp;a &lt;code&gt;name&lt;/code&gt; field,&amp;nbsp;a &lt;code&gt;start&lt;/code&gt; field, and&amp;nbsp;an &lt;code&gt;end&lt;/code&gt; field. I can embed documents just about any place I&amp;nbsp;want.&lt;/p&gt;
&lt;p&gt;I can also group multiple values under a&amp;nbsp;field:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;{
  ...
  characteristics: [&amp;quot;wears glasses&amp;quot;, &amp;quot;lightning-shaped scar on forehead&amp;quot;]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I can also group multiple &lt;em&gt;documents&lt;/em&gt; under a field. It’s documents all the way&amp;nbsp;down!&lt;/p&gt;
&lt;h2&gt;Collections: the only way to organise&amp;nbsp;documents&lt;/h2&gt;
&lt;p&gt;While relational databases have tables for organising rows, document databases have collections for organising&amp;nbsp;documents.&lt;/p&gt;
&lt;p&gt;Each collection can contain multiple documents. There is no constraint on what kind of documents each collection can&amp;nbsp;contain.&lt;/p&gt;
&lt;p&gt;I could have a collection for teachers containing only teacher documents, a collection for students containing only student documents, a collection for subjects containing only subject documents, … or I could just have a collection for the department containing a mix of all three types of&amp;nbsp;documents.&lt;/p&gt;
&lt;h2&gt;What can I do with a document&amp;nbsp;database?&lt;/h2&gt;
&lt;p&gt;Just about &amp;#8230; anything? If you can think of a way to organise the data as documents, you can put it into a document&amp;nbsp;database.&lt;/p&gt;
&lt;p&gt;A document database lets you find documents based on its fields. I can look up all documents which have&amp;nbsp;a &lt;code&gt;name&lt;/code&gt; field, or check that the word &amp;#8220;Harry&amp;#8221; is in&amp;nbsp;the &lt;code&gt;name&lt;/code&gt; field. I could look for students who enrolled in the&amp;nbsp;year &lt;code&gt;"1991"&lt;/code&gt; or later, or more specifically students who enrolled&amp;nbsp;in &lt;code&gt;"Hogwarts School of Witchcraft and Wizardry"&lt;/code&gt; in &lt;code&gt;"1991"&lt;/code&gt; or&amp;nbsp;later.&lt;/p&gt;
&lt;h2&gt;Drawbacks&lt;/h2&gt;
&lt;p&gt;Since this is not a relational database, you don’t have the protection of foreign keys and other features that stop you from making the data inconsistent—there’s no concept of enforced consistency here! You’ll have to write those rules into your app when it accesses the document database; the database won’t enforce them for&amp;nbsp;you.&lt;/p&gt;
&lt;h2&gt;Advantages&lt;/h2&gt;
&lt;p&gt;Data organised as documents tends to be more self-contained. Since the database does not enforce consistency, it has less to worry about when edits or changes are made to the database. In a distributed document database, we thus sacrifice some consistency—unless we make pains to ensure it in our application&amp;nbsp;code.&lt;/p&gt;
&lt;p&gt;This does provide an advantage: when the distributed document database suffers a network outage, causing it to partition into multiple clusters (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue086.html"&gt;Issue 86&lt;/a&gt;)), the database can continue to operate. However, each cluster only has access to its own data, and not data on the other clusters. Over time, each cluster will become less and less consistent, since changes in each cluster are not synchronised to other&amp;nbsp;clusters.&lt;/p&gt;
&lt;p&gt;Once the network issue is resolved and the clusters are synchronised again, these changes can subsequently be merged following rules for resolving conflicts. The database remains operational throughout the ordeal, just with some&amp;nbsp;desynchronisation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Document databases organise data into documents, each containing a number of field-value pairs. Each value can itself be a document, and multiple values/documents can be grouped under a field. Document databases do not enforce data consistency across documents, so those rules need to be managed by the application which is using the database. This allows document databases to continue operating even when partitioned, at the cost of some&amp;nbsp;consistency.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 89: Graph&amp;nbsp;Databases&lt;/p&gt;
&lt;p&gt;Okay, relational and document databases were easy enough. They are more easily mapped to spreadsheets and file/folder hierarchies,&amp;nbsp;respectively.&lt;/p&gt;
&lt;p&gt;But now we go up the abstraction ladder, and get to more abstract ideas of data. In a social network, the user profile is usually the least significant part of the account; what often matters most is how this account is linked to other accounts (followers and following). The study of such interlinked objects is known in mathematics as &lt;strong&gt;graph theory&lt;/strong&gt; (nope, not the kind of graphs we are so used to in reports). This is where terms like “social graph”, the representation of your social network on Facebook or Twitter, comes&amp;nbsp;from.&lt;/p&gt;
&lt;p&gt;What is the most intuitive way to represent, store, and modify this kind of graph data? Using a graph database, of&amp;nbsp;course.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category><category term="document"></category></entry><entry><title>Issue 87: Relational Databases</title><link href="https://ngjunsiang.github.io/laymansguide/issue087.html" rel="alternate"></link><published>2020-09-12T08:00:00+08:00</published><updated>2020-09-12T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-09-12:/laymansguide/issue087.html</id><summary type="html">&lt;p&gt;Relational databases are designed to maintain a well-structured set of data tables through constraint rules. This makes them very useful for preventing accidental inconsistencies in data, but make any changes to the data schema difficult to implement. Changing from one schema to another involves downtime and a&amp;nbsp;migration.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; To increase the performance of a distributed database, we can scale up/scale vertically by increasing the computers’ performance, or scale out/scale horizontally by adding more computers. Distributed databases can only prioritise two of the following three factors: consistency, availability, partitioning (&lt;span class="caps"&gt;CAP&lt;/span&gt;&amp;nbsp;theorem).&lt;/p&gt;
&lt;p&gt;I’ve already discussed one big strength of relational databases in &lt;a href="https://ngjunsiang.github.io/laymansguide/issue084.html"&gt;Issue 84&lt;/a&gt;) when I illustrated how the &lt;span class="caps"&gt;JOIN&lt;/span&gt; keyword, one of many &lt;span class="caps"&gt;SQL&lt;/span&gt; commands (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue083.html"&gt;Issue 83&lt;/a&gt;)), can join our data from multiple tables into a single view. This is where we look under the surface to see what makes that&amp;nbsp;possible.&lt;/p&gt;
&lt;h2&gt;Linking tables through foreign&amp;nbsp;keys&lt;/h2&gt;
&lt;p&gt;From &lt;a href="https://ngjunsiang.github.io/laymansguide/issue084.html"&gt;Issue 84&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To join&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; and &lt;code&gt;Sales&lt;/code&gt; data so that we get the sales data along&amp;nbsp;with &lt;code&gt;custName&lt;/code&gt;, we would write a &lt;span class="caps"&gt;SQL&lt;/span&gt; query like&amp;nbsp;this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SELECT salesID, orderDate, custID FROM Sales
JOIN Customer ON Sales.custID = Customer.custID&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Here, &lt;code&gt;Sales.custID&lt;/code&gt; refers to&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt; of&amp;nbsp;the &lt;code&gt;Sales&lt;/code&gt; table,&amp;nbsp;while &lt;code&gt;Customer.custID&lt;/code&gt; refers to&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt; of&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; table. This query effectively says “select&amp;nbsp;the &lt;code&gt;salesID&lt;/code&gt;, &lt;code&gt;orderDate&lt;/code&gt;,&amp;nbsp;and &lt;code&gt;custID&lt;/code&gt; columns&amp;nbsp;from &lt;code&gt;Sales&lt;/code&gt; table, and add data from&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; table where&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt; column matches”. This will&amp;nbsp;return:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of an INNER JOIN operation between the Sales and Customer data tables, merged using custID values." src="https://ngjunsiang.github.io/laymansguide/issue084_04.png" /&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Did you catch the fact that there were actually &lt;em&gt;two&lt;/em&gt; &lt;code&gt;custID&lt;/code&gt; columns? One in&amp;nbsp;the &lt;code&gt;Sales&lt;/code&gt; table, and one in&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; table &amp;#8230; by linking two tables like that, we actually introduce a point of potential&amp;nbsp;breakage.&lt;/p&gt;
&lt;p&gt;Suppose one day, a customer goes out of business, or changes name, and the&amp;nbsp;corresponding &lt;code&gt;Customer&lt;/code&gt; entry gets deleted. Now if we accidentally attempt to&amp;nbsp;retrieve &lt;code&gt;Sales&lt;/code&gt; to that customer, the &lt;span class="caps"&gt;SQL&lt;/span&gt; command will fail because it is unable to find the&amp;nbsp;entry.&lt;/p&gt;
&lt;p&gt;We can protect ourselves from this kind of error by&amp;nbsp;declaring &lt;code&gt;Sales.custID&lt;/code&gt; as a &lt;strong&gt;foreign key&lt;/strong&gt;&amp;nbsp;in &lt;code&gt;Customer&lt;/code&gt;, thus informing the database&amp;nbsp;that &lt;code&gt;Sales.custID&lt;/code&gt; is actually a column&amp;nbsp;from &lt;code&gt;Customer&lt;/code&gt;. If we attempt to delete that customer again, the database will help to check if that entry is referenced by other tables as a foreign key. Entries can only be deleted if they are not referenced by other&amp;nbsp;entries.&lt;/p&gt;
&lt;p&gt;These and other constraints allow us to protect ourselves from inadvertent harm, but over time, they accumulate and make a relational database very hard to modify. Database administrators will tell you to think about your database tables in advance, as even attempting to add a column or change a column type is going to be a pain in&amp;nbsp;future!&lt;/p&gt;
&lt;h2&gt;The tradeoff: downtime for database maintenance and&amp;nbsp;migrations&lt;/h2&gt;
&lt;p&gt;To modify a relational database, we have to shut it down&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1"&gt;1&lt;/a&gt;&lt;/sup&gt;, and &lt;strong&gt;migrate&lt;/strong&gt; the database from the old schema to the new schema. In essence, we are exporting our data and re-importing it again. Attempting to migrate while the database is active—known as a &lt;strong&gt;live migration&lt;/strong&gt;—is strongly discouraged, as changing a database while a migration is in progress can introduce data inconsistency; a real headache with&amp;nbsp;constraints!&lt;/p&gt;
&lt;p&gt;Relational databases can also develop problems that require them to be shut down and rectified. It’s the tradeoff for having a consistent and structured way to store our data, and automated rules to enforce this&amp;nbsp;structure.&lt;/p&gt;
&lt;h2&gt;Relational databases: excellent for predictable data&amp;nbsp;needs&lt;/h2&gt;
&lt;p&gt;If you don’t expect to be changing your database schema often, or if you are able to design the schema to minimise such migrations, relational databases can be quite excellent for your needs. Please consult a professional database engineer if you are planning to use a database for your business&amp;nbsp;needs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Relational databases are designed to maintain a well-structured set of data tables through constraint rules. This makes them very useful for preventing accidental inconsistencies in data, but make any changes to the data schema difficult to implement. Changing from one schema to another involves downtime and a&amp;nbsp;migration.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 88: Document&amp;nbsp;Databases&lt;/p&gt;
&lt;p&gt;Relational databases work well for data that we can imagine as an Excel table. But often, we have data that might not share the same set of properties, or might not have a predictable structure (such as online collaboration data). Such data is more intuitively imagined as a set of documents than as a set of tables. What do databases that encourage a document-based model of data look&amp;nbsp;like?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="footnote"&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;There are ways to avoid this, but I’ll let a &lt;strong&gt;real&lt;/strong&gt; database administrator tell you about how to make it happen.&amp;#160;&lt;a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text"&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 86: Distributed databases</title><link href="https://ngjunsiang.github.io/laymansguide/issue086.html" rel="alternate"></link><published>2020-09-05T08:00:00+08:00</published><updated>2020-09-05T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-09-05:/laymansguide/issue086.html</id><summary type="html">&lt;p&gt;To increase the performance of a distributed database, we can scale up/scale vertically by increasing the computers’ performance, or scale out/scale horizontally by adding more computers. Distributed databases can only prioritise two of the following three factors: consistency, availability, partitioning (&lt;span class="caps"&gt;CAP&lt;/span&gt;&amp;nbsp;theorem).&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Forms that naïvely inject user-submitted data into a &lt;span class="caps"&gt;SQL&lt;/span&gt; query template may end up sending valid &lt;span class="caps"&gt;SQL&lt;/span&gt; commands to the database, with disastrous&amp;nbsp;consequences.&lt;/p&gt;
&lt;p&gt;So far, we have been assuming that the database runs from a single computer, and all its data is stored on one as well. What happens when it outgrows this single&amp;nbsp;computer?&lt;/p&gt;
&lt;p&gt;We could add more disk space, more memory, more cores on the processor; this is called &lt;strong&gt;vertical scaling&lt;/strong&gt;/&lt;strong&gt;scaling up&lt;/strong&gt; (because we are increasing the performance of the computer, which usually &lt;em&gt;feels&lt;/em&gt; like pushing up the performance bar on the vertical axis of a&amp;nbsp;graph).&lt;/p&gt;
&lt;p&gt;Or we could spread that database over two or more computers. And keep them constantly synchronised. This is called &lt;strong&gt;horizontal scaling&lt;/strong&gt;/&lt;strong&gt;scaling out&lt;/strong&gt; (because we are adding more computers, which is usually depicted as adding more units on a horizontal&amp;nbsp;axis).&lt;/p&gt;
&lt;p&gt;We can only take vertical scaling so far; at some point we will have the most powerful server possible and it still won’t be enough. So if we are expecting massive growth, that means we will need a &lt;strong&gt;distributed database&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;Wait, who actually expects a database to not have to store &lt;em&gt;a lot&lt;/em&gt; of&amp;nbsp;information?&lt;/h2&gt;
&lt;p&gt;There are tiny databases out&amp;nbsp;there!&lt;/p&gt;
&lt;p&gt;These are often used in places where the task is not expected to grow beyond a single &lt;span class="caps"&gt;PC&lt;/span&gt;. For example, the database that stores your WhatsApp messages on your mobile phone, or a tiny database that stores records from a remote standalone sensor. These databases are designed to be extremely efficient at handling small amounts of data, to use very little memory, and/or to ensure that data is always written&amp;nbsp;securely.&lt;/p&gt;
&lt;h2&gt;Okay, fine. Back to distributed&amp;nbsp;databases&lt;/h2&gt;
&lt;p&gt;Buying more computers to run a server is similar to hiring more employees to do the company’s work. The good: you now have more help. The bad: you now have to talk to them!&amp;nbsp;Regularly!&lt;/p&gt;
&lt;p&gt;In distributed databases, there are three factors that are impossible to achieve together in&amp;nbsp;full:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;C&lt;/strong&gt;onsistency — reading the same data multiple times should not give us different&amp;nbsp;results&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A&lt;/strong&gt;vailability — we should get a response from the database&amp;nbsp;quickly&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;P&lt;/strong&gt;artition tolerance — If network disruptions or software/hardware failures break communication, our cluster of servers break up into smaller clusters—they get partitioned. Computers in each subcluster can communicate with each other, but not with computers outside the subcluster. Under such conditions, the system should still continue to&amp;nbsp;operate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is known as the &lt;strong&gt;&lt;span class="caps"&gt;CAP&lt;/span&gt; theorem&lt;/strong&gt;: you can only really prioritise two out of the three&amp;nbsp;factors.&lt;/p&gt;
&lt;h2&gt;Consistency and&amp;nbsp;Availability&lt;/h2&gt;
&lt;p&gt;The database we have been examining so far in Season 7 are known as relational databases, which handle data in the form of tables. When implemented as a distributed database, they often prioritise consistency and&amp;nbsp;availability.&lt;/p&gt;
&lt;p&gt;How does that work? When our distributed database is being hit with 100,000s of requests per second, more than one computer can handle, we need multiple computers to serve these requests. These computers had better be synchronised (to achieve consistency) so that the request will always return the same response from any of those&amp;nbsp;computers.&lt;/p&gt;
&lt;p&gt;One way to achieve this is to have a Single Source of Truth: perhaps we design it so that only one “leader” computer handles edits/changes to the database, which then get sent to all the other “follower” computers. (This assumption that reading data occurs much more frequently than writing/changing data holds up for most use cases.) What happens if the “leader” computer goes down, and our distributed database goes from a leader-follower system to a partitioned bunch of followers? No writes can happen, the system is no longer&amp;nbsp;operational.&lt;/p&gt;
&lt;p&gt;(There are multiple theorems on how to design this system to automatically/manually select a new leader, but I won’t go into that here. The fundamental problem of ensuring consistency and availability in such cases&amp;nbsp;remains.)&lt;/p&gt;
&lt;h2&gt;When a partition&amp;nbsp;happens&lt;/h2&gt;
&lt;p&gt;So it comes down to this: when communication failure happens in a scenario like the above, we have to&amp;nbsp;choose.&lt;/p&gt;
&lt;p&gt;If we need a workaround to ensure that updates on one computer still reaches all the computers so that the data is consistent, that is going to be slow — we lose&amp;nbsp;availability.&lt;/p&gt;
&lt;p&gt;If we want to achieve availability, we could have each computer just return or update the data it has, then worry about synchronisation later — we lose&amp;nbsp;consistency.&lt;/p&gt;
&lt;p&gt;If you find yourself in the position of having to choose a distributed database, it would be immensely helpful to know upfront which 2 factors you want to&amp;nbsp;prioritise!&lt;/p&gt;
&lt;h2&gt;Examples&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Consistency and Availability&lt;/strong&gt;: Bank databases fall in this category. Financial transactions must be accurate, and people need to quickly know whether they were successful. So we have to live with these databases requiring regular maintenance (usually late at night) to minimise the risk of partitioning&amp;nbsp;failure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Partitioning and Consistency&lt;/strong&gt;: Authentication systems are relied upon to ensure that data is only accessed by people who are authorised to do so, and cannot afford to go down for long periods of time. This requires that permissions be properly synchronised across all computers, so consistency is key. These two factors are more important than ensuring a speedy&amp;nbsp;response.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Partitioning and Availability&lt;/strong&gt;: Essential services, such as Google Maps, have to remain operational even with (recoverable) failures, and still have to respond in a reasonable amount of time (otherwise real-time navigation would fail). Roads do not change often, so it is okay if the info we are getting is slightly out of date; we might occasionally get a slower route or find ourselves at a business whose operating hours are not updated in Google Maps, but these are not critical&amp;nbsp;failures.&lt;/p&gt;
&lt;p&gt;The &lt;span class="caps"&gt;CAP&lt;/span&gt; theorem does not say we can never have the third factor! It means we have to pick 2 factors to prioritise, and live with the lowered performance of the&amp;nbsp;third.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; To increase the performance of a distributed database, we can scale up/scale vertically by increasing the computers’ performance, or scale out/scale horizontally by adding more computers. Distributed databases can only prioritise two of the following three factors: consistency, availability, partitioning (&lt;span class="caps"&gt;CAP&lt;/span&gt;&amp;nbsp;theorem).&lt;/p&gt;
&lt;p&gt;This actually ran longer than I expected; the examples were an unplanned addition that I think helps to clarify use cases for each&amp;nbsp;combination.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 87: Relational&amp;nbsp;Databases&lt;/p&gt;
&lt;p&gt;I’ll spend the next 3 issues talking about 3 major types of databases in use today. This isn’t strictly layman content, but I suspect in some non-technical conversations these terms may pop up. More importantly, I think the 3 major types cover 3 different concepts of data, and I hope that elaborating on these in a little bit more detail will help to develop a more nuanced way of thinking about&amp;nbsp;data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 85: SQL Injections</title><link href="https://ngjunsiang.github.io/laymansguide/issue085.html" rel="alternate"></link><published>2020-08-29T08:00:00+08:00</published><updated>2020-08-29T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-08-29:/laymansguide/issue085.html</id><summary type="html">&lt;p&gt;Forms that naïvely inject user-submitted data into a &lt;span class="caps"&gt;SQL&lt;/span&gt; query template may end up sending valid (but otherwise unathorised) &lt;span class="caps"&gt;SQL&lt;/span&gt; commands to the database, with disastrous&amp;nbsp;consequences.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; &lt;span class="caps"&gt;SQL&lt;/span&gt; queries let you join multiple tables based on specified conditions using the &lt;span class="caps"&gt;JOIN&lt;/span&gt; keyword. This enables crafting complex queries to return only the specific data that is&amp;nbsp;required.&lt;/p&gt;
&lt;p&gt;&lt;span class="caps"&gt;SQL&lt;/span&gt; databases are really powerful; this is usually a good thing since it allows developers to do amazing things with the data inside. But it can also lead to disastrous consequences in the unsupervised hands of inexperienced developers. And matters can be even worse if these powers are not carefully granted. A malicious actor could “borrow” these powers to wreak havoc on the&amp;nbsp;database!&lt;/p&gt;
&lt;p&gt;&lt;img alt="XKCD comic: Exploits of a Mom" src="https://imgs.xkcd.com/comics/exploits_of_a_mom.png" /&gt;&lt;br/&gt;
&lt;small&gt;&lt;a href="https://xkcd.com/327/"&gt;Relevant xkcd comic&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;
&lt;h2&gt;Adding data to an &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;database&lt;/h2&gt;
&lt;p&gt;Adding data to an &lt;span class="caps"&gt;SQL&lt;/span&gt; database is easy. If&amp;nbsp;our &lt;code&gt;Customer&lt;/code&gt; table looks like this (from &lt;a href="https://ngjunsiang.github.io/laymansguide/issue084.html"&gt;Issue 84&lt;/a&gt;)):&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Customer data table, with custID, custName, custEmail, and custContact columns." src="https://ngjunsiang.github.io/laymansguide/issue084_01.png" /&gt;&lt;/p&gt;
&lt;p&gt;The relevant &lt;span class="caps"&gt;SQL&lt;/span&gt; query to add another customer&amp;nbsp;is:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Ernest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ernest&lt;/span&gt;&lt;span class="nv"&gt;@lmn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;57564986&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What could go&amp;nbsp;wrong?&lt;/p&gt;
&lt;h2&gt;Interacting with an &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;database&lt;/h2&gt;
&lt;p&gt;The most direct way of managing and interacting with a database is through its commandline tool. Needless to say, this is not how you would want your users using it. It’s just a terrible user experience, and gives them &lt;em&gt;waaaay&lt;/em&gt; too much&amp;nbsp;power.&lt;/p&gt;
&lt;p&gt;So we usually design a frontend—an app, webpage, or database form—that formats and lays out the data nicely for them, and limits the things they can do to the data. This frontend will usually only allow users to edit or delete existing data, and add new data. Then it constructs an &lt;span class="caps"&gt;SQL&lt;/span&gt; query to be sent to the database. The code to do this might look like the&amp;nbsp;following:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;custName = request.form[&amp;#39;custName&amp;#39;]
custEmail = request.form[&amp;#39;custEmail&amp;#39;]
custContact = request.form[&amp;#39;custContact&amp;#39;]
sql.execute(f&amp;#39;INSERT INTO Customer VALUES ({custName}, {custEmail}, {custContact})&amp;#39;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This code naïvely inserts data from the submitted form into the database without any checks. That’s not smart; the contact number might have the wrong number of digits, the email might not even have an &amp;#8216;@&amp;#8217;, and people often type the wrong things in the wrong&amp;nbsp;fields.&lt;/p&gt;
&lt;p&gt;What else could go&amp;nbsp;wrong?&lt;/p&gt;
&lt;h2&gt;&lt;span class="caps"&gt;SQL&lt;/span&gt; Injections: sending &lt;span class="caps"&gt;SQL&lt;/span&gt; commands through an unsecured&amp;nbsp;form&lt;/h2&gt;
&lt;p&gt;A malicious/clever user might attempt to submit the following form&amp;nbsp;data:&lt;/p&gt;
&lt;p&gt;Customer Name: Ernest
Customer Email: ernest@lmn.com
Customer Contact: 10); &lt;span class="caps"&gt;DROP&lt;/span&gt; &lt;span class="caps"&gt;TABLE&lt;/span&gt;&amp;nbsp;Customers&amp;#8212;&lt;/p&gt;
&lt;p&gt;Why would they do that? When inserted into the template above, the full &lt;span class="caps"&gt;SQL&lt;/span&gt; query&amp;nbsp;becomes:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Ernest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ernest&lt;/span&gt;&lt;span class="nv"&gt;@lmn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;DROP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Customers&lt;/span&gt;&lt;span class="c1"&gt;--)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Two things to explain:
- the semicolon&amp;nbsp;(&lt;code&gt;;&lt;/code&gt;) indicates the end of an &lt;span class="caps"&gt;SQL&lt;/span&gt; query. It is used to write two or more queries in one line.
- The database ignores everything after&amp;nbsp;the &lt;code&gt;--&lt;/code&gt;. It is a useful way to add comments to &lt;span class="caps"&gt;SQL&lt;/span&gt; queries (for human consumption) &amp;#8230; or to make the database ignore invalid syntax (such as the&amp;nbsp;standalone &lt;code&gt;)&lt;/code&gt;), which is what happens in this&amp;nbsp;case.&lt;/p&gt;
&lt;p&gt;So the database ends up executing&amp;nbsp;this:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;INTO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Ernest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ernest&lt;/span&gt;&lt;span class="nv"&gt;@lmn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Customers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Goodbye, &lt;code&gt;Customer&lt;/code&gt; table&amp;nbsp;&amp;#8230;&lt;/p&gt;
&lt;h2&gt;Data leakage through &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;injections&lt;/h2&gt;
&lt;p&gt;This app is probably going to have some kind of search or filtering feature, where we enter a name to search for and get results that match. If we were searching for a user named George, an inexperienced developer might send this as the &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;query:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;SELECT * FROM Customer WHERE custName = George
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If I submit the following in the search&amp;nbsp;box:&lt;/p&gt;
&lt;p&gt;Customer Name: George &lt;span class="caps"&gt;OR&lt;/span&gt;&amp;nbsp;1=1&lt;/p&gt;
&lt;p&gt;It might get naïvely substituted to form the following&amp;nbsp;query:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;SELECT * FROM Customer WHERE custName = George OR 1=1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The database will attempt to parse this, and come&amp;nbsp;across &lt;code&gt;custName = George OR 1=1&lt;/code&gt;. It gets interpreted as “return all results&amp;nbsp;from &lt;code&gt;Customer&lt;/code&gt; table where&amp;nbsp;the &lt;code&gt;custName&lt;/code&gt; column matches the result&amp;nbsp;of &lt;code&gt;George OR 1=1&lt;/code&gt;”.&lt;/p&gt;
&lt;p&gt;It will then attempt to&amp;nbsp;evaluate &lt;code&gt;George OR 1=1&lt;/code&gt;. By the unintuitive reasoning of computer logic, this always evaluates to True, and results in the database returning &amp;#8230; all the rows&amp;nbsp;in &lt;code&gt;Customer&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;If you’re going to use a database with a frontend, get an experienced developer to do it. If all you have are inexperienced developers, send them for the appropriate training. If you don’t have developers, use an established product over an untested one. If in doubt, find someone with the relevant credentials to ask for&amp;nbsp;advice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Forms that naïvely inject user-submitted data into a &lt;span class="caps"&gt;SQL&lt;/span&gt; query template may end up sending valid (but otherwise unathorised) &lt;span class="caps"&gt;SQL&lt;/span&gt; commands to the database, with disastrous&amp;nbsp;consequences.&lt;/p&gt;
&lt;p&gt;This would have been 3–5 times as long if I had started going into some basic ways to prevent this kind of mistake. Fortunately, this is just a layman’s guide, and I can foist that responsibility off to the rest of the&amp;nbsp;internet.&lt;/p&gt;
&lt;p&gt;On a serious note, database security is a whole field of study. If you are using a database for enterprise purposes, please give database security the resources it needs; there are just so many ways that things can go&amp;nbsp;wrong!&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 86: Distributed&amp;nbsp;databases&lt;/p&gt;
&lt;p&gt;So far, we have been assuming that the database runs from a single computer, and all its data is stored on one as well. What happens when it outgrows this single computer? Why, it then gets transmitted and infects another computer &amp;#8230; just kidding, we then have to spread that database over two or more computers. And keep them constantly synchronised. If that sounds like a pain, you are exactly right! More on this next&amp;nbsp;issue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 84: JOIN – supercharged VLOOKUP</title><link href="https://ngjunsiang.github.io/laymansguide/issue084.html" rel="alternate"></link><published>2020-08-22T08:00:00+08:00</published><updated>2020-08-22T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-08-22:/laymansguide/issue084.html</id><summary type="html">&lt;p&gt;&lt;span class="caps"&gt;SQL&lt;/span&gt; queries let you join multiple tables based on specified conditions using the &lt;span class="caps"&gt;JOIN&lt;/span&gt; keyword. This enables crafting complex queries to return only the specific data that is&amp;nbsp;required.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Structured Query Language (&lt;span class="caps"&gt;SQL&lt;/span&gt;) is a computer language for managing data in databases. It has keywords and keyphrases that let you filter rows and columns, group and order data, perform basic arithmetic on data, and more. It is complex and powerful, but astute and efficient use requires specialised&amp;nbsp;training.&lt;/p&gt;
&lt;h2&gt;&lt;span class="caps"&gt;VLOOKUP&lt;/span&gt;: The bread-and-butter of&amp;nbsp;spreadsheets&lt;/h2&gt;
&lt;p&gt;If I have&amp;nbsp;a &lt;code&gt;Customer&lt;/code&gt; data table that looks like&amp;nbsp;this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Customer data table, with custID, custName, custEmail, and custContact columns." src="https://ngjunsiang.github.io/laymansguide/issue084_01.png" /&gt;&lt;/p&gt;
&lt;p&gt;And&amp;nbsp;a &lt;code&gt;Sales&lt;/code&gt; data table that looks like&amp;nbsp;this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Sales data table, with salesID, orderDate, and custID columns." src="https://ngjunsiang.github.io/laymansguide/issue084_02.png" /&gt;&lt;/p&gt;
&lt;p&gt;I could add&amp;nbsp;a &lt;code&gt;custName&lt;/code&gt; column to the sales table that &lt;em&gt;looks up&lt;/em&gt;&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt;, and inserts&amp;nbsp;the &lt;code&gt;custName&lt;/code&gt; info from the same row. This feature of spreadsheets is known as &lt;strong&gt;&lt;span class="caps"&gt;VLOOKUP&lt;/span&gt;&lt;/strong&gt; (vertical lookup)&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1"&gt;1&lt;/a&gt;&lt;/sup&gt;. This is what the formula for each cell&amp;nbsp;in &lt;code&gt;custName&lt;/code&gt; would look&amp;nbsp;like:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Sales data table, with salesID, orderDate, and custID columns." src="https://ngjunsiang.github.io/laymansguide/issue084_03.png" /&gt;&lt;/p&gt;
&lt;p&gt;Let’s break down each part of that&amp;nbsp;formula:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;=VLOOKUP(C2,Customer!A:D,2)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This means “in columns &lt;strong&gt;A:D&lt;/strong&gt; of the &lt;strong&gt;Customer&lt;/strong&gt; table, look for the value from cell &lt;strong&gt;C2&lt;/strong&gt; (which&amp;nbsp;is &lt;code&gt;1&lt;/code&gt;) in the first column of the &lt;strong&gt;Customer&lt;/strong&gt; table, and return the value from the same-row cell in the &lt;strong&gt;2&lt;/strong&gt;nd column of the &lt;strong&gt;Customer&lt;/strong&gt;&amp;nbsp;table.”&lt;/p&gt;
&lt;p&gt;What if you needed to insert more than one column? What if you need to “join” two or more tables? Your spreadsheet would soon be filled with &lt;span class="caps"&gt;VLOOKUP&lt;/span&gt; cells, and this really slows down the performance of the spreadsheet. This method is not suitable for data involving millions of rows, for&amp;nbsp;sure.&lt;/p&gt;
&lt;h2&gt;&lt;span class="caps"&gt;SQL&lt;/span&gt; &lt;span class="caps"&gt;JOIN&lt;/span&gt;: &lt;span class="caps"&gt;VLOOKUP&lt;/span&gt; on&amp;nbsp;steroids&lt;/h2&gt;
&lt;p&gt;In a database, there is no “standard view” of the data. All data you want to see has to be retrieved with a &lt;strong&gt;query&lt;/strong&gt;. So it makes no sense to require cells filled with &lt;span class="caps"&gt;VLOOKUPS&lt;/span&gt;; we just need to figure out how to do the equivalent in a query. The keyword for that is called &lt;strong&gt;&lt;span class="caps"&gt;JOIN&lt;/span&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To join&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; and &lt;code&gt;Sales&lt;/code&gt; data so that we get the sales data along&amp;nbsp;with &lt;code&gt;custName&lt;/code&gt;, we would write a &lt;span class="caps"&gt;SQL&lt;/span&gt; query like&amp;nbsp;this:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;SELECT salesID, orderDate, custID FROM Sales
JOIN Customer ON Sales.custID = Customer.custID
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;code&gt;Sales.custID&lt;/code&gt; refers to&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt; of&amp;nbsp;the &lt;code&gt;Sales&lt;/code&gt; table,&amp;nbsp;while &lt;code&gt;Customer.custID&lt;/code&gt; refers to&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt; of&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; table. This query effectively says “select&amp;nbsp;the &lt;code&gt;salesID&lt;/code&gt;, &lt;code&gt;orderDate&lt;/code&gt;,&amp;nbsp;and &lt;code&gt;custID&lt;/code&gt; columns&amp;nbsp;from &lt;code&gt;Sales&lt;/code&gt; table, and add data from&amp;nbsp;the &lt;code&gt;Customer&lt;/code&gt; table where&amp;nbsp;the &lt;code&gt;custID&lt;/code&gt; column matches”. This will&amp;nbsp;return:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of an INNER JOIN operation between the Sales and Customer data tables, merged using custID values." src="https://ngjunsiang.github.io/laymansguide/issue084_04.png" /&gt;&lt;/p&gt;
&lt;p&gt;That is much easier—once you’ve been trained in &lt;span class="caps"&gt;SQL&lt;/span&gt; syntax—than writing separate &lt;span class="caps"&gt;VLOOKUP&lt;/span&gt; formulas for each column you want, and having to maintain a whole table of&amp;nbsp;formulas!&lt;/p&gt;
&lt;p&gt;You can even join more than two tables together with a query&amp;nbsp;like:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;SELECT salesID, orderDate, custID, invoiceID, Customer.custName, Customer.custContact, invoiceDate, invoiceAmt FROM Sales
JOIN Customer ON Sales.custID = Customer.custID
JOIN Invoice ON Sales.invoiceID = Invoice.invoiceID
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is barely scratching the surface of what &lt;span class="caps"&gt;SQL&lt;/span&gt; can do; it has at least 4 types of JOINs, and many more ways of crafting queries to return specifically the data you&amp;nbsp;want.&lt;/p&gt;
&lt;p&gt;&lt;span class="caps"&gt;SQL&lt;/span&gt; queries are a whole different way of talking to your computer, and they can be really frustrating to write for people who are new to it. But they are behind many of the interfaces you see, which seem to seamlessly pull data from multiple sources together into a coherent&amp;nbsp;view.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; &lt;span class="caps"&gt;SQL&lt;/span&gt; queries let you join multiple tables based on specified conditions using the &lt;span class="caps"&gt;JOIN&lt;/span&gt; keyword. This enables crafting complex queries to return only the specific data that is&amp;nbsp;required.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 85: &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;injections&lt;/p&gt;
&lt;p&gt;Databases are immensely powerful software systems when it comes to searching for information. One recurring challenge that all admins face is ensuring that only authorised use is permitted; how do we prevent malicious activity from being able to access the&amp;nbsp;database?&lt;/p&gt;
&lt;p&gt;Next week, I will introduce a common &lt;strong&gt;vulnerability&lt;/strong&gt; that web developers always have to guard against: &lt;span class="caps"&gt;SQL&lt;/span&gt;&amp;nbsp;injection.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="footnote"&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;There is an equivalent feature for columns known as &lt;span class="caps"&gt;HLOOKUP&lt;/span&gt; (horizontal lookup) that looks up info in a row and inserts data from the same column, but it is not as popular. So the &lt;span class="caps"&gt;VLOOKUP&lt;/span&gt; name is more commonly used for this kind of operation.&amp;#160;&lt;a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text"&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 83: Structured Query Language</title><link href="https://ngjunsiang.github.io/laymansguide/issue083.html" rel="alternate"></link><published>2020-08-08T08:00:00+08:00</published><updated>2020-08-08T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-08-08:/laymansguide/issue083.html</id><summary type="html">&lt;p&gt;Structured Query Language (&lt;span class="caps"&gt;SQL&lt;/span&gt;) is a computer language for managing data in databases. It has keywords and keyphrases that let you filter rows and columns, group and order data, perform basic arithmetic on data, and more. It is complex and powerful, but using it in an astute and efficient manner requires specialised&amp;nbsp;training.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; A database system follows rules that enable multiple users to send commands to the database at the same time. The system attempts to execute each action one at a time, locking data that is in use by other users, and ensuring that each user does not carry out actions that they are not permitted to. Such systems are better able to prevent data corruption compared to a text-based&amp;nbsp;system.&lt;/p&gt;
&lt;p&gt;Have you experienced the pain of having really huge tables in your spreadsheet, sometimes spanning more than a hundred columns? Then you might know how painful it can be trying to filter data from it, e.g. if your boss just wants a few columns of info from certain rows. Like if he asks for the performance numbers of employees who are up for&amp;nbsp;promotion.&lt;/p&gt;
&lt;p&gt;In a spreadsheet, you would have to apply filters&amp;nbsp;for &lt;code&gt;nextPromoYear&lt;/code&gt; to only show the appropriate rows, then you&amp;#8217;ll have to hide all the other irrelevant columns. Or you&amp;#8217;d just copy all more-than-a-hundred columns for those rows into another new spreadsheet, and manually delete the unnecessary&amp;nbsp;columns.&lt;/p&gt;
&lt;p&gt;Database designers don’t want to to do that. You should be able to ask the database to do this querying and filtering for you, and return you only the data you want. But how would that be&amp;nbsp;designed?&lt;/p&gt;
&lt;h2&gt;Structured Query Language: the universal database&amp;nbsp;language&lt;/h2&gt;
&lt;p&gt;Structured Query Language (&lt;span class="caps"&gt;SQL&lt;/span&gt;) is another computer language designed to manage data in databases. It reads &lt;em&gt;almost&lt;/em&gt; like English, but more logical and less poetic. It has its own syntax and grammar, which are not the same as in English. And sending a proper &lt;span class="caps"&gt;SQL&lt;/span&gt; query to any database that supports it will get you what you&amp;nbsp;want.&lt;/p&gt;
&lt;p&gt;Here’s what an &lt;span class="caps"&gt;SQL&lt;/span&gt; query for the above info might look&amp;nbsp;like:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;SELECT employeeName, teamName, salesCount, salesTotal FROM SalesData
WHERE nextPromoYear = 2020
GROUP BY teamName
ORDER BY salesTotal;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;SELECT&lt;/code&gt; keyword lets you filter only the columns you&amp;nbsp;want &lt;code&gt;FROM&lt;/code&gt; a&amp;nbsp;table&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;WHERE&lt;/code&gt; keyword lets you filter only the rows you want, based on one or more&amp;nbsp;criteria&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;GROUP BY&lt;/code&gt; keyphrase lets you group the returned data based on values in a&amp;nbsp;column&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;ORDER BY&lt;/code&gt; keyphrase lets you sort the returned results according to values in a&amp;nbsp;column&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;A database has no “main&amp;nbsp;view”&lt;/h2&gt;
&lt;p&gt;One difficulty many people have in “upgrading” from a spreadsheet mindset to a database mindset is that they expect to have a “main spreadsheet” where (almost) all the data lives, and where sub-spreadsheets pull data from. In a database, all data lives in separate tables, and are joined only when a query is executed. The only way to get data from a database is to use&amp;nbsp;queries!&lt;/p&gt;
&lt;p&gt;Most websites or software you are using which retrieves data for you usually end up executing one or more queries such as the above to get that data. And the job of the database software is to interpret such commands, pull the data from the various tables together, collate it correctly, and send it to&amp;nbsp;you.&lt;/p&gt;
&lt;h2&gt;A database can give you almost exactly what you&amp;nbsp;want&lt;/h2&gt;
&lt;p&gt;By using these and many other keywords and keyphrases, it is possible to put together a query that gives you only the data you want. &lt;span class="caps"&gt;SQL&lt;/span&gt; has arithmetic functions such as count, average, sum, and it can even return only unique&amp;nbsp;values.&lt;/p&gt;
&lt;p&gt;The tradeoff is that you have to learn another language, and use it regularly enough to understand the ins and outs. This is why every big corporation has a data team that can do&amp;nbsp;this!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Structured Query Language (&lt;span class="caps"&gt;SQL&lt;/span&gt;) is a computer language for managing data in databases. It has keywords and keyphrases that let you filter rows and columns, group and order data, perform basic arithmetic on data, and more. It is complex and powerful, but using it in an astute and efficient manner requires specialised&amp;nbsp;training.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 84: &lt;span class="caps"&gt;JOIN&lt;/span&gt; – supercharged &lt;span class="caps"&gt;VLOOKUP&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;I haven’t even touched on &lt;span class="caps"&gt;SQL&lt;/span&gt;’s really powerful features yet. Filtering data from a table is fine, but if my data is spread across many tables, how do I pull that data together? Excel folks have a command they rely on heavily to do this, and it is&amp;nbsp;called &lt;code&gt;VLOOKUP&lt;/code&gt;. I’ll show you the &lt;span class="caps"&gt;SQL&lt;/span&gt; version next&amp;nbsp;issue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 82: Multiplayer databases</title><link href="https://ngjunsiang.github.io/laymansguide/issue082.html" rel="alternate"></link><published>2020-08-01T08:00:00+08:00</published><updated>2020-08-01T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-08-01:/laymansguide/issue082.html</id><summary type="html">&lt;p&gt;A database system follows rules that enable multiple users to send commands to the database at the same time. The system attempts to execute each action one at a time, locking data that is in use by other users, and ensuring that each user does not carry out actions that they are not permitted to. Such systems are better able to prevent data corruption compared to a text-based&amp;nbsp;system.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Putting all data into one table results in unnecessary duplication of data. Making data atomic by splitting it up into multiple tables makes the data easier to work with, but requires multiple lookups and joins to get the required data. A standard database language, &lt;span class="caps"&gt;SQL&lt;/span&gt;, makes it possible to write queries that are supported by multiple&amp;nbsp;databases.&lt;/p&gt;
&lt;p&gt;This issue is going to be a short one, because it is simple enough to explain&amp;nbsp;:)&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The action can’t be completed because the file is open. Close the file and try&amp;nbsp;again.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This happens in File Explorer because the operating system treats a text file as a single block of data. When a user opens this file, they do not expect data inside to change. To prevent other users inadvertently modifying it, the operating system “locks” the file, preventing any changes—including&amp;nbsp;deletion!&lt;/p&gt;
&lt;p&gt;How do we resolve this with a database? In the previous issue (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue081.html"&gt;Issue 81&lt;/a&gt;)), I described the process of making data atomic—breaking it up into the smallest level of detail. By splitting up one huge spreadsheet worth of data into multiple tables representing different things, we allow the database to do the heavy work of data processing for us, while we avoid the tedium of repeating the same data row after row (such as author name for different blog&amp;nbsp;posts).&lt;/p&gt;
&lt;h2&gt;Locking specific&amp;nbsp;data&lt;/h2&gt;
&lt;p&gt;Now that the data is atomic, the database is better able to figure out which data needs to be locked. If a user is requesting data from a particular table and from certain rows only, we do not need to lock the entire database and prevent other users from accessing it. Such systems are called row-locking systems, and some databases (but not all) support this&amp;nbsp;feature.&lt;/p&gt;
&lt;h2&gt;Action&amp;nbsp;deconflicting&lt;/h2&gt;
&lt;p&gt;When multiple users access a database and attempt to write data to it at the same time, the database takes these requests and puts them in a queue, processing them one by one so that no two conflicting actions end up causing the data to be&amp;nbsp;corrupted.&lt;/p&gt;
&lt;p&gt;But sometimes, conflicting actions can end up getting queued. For instance, User 1 might send a command to delete a table while User 2 send a command to retrieve data from that table (because it had not been deleted at the point when User 2 accessed it). User 1’s command gets through first and deletes the table, and when the database reaches User 2’s command, it is no longer able to execute it. What happens&amp;nbsp;then?&lt;/p&gt;
&lt;p&gt;Well, that’s when the database throws an error. A database system is able to detect actions whose logic conflict with other actions. With our previous text-based system, even with the table gone, the program could still continue to search for results, and finding none, return empty data instead of alerting the&amp;nbsp;user.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; A database system follows rules that enable multiple users to send commands to the database at the same time. The system attempts to execute each action one at a time, locking data that is in use by other users, and ensuring that each user does not carry out actions that they are not permitted to. Such systems are better able to prevent data corruption compared to a text-based&amp;nbsp;system.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 83: Structured Query&amp;nbsp;Language&lt;/p&gt;
&lt;p&gt;If you have an Excel maven in your workplace who&amp;nbsp;writes &lt;code&gt;VLOOKUP&lt;/code&gt;s, &lt;code&gt;INDEX-MATCH&lt;/code&gt;s and other chained functions with ease, you will know how spreadsheets can do downright amazing things. But wait till you see Structured Query Language (&lt;span class="caps"&gt;SQL&lt;/span&gt;); it will blow your mind! It &lt;em&gt;almost&lt;/em&gt; looks like Excel code, except with fewer nested parentheses, and reads a little (deceptively) more like English. I’ll show you next&amp;nbsp;issue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 81: Data Normalisation</title><link href="https://ngjunsiang.github.io/laymansguide/issue081.html" rel="alternate"></link><published>2020-07-25T08:00:00+08:00</published><updated>2020-07-25T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-07-25:/laymansguide/issue081.html</id><summary type="html">&lt;p&gt;Putting all data into one table results in unnecessary duplication of data. Making data atomic by splitting it up into multiple tables makes the data easier to work with, but requires multiple lookups and joins to get the required data. A standard database language, &lt;span class="caps"&gt;SQL&lt;/span&gt;, makes it possible to write queries that are supported by multiple&amp;nbsp;databases.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; An index is a separate table containing key terms in the database (usually names, IDs, or some other key identifier), alongside the row numbers where they are found. An index greatly speeds up row lookups, but slows down the writing of new&amp;nbsp;rows.&lt;/p&gt;
&lt;p&gt;In this post, I will use &lt;span class="caps"&gt;CSV&lt;/span&gt; format to describe data, although if you have followed this season from the start you would be aware that in a database, this data would not be in text form. Nonetheless, at this point it would be represented&amp;nbsp;similarly.&lt;/p&gt;
&lt;p&gt;If we were constructing a database of blog posts from multiple authors of a blog, we might organise the post data like&amp;nbsp;this:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;id,author,title,content
1,knowitall,Why the world is falling apart,blahblahblah…
2,knowitall,Make the world great again,blahblahblah…
3,whatsgoingon,Why have things come to this,blahblahblah…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Later, when the authors start to add avatars and other information to their profile, the table might&amp;nbsp;grow:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;id,author,avatarURL,about,title,content
1,knowitall,http://avatars.me/knowitall.jpg,I know everything!,Why the world is falling apart,blahblahblah…
2,knowitall,http://avatars.me/knowitall.jpg,I know everything!,Make the world great again,blahblahblah…
3,whatsgoingon,http://avatars.me/whatsgoingon.jpg,Curious about the world,Why have things come to this,blahblahblah…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And when we start to add post&amp;nbsp;tags:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;id,author,avatarURL,about,title,content,tags
1,knowitall,http://avatars.me/knowitall.jpg,I know everything!,Why the world is falling apart,blahblahblah…,daily+apocalypse
2,knowitall,http://avatars.me/knowitall.jpg,I know everything!,Make the world great again,blahblahblah…,daily+ambition
3,whatsgoingon,http://avatars.me/whatsgoingon.jpg,Curious about the world,Why have things come to this,blahblahblah…,essay+apocalypse
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What problems are we going to run into with a table like&amp;nbsp;this?&lt;/p&gt;
&lt;h2&gt;Suboptimal table&amp;nbsp;forms&lt;/h2&gt;
&lt;p&gt;Even with constant-width tables and pre-determined data types, plus speeding up lookups with indexes, we will run into some issues as the number of posts&amp;nbsp;grows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Duplication of data&lt;/strong&gt;&lt;br /&gt;
   The avatar &lt;span class="caps"&gt;URL&lt;/span&gt; and About description (for the author) are repeated in each post. In a real blog, where these are often longer and you might have more contact information about each author (such as contact info), the amount of duplicated data is simply&amp;nbsp;wasteful.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Difficult data extraction&lt;/strong&gt;&lt;br /&gt;
   Notice that the tags are all jammed up into one column. How would we search for all posts with the &amp;#8220;apocalypse&amp;#8221; tag?&lt;br /&gt;
   We would have to retrieve each row one by one, split up the tag strings, and check if &amp;#8220;apocalypse&amp;#8221; is in there … that’s really&amp;nbsp;slow!&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Data normalisation: making data&amp;nbsp;atomic&lt;/h2&gt;
&lt;p&gt;When data is really complex, it makes sense to split it up and make it &lt;strong&gt;atomic&lt;/strong&gt;. When data is atomic, it means that it has been broken down to the lowest level of detail; typically this would mean individual records that avoid&amp;nbsp;duplication.&lt;/p&gt;
&lt;p&gt;For instance, we might have&amp;nbsp;an &lt;code&gt;Author&lt;/code&gt; table:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;ID,name,avatarURL,about
1,knowitall,http://avatars.me/knowitall.jpg,I know everything!
2,whatsgoingon,http://avatars.me/whatsgoingon.jpg,Curious about the world
…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And&amp;nbsp;a &lt;code&gt;Posts&lt;/code&gt; table:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;ID,authorID,title,content
1,1,Why the world is falling apart,blahblahblah…
2,1,Make the world great again,blahblahblah…
3,2,Why have things come to this,blahblahblah…
…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What to do with&amp;nbsp;the &lt;code&gt;Tags&lt;/code&gt;? Often a database designer will create&amp;nbsp;a &lt;code&gt;Tags&lt;/code&gt; table like&amp;nbsp;this:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;ID,tag
1,daily
2,apocalypse
3,ambition
4,essay
…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;and&amp;nbsp;a &lt;code&gt;PostTags&lt;/code&gt; table like&amp;nbsp;this:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;postID,tagID
1,1
1,2
2,1
2,3
3,2
3,4
…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This process of splitting up a complex data set into atomic, related data fields is known as &lt;strong&gt;data normalisation&lt;/strong&gt;. A data set that is not normalised will make it difficult to do lookups efficiently as new needs arise&amp;nbsp;later.&lt;/p&gt;
&lt;h2&gt;Advantages of data&amp;nbsp;normalisation&lt;/h2&gt;
&lt;p&gt;The first advantage you can see above is that retrieving author-only data, post-only data, etc is now much faster. We don’t have to pull up a whole lot of other unrelated information, incurring unnecessary data transfer&amp;nbsp;overhead.&lt;/p&gt;
&lt;p&gt;The second advantage you can see is that entities—authors, posts, tags—are now referred to by an &lt;span class="caps"&gt;ID&lt;/span&gt;. An &lt;span class="caps"&gt;ID&lt;/span&gt; is usually a number, which is represented more compactly in a computer in binary form as compared to a name or title in text form (Issue 79). This allows our program to carry out any processing on relationships between these entities much more quickly (e.g. “how many posts does this author have?” “How many posts have this tag?”), with lower data transfer&amp;nbsp;overhead.&lt;/p&gt;
&lt;h2&gt;Disadvantages: greater&amp;nbsp;complexity&lt;/h2&gt;
&lt;p&gt;The disadvantage is that pulling data together to render a blog post on a webpage now involves looking up three different tables and joining the data together. Each query is going to involve multiple lookups and joins, and is going to require many lines of code … if each programming language is going to come up with its own way of writing these lookups and joins, and each new database format also comes up with its own commands, very soon we would have a huge unmaintainable mess of syntax and commands to&amp;nbsp;learn!&lt;/p&gt;
&lt;p&gt;So programmers and database designers came together and came up with a &lt;em&gt;new language&lt;/em&gt; to do lookups and joins: Structured Query Language, or &lt;strong&gt;&lt;span class="caps"&gt;SQL&lt;/span&gt;&lt;/strong&gt;. This is the reason why today you can write &lt;span class="caps"&gt;SQL&lt;/span&gt; queries that will work on a Microsoft &lt;span class="caps"&gt;SQL&lt;/span&gt; (&lt;span class="caps"&gt;MSSQL&lt;/span&gt;), PostGreSQL, MySQL, or MariaDB database; they all support &lt;span class="caps"&gt;SQL&lt;/span&gt;!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Putting all data into one table results in unnecessary duplication of data. Making data atomic by splitting it up into multiple tables makes the data easier to work with, but requires multiple lookups and joins to get the required data. A standard database language, &lt;span class="caps"&gt;SQL&lt;/span&gt;, makes it possible to write queries that are supported by multiple&amp;nbsp;databases.&lt;/p&gt;
&lt;p&gt;I am jumping ahead of myself a little here; I’ll only talk about &lt;span class="caps"&gt;SQL&lt;/span&gt; a couple of issues later. Before I go into what &lt;span class="caps"&gt;SQL&lt;/span&gt; does, there are two features our program does not yet support: allowing multiple users to read and write data, and setting access permissions on&amp;nbsp;data.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 82: Multiplayer&amp;nbsp;databases&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The action can’t be completed because the file is open. Close the file and try&amp;nbsp;again.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;How often have you run into this error on&amp;nbsp;Windows?&lt;/p&gt;
&lt;p&gt;This makes it difficult for multiple users to work on a file at the same time. How do databases work around this? Find out in the next&amp;nbsp;issue!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 80: Indexing</title><link href="https://ngjunsiang.github.io/laymansguide/issue080.html" rel="alternate"></link><published>2020-07-18T08:00:00+08:00</published><updated>2020-07-18T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-07-18:/laymansguide/issue080.html</id><summary type="html">&lt;p&gt;An index is a separate table containing key terms in the database (usually names, IDs, or some other key identifier), alongside the row numbers where they are found. An index greatly speeds up row lookups, but slows down the writing of new&amp;nbsp;rows.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Comma-separated value (&lt;span class="caps"&gt;CSV&lt;/span&gt;) files store all data in text form. Within each row, a separator divides each chunk of data, and rows are separated by a line delimiter. To keep the data compact and read it more quickly, we have to decide beforehand what &lt;em&gt;data type&lt;/em&gt; each chunk should be, and how much space it is allowed to take up. Such a data form can no longer be opened in a simple text editor program like&amp;nbsp;Notepad.&lt;/p&gt;
&lt;p&gt;Last issue, we were still looking at how to speed up a text-based data storage solution. When we finished, we had a program that could skip the process of reading every single line and counting line breaks, but it could no longer be opened in Notepad. (That’s not a big loss really; Notepad can’t really handle text files larger than 0.5–1 &lt;span class="caps"&gt;GB&lt;/span&gt; anyway&amp;nbsp;…)&lt;/p&gt;
&lt;p&gt;No matter, so long as our program does not need to read in everything at a&amp;nbsp;go!&lt;/p&gt;
&lt;h2&gt;The search&amp;nbsp;problem&lt;/h2&gt;
&lt;p&gt;Right now, our data is still stored in a huge, continuous text block. Retrieving information from this block is easy if you already know the row number you want; our data program can quickly calculate the required row and jump to its starting&amp;nbsp;byte.&lt;/p&gt;
&lt;p&gt;Most if not all of the time, you would have no idea which row to retrieve, although you might know something to tell you what data to look for—a name, a date of birth, etc. You would need to &lt;strong&gt;search&lt;/strong&gt; for this row. And blocks are just not really optimised for such&amp;nbsp;operations.&lt;/p&gt;
&lt;p&gt;Nonetheless, this is not a new challenge. Paper books are often dense and long, especially textbooks. If you wanted to find a passage in there to quote, you would not be flipping through more than 800 pages and scanning paragraph by paragraph just to find it again! You would just flip to the &lt;strong&gt;index&lt;/strong&gt;, look up the term you were hoping to find, and simply check those page&amp;nbsp;references.&lt;/p&gt;
&lt;p&gt;Why not do that&amp;nbsp;here?&lt;/p&gt;
&lt;h2&gt;Indexes&lt;/h2&gt;
&lt;p&gt;To create an index, we would need to create another block of data. This data block would contain select pieces of data from our table for indexing—names, dates, or other select pieces of data from our table—along with the corresponding row number(s) where they are&amp;nbsp;found.&lt;/p&gt;
&lt;p&gt;Yes, that would take up more space, but it would speed up the search immensely, and that is often a worthy tradeoff. This index would be stored together with the table in our database. When the database is opened, this index would be read into memory, because accessing memory is much faster than accessing physical storage (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue057.html"&gt;Issue 57&lt;/a&gt;)). Our database would use it to look up the row number of the record containing the name we want, and retrieve it with the row number much more quickly than a row-by-row lookup&amp;nbsp;could.&lt;/p&gt;
&lt;h2&gt;Tradeoffs&lt;/h2&gt;
&lt;p&gt;You can see how an index would greatly speed up searches, which do not modify the database. But what if we need to store&amp;nbsp;data?&lt;/p&gt;
&lt;p&gt;Each row we add to the database would necessitate updating the index. Instead of updating one table with our original database format, we now have two tables to update; that is definitely slower. You would not want to include an index for tables that are often written&amp;nbsp;to.&lt;/p&gt;
&lt;p&gt;Now that creates a conundrum for us: if I have customer records, should I add an index, or not? I would often have to search these records for a customer’s information, but I would also be adding to this information often. So it looks like indexes would greatly speed up the lookup, but slow down the adding of&amp;nbsp;records.&lt;/p&gt;
&lt;p&gt;I’ll examine this issue next week with &lt;strong&gt;data normalisation&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; An index is a separate table containing key terms in the database (usually names, IDs, or some other key identifier), alongside the row numbers where they are found. An index greatly speeds up row lookups, but slows down the writing of new&amp;nbsp;rows.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 81: Data&amp;nbsp;Normalisation&lt;/p&gt;
&lt;p&gt;In a spreadsheet, we sometimes love to split a page into multiple tables, with lovely table labels and such. With our database now optimised for fast access with constant-width rows and specific data types, we can no longer do&amp;nbsp;that.&lt;/p&gt;
&lt;p&gt;How should we organise our data then? More on this in the next&amp;nbsp;issue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry><entry><title>Issue 79: A Base for Data</title><link href="https://ngjunsiang.github.io/laymansguide/issue079.html" rel="alternate"></link><published>2020-07-11T08:00:00+08:00</published><updated>2020-07-11T08:00:00+08:00</updated><author><name>J S Ng</name></author><id>tag:ngjunsiang.github.io,2020-07-11:/laymansguide/issue079.html</id><summary type="html">&lt;p&gt;Comma-separated value (&lt;span class="caps"&gt;CSV&lt;/span&gt;) files store all data in text form. Within each row, a separator divides each chunk of data, and rows are separated by a line delimiter. To keep the data compact and read it more quickly, we have to decide beforehand what &lt;em&gt;data type&lt;/em&gt; each chunk should be, and how much space it is allowed to take up. Such a data form can no longer be opened in a simple text editor program like&amp;nbsp;Notepad.&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Previously:&lt;/strong&gt; Modern webpages rely on many third-party resources for their functionality. Blocking access to some domains may cause these webpages to break and stop&amp;nbsp;working.&lt;/p&gt;
&lt;p&gt;We start a new season this issue, and now I circle back to the theme of data again. In Season 4, I laid out the broad categories of data, and showed how these basic data types get put together into more complex containers, such as video and documents. Let’s take it one step&amp;nbsp;further.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;base&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;noun&lt;/em&gt;&lt;br /&gt;
  the bottom of something considered as its support:&amp;nbsp;[foundation]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Whatever you do, I doubt your entire life can fit into a single document. Heck, even your household data, your work data … whatever it is, it is probably too complex and varied to fit into a single container. So many numbers and paragraphs, related in some ways, all interconnected … how to make sense of&amp;nbsp;it?&lt;/p&gt;
&lt;p&gt;We need a foundation for all this data, a base on which we can build our lives and our worlds. We need &lt;strong&gt;databases&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let’s start from square 1: plain&amp;nbsp;text.&lt;/p&gt;
&lt;h2&gt;Text files and &lt;span class="caps"&gt;CSV&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Starting simple, let’s try to put our data into a text file. Inside the computer, a text file is just a long string of&amp;nbsp;text:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;This is the first line\nThis is the second line\nThis is the third line\n…
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That &lt;code&gt;\n&lt;/code&gt;? That is the newline character (&lt;a href="https://ngjunsiang.github.io/laymansguide/issue041.html"&gt;Issue 41&lt;/a&gt;)), an unprintable code that tells the computer “the subsequent parts go on a new line”. Without it, Everything would just be one long, continuous string. The newline character determines the limits of each line; it &lt;strong&gt;delimits&lt;/strong&gt; the&amp;nbsp;line. &lt;code&gt;\n&lt;/code&gt;, the newline character, is therefore a line &lt;strong&gt;delimiter&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Not all our data is just a single line like that. In spreadsheets, for example, we want multiple data types in the same row. How do we get the computer to understand that these data are not one big bundle, but separate pieces? We need a &lt;strong&gt;separator&lt;/strong&gt;. Commonly, commas are used to separate data, like&amp;nbsp;this:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="mf"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;bubbleSort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;1.734122735215351e-06&lt;/span&gt;
&lt;span class="mf"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;insertionSort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;1.0771698807366193e-06&lt;/span&gt;
&lt;span class="mf"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;mergeSort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;5.6086346949450675e-06&lt;/span&gt;
&lt;span class="mf"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;quickSort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;4.135697910096496e-06&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That’s some data I was compiling to compare different sorting methods. The first piece of data, a number, represents how many numbers I was sorting. The second piece of data is the sorting method, and the third piece of data is the time taken. and they are separated by&amp;nbsp;commas.&lt;/p&gt;
&lt;p&gt;This format is known as &lt;strong&gt;comma-separated values&lt;/strong&gt;, and referred to by the acronym &lt;strong&gt;&lt;span class="caps"&gt;CSV&lt;/span&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;Searching through data in &lt;span class="caps"&gt;CSV&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In &lt;a href="https://ngjunsiang.github.io/laymansguide/issue041.html"&gt;Issue 41&lt;/a&gt;), I mentioned that each character takes up a standard number of bytes (1 byte, in the case of the characters on your keyboard; anything outside of that, it’s complicated). That makes it easy for the computer to retrieve characters. First character, first byte. 100th character, 100th&amp;nbsp;byte.&lt;/p&gt;
&lt;p&gt;What about the 5th row? Which byte is&amp;nbsp;that?&lt;/p&gt;
&lt;p&gt;Now the computer has to start searching from byte 1 all the way, count the number of newlines&amp;nbsp;(&lt;code&gt;\n&lt;/code&gt;), and after the 4th newline it knows “this is the fifth line”. That works for a small amount of data, perhaps even for a household, but for businesses with thousands of customers and millions, even billions of lines of data, this is unworkable. What can we do about&amp;nbsp;this?&lt;/p&gt;
&lt;p&gt;If you recognise the themes that have been recurring so far, you probably know it subconsciously: we need &lt;em&gt;standardisation&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;If we could decide beforehand—a big &lt;em&gt;&lt;span class="caps"&gt;IF&lt;/span&gt;&lt;/em&gt;, but possible—how many data pieces each row should have, and the largest number of bytes each data piece will take up, things will be much easier. If each row only has 3 pieces of data, and each piece of data takes up no more than 8 bytes (64 bits), then each row takes up 28 bytes. The 5th row starts from byte&amp;nbsp;113.&lt;/p&gt;
&lt;p&gt;This process is much faster for a computer. It does not need to read every single byte and count newlines anymore; it can just jump to the position of byte 113 and start reading from&amp;nbsp;there.&lt;/p&gt;
&lt;h2&gt;Reducing data&amp;nbsp;size&lt;/h2&gt;
&lt;p&gt;One more problem to resolve: 112 bytes for 4 rows is a lot of data! A chunk of data in text form, such as&amp;nbsp;“&lt;code&gt;$1,234.56&lt;/code&gt;” is 9 characters, which means 9 bytes. If we could somehow &lt;em&gt;standardise&lt;/em&gt; this data type (say, let’s just call it &lt;em&gt;currency&lt;/em&gt;), and reduce it to just the number&amp;nbsp;form &lt;code&gt;1234.56&lt;/code&gt;, we could store it in just 2 bytes! That’s much fewer bytes to retrieve, store, and&amp;nbsp;transfer.&lt;/p&gt;
&lt;p&gt;The tradeoff is that now we can no longer just open that file in Notepad to peek at the data. We would need a program&amp;nbsp;that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;remembers how many bytes each row and piece of data should take&amp;nbsp;up,&lt;/li&gt;
&lt;li&gt;remembers what type each piece of data&amp;nbsp;is.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This program will figure out where to start reading the file from, retrieve the data we want, and return it. Compared to managing data in &lt;span class="caps"&gt;CSV&lt;/span&gt;, the data will be more compact, and the program will be faster. And we would have taken one step away from plain text files, towards a full&amp;nbsp;database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Issue summary:&lt;/strong&gt; Comma-separated value (&lt;span class="caps"&gt;CSV&lt;/span&gt;) files store all data in text form. Within each row, a separator divides each chunk of data, and rows are separated by a line delimiter. To keep the data compact and read it more quickly, we have to decide beforehand what &lt;em&gt;data type&lt;/em&gt; each chunk should be, and how much space it is allowed to take up. Such a data form can no longer be opened in a simple text editor program like&amp;nbsp;Notepad.&lt;/p&gt;
&lt;p&gt;For tech junkies and programmers, it is easy to get into the blind pursuit of performance. I wanted this issue to start right, by demonstrating the tradeoffs involved in increasing performance. We started from a data format so simple it can be opened in Notepad and read by a human, to a format that needs a program to&amp;nbsp;read.&lt;/p&gt;
&lt;p&gt;At least this program is simple to write; I could do it in less than fifty lines of Python code. Let’s look at more tradeoffs in the next&amp;nbsp;issue.&lt;/p&gt;
&lt;h2&gt;What I’ll be covering&amp;nbsp;next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next issue:&lt;/strong&gt; [&lt;span class="caps"&gt;LMG&lt;/span&gt; S7] Issue 80: From Blocks to&amp;nbsp;Trees&lt;/p&gt;
&lt;p&gt;We are so used to seeing data in a single &lt;strong&gt;blob&lt;/strong&gt;—as a dense spreadsheet table, as densely packed lines of text, etc—that it is difficult to see it as a loosely organised tree&amp;nbsp;structure.&lt;/p&gt;
&lt;p&gt;But in our daily lives, that is much more commonly the way data is organised. Data in organisations is never all put in a single document or place; it is loosely spread across departments, each of which manage a portion of it, and these departments send information updates to each other to update their separate&amp;nbsp;sections.&lt;/p&gt;
&lt;p&gt;In the next issue, I’ll apply this idea to the way computers manage&amp;nbsp;information.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sometime in the future:&lt;/strong&gt; What&amp;nbsp;is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;booting up? [Issue&amp;nbsp;15]&lt;/li&gt;
&lt;li&gt;&lt;span class="caps"&gt;XSS&lt;/span&gt;? [Issue&amp;nbsp;8]&lt;/li&gt;
&lt;li&gt;a good reason developers write code and give it away for free online? [Issue&amp;nbsp;21]&lt;/li&gt;
&lt;li&gt;firmware? [Issue&amp;nbsp;34]&lt;/li&gt;
&lt;li&gt;OpenType? And what are fonts anyway? [Issue&amp;nbsp;42]&lt;/li&gt;
&lt;li&gt;What is involved in installing a piece of software? [Issue&amp;nbsp;48]&lt;/li&gt;
&lt;li&gt;How do apps know where a file starts and ends? [Issue&amp;nbsp;49]&lt;/li&gt;
&lt;li&gt;What is a password hash? [Issue&amp;nbsp;63]&lt;/li&gt;
&lt;/ul&gt;</content><category term="Season 07"></category></entry></feed>