From Clocks to Categories

loopspace

2024-04-29

# Contents

In the summer of 2020 I gave a talk in the Talk Math With Your Friends online series with the title From Clocks to Categories. The talk had two strands to it:

1. There is Category Theory in school level Mathematics.

2. So what?

In this post I want to go into a bit more detail on the first of these and explain the Mathematics behind the things that I mentioned in my talk. If you know about category theory already, you can think of this as a source of examples for if you get asked to explain it to someone. If you don't know about category theory, this will hopefully give you a flavour of it with non-scary examples.

One thing that needs saying at the outset is that my goal is to find examples where the categorical viewpoint is already there. By its nature as a foundational theory of Mathematics, I can recast all of school Mathematics in a categorical framework. But most of the time that involves bringing in additional structure. For example, if I think about the field structure of the real numbers then I can consider $ℝ$ as an object in the category of fields. However, without knowledge of other fields then this isn't really school Mathematics anymore. So I want to find aspects of category theory that are already present and just need a little changing of perspective or perhaps a small additional idea to reveal them.

# 1 Why Category Theory?

Let me briefly explain why I'm interested in finding category theory in school Mathematics. Many years ago, back when I was an academic, a question was asked on MathOverflow about when to teach category theory. I gave an answer, and as part of that answer I wrote:

I believe that category theory is an excellent way to understand and express mathematical concepts. I find in my own work that, time and time again, when I express my ideas using categorical language then it makes them clearer both to me and to others. Believing this, as I do, why on earth would I want to deprive my students of the same benefits?

As I said, this was when I was an academic and giving lectures to undergraduates. When I switched to teaching in a school I did think that it would be a bit of a stretch to include category theory in my day to day lessons. But then one day I was trying to explain about conditional probability and I drew on a board the diagram in Figure 1. It felt like a diagram straight out of a text book on category theory.

Before I go further, I should explain that I do not use the terminology of category theory when teaching. Rather it is that I find myself, time and time again, using the language of category theory.

# 2 Category Theory: the Basics

I've described category theory at times as being Ubuntu Mathematics: a thing is what it is because of how it relates to other things. So we don't examine a thing in isolation but we examine it in a context of similar things.

Thus the aspect of category theory that I find foundational is that it puts the focus on the relationships between things rather than the things themselves. So in all of this, the goal is to lift the focus of attention ever higher in the hierarchy.

With that in mind, here are the basic definitions of terms that I'll be using in this article. I'll separate out the definitions a bit so that I can provide a running commentary, thus the early definitions are not complete.

Definition 1

A category consists of objects and arrows.

An arrow has a source (or domain) and target (or codomain), both of which are objects in the category.

A more common name for the arrows is morphisms, but as my intention is that this makes sense to someone who hasn't heard of category theory, I'll use the more normal-sounding "arrow" instead.

We think of the arrow as going from its source to its target, but it actually isn't necessarily the case that an arrow fits that mould quite so neatly. Actual examples will come later, but there are various pictures that have proven to be useful to have in mind. The "arrow" terminology comes in useful with pictures like Figure 2.

"Objects as blobs" is quite a useful point of view in category theory: we try not to look too closely at the objects themselves.

If you know a bit about the history of category theory, you might be expecting me to talk about classes and sets here. I shan't. If you know it, great. If you don't know it, you probably aren't bothered by it and I don't want to try to cover everything here.

On with the definitions.

Definition 2

Every object has a special arrow called the identity arrow, which has source and target that object. For an object $X$, we will write this as ${1}_{X}$ or ${I}_{X}$ (particularly if $1$ has other meanings in context).

In terms of pictures, this might look like Figure 3

The one piece of structure that categories always have is that one can compose arrows providing they line up correctly.

Definition 3

If the target of one arrow is the source of a second then these two arrows can be composed (in the obvious order) to produce a third.

Figure 4 is the picture for this.

Notice that I'm writing composition in "inside-out" order. This is a convention, and others will write them in "left-to-right" order. I once wrote a paper in which I made it possible to easily switch between these just in case the referee asked for it (they didn't).

Composition satisfies two basic rules:

1. Associativity: $\gamma \circ \left(\beta \circ \alpha \right)=\left(\gamma \circ \beta \right)\circ \alpha$

2. Identities are identities: $\alpha \circ {1}_{X}={1}_{Y}\circ \alpha =\alpha$

Notice that in that second one, the identities are different since one is at the source of $\alpha$ and the other at its target. In those rules, I've tacitly assumed that the various compositions make sense, for example that the target of $\beta$ is the source of $\gamma$.

# 3 Examples of Categories in School Mathematics

Here, I'll gather the examples of categories from school Mathematics that I'll use in the rest.

## 3.1 Clock Arithmetic

This was my first example in the Talk Math With Your Friends talk. There are actually several categories here depending on how finely you divide time.

Definition 4

A clock category has:

• Object: times

• Arrows: durations

The one I gave in the talk was based on hourly divisions. More precisely, the objects were hours on a clock. So we can draw the objects of this category very neatly as a clock, as in Figure 5.

The arrows are the durations, which are a little harder to draw if we try to draw all of them. Rather, let us draw a representative few in Figure 6.

I feel that this is a simple enough example that I don't need to go into great detail as to proving that this makes a category. Rather, I'll list some half-baked thoughts on what the categorical viewpoint brings to this example.

1. It separates out the related but often conflated notions of time and duration.

Times are measures of position. By focussing on hours on the clock, we can't even talk about "earlier" or "later" for such times since we don't know if $1$ o'clock is earlier or later than $11$ o'clock. There are situations that make either one true. In particular, we can't add or subtract times and we certainly can't multiply them.

Durations are more algebraic. We can add and subtract durations. Although we can't multiply durations, we can consider multiples of the same duration (so, there's a secret $ℤ$–action here).

2. It emphasises the distinction between abstract durations and concrete durations.

By a concrete duration, I mean something like "$3$ hours after $4$ o'clock". This is a specific arrow. An abstract duration would be just "$3$ hours". So it's a duration looking for a source1.

1That this makes sense is a special property of this category, not all categories have the ability to detach their arrows in this fashion.

This could be used to break down the passage from the concrete concepts of arithmetic to the abstract into smaller pieces.

3. It brings out the contextual element of arithmetic.

What is $5+9$? If we regard these as concrete durations, we have to expand this to "What do we get if we wait $5$ hours after $1$ o'clock and then $9$ hours after that?" But even that is ambiguous! There are two possible answers depending on whether what we want is the time after that wait or the total duration. If all we care about is the destination, the answer is $3$ o'clock. If all we care about is the duration, the answer is $14$ hours. But both of those are partial answers. The full answer is that we get the duration (arrow) of $14$ hours after $1$ o'clock, and the end time (target) of this arrow is $3$ o'clock.

4. It means we no longer have to make that weird statement when introducing modular arithmetic about how we regard $12$ o'clock as $0$. We don't. It is a duration of $12$ hours that has the same target as a duration of $0$ hours.

## 3.2 Sequences

There are many categories of sequences. The one that I'm going to use later on is that of arithmetic or linear sequences.

Definition 5

The category of Arithmetic Sequences has:

• Objects: arithmetic sequences

• Arrows: an arrow from one sequence to another, say from $\left({s}_{n}\right)$ to $\left({t}_{n}\right)$, is a pair of numbers $\left(a,b\right)$ with the property that for each $n\in ℕ$, ${t}_{n}=a{s}_{n}+b$.

As with the clock category, the arrows have a description that is separate to their source and target so it is tempting to talk about a pair $\left(a,b\right)$ as a generic arrow. One thing that this doesn't capture is that the behaviour of these arrows is subtly different on constant sequences versus non-constant ones. There is a single arrow from any non-constant sequence to any other given sequence. There are no arrows from constant sequences to non-constant sequences. There are many arrows between two constant sequences.

I said at the outset that I wanted to reveal structure that was already in place rather than impose new structure on an existing framework. When I started thinking about this category then I felt that the arrows were an extra and that in school we tend to think about sequences individually. Then I thought a bit more and realised that while we tend not to think about arrows between arbitrary sequences, we do think about arrows from a very specific sequence: the counting sequence.

A common question for an arithmetic sequence, say $3,5,7,\dots$, is to find a formula for its $n$th term. One strategy for finding this is as follows: start by finding the common difference (in this case, $2$); consider the sequence formed by taking the multiples of that common difference (in this case, $2,4,6,\dots$); this differs from the original sequence by a constant shift (in this case, $+1$). Put these together to get the $n$th term rule (here, $2n+1$). It is common to put these together into a table, as in Table 1.

 $n$ $1$ $2$ $3$ $2n$ $2$ $4$ $6$ $+1$ $1$ $1$ $1$ $2n+1$ $3$ $5$ $7$

Table 1: Finding the $n$th term rule

What we are doing here is constructing an arrow from the sequence $1,2,3,\dots$ to our target sequence. Not only that, but we're doing it in two stages so we've constructed it as the composition of, in this case, $\left(2,0\right)$ followed by $\left(0,1\right)$, as shown in Figure 7.

## 3.3 Shapes

Shapes are used all over school Mathematics, and using transformations makes it into an easy category.

Definition 6

The category of Shapes has:

• Objects: subsets of ${ℝ}^{2}$

• Arrows: transformations

Within this category, we can define various subcategories. A subcategory is quite a loose concept: we can throw out both objects and arrows. The main thing that we can't change is how to compose arrows.

Subcategories in common use are:

1. Restrict the objects:

1. Polygons

2. Curvy-linear shapes (imagine polygons but with arcs as well as straight lines)

4. Triangles

2. Restrict the arrows:

1. Affine transformations

2. Shape-preserving (so, conformal affine transformations)

3. Shape and size-preserving (so, orthogonal affine)

Particularly when the arrows are reduced, we are concerned with classifying shapes that are related via an arrow. When the arrows are shape-preserving, we call these similar shapes. When the arrows are shape- and size-preserving, we call these congruent shapes.

The time when I find myself working most categorically here is when looking at similar shapes. I will often draw the two shapes with an arrow from one to the other labelled by the scale factor. This is to emphasise that the scale factor has a direction – it has a source shape (object) and a target shape (object). If we reverse the arrow, we have to take the inverse of the arrow and the scale factor changes accordingly.

## 3.4 Functions

Often, functions are presented as a key example of arrows in various categories. In this case, I want to present functions as the objects in a category.

Definition 7

The category of functions has:

• Objects: functions $ℝ\to ℝ$

• Arrows: pre and post composition by affine functions

When we think of an affine function, say $x↦ax+b$, as an arrow then we need to include the information as to whether we are pre-composing or post-composing.

## 3.5 Numbers

There are lots of categories that involve numbers. The numbers can be objects (as here) or arrows (as in the clock category). With the numbers as objects, we can consider several different types of arrow which indicate what aspect of numbers we are considering.

Definition 8

Shifts:

• Objects: some type of numbers ($ℝ$, $ℚ$, $ℤ$, $ℕ$)

• Arrows: an arrow from $a$ to $b$ is labelled by $c$ (in the same number system) where $a+c=b$

Definition 9

Scales or divisibility:

• Objects: some type of numbers ($ℝ$, $ℚ$, $ℤ$, $ℕ$)

• Arrows: an arrow from $a$ to $b$ is labelled by $c$ (in the same number system) where $ca=b$

Definition 10

Order:

• Objects: some type of numbers ($ℝ$, $ℚ$, $ℤ$, $ℕ$)

• Arrows: there is an arrow from $a$ to $b$ if $a\le b$

In this case, the arrow is unique if it exists.

The middle one is particularly interesting when one takes natural numbers because then there is an arrow from $a$ to $b$ if $a$ divides $b$, so this is a natural place to discuss multiples, factors, and all matters relating to divisibility. This is also a situation where it is common to focus on the existence of an arrow separately to the arrow itself in that we often think about when $a$ divides $b$ without concerning ourselves as to what the multiplier is.

In all of the above categories of numbers then for almost all numbers, if we have an arrow $a\to b$ then it is unique (the usual exception being $0$). We will also need the following category in which this is not true.

Definition 11

Arithmetic:

• Objects: some type of numbers ($ℝ$, $ℚ$, $ℤ$, $ℕ$)

• Arrows: an arrow from $a$ to $b$ is labelled by a pair $\left(c,d\right)$ where $b=ca+d$

# 4 Functors and Natural Transformations

If the focus of category theory is to shift the focus to the relationships between things, then it seems obvious that we should look at the relationships between categories themselves. Such things are called functors. These come in two types: covariant and contravariant. I'll explain these in Section 4.2.

I don't want to make this heavy on unnecessary definitions so I'll not give a formal definition of a functor. In short, it takes objects and arrows in one category to objects and arrows in a second category and "plays nicely" with the structure of each.

Now that we've introduced functors, the philosophy of category theory says that we should now think about how different functors are related. In other words, can we think of functors as objects in some new category and if so, what are the arrows? Unsurprisingly, the answer is "we can" and the arrows are called "natural transformations". I'll show these by example.

## 4.1 Sequences

A very simple source of examples of functors (and natural transformations) comes from considering sequences. Given a sequence, say $\left({s}_{n}\right)$, we can consider a particular term in that sequence, say ${s}_{4}$. Sending a sequence to a term like this defines a functor from a category of sequences to a suitable category of numbers. All we have to do is make sure that our category of numbers has arrows to match those from our sequences. So if we're starting with our category of arithmetic sequences then we need to ensure that our arrows in the category of numbers includes ones of the form $\left(a,b\right):x↦ax+b$, such as our arithmetic category of numbers.

So we have functors like:

 $\begin{array}{rl}{🚀}_{4}:\left({s}_{n}\right)& ↦{s}_{4}\\ {🚀}_{5}:\left({s}_{n}\right)& ↦{s}_{5}\end{array}$

(While preparing the talk, I played around with getting emoji characters in the slides. They seemed to go down much better than the more usual mathfrak style in common use for functors.)

One thing that the categorical viewpoint emphasises here is the distinction between sequences and numbers, since they are in different categories. And in making that distinction then "taking the $4$th term" perhaps becomes more of a deliberate act.

I've already mentioned the $n$th term rule for a sequence as being an arrow to it from the "counting" sequence. Let's take a concrete example, the sequence $10$, $13$, $16$, $\dots$. This has $n$th term rule $3n+7$, so categorically speaking we have:

 $\left(1,2,3,\dots \right)\stackrel{3n+7}{\to }\left(10,13,16,\dots \right)$

Now let's apply some term functors to these sequences, and the arrow between them. Taking the $4$th and $5$th term functors we get two shadows of this arrow in a category of numbers:

 $\begin{array}{c}4\stackrel{3x+7}{\to }19\\ 5\stackrel{3x+7}{\to }22\end{array}$

A natural transformation means that we need an arrow between these two lines. What this turns out to mean is that we need arrows between the objects which make it a square, as in Figure 8.

The key properties of this square are as follows:

1. Each vertical edge depends only on that sequence and not on anything in the rest of the diagram.

2. Whichever route you take from top left to bottom right results in the same overall arrow.

Note that this means more than just you end up in the same place. The arrow obtained by composing the sides involved has to be the same as well.

It turns out that the right way to fill in the vertical arrows is to use the term to term rules for each sequence, as shown in Figure 9.

So term to term rules are examples of natural transformations (incidentally, there was nothing special about arithmetic sequences here).

## 4.2 Variance

Graphs are another example of functors and these illustrate the important concept of variance. Since arrows have a direction, it is sometimes the case that one wants to think of an arrow going in the opposite direction to its natural one. For example, with the category of natural numbers and arrows for divisibility, maybe we'd rather say that there is an arrow $a\to b$ if $a$ is a multiple of $b$. This is exactly the same as the original category except that we've reversed all the arrows. For such a simple thing, it has a remarkably broad impact.

The concept of variance comes in when we have a functor that ought to be defined by reversing the arrows in one of its categories (source or target) but we don't want to acknowledge that. In this case we say that the functor is contravariant. Ordinary functors are said to be covariant. Taking graphs of functions is a nice example of how these two behave.

We start with a category of functions on the reals. Our arrows will be composition by functions of the form $ax+b$ with $a$ non-zero, but we have a choice as to whether to take pre-composition or post-composition so actually we have two categories, one for each option.

The functor assigns to a function its graph so our target category is a category of shapes in the plane. That is, for a function $f:ℝ\to ℝ$ we end up with the shape:

 ${G}_{f}:-\left\{\left(x,y\right):y=f\left(x\right)\right\}$

What happens with arrows depends on which category we started with. Let's start with post-composition, so we have $g\left(x\right)=af\left(x\right)+b$ and $\left(a,b\right):f\to g$. Then ${G}_{g}$ is related to ${G}_{f}$ by scaling by $a$ and then translation by $b$ in the $y$–direction.

With pre-composition we have $h\left(x\right)=f\left(ax+b\right)$ and $\left(a,b\right):f\to h$. Again, we have that ${G}_{h}$ and ${G}_{f}$ are related by scaling by $a$ and translation by $b$ – this time in the $x$–direction – but it is actually ${G}_{f}$ that is obtained from ${G}_{h}$ by scaling by $a$ and then translation by $b$.

This is usually expressed as ${G}_{h}$ being obtained from ${G}_{f}$ by scaling by $\frac{1}{a}$ and then translation by $-b$ so let me give a suggestion as to why this point of view might be a useful one to introduce.

The graph of $h$ consists of all those points $\left(x,y\right)$ such that $y=h\left(x\right)$. Similarly, the graph of $f$ consists of all those points $\left(x,y\right)$ such that $y=f\left(x\right)$. Let us start with a point $\left(x,y\right)\in {G}_{h}$. Then, by definition, $y=h\left(x\right)$. Now let us scale that point by $a$ and translate it by $b$, both in the $x$–direction. This results in the point $\left(ax+b,y\right)$. I claim that this is a point on ${G}_{f}$. To show that, I need to convince you that $y=f\left(ax+b\right)$. But this is immediate from the fact that $f\left(ax+b\right)=h\left(x\right)$.

I'd just like to note that there are other ways of looking at the connections between functions and graphs. Just as a line $y=\frac{2}{3}x-4$ can be rewritten as $2x-3y=12$ so the equation $y=f\left(x\right)$ can be rewritten as $y-f\left(x\right)=0$ and we can think of it rather as looking at the zero set of a function $F:{ℝ}^{2}\to ℝ$. This can make the apparent disparity between $x$ and $y$ disappear since it treats them the same: instead of pre- and post-composing $f$ we simply pre-compose by suitable functions in both $x$ and $y$. For example, if we pre-compose by an invertible affine transformation, say $T:{ℝ}^{2}\to {ℝ}^{2}$, then the zero set of $F$ is obtained by applying $T$ to the zero set of $F\circ T$. This is the same argument as above, in that if $\left[\begin{array}{c}x\\ y\end{array}\right]$ is such that $\left(F\circ T\right)\left[\begin{array}{c}x\\ y\end{array}\right]=0$ then if we set $\left[\begin{array}{c}u\\ v\end{array}\right]=T\left[\begin{array}{c}x\\ y\end{array}\right]$, we have:

 $F\left[\begin{array}{c}u\\ v\end{array}\right]=F\left(T\left[\begin{array}{c}x\\ y\end{array}\right]\right)=\left(F\circ T\right)\left[\begin{array}{c}x\\ y\end{array}\right]=0$

and so $\left[\begin{array}{c}u\\ v\end{array}\right]$ is in the zero set of $F$.

One of the features of category theory is the plethora of adjoint functors. Sometimes it seems as though any functor worth studying is part of a pair of adjoint functors. Many, many years ago I asked on MathOverflow for an intuitive view of adjoint functors. I think that the best replies had a sense of partial inverse, or "nearest" solution to an unsolveable problem.

Perhaps surprisingly – or perhaps not if you have been keeping up so far – we can find adjoint functor pairs at a very early stage of school mathematics. Our category in this case is simply $ℕ$ with an arrow $a\to b$ if $a$ divides $b$. There is just have one such arrow if it exists.

For purposes of illustration I need to pick a number and I'm going to use $12$. There is nothing special about this number and everything I say would work equally well with any number (though $0$ and $1$ are a little bit special). This choice defines a subcategory of $ℕ$ consisting of all multiples of $12$ which I will write as $12ℕ$. We can now define some functors between these categories:

 $\begin{array}{rlrl}😎:12ℕ& \to ℕ,& a& ↦a\\ 😟:ℕ& \to 12ℕ,\phantom{\rule{1em}{0ex}}& a& ↦lcm\left(12,a\right)\\ 😻:ℕ& \to ℕ,& a& ↦12a\\ 🙀:ℕ& \to ℕ,& a& ↦\frac{a}{hcf\left(12,a\right)}\end{array}$

Let's start with the first of these, $😎:12ℕ\to ℕ$. This takes a multiple of $12$ and simply regards it as a number so "forgets" that it was a multiple of $12$. There's clearly no inverse here as there's plenty of numbers in $ℕ$ that aren't multiples of $12$. So I want to find something that is as close to an inverse as I can find.

This means that given a number, which may or may not be a multiple of $12$, then I want to construct a multiple of $12$ out of it. If I were to start with a number that happens to already be a multiple of $12$ then I want to stick with that number. Also, this has to preserve divisibility in that if $a$ divides $b$ then whatever $a$ goes to has to divide whatever $b$ goes to. Combining these two, we get that if $b$ happens to be a multiple of $12$ and $a$ divides $b$ then the result of applying this functor to $a$ must also still divide $b$.

The solution is to send a number to its lowest common multiple with $12$, which I've called $😟$ in my list above. This has the required properties:

• If $a$ divides $b$ then $lcm\left(a,12\right)$ divides $lcm\left(b,12\right)$.

• If $a$ is already a multiple of $12$ then $lcm\left(a,12\right)=a$.

• If $b$ is a multiple of $12$ and $a$ divides $b$ then $lcm\left(a,12\right)$ still divides $b$.

The first of these is actually just saying that $😟$ is a functor. It is the second two that establish that $😟$ is adjoint to $😎$. (For those that know a bit about these, these two functors have a few extra special properties as well which is why the second statement is stronger than one might expect.)

Let's turn our attention to the other two functors. These are defined just on $ℕ$. We face the same problems trying to invert $😻$, multiplication by $12$, as we did with $😎$: not everything is in its image. The naïve inverse is dividing by $12$ but that only works on things that are definitely multiples of $12$. So we need something that will act as division by $12$ where that is possible but also still do something sensible when it isn't.

It turns out that the correct answer is to divide by the highest common factor of $a$ and $12$ (also known as the greatest common divisor). This is the functor $🙀$ above. What this does is to divide by as much of $12$ as it is possible to do.

The adjuntion rule means that for two numbers $a$ and $b$ then the following two statements are equivalent:

1. $a$ divides $12b$,

2. $a÷hcf\left(a,12\right)$ divides $b$.

# 5 So What?

I plan to expand on the "So What?" section of the original talk in another article, but it wouldn't be right to end this without a smidgen of "so what?".

Let me start with the "So definitely not". I am very definitely not suggesting that school students be taught category theory, nor even that teachers are. My point is much more simple than that. It is that if I can find category theory in school mathematics in an unforced way, then it is time to stop claiming that school mathematics is different to "real" mathematics. The analogy that I used in my talk was the following:

 School Mathematics The Hobbit University Mathematics The Lord of the Rings Research Mathematics The Silmarillion

You cannot find every theme from LOTR in the Hobbit, nor from the Silmarillion in LOTR. Even when they are there they are often simplified. But (to a large degree, particularly given that these are fictional stories2) the stories are consistent with each other. Themes are developed in the deeper works, expanded on, and explained, but not contradicted. In the Hobbit the animosity between Elves and Dwarves is largely taken as fact. In the LOTR it is developed further through the friendship of Legolas and Gimli. In the Simarillion it is explained through the history of the interactions of the races. But while each further work contains more than the Hobbit did, none tell us that what was in the Hobbit was actually incorrect.

2Apologies to anyone for whom that comes as a shock.

And that's what I would like to see between these branches of Mathematics.