Chain Rules Supreme

Andrew Stacey


Creative Commons License


  1. Home

  2. 1. A Matter of Squares

  3. 2. Introducing the Chain Rule

  4. 3. Squares to the Rescue

  5. 4. Attribution

  6. 5. Postscript: Using Partial Derivatives

1 A Matter of Squares

As a lead-in to the main point of this post, here's a curious fact: it's easy to see that if you know how to multiply [two numbers] then you know how to square [a number]. Did you know that the reverse is true? That is, if you know how to square [a number] then you can do multiplication as well.

It does rely on being able to do a couple of other things; namely, addition and division by two1 but nevertheless, this separates out what of multiplication is due to multiplication and what comes from the underlying structure of the things being multiplied.

1So you need to watch your characteristic if trying to apply this to other rings, and commutativity plays a part as well.

It's not even that complicated. Here's how it's done.

  1. Start with the two numbers you want to multiply, let's say 7 and 3.

  2. Square their sum: (7+3)2=102=100.

  3. Square them individually: 72=49 and 32=9.

  4. Subtract the individual squares from the square of the sum: 100-49-9=42.

  5. Halve the result: 42÷2=21.

And observe that 7×3 is, indeed, 21.

The proof is a bit of straightforward algebra. Starting with two numbers, a and b, then the claim is that the product ab is equal to half of (a+b)2-a2-b2. Expanding out the square of the bracket2 yields:

2This is where commutativity comes in.


and so half of that is ab.

In and of itself, this isn't exactly an earth-shattering result. However, it does turn out to be an incredibly useful tool – finding its way into the relationship between an inner product and its norm, and more generally between a bilinear form and its quadratic form. There are situations, such as those, where squaring is viewed as a separate operation to multiplication and knowing that they determine each other is useful. It also, as alluded to above, shows a little bit of how much of multiplication belongs to itself and how much comes from the space on which it is defined.

What I want to do is use it to show how the Chain Rule reigns supreme above all else.

2 Introducing the Chain Rule

In A level Mathematics3 there are three "rules" introduced for differentiation: the chain rule, the product rule, and the quotient rule.

3Studied in the last two years of school in the UK.

I doubt I'm alone in considering the chain rule to be the primary one in this list. However, explaining why that should be to school students is something that I've not previously found easy to do. It's simple enough to demonstrate that the quotient rule is redundant since one can just use the product rule with the chain rule in its place4. But, until recently, if asked for a similar argument to demonstrate the redundancy of the product rule then I'd've prevaricated because the one that I knew uses partial derivatives (for completeness, I've included it at the end of this post) and that takes us too far outside the curriculum.

4Which makes it a bit daft that the quotient rule is the only one of the three that is listed in the A level formula booklet.

To be clear, I'm not saying that the product rule (or quotient rule) aren't useful or shouldn't be in the curriculum. I'm just saying that there is a hierarchy of rules and the chain rule is very definitely at the top. I'm also not saying that the fact that the others can be deduced from it is the real reason for the primacy of the chain rule. The real reason is that it establishes the functorality of differentiation and so establishes it as a categorical tool. But I'm even less likely to say that to school students!

3 Squares to the Rescue

I recently learnt of a simple way to demonstrate that the product rule can be deduced from the chain rule, and it uses the squaring trick from the first section.

Let's start by considering the square of a function. So let h(x) be a differentiable function5 and consider differentiating h(x)2. This can be done by the chain rule:

5I'll work over , but the idea would generalise.


Now consider two differentiable functions, f(x) and g(x). Our aim is to differentiate their product, f(x)g(x). Inspired by the squaring idea, we'll consider differentiating the square of their sum:


We can expand the brackets on both sides and use the linearity of differentiation to see that:


We recognise the first and last terms on each side as being equal and so removing them leaves us with:


Whence, providing we can divide by 2, the product rule drops out.

Now, I wouldn't advocate using this as an introduction to the product rule. And indeed, I might stick with my current justification (using f(x+h)f(x)+hf'(x)) simply because it reinforces an important use of the derivative. Nevertheless, I do at last have an explanation for how to deduce the product rule from the chain rule that I can use in a classroom.

And there's something satisfying about that.

4 Attribution

I said above that I'd recently learnt of this method, implying that I learnt it from somewhere else. I didn't. I deduced it myself. However, I'd be extremely surprised to learn that it wasn't already known (and possibly well-known in some circles). It's just that I haven't come across it elsewhere.

But my phrasing "learnt of" is deliberate even beyond that because even though I deduced it for myself, I had a very definite prompt. I can't attribute the result to the prompt (and thereby the prompter) because they didn't know what they were prompting and I still had to make the connection to this result. However, it does make it feel very much like discovery rather than invention.

Just for interest's sake, the prompt was a student considering how to differentiate the square of a function, I think it was (ln(x))2. My instinct was to use the chain rule, and so that came to the forefront of my mind on seeing it. The student thought of it as ln(x)×ln(x) and so used the product rule. That was enough to bring to mind the link between squaring and multiplying, and to pose the question as to whether that could be used to connect the chain rule to the product rule.

Just to show that Pratchett is completely right with his theory of inspiration particles, I'm fully aware of the way of proving the derivative of x2 using the product rule and yet had never made the connection with that starting point.

The mind is curious.

5 Postscript: Using Partial Derivatives

To make this post complete, here's how to use partial derivatives to deduce the product rule from the chain rule. We start with the multiplication operator, M(x,y)=xy. It has total derivative:


Given two differentiable functions, say f(x) and g(x), we can view their product f(x)g(x) as the composition of M with the map:


The derivative of this is:


The chain rule then says that to find the derivative of xM(f(x),g(x)) we compose these derivatives to get: