Differentiating Opinion

Andrew Stacey

2025-05-11

Creative Commons License

Contents

  1. Home

  2. 1. A Bit of BlueSky Thinking

  3. 2. A Derivative with a View

    1. 2.1. Algebraic

    2. 2.2. Dynamic Derivatives

    3. 2.3. Geometrical Differentiation

  4. 3. More Geometry

  5. 4. Pedagogical Purposes

  6. 5. A Geometrical Model

  7. 6. Teaching Tangency

  8. 7. Bonus Round

  9. 8. Conclusion

1 A Bit of BlueSky Thinking

On the 3rd of May, 2025, I posted the following opinion on BlueSky:

Recent realisation about my understanding of calculus:

BlueSky posts are limited in length so I was slightly counting on people understanding what I meant by this, namely that the focus was on the conceptual understanding. And, since I have the space here, let me clarify that the concept I was particularly interested with was differentiating x2, and perhaps more generally powers of x, but not differentiation in general.

Indeed, none other than Tim Gowers posted:

I think I reacted to that in exactly the way you hoped. Stage 1: Hang on, isn't differentiation easier than integration? Stage 2: Ah, I get it: the area under the graph of y=2x is just the area of a triangle, which is very elementary, whereas the gradient of y=x2 is not at all elementary.

There was then much discussion with people giving their intuition for differentiation.

2 A Derivative with a View

Looking through the various examples then I feel that they can be divided into algebraic, dynamic, and geometrical models.

2.1 Algebraic

Chad Topaz commented:

To me, any shortcut differentiation rule is just an artifact of algebra applied to the conceptual meaning of differentiation-the change in the output of a function occuring for an infinitesimal increase in the input.

The phrase "artifact of algebra" struck a chord1 in that I think it perfectly sums up my feeling. Showing that the derivative of x2 is 2x feels like a consequence of a bit of algebraic manipulation and not something where there's some intuition that says it's the obvious answer.

1I shall try to stay away from puns … I shan't always succeed

A few people posted variants of looking at (x+Δx)2, but Barbara Fantechi posted an intriguing comment:

I can differentiate every rational (ratio of polynomials) function f without using limits, and for any field of coefficients. Since F(a)=0 implies the existence of a unique function G such that G(x)(x-a)=F(x), I define f'(a) as G(a), where F(x)=f(x)-f(a). Slogan: algebra = no limits.

This is something I will return to a bit later.

Benjamin Dickman's approach was a bit more numerical in nature, and contains the wonderful phrase "By the fundamental theorem of aesthetics…". In full, he writes (emphasis mine):

For ax2+bx+c, I feel the derivative should be linear (viewing derivative as function that takes input and gives as its output slope of the tangent line). Vertex is at x=-b2a, so linear function sends that to 0. By the fundamental theorem of aesthetics, we should multiply by 2a, then add b: 2ax+b

The emphasised text is exactly what I'm trying to dig into: why does one feel that the derivative should be linear? And, more importantly, how do I instil that feeling in my students?

2.2 Dynamic Derivatives

Most people's models for differentiation involved something dynamic. This makes sense, and the concept of rate of change is possibly the most intuitive way to understand it. So all this needs is something that varies as x2.

Two obvious ones are a square and a circle, as Akiva Weinberger posted with a version of the following diagram.

Figure 1: A Dynamic Square

Rate-of-change also links differentiation to a very concrete model, namely distance and speed (or, more correctly, velocity and displacement). Indeed, x2 fits quite well here as the equation of motion under gravity is a quadratic.

2.3 Geometrical Differentiation

I was really after something geometrical, where one could look at the graph of y=x2 and just see that the gradient of the tangent at x was 2x. This is to match the area of a triangle approach, as outlined above by Tim Gowers, where one can just look at the graph of y=2x and see that the area under the graph is x2.

There were actually quite a few insights into geometrical derivatives for y=x2 but they all focussed2 on some aspect of its being a parabola that we no longer teach (and, indeed, even when it was on the syllabus then conic sections came way after simple differentiation).

2What was that about puns again?

Tim Corica posted one such example:

You need a geometric property of a parabola to take the place of the "radius is perp[endicular] to [the] tangent line" for a circle. I think this might be the focus/directrix/reflection property, but it has many moving parts. I think the image does this (tho it's cryptic!). Maybe you can fix it.

This was accompanied by a diagram, partially reproduced in Figure 2, demonstrating the link between the focus-directrix property of a parabola and its tangent. In this diagram, the focus of the parabola is at (0,14) and the directrix is the dashed line at y=14. The quadrilateral is a rhombus, and so its diagonals are perpendicular. With x–coordinate a then the gradient of the line from the focus to the directrix is -12a so the gradient of the bisector is 2a. This is also the tangent at (a,a2).

Figure 2: A Geometrical Approach to the Tangent of a Parabola

This did get me wondering whether there was some approach to tangents along the lines of thinking of the curve as a mirrored surface and looking at reflections, but I'm not sure that would be sufficiently intuitive to justify the derivative of x2 being 2x.

3 More Geometry

In the course of the discussion, I decided that I was specifically looking for a geometric reason. Rather, I realised that that was what I'd been looking from the start3.

3So I didn't move the goalposts, but discovered where the goalposts had been all along.

To explain this, I need to give a bit of background. My academic research was very much focussed on differential geometry. In particular, I spent a lot of time thinking about how the core concepts in differential geometry extended to situations they weren't originally designed for. One of these was a notion called a tangent space. So I have spent a lot of time thinking very specifically about just this issue, and nevertheless still have a sense of "huh?" about the fact that the derivative of x2 is 2x.

Through my research, and based on my earlier studies, I built an intuition of differentiation based very much on the concept of rate-of-change. It's a very dynamic model, and I think it is a very strong one and one that I think students would do well to internalise.

But – if you'll forgive a moment of technicality – there's a subtle difference between a tangent vector and a tangent space. And it is this that is at the heart of my unease.

Let me make it a bit clearer with an example. Let's stick with the parabola y=x2 sitting inside the xy plane. Imagine it is a path and you are cycling along it at night.

As you cycle along, your velocity is a vector that describes your motion. Because you are constrained to cycle on the path, it is always tangential to the path and its magnitude is your speed. You can feel this as you move.

Now, as a good cyclist you have your lights on and your front light shines in the direction you are facing. Let's also imagine that it is slightly foggy so the beam from your light is visible. This forms a line from the point at which you are at that moment and as it is in the direct you are heading then it is also tangential to the path. But the line doesn't care how fast you are going since it stretches as far as you can see4.

4Depending on what batteries you have, that is; actually if you have a dynamo then it slightly does depend on how fast you're going.

Your velocity is a tangent vector, your lights form the tangent space.

And it is this latter that I want to get some intuition for.

4 Pedagogical Purposes

During the discussion, Rob Low shared a post that he'd written seemingly on this issue. In it, he argues for the most useful interpretation of differentiation to be the rephrasing of it as:

f(x+h)f(x)+hf'(x)

And I agree with his conclusion.

However, we are discussing different times in a student's journey with calculus. I think it takes a bit of time to get to the stage where this reformulation feels right, so this is a staging post on that journey.

What I am interested in is building a student's initial intuition. The rate-of-change model is good, but I think it is useful to have a second source of intuition from a geometric model. One reason for this is that I think that the connection between the concept of function-as-process and function-as-graph is not strong enough to bear the weight of transferring the intuition between the two. So I'd like to have an intuitive notion of derivative on the graph side so that the two models can reinforce each other and also reinforce that connection.

5 A Geometrical Model

The goal is to have something that builds intuition. So, crucially, it doesn't have to be perfect5. More importantly, it should start from what students already know about tangents.

5A teacher will be aware of any imperfections and ensure that they don't set up residence in students' consciousness.

And that, in short, is most anchored in the notion of a tangent of a circle. Here, we have the simple fact that the tangent touches the circle but once.

Now, I know that this is problematic with more general curves. But let's put that to one side for the moment, particularly as it doesn't cause issues for parabolas.

The most obvious journey from this is to attempt to draw tangent lines, then give up and draw chords, do a bit of hand-waving and – as if by algebraic artifice – the usual quotient appears.

This is where I want to return to Barbara Fantechi's post. After letting it settle in my mind for a bit, I realised that there was a nice twist on the usual story that would lead to Barbara's formulation.

A Level Mathematics contains a little bit on circles. A common task is to consider the intersection of a line and a circle, where the line has some parameter, and find the values of the parameter where the line is tangent to the circle. This usually involves forming a quadratic and finding where its discriminant is zero.

There's a nice dynamism here with the line varying with the parameter from a line that doesn't intersect the circle through to one that intersects it twice, and the boundary between the two is where the line is a tangent.

Consider the special case of this where the gradient is fixed and the parameter controls the position of the line, say y=mx+c where m is fixed (but arbitrary) and c varies.

Figure 3: A Family of Parallel Lines

Let's play that game, but with the parabola y=x2 instead of a circle. The line and parabola intersect when x2=mx+c and this rearranges to:

x2-mx-c=0

For this to have a unique intersection point it must have a single root, which we can find either by setting the discriminant to zero or by completing the square:

(x-m2)2-m24-c=0

For this to have a single root we must have c=-m24 and the intersection point is at x=m2.

So the line y=mx+c is tangent to the parabola when c=-m24 and the point of tangency is (m2,m24).

The point is that we flip the narrative. Rather than asking "what is the gradient of the tangent line to the curve y=x2 at x=k?" we ask "when is the line y=mx+c tangent to the curve y=x2?".

But we want the answer to the first question, not the second. Fortunately, we can take our answer to the second and re-flip it back to answer the first. Our answer was: "it happens when there is only one solution to x2=mx+c". Also, our answer involved keeping m fixed and varying c. The point of tangency came afterwards.

In our reframing, we start with the point of tangency, say x=k. Then we want to ask "when does x2=mx+c only have one solution?" with the extra condition that the solution happen at x=k.

For there to be an intersection at x=k we must have k2=mk+c which rearranges to c=k2-mk. So we're looking for a change in the number of solutions to:

x2=mx+k2-mk

and this rearranges to

x2-k2=m(x-k)

This has a solution at x=k, for any m, and this is "by construction". We want to ensure that there are no other solutions. So let's factorise the left-hand side to get:

(x+k)(x-k)=m(x-k)

which leads a solution where x+k=m, or x=m-k. As we want x=k to be the only solution, we must have k=m-k, or m=2k.

Look familiar?

Let's unpack and generalise that a little.

For a polynomial p(x), we have the question "what is the gradient of the tangent line to the curve y=p(x) at x=k?". To answer that, we first flip it and ask "when is the line y=mx+c tangent to the curve y=p(x)?".

Our answer was: "it happens when p(x)=mx+c has a double root". Note the slightly difference in language from "one solution" to "has a double root". This can be motivated by looking at cubics where there is an extra solution elsewhere.

In our reframing, we start with the point of tangency, say x=k. Then we want to ask "when does p(x)=mx+c have a double root at x=k?". Note also the additional requirement that the double root be at x=k.

This extra condition implies that p(k)=mk+c which rearranges to c=p(k)-mk. Then the intersections of this line with y=p(x) are when

p(x)=mx+p(k)-mk

and this rearranges to:

p(x)-p(k)=m(x-k)

We know that this has a solution at x=k, and we want to fix m so that there is a double root (at least).

Since p(x)-p(k) is zero at x=k we can divide by x-k to write:

p(x)-p(k)=q(x)(x-k)

So we want there to be (at least) a double root at x=k of

q(x)(x-k)=m(x-k)

which means we want (at least) a single root of q(x)=m at x=k.

And this implies that m=q(k).

Summarising, we want p(x)-p(k)=m(x-k) to have a double root at x=k so we start by dividing p(x)-p(k) by x-k to get the quotient q(x). Then we want q(x)=m to have a solution at x=k so we'd better have m=q(k).

To paraphrase Barbara:

Look, Ma! No limits!

6 Teaching Tangency

The previous section was written for "experts" in that it aimed to show how the familiar story of chords approaching tangents can be reframed into lines approaching tangents to produce a subtly different approach to differentiation. To students, approaching the concept for the first or nearly first time, it would look different.

Firstly, the pre-requisites. These are the notion of the tangent to a circle, and solving lots of problems about lines and circles intersecting. Then there is polynomial division and the idea of the factor theorem.

The first task would be to look at y=x2 and the question "When is y=mx+c a tangent to this curve?". Initially with actual values for m and allowing c to vary, then with an arbitrary but fixed m and allowing c to vary. After that, it should be a simple rearrangement to say "What if we wanted the tangent at x=k?".

The next step would be generalising to other quadratics.

Then we introduce cubics, and this leads to the "double root" version rather than the single point of intersection. As finding double roots of cubics leads to solving quadratics, the algebra is still very much within students' grasp to look at this from the fixed gradient perspective first.

Flipping this to looking for the tangent at a point puts the focus much more on the division.

By this stage, we should be ready to accept the following as a procedure: to find the gradient of the tangent of p(x) and x=k then divide p(x)-p(k) by x-k and then substitute in x=k.

Here, I'd consider putting in a pause. I'd switch topics, do something else for a bit while they do some practice on this, and then return a little later.

On return, it's time to look for shortcuts. Here's where we observe that if, say, we divide xn-kn by x-k then we get:

xn-1+kxn-2+k2xn-3++kn-2x+kn-1

and then on substituting x=k we get nkn-1 because there are n terms there. Notice also the nice link to sums of geometric series!

Then both division and substitution respect linearity so once we have a pattern for xn then we can extend it to all polynomials.

7 Bonus Round

In her messages, Barbara Fantechi also pointed out that it works for rational functions. So we should be able to use this to differentiate, say, y=1x. Let's try!

We're looking for 1x=mx+c to have a "double root" at x=k. As before, for there to be an intersection at k then c=1k-mk so we're looking for a "double root" of:

1x-1k=m(x-k)

Putting the left hand side over a common denominator6 we get:

6Looking at you, partial fractions!

k-xxk=m(x-k)

and so dividing both sides by x-k, we want x=k to be a solution of:

-1xk=m

which means that m=-1k2.

Now, in actual fact this method works even more generally and in particular for any function that students are going to meet. The difficulty is merely in dividing p(x)-p(k) by x-k. But as this is all about promoting conceptual intuition, some well chosen examples might be all that's needed.

For example, for x we just need to note that x-k is the "difference of two squares" so:

x-k=(x-k)×1x+k

which means that the gradient, m, is:

m=12k

This method extends to [n]x.

8 Conclusion

I think I have a better understanding of why the derivative of y=x2 is 2x. Or, perhaps, I have a better understanding of how the tangent space of y=x2 at (k,k2) is the line y=2kx-k2. I think I also have a pedagogical path for differentiation that starts in a better place than the "traditional" one, ends in the same place, and doesn't deviate from it too much.

But Calculus is a huge subject and is still developing today. There is no perfect route to building a student's understanding and so it might be that I would use this approach not as an introductory one but rather as a reinforcement later on after the students are familiar with what I usually refer to a procedural differentiation but which, inspired by Chad's characterisation, I shall now call "algebraic artifice".

I'd like to conclude by thanking everyone who took part in the discussion on BlueSky one lazy bank holiday weekend. I did try to read everything that was written, and whether I included your contribution here or not it was very much valued.