Contents
1 Introduction
Back when I was an academic Mathematician, a common question at a social gathering (after the usual opening gambit of "I was never very good at Maths at school.") was "What do you actually do all day? Add up big numbers?"1. I'd invariably answer with something about trying to invent new Mathematics, but then the conversation would get bogged down by a question about where the ideas come from. I found that hard to answer. I might say that I was working on jubjub birds because I'd come across it when studying bandersnatches, which I'd been led to from my study of slithy toves. And so on. By the time I got to anything resembling the "real world", my interlocutor had inevitably moved on.
1If the person was trying to show off, they'd say "multiply".
But there is one result of mine that I know the story of extremely well and it is short enough to tell. It recently resurfaced on the internet, and in a bit of chatter about it I considered explaining the story behind it. The result itself may not be an extremely exciting one, but the purpose of this is not to sell you the result but to show how it arose naturally and developed as I also developed as a Mathematician.
2 The Result
Here's an attempt at explaining the result. Start with Pascal's triangle, only left justify it.
$$\begin{array}{c}1\\ 1& 1\\ 1& 2& 1\\ 1& 3& 3& 1\\ 1& 4& 6& 4& 1\\ \vdots \end{array}$$ 
Now pick a square of numbers which starts somewhere on the left hand side. Empty slots are considered to be $0$. For example,
$$\begin{array}{cccc}1& 2& 1& 0\\ 1& 3& 3& 1\\ 1& 4& 6& 4\\ 1& 5& 10& 10\end{array}$$ 
Here's the result: this matrix has determinant $1$.
3 The What has What?
If that seemed fine up until the last sentence, this section is for you. If you're fine with matrices and determinants, feel free to skip to the next section (though you might find it useful to know how I'm going to talk about them so skimming through this section might be of some value).
Let's start with the basic definition.
A matrix is a rectangular grid of numbers.
We put brackets around the grid to indicate that we consider it as a single object. Examples are:
$$\left[\begin{array}{ccc}1& 2& 3\\ 4& 5& 6\end{array}\right],\phantom{\rule{thickmathspace}{0ex}}$$ 
If you read a book with matrices in it, you will probably be told that a matrix is something more impressive2. It isn't. This is it. However, a matrix can represent many different things and that's the reason why they are such an important part of Mathematics. Many problems in disparate areas of Mathematics can be phrased in terms of matrices and then solved using a set of standard techniques.
2Especially if that book is the novelisation of The Matrix.
As far as we're concerned, a rectangular grid of numbers is as deep as we need to go.
The determinant is something a little more complicated, but we don't need a deep understanding of it. It only makes sense for socalled square matrices, which are matrices with the same number of rows as columns.
The determinant of a square matrix is a number that summarises that matrix. It is calculated from all of the entries of the matrix by a series of multiplications, additions, and subtractions.
Thus the result says that if you take a square from the lefthand side of the leftjustified Pascal's triangle and "summarise" it, you always get the number $1$.
4 Why?
The story starts when I was teaching an undergraduate topic called Linear Algebra. This is the topic that studies matrices and builds the standard techniques used to study them. As part of teaching the course, I had to set problems for the students to practise the techniques. This involved devising matrices with certain properties. I could build these matrices from certain ingredients, one of which was a square matrix. If that square matrix had determinant $1$ then the steps involved in solving the problem would be numerically simpler (in that there would be no fractions) and so would be better suited for early practice of the methods. So I needed a ready source of square matrices with determinant $1$ and digging them out of Pascal's triangle was quite an easy method of obtaining them3.
3For completeness, I should say that there are better ways than using Pascal's triangle.
One important aspect to note here is that I didn't need a proof of the result at this time. Whenever I used this result in building a problem, I would take a matrix out of the side of Pascal's triangle and then verify it had determinant $1$ by putting into a computer algebra system.
So I was in the position of knowing something to be true for practical purposes, but not needing to know it to be true for all purposes.
I expect that I figured out a simple proof for small matrices ($2\times 2$ and $3\times 3$), but the fact that I don't have a firm memory of it shows that it wasn't very important to me at the time.
The story now skips forward a couple of years (and several thousand miles) to a time when I was sitting round a lunch table with a group of Mathematicians. One of these happened to remark that in their latest work they needed a particular result about determinants of matrices drawn from Pascal's triangle. My memory is a little hazy, but I think they said that they didn't have a proof but had verified it in many cases. The result was not quite the same as mine: instead of taking a contiguous block, their method needed a matrix taken from every other line. Also, the resulting determinant was a power of $2$. Nevertheless, it was sufficiently similar that it brought to mind my old result and I wondered if they were two cases of a more general result.
What is important to note here is that now someone else is interested in the same, or nearly the same, result. So it becomes not a neat trick for making problems but something that someone else would like to know about. This then gives me an incentive to figure out what's really going on, and to ask whether the result that I knew worked for practical purposes was actually always true.
At the same time, in my research then I was learning about something called numerical polynomials. This had absolutely no connection with the ideas that had led to my discovery but had come from my work in Algebraic Topology. However, it turned out to be a key ingredient in the proof of the result. As it's a key ingredient, I'll explain it in the next section.
5 Newton's Polynomials
There is a formula for the terms in Pascal's triangle. The $k$th term in the $n$th row is given by:
$$\left(\begin{array}{c}n\\ k\end{array}\right)=\frac{n!}{k!(nk)!}=\frac{n(n1)(n2)\dots (nk+1)}{k(k1)(k2)\dots 1}$$ 

(Note that we're zerobased here.) This is also the number of ways of choosing $k$ objects from a set of $n$.
We can think of this as a function by varying the inputs. It's common to fix $n$ and allow $k$ to vary, which gives us the entries across a row. But for our purposes, we will fix $k$ and allow $n$ to vary which will give us the entries down a column. We can actually make sense of this for nonintegers by using the extreme righthand side of Equation (1), so we define:
$${b}_{k}(x):\frac{x(x1)(x2)\dots (xk+1)}{k(k1)(k2)\dots 1}$$ 
When $n\ge k$ is a positive integer, we have that ${b}_{k}(n)=\left(\begin{array}{c}n\\ k\end{array}\right)$. The first few are:
$$\begin{array}{rl}{b}_{0}(x)& =1\\ {b}_{1}(x)& =x\\ {b}_{2}(x)& =\frac{1}{2}x(x1)=\frac{1}{2}{x}^{2}\frac{1}{2}x\\ {b}_{3}(x)& =\frac{1}{6}x(x1)(x2)\end{array}$$ 
These are known as Newton's polynomials4.
4Newton used them in his generalised binomial theorem.
These functions have some very nice properties. In this, and the following, we'll restrict $n$ to integers (positive and negative) and $k$ to positive integers (including $0$) while $x$ can be any real number.

For any $n$, ${b}_{k}(n)$ is an integer.

${b}_{k}(x)$ is a polynomial of degree $k$.

For $k>0$ and $0\le n<k$, ${b}_{k}(n)=0$.

For any $k$, ${b}_{k}(k)=1$.

For any $k$, the values of ${b}_{k}(n)$ for $n$ positive is the $k$th column of the leftjustified Pascal's triangle.
That last property suggests that we could consider extending Pascal's triangle upwards using the ${b}_{k}(x)$ at negative integers, and indeed this makes sense to do but we shan't pursue it here.
These properties make the family $\{{b}_{k}(x)\}$ a very special set of functions. Although they typically have fractions for coefficients, they "happen" to map integers to integers. Such polynomials are called numerical polynomials. The other properties mean that any polynomial with that property can be written as a sum of these functions, where each appears an integral number of times.
In technical terms, the $\{{b}_{k}(x)\}$ form a basis for the $\mathbb{Z}$–module $Int(\mathbb{Z}):\{p\in \mathbb{Q}[x]:p(\mathbb{Z})\subseteq \mathbb{Z}\}$.
Regardless of whether you read or skipped that last sentence, there is a crucial point: if $p(x)$ is a polynomial of degree $k$ with rational coefficients which happens to take integers to integers then there are integers ${a}_{0},\dots ,{a}_{k}$ such that:
$$p(x)={a}_{0}{b}_{0}(x)+{a}_{1}{b}_{1}(x)+\dots +{a}_{k}{b}_{k}(x)$$ 
An important point to notice here is that we don't use any ${b}_{j}(x)$ of degree higher than our original polynomial.
The reason I was learning about these polynomials is that they play a very important role in Algebraic Topology. They describe, in a natural way, a particular object that I was studying at the time. Although there were lots of connections to binomials, and so to Pascal's triangle, the context meant that I rarely thought of ${b}_{k}(x)$ as a function of numbers – I was putting in much more complicated objects in place of $x$. So although I was studying the pieces I needed to prove my result, the connection wasn't obvious until I was prompted to look at it more closely by the conversation with the other Mathematicians.
6 Assembling the Proof
I'm not going to give the full proof here, but I will sketch it out for those reading this who might be interested. I've tried to keep it elementary, but experience with determinants and matrix multiplication will be helpful in following it.
When we take a square matrix from Pascal's triangle, we are choosing particular rows and then taking the first few elements from each one. Since we're taking the first elements, each row is given by taking the polynomials ${b}_{0}(x)$ to ${b}_{k}(x)$ and evaluating them on the row number. So if our rows are ${r}_{0}$ to ${r}_{k}$, the matrix is:
$$\left[\begin{array}{cccc}{b}_{0}({r}_{0})& {b}_{1}({r}_{0})& \dots & {b}_{k}({r}_{0})\\ {b}_{0}({r}_{1})& {b}_{1}({r}_{1})& \dots & {b}_{k}({r}_{1})\\ \vdots & \vdots & & \vdots \\ {b}_{0}({r}_{k})& {b}_{1}({r}_{k})& \dots & {b}_{k}({r}_{k})\end{array}\right]$$ 
(Because we're zerobased, this matrix has $k+1$ rows and columns.)
In the cases we're examining, the ${r}_{i}$ have a very specific pattern. They are of the form ${r}_{i}=p+qi$ where $p$ and $q$ are integers. So our matrix is:
$$\left[\begin{array}{cccc}{b}_{0}(p)& {b}_{1}(p)& \dots & {b}_{k}(p)\\ {b}_{0}(p+q)& {b}_{1}(p+q)& \dots & {b}_{k}(p+q)\\ \vdots & \vdots & & \vdots \\ {b}_{0}(p+qk)& {b}_{1}(p+qk)& \dots & {b}_{k}(p+qk)\end{array}\right]$$ 
(In actual fact, we can deal with more variation than this, but this will be enough to get the idea across.)
Here's the trick. Rather than thinking of that as a matrix of numbers, we look at it as a matrix of functions. To do this, we just need to add in a judicious choice of variable in each cell:
$$\left[\begin{array}{cccc}{b}_{0}(p+q{x}_{0})& {b}_{1}(p+q{x}_{0})& \dots & {b}_{k}(p+q{x}_{0})\\ {b}_{0}(p+q{x}_{1})& {b}_{1}(p+q{x}_{1})& \dots & {b}_{k}(p+q{x}_{1})\\ \vdots & \vdots & & \vdots \\ {b}_{0}(p+q{x}_{k})& {b}_{1}(p+q{x}_{k})& \dots & {b}_{k}(p+q{x}_{k})\end{array}\right]$$ 
We get our original matrix back by setting ${x}_{i}=i$.
Now for each $j$, ${b}_{j}(p+qx)$ is still a polynomial of degree $j$ which takes integers to integers. Therefore it can be written as a sum of ${b}_{0}(x)$ up to ${b}_{j}(x)$. Let us write:
$${b}_{j}(p+qx)={a}_{0j}{b}_{0}(x)+{a}_{1j}{b}_{1}(x)+\dots {a}_{jj}{b}_{j}(x)$$ 

and note that this holds for any $x$.
There is a way of putting these numbers into matrices which results in the expression:
$$\left[\begin{array}{cccc}{b}_{0}(p+q{x}_{0})& {b}_{1}(p+q{x}_{0})& \dots & {b}_{k}(p+q{x}_{0})\\ {b}_{0}(p+q{x}_{1})& {b}_{1}(p+q{x}_{1})& \dots & {b}_{k}(p+q{x}_{1})\\ \vdots & \vdots & & \vdots \\ {b}_{0}(p+q{x}_{k})& {b}_{1}(p+q{x}_{k})& \dots & {b}_{k}(p+q{x}_{k})\end{array}\right]=\left[\begin{array}{cccc}{b}_{0}({x}_{0})& {b}_{1}({x}_{0})& \dots & {b}_{k}({x}_{0})\\ {b}_{0}({x}_{1})& {b}_{1}({x}_{1})& \dots & {b}_{k}({x}_{1})\\ \vdots & \vdots & & \vdots \\ {b}_{0}({x}_{k})& {b}_{1}({x}_{k})& \dots & {b}_{k}({x}_{k})\end{array}\right]\left[\begin{array}{cccc}{a}_{00}& {a}_{01}& \dots & {a}_{k0}\\ 0& {a}_{11}& \dots & {a}_{k1}\\ \vdots & \vdots & & \vdots \\ 0& 0& \dots & {a}_{kk}\end{array}\right]$$ 
(If you know about matrix multiplication you can verify that this is correct.)
Two crucial observations are now needed. The first is to determine the numbers ${a}_{jj}$. The simplest way to do this is to compare the coefficients of the highest power of $x$. On the righthand side of Equation 2 only the term ${a}_{jj}{b}_{j}(x)$ contributes because all of the other terms have degree strictly less than $j$. On the lefthand side, the coefficient turns out to be ${q}^{j}$ times the coefficient of ${x}^{j}$ in ${b}_{j}(x)$. Hence ${a}_{jj}={q}^{j}$.
The second observation is that when setting ${x}_{i}=i$, the terms ${b}_{j}({x}_{i})$ all vanish for $j>i$ and are equal to $1$ for $j=i$. We therefore have:
$$\left[\begin{array}{cccc}{b}_{0}(p)& {b}_{1}(p)& \dots & {b}_{k}(p)\\ {b}_{0}(p+q)& {b}_{1}(p+q)& \dots & {b}_{k}(p+q)\\ \vdots & \vdots & & \vdots \\ {b}_{0}(p+qk)& {b}_{1}(p+qk)& \dots & {b}_{k}(p+qk)\end{array}\right]=\left[\begin{array}{cccc}1& 0& \dots & 0\\ {b}_{0}({x}_{1})& 1& \dots & 0\\ \vdots & \vdots & & \vdots \\ {b}_{0}({x}_{k})& {b}_{1}({x}_{k})& \dots & 1\end{array}\right]\left[\begin{array}{cccc}{q}^{0}& {a}_{01}& \dots & {a}_{k0}\\ 0& {q}^{1}& \dots & {a}_{k1}\\ \vdots & \vdots & & \vdots \\ 0& 0& \dots & {q}^{k}\end{array}\right]$$ 
The last piece of the jigsaw relies on a couple of nice properties about determinants. I said earlier that a determinant was like a summary of a matrix. It can be quite complicated to calculate, but there are certain properties that help. In particular, when we have an equation as above then we can work out the determinant of the lefthand side by working out the determinants of the two on the righthand side and multiplying the resultant numbers. The other useful property is that when we have a lot of zeros, as in both of the matrices on the righthand side, the determinant is just the product of the terms on the diagonal from top left to bottom right.
So our determinant calculates as:
$${1}^{k+1}\cdot {q}^{0}\cdot {q}^{1}\cdot {q}^{2}\cdots {q}^{k}={q}^{\frac{1}{2}k(k+1)}$$ 
The full theorem is even more general than this, and can be found in the original paper.
7 Conclusion
Alongside explaining the idea of the actual result, what I wanted to show in this article was that theorems almost never come out of "thin air".
In this case, the result started out as something that I needed for a practical reason. Even though it was as part of my job, it was the teaching aspect rather than the pure research so I wasn't bothered about proving it I just needed something that worked.
Once someone else got interested, it is as if it switched from the teaching aspect to the research aspect. I still needed the right tools to attack the problem, though, and so it was partly the happy coincidence of that someone else getting interested at the same time as me learning of the tools I needed to prove it.
So theorems don't come out of "thin air", they arise far more naturally. Sometimes, as in this case, out of spotting a pattern and wanting to understand whether it is a genuine pattern or not.
Lastly, although the result itself is not particularly groundbreaking it was a very useful proof to work through because it taught me a lot about Newton's polynomials that I might not have learnt so easily in my research context. By taking them out of the rather complicated environment of my research, I was able to focus on learning about them.
8 Epilogue
Although I wrote this up, I never actually got round to publishing it – I was more concerned with getting my "serious research" published and the effort required to figure out a suitable journal for this, or even finding out if it was original, seemed too much. So it lingered on my website and CV, until I left academia and my academic website disappeared. Many years later, Alexander Bogomolny of Cut the Knot tweeted a result that looked like a special case of what I'd proven so I mentioned it to him. Someone dug up the paper from the wayback machine and linked it on that site. At that point, I thought about writing up the story of that paper as it felt like one where I could remember all the salient details. It's taken me a little while to get it up on this website, though!