Contents
1 Introduction
On this page I want to gather some thoughts, techniques, and resources to help with reading and writing mathematical proofs. As with all mathematics, the best way to learn about proofs is by doing it yourself, but it can be useful to have some guidance.
2 The Nature of Proof
The exact nature of a mathematical proof can be hard to pin down. At one extreme lies the systems of formal proofs that encode proofs and theorems in an almost unrecognisable language "known only to a bunch of monks that live on a mountain"; at the other extreme are the "hand waving" proofs of which "proof by pictures" is a common example (not all "proof by pictures" are handwaving). Somewhere in the middle lie "proofs by mathematicians". The important thing to note from this is the word "somewhere". "Proofs by mathematicians" do not lie at a specific point in this range, but can vary quite considerably between the two extremes.
A different way of expressing this range may help make this clearer. The purpose of a proof is to convince someone that something is true. Exactly how that proof is worded will depend on who that someone is and how much they demand before being convinced. Sometimes, it's enough just to give some evidence for a result being true. At other times, it's important to make sure that every step is clear. The extremes correspond to someone who won't believe anything unless they logically have to and someone who is prepared to believe anything not obviously false.
Mathematicians primarily write proofs for other mathematicians. And since proofs are our business, we usually want to be fairly sure that they are true. We build our results on those of others and thus our results are at most as sure as the results we build on. In turn, we want our results to be built upon by others and want to be sure that we are giving them a good foundation to build upon.
But even within this there is room for manoeuvre, mainly because we present our results in different contexts. In a paper we try to be as rigorous and careful as we can, and try to write so that the reader is convinced that we could have written a formal proof if we'd been so inclined. In a seminar, we are more lax and tend to be nearer the "just convince them" end of the spectrum.
When reading proofs it is therefore important to know who the proof is intended for. The other thing to realise when reading a proof is that when a mathematician is not sure of exactly who will be reading then they will default to the more rigorous end of the spectrum.
The issue for writing proofs is subtly different. Most mathematical proofs are written by mathematicians for other mathematicians1, and thus there is a symmetry between the author and the audience. But proofs written by students are written to be read by their lecturer, or the grader on an exam. The reader therefore already knows that the result is true and so the task of the proof is not to convince the reader of the truth of the result, but to convince the reader that the author knows that the result is true.
1This is also true of textbooks. If you think about who chooses a textbook for a course, you'll see why.
Thus when writing a proof, the essential aim is to convince the reader that you could put in all the details if asked. Note that you are not required to actually put in all these details, just sufficient to show that you could do so.
The other aspect of writing proofs that is a little different between students and mathematicians is in putting the details of how the proof was devised. The purpose, for a mathematician, of a proof is to convince another mathematician that a result is true. Exactly how the proof was thought of is largely irrelevant to that and so is often left out2. On the other hand, for students writing proofs the method is often more important than the actual proof since the purpose of the proof is to demonstrate the students' capability in doing mathematics – which involves devising proofs as well as understanding them.
2The method used may be thought to be of interest in which case it is included, but not usually as part of the proof itself.
In summary, a student studying mathematics could encounter the following types of proof. They can further vary depending on whether or not they are spoken or written, and whether the person devising the proof knows exactly who they are talking to or writing for.

By a mathematician for a mathematician.

By a mathematician for a student.

By a student for a mathematician.

By a student for a student.
3 Examples of Proofs
Let us consider a simple statement to illustrate the different types of proof, perhaps with a little exaggeration for effect.
The composition of linear transformations is again linear.

By a mathematician for a mathematician. (This is perhaps the most artificial example since a mathematician generally wouldn't consider this needed proving.)
ProofLet $T:U\to V$ and $S:V\to W$ be linear transformations. Then
$$ST(u+\lambda v)=S(T(u)+\lambda T(v))=ST(u)+\lambda ST(v)$$ hence $ST$ is linear.
Characteristics of this proof:

Definitions are not expanded out: for example, what "linear" means.

"Obvious" (i.e. to the author) assumptions are not made clear: for example, that $U$, $V$, and $W$ are vector spaces, that $u,v$ are vectors in $U$, and that $\lambda $ is a scalar (indeed, the coefficient field hasn't even been mentioned).

Only the main step is presented: it is assumed that the reader will know how to fill in the "before" and "after" pieces to complete the proof.


By a mathematician for a student. When written, this type of proof has to be both rigorous and clear. If done badly, it can easily fail on both counts. At best, it tends to be very "wordy".
ProofWe want to prove that the composition of two linear transformations is again a linear transformation. So we start with two linear transformations. As we wish to consider their composition (as functions) we must ensure that the domain of one is the codomain of the other. Thus let $T:U\to V$ and $S:V\to W$ be linear transformations where $U$, $V$, and $W$ are vector spaces over $\mathbb{R}$.
As the codomain of $T$ is the domain of $V$, the composition $S\circ T$ is welldefined as a function from $U$ to $W$. To prove that it is linear, we need to show that if $u,v\in U$ are two arbitrary vectors and $\lambda \in \mathbb{R}$ is an arbitrary scalar, then $S\circ T(u+\lambda v)=S\circ T(u)+\lambda S\circ T(v)$.
So let $u,v\in U$ be two arbitrary, but fixed, vectors and $\lambda \in \mathbb{R}$ a scalar3. We consider the lefthand side of what we want to prove: $S\circ T(u+\lambda v)$. By definition of composition, we have
3Note that the previous sentence did not make a choice of $u,v,\lambda $ but said "if …", we therefore need to choose them here. Although in this case it is fairly harmless, some proofs require more care to show that what is being chosen really does exist so it is good to get into good habits now.
$$S\circ T(u+\lambda v)=S(T(u+\lambda v))$$ As $T$ is linear, $T(u+\lambda v)=T(u)+\lambda T(v)$ and so
$$S\circ T(u+\lambda v)=S(T(u)+\lambda T(v))$$ Now $S$ is also linear, which means that for $x,y\in V$ and $\mu \in \mathbb{R}$, $S(x+\mu y)=S(x)+\mu S(y)$. We apply this taking $x=T(u)$, $y=T(v)$, and $\mu =\lambda $ to see that
$$S\circ T(u+\lambda v)=S(T(u))+\lambda S(T(v))$$ Finally, we recall that by definition $S\circ T(z)=S(T(z))$ and so
$$S\circ T(u+\lambda v)=S\circ T(u)+\lambda S\circ T(v)$$ Characteristics of this proof:

Meticulous explanation of just about everything; for example, including the definition of the composition of a function.

Sometimes includes extra explanations intended to indicate how the proof might be constructed (no particular examples of this in this case).

Extreme care taken with notation and application of definitions; for example, most people are happy with $ST$ for the composition rather than $S\circ T$.


By a student for a mathematician.
ProofTo show that the composition of two linear transformations is again linear, it is sufficient to show that this is true for two arbitrary (composable) linear transformations. So let $T:U\to V$ and $S:V\to W$ be linear transformations between vector spaces. We wish to show that the composition $ST$ is again linear. It is enough to show that
$$ST(u+\lambda v)=ST(u)+\lambda ST(v)$$ for any $u,v\in U$ and $\lambda \in \mathbb{R}$.
We can show this by applying first the linearity of $T$ and then the linearity of $S$ to the expression on the lefthand side. That is,
$$\begin{array}{rlrl}ST(u+\lambda v)& =S(T(u)+\lambda T(v))& & \phantom{\rule{thickmathspace}{0ex}}\\ & =ST(u)+\lambda ST(v)& & \phantom{\rule{thickmathspace}{0ex}}\end{array}$$ As this holds for all $u,v\in U$, and $\lambda \in \mathbb{R}$, $ST$ is thus linear.
Characteristics of this proof:

Trivial details are omitted, but replaced with "signposts" to show that the omissions are deliberate; for example the words "between vector spaces" in the first paragraph and the word "composable" in parentheses.

The important steps are done carefully.

The flow of the proof is well signposted.


By a student for a student.
ProofWell, it works for matrices, doesn't it?
The last example was, perhaps, a little facetious. The important point to realise is that before reading a proof, you should work out who is writing it and who they are trying to convince.
4 Finding Proofs
Reading and writing a proof are different skills. They can be compared to differentiation and integration. Differentiation is often fairly algorithmic and once the basic rules are learnt, it is fairly easy to differentiate many seemingly complicated functions. On the other hand, integration is an art. Even with a long list of examples, finding the integral of a complicated function (even proving that it has a closed form) can be sheer guesswork. So it can seem with proofs.
But just as with integration, there are standard techniques and examples that can guide you through the seemingly tortuous process of finding a proof.
4.1 Knowing Where to Begin
Often the most difficult part of finding a proof is knowing where to begin. Theorems are often stated quite concisely and so the first step is to expand the statement to recognisable terms. As this is "rough working", the term "recognisable" should be interpreted personally. At this stage, it is also important to separate out the pieces of the theorem into hypothesis and conclusion.
For an experienced mathematician, the theorem used in the previous section needs no expansion: it is obvious what the terms mean. However, for a student encountering linearity for the first time, it may be a little too concise.
This process of expanding out theorems generally involves the following types of expansion

Expanding out a definition.

Replacing generic statements by statements about generic objects.

Including implicit information.
Of course, this expansion can result in quite a length rephrasing in which case part of the process is finding a good way to write that rephrasing so that its structure is clear. Remember that these rewritings are purely for your benefit so a good method is a useful one.
Let us illustrate those with Theorem 1 from the previous section.

Original statement:
The composition of linear transformations is again linear.

Replace generic statements:
If $S$ and $T$ are two composable linear transformations then their composition, $ST$, is again linear.
It is important to be precise here. The word "composable" could have been left out, as the statement only makes sense if $S$ and $T$ are composable, but until you are completely familiar with this kind of process, it is better to be overly precise than otherwise. In this case, leaving in the word "composable" reminds us that there is a restriction on the domains and codomains which will be useful later. (However, one has to draw the line somewhere: even the word "composable" is not quite enough since it leaves open the question as to whether it is $ST$ or $TS$!)

Include implicit information:
If $S:V\to W$ and $T:U\to V$ are linear transformations then $ST:U\to W$ is again linear.
Here's where remembering that $S$ and $T$ are composable in the previous step helps keep things clear. As $S$ and $T$ are composable, we only need $3$ vector spaces. Then, since we explicitly have the vector spaces the fact that $S$ and $T$ are composable is plain, though some may prefer to keep that fact in the statement. Also, some may like to have the fact that $U$, $V$, and $W$ are vector spaces explicitly stated.

Expand out definitions:
If $S:V\to W$ and $T:U\to V$ are such that $S({v}_{1}+\lambda {v}_{2})=S({v}_{1})+\lambda S({v}_{2})$ and $T({u}_{1})+\mu T({u}_{2})$ for all ${v}_{1},{v}_{2}\in V$, ${u}_{1},{u}_{2}\in U$, and $\lambda ,\mu \in \mathbb{R}$, then $ST({x}_{1}+\eta {x}_{2})=ST({x}_{1})+\eta ST({x}_{2})$ for all ${x}_{1},{x}_{2}\in U$ and $\eta \in \mathbb{R}$.
Note that I have been careful not to repeat myself with the newly introduced symbols. It would be technically alright to reuse ${u}_{1}$ and ${u}_{2}$ in place of ${x}_{1}$ and ${x}_{2}$ since these are local declarations (restricted by the phrases "for all …"). However, humans are not good at differentiating between local and global declarations so it is best not to reuse symbols unless the scope is very clear.

Replace generic statements:
If $S:V\to W$ and $T:U\to V$ are such that $S({v}_{1}+\lambda {v}_{2})=S({v}_{1})+\lambda S({v}_{2})$ and $T({u}_{1})+\mu T({u}_{2})$ for all ${v}_{1},{v}_{2}\in V$, ${u}_{1},{u}_{2}\in U$, and $\lambda ,\mu \in \mathbb{R}$, then whenever ${x}_{1},{x}_{2}\in U$ and $\eta \in \mathbb{R}$, $ST({x}_{1}+\eta {x}_{2})=ST({x}_{1})+\eta ST({x}_{2})$.
Up to now, the rephrasing has not taken into account the fact that there is a conclusion and a hypothesis. This rephrasing modifies a part of the conclusion to turn it from a generic statement "$P(p)$ is true for all $p\in Q$" to a statement about a generic object "whenever $p\in Q$ then $P(p)$ is true". We do not do this for the similar statements in the hypothesis. This is because these two pieces are treated differently in the proof.

Replace generic statements, and reorganise to bring choices to the fore:
Let $S:V\to W$ and $T:U\to V$ be such that $S({v}_{1}+\lambda {v}_{2})=S({v}_{1})+\lambda S({v}_{2})$ and $T({u}_{1})+\mu T({u}_{2})$ for all ${v}_{1},{v}_{2}\in V$, ${u}_{1},{u}_{2}\in U$, and $\lambda ,\mu \in \mathbb{R}$. Let ${x}_{1},{x}_{2}\in U$ and $\eta \in \mathbb{R}$. Then $ST({x}_{1}+\eta {x}_{2})=ST({x}_{1})+\eta ST({x}_{2})$.
In this form, the distinction between hypothesis and conclusion is all the clearer. Parts of the hypothesis use the word "Let", parts of the conclusion use the word "Then".
With this formulation, the proof essentially writes itself. With all its gory details:
Let $S:V\to W$ and $T:U\to V$ be such that $S({v}_{1}+\lambda {v}_{2})=S({v}_{1})+\lambda S({v}_{2})$ and $T({u}_{1})+\mu T({u}_{2})$ for all ${v}_{1},{v}_{2}\in V$, ${u}_{1},{u}_{2}\in U$, and $\lambda ,\mu \in \mathbb{R}$. Let ${x}_{1},{x}_{2}\in U$ and $\eta \in \mathbb{R}$.4 Then:
4A shorthand for this is to write "Let … be as in the statement of the theorem."
$$ST({x}_{1}+\eta {x}_{2})=S(T({x}_{1})+\eta T({x}_{2}))$$ 
using the hypothesis on $T$ as ${x}_{1},{x}_{2}\in U$ and $\eta \in \mathbb{R}$. So:
$$ST({x}_{1}+\eta {x}_{2})=ST({x}_{1})+\eta ST({x}_{2})$$ 
using the hypothesis on $S$ as $T({x}_{1}),T({x}_{2})\in V$ and $\eta \in \mathbb{R}$. Hence the conclusion is true.
Notes:

This could be condensed, but the important thing here is how to find it, not what the final form should be.

Notice that I wrote "as ${x}_{1},{x}_{2}\in U$" rather than "with ${u}_{1}={x}_{1}$ and ${u}_{2}={x}_{2}$". This is partly style, and partly because in the statement of linearity, ${u}_{1}$ and ${u}_{2}$ are placeholders into which we put ${x}_{1}$ and ${x}_{2}$. So saying ${u}_{1}={x}_{1}$ is semantically incorrect as it equates a virtual vector with an actual vector. This is a very minor point, though.
4.2 When There Is No Beginning
Some theorems just don't seem to have a beginning. A simple example are the "There exists …" or "There doesn't exist …" type results where one has to find something satisfying certain properties, or prove that such doesn't exist. The hypotheses tend to be about the properties of the object, but this doesn't always lend itself to the style of proof in the previous example.
In these circumstances, a good strategy is to set aside questions of existence for the time being and try to find other properties of the object that it would have to have. This process of finding properties is like a proof which has a beginning but no particular end. However, that can be simpler than a proof with an end but no particular beginning!
A classic example of this is the "no largest number" theorem. Again, this is a slightly artificial example as the proof is so wellknown, and also quite obvious, that it can be hard to imagine not knowing it, but let's pretend.
There is no largest natural number.
We leave aside existence for now and simply imagine that we have such a number, say $N$. Then $N$ is the largest number. So what does it look like? What properties does it have? We start with its defining property:
It is bigger than every natural number.
and apply our expansion techniques. This leads to:
For each $n\in \mathbb{N}$, $N\ge n$.
Now as we are focussing on particular properties of $N$, we can introduce a further replacement technique: that of replacing a strong statement by a weaker one, or of replacing a generic statement by a particular occurrence of it (we actually used this in the proof of linearity: the generic statement "$T({u}_{1}+\lambda {u}_{2})=T({u}_{1})+\lambda T({u}_{2})$" for all ${u}_{1},{u}_{2}\in U$ and $\lambda \in \mathbb{R}$" was replaced by the particular statement "$T({x}_{1}+\eta {x}_{2})=T({x}_{1})+\eta T({x}_{2})$").
$N\ge 1$
Which doesn't seem very helpful: we knew that already as $N\in \mathbb{N}$. Let's try a different one.
$N\ge 2$
Which does tell us something we didn't know already. It tells us that $N\ne 1$ since $N\ge 2>1$. Since we are playing dumb, let's try the next.
$N\ge 3$
Which, by the same argument, tells us that $N\ne 2$.
But $N$ is a natural number! So if we keep going, we should eventually reach $N$ and be able to conclude that $N\ne N$.
That, however, is not a proof. But whilst examples are not proofs, they often suggest a strategy. In this case, several strategies are possible. Here are two.

The direct approach to a negative result: By assumption, if $N$ exists then $N\ge N+1$, but then $N+1>N$ so we obtain the contradiction $N>N$.

The indirect approach to a positive result: If $N$ exists, it is an element of $\mathbb{N}$. Let us prove that no element in $\mathbb{N}$ is a maximum. So let $k\in \mathbb{N}$. Then $k+1\in \mathbb{N}$ and $k<k+1$. Hence $k$ is not the maximum of $\mathbb{N}$. As this holds for all $k\in \mathbb{N}$, $\mathbb{N}$ does not have a maximum.
4.3 General Techniques
In one representation of formal mathematics, a chain of logical implications provides a path from a source (namely, the hypothesis) to a destination (the conclusion)5. The purpose of the proof, then, is to demonstrate that such a path exists. The most usual method for doing this is to construct an explicit path, namely a chain of logical implications that starts with the hypothesis and ends with the conclusion.
5The similarity with functions is not coincidental.
If one doesn't see such a path, even after expanding out the statement as indicated above, there are several techniques that can be used to try to find such a path.

Just start. If one has expanded out the statement of the theorem to the point where there are no unfamiliar terms, then there will probably be something you can deduce from the hypothesis. Even if it seems useless, it is worth making a list of these simple deductions to see if they hint at a strategy.

Just Finish. Working from the end can often be a useful strategy.

Look for related ideas. A proof consists of a chain of simple steps. Some of those steps will be applications of other theorems and definitions. So write down a list of these that might be useful.

Look for an overall proof. It can be useful to break a proof down into smaller chunks, along the lines of "If I could prove that $X$ was $Y$, then I know that $Y$ is $Z$ so if I can just show that $Z$ is $W$ then I'm done".

Look for similar results. Do you know of a result that's a bit like this one? Maybe with slightly different hypotheses or conclusion, or applies to a different thing. The proof of that one may be adaptable to the current problem.

Modify the problem. Try changing the hypotheses a little, or the conclusion. Sometimes it can be useful to weaken the hypothesis to such a point that it is obvious that the conclusion cannot hold. The reason why it doesn't hold in that case can give an indication of how to prove that it does hold in the original case. Similarly, one can sometimes weaken the conclusion to such a point that it is obvious that it does hold. Again, the reason why it holds can give an indication of how to prove it in the original case.

Look for examples. Examples are almost always not proofs. But examples can give indications as to the strategy for a proof. Figuring out why something holds in a particular instance can be easier than the general case, but usually gives some insight as to how the general case should be approached. Obviously, the more examples, the more intuition one will get.
5 Further Reading

How to solve it by George Polya. ISBN 069111966X. This is extremely accessible and extremely useful. (Google, Amazon)

The noselfdefeating object argument on the What's New webblog by Terrence Tao.