Document Revision System

loopspace

2007-12-20

Creative Commons License

Contents

  1. Home

  2. 1. Introduction

  3. 2. Stage One: Getting Started

  4. 3. Stage Two: Line Wrapping

  5. 4. Stage Three: Deciding on a Strategy

  6. 5. Useful commands

1 Introduction

I've decided to start using a document revision system for my papers. My current ad-hoc method is getting seriously strained; exchanging copies with collaborators, merging their comments with my revisions, preparing different versions for different places (archives, journals, home pages); it was all getting too confusing.

So I had a look around on the web and found a paper (also available as a wikibook) promoting the use of subversion. This is actually a software revision system. However, the general principle is the same: a typical LaTeX paper consists of several text files and this is exactly what a typical piece of software consists of and thus what software revision systems are designed to track.

There are several different systems around and it's not clear which is the "right" one. Subversion certainly seems better than CVS, but there's also GNU Arch and its variants. I had a quick play with subversion, but am going to give bazaar a serious go.

2 Stage One: Getting Started

My first attempt at using bazaar involves an already-existing document. I want to store it as a bazaar archive without destroying any information that I currently have. I have several versions of a paper. Schematically, they can be represented as follows.

Main Branch Journal 1 Journal 2 Archive
Original
Updated
Submitted Archived
Report
Updated Submitted Archived
Report

So, I have a main branch together with some other branches. Real updates will go in the main branch and the other branches contain variants suitable for other purposes. I also will want to take snapshots when I do something significant, like submit it to a journal.

So we start by creating a new repository. Repositories in bazaar are for projects with shared histories, so we create a new one for each paper.



bzr init-repo ~/tex/archive/mypaper
cd !:2

(You did know that I use zsh, didn't you?)



bzr init main
cd !:2
cp ~/original/path/to/paper/mypaperv1.tex paper.tex
bzr add
bzr commit -m "Title of Paper"

So far, so good. Next step is to add the next version to the main branch.



cp ~/original/path/to/paper/mypaperv2.tex paper.tex
bzr commit -m "Second version: ready for distribution"

Okay, now we want to branch this version as the public versions are minor modifications of this one. So we go back up to the top directory and create a new branch, two in fact.



cd ~/tex/archive/mypaper
bzr branch main journal1
bzr branch main archive

Now we copy across the overlaid versions.



cd journal1
cp ~/original/path/to/paper/mypaper_journal1.tex paper.tex
bzr commit -m "Version adapted for journal1"
cd ../archive
cp ~/original/path/to/paper/mypaper_archive.tex paper.tex
cp ~/original/path/to/paper/archive_meta.txt .
bzr add
bzr commit -m "Version adapted for archive"

As you may notice, the version for the archive had an extra file; in this case holding some meta data that the archive wants.

Now we continue this process until all of the files in the original directory have been added to the bazaar archive in their correct order and their correct relationship.

When we get a version that we want to tag, we do so:



cd ~/tex/archive/mypaper/journal1
bzr tag FirstSubmission

3 Stage Two: Line Wrapping

Line wrapping is an issue that comes up with document revision. It is mentioned in the article referenced at the top of this page. Systems such as bazaar are designed primarily to work with source code. This is (usually) in the form of text files and these are constrained somewhat in their format (some more than others). To determine whether two files differ, the usual method is to compare them line by line, with the lines themselves being compared as a whole. That is, no finer distinction is made than whether the lines are the same or not. This is generally fine for source code. For a document this can cause problems with regard to line wrapping. Consider a long paragraph that begins "And so to bed." and imagine that you decide to change it to "Consequently, one retired to ones bedroom and laid oneself upon ones usual resting place.". Using a text editor that does hard line wrapping, this change will probably knock several words on to the next line, and the next, and probably for several more lines. All of these lines will display as changed when doing a comparison. This is clearly not what is desired.

The solution appears to be to use soft line wrapping. The distinction being that hard line wraps are written to the file but soft line wraps only appear in the editor and are removed when written to the file. Unfortunately, this makes the problem worse! If there are no line breaks in the file then the whole document is one line and that one line is displayed if there are any changes at all.

So the real solution is to take note of what line breaks are for. Within a LaTeX document, a line break is usually simply whitespace (two line breaks denote a paragraph ending). So, apart from doubles, line breaks are irrelevant for LaTeX and can be used for something else. Within an editor, we can use soft line wrapping to make the text easy to read and ignore hard line breaks (except for doubles). So who does use line breaks? From above we see that it is the version software. Essentially, the line breaks are used to determine the context of a change, namely if I change "And so to bed." to "Consequently to bed." what should I see when later I want to know what I changed? The line breaks tell the version software how much information around the change should be displayed or recorded.

So when writing a LaTeX file, we should insert a line break to separate out context. As a side point, this is consistent with TeX's use of double line breaks to denote the end of a paragraph since a new paragraph should certainly designate a new context. A reasonable set of rules might be:

Note that this advice is contained in the paper where I originally got this idea. As mentioned there, there are other advantages to using such a system.

The exact set of rules is not important. What is important is to choose a set of rules and stick to it. Changing rules mid project is a Bad Idea.

This produces a problem, though. What if I have a file that is badly formatted? Perhaps I'm doing the initial import in the manner laid out above and I didn't pay attention to line wrapping when I originally wrote it? Or perhaps I made a quick edit in an editor that doesn't understand the difference between hard and soft line wraps and I want to fix it before committing it.

The answer? A perl script, of course, which you can download here: fmtlatex.

If, like me, you use Emacs for your editor then you should use longlines for editing tex documents. It now seems to be a standard package in the Emacs lisp collection, so all you need is



(add-hook 'tex-mode-hook '(lambda () (longlines-mode 1)))

in your .emacs file.

4 Stage Three: Deciding on a Strategy

This probably ought to be Stage One, but I tend to find it easier to play with a system a little before deciding exactly how I'm going to use it. I've decided to base my system on the "Team collaboration, central style" workflow described in the Bazaar documentation. My reason for this is that in general I am the only person directly editing files, even in collaborative work, but I sometimes work on different computers.

So I store the repositories in a "central" location that I have read and write access to from anywhere on the internet. One advantage of bazaar here is that this central location doesn't have to have bazaar installed. At work, I have direct access to this location (via NFS) whilst elsewhere I get access via ssh. So my workflow is now as follows.

  1. Start a new project

    
    
    bzr init-rep --no-trees ~/.repository/papers/newpaper
    bzr init ~/.repository/papers/newpaper/main
    

    The central location is actually my home directory which is mounted on my office machine via NFS so I hide the repositories away to avoid the temptation of working directly in the repository and not on a checkout. This wouldn't cause any problems if it weren't for the fact that I find it easier to know what's going on if the different components are distinct. Maybe when I'm more familiar with the system I'll be less confusable.

  2. Checkout the project and work on it

    
    
    bzr checkout ~/.repository/papers/newpaper/main ~/current/papers/newpaper/main
    

    The current directory is where I've chosen to hold the working trees on my machine.

  3. Commit the changes

    
    
    bzr status
    bzr add <anything needing adding>
    bzr commit -m 'Fixed all errors'
    

Then repeat steps 2 and 3 until the paper (or whatever, I'm using this for lots of things now) is finished.

5 Useful commands

Here's a list of bazaar commands that I use (or think I will use) a lot.



bzr init-repo --no-trees ~/.repository/projectbase
bzr init ~/.repository/projectbase/branch
bzr checkout ~/.repository/probjectbase/branch mewhere/to/work/on/it
bzr status
bzr add <files>
bzr diff [-r revid]
bzr commit -m 'Message'
bzr branch ~/.repository/projectbase/origbranch epository/projectbase/newbranch
bzr tag name_of_tag

Tags and commits on a checkout get sent straight to the central repository.