The digital imaging of the Austen manuscripts has been done to the highest possible standard with the most advanced equipment available. Images for Volume the First were provided by the Bodleian Library, and for Volume the Second, Volume the Third, Persuasion, and Opinions of Mansfield Park and Emma by the British Library. The equipment and photographers used to supply images for the manuscripts held at the Morgan Library & Museum, at King's College, Cambridge, and for the manuscript on deposit at Queen Mary, University of London were loaned to us from the Digital Image Archive of Medieval Music and were also used by the Israeli Antiquities Authority during 2008 for the digitization of some of the Dead Sea Scrolls. Scanning of the Austen materials was done in 24 bit colour. In the early stages of the project, a Phase One PowerPhase scanning back which captures images at 144 million pixels and yields images of up to 350 Mb was used. With this equipment, each manuscript image took up to two minutes to capture. More recently, Phase One have developed a single shot camera which captures up to 40 million pixels, yielding images of approximately 44 Mb. This became our camera of choice, being more portable and faster – single shots were captured instantly – while sacrificing nothing in terms of quality, given that most of our materials are relatively small in size. Many of the holding libraries could not image at the high standards we established, which is why we used our own photographer and equipment.

Transcription and metadata

Full diplomatic transcriptions of all texts were produced and marked up using an XML schema developed at the Centre for Computing in the Humanities, King’s College London.1 Austen’s handwriting and punctuation are agreed to be of great importance in the understanding of her work but have hitherto been little studied. The mark up scheme has recorded orthographic variants and punctuation symbols in minute detail for subsequent computational analysis.

Complex structural metadata for each work has been added using the METS standard within the TEI Header. Austen prepared many of her writing surfaces with special care, regularly assembling small booklets by cutting and folding large sheets of paper in a particular manner. Structural metadata allows for the online reconstruction and deconstruction of these material surfaces which instantiate in miniature, booklet by homemade booklet, Austen’s sense as she wrote of the emerging novel.


The XML encoding is based on the standard as defined by the Text Encoding Initiative (TEI), but it goes far beyond what is proposed there.  The project is therefore establishing the more advanced standards that will be adopted by the TEI for encoding the complexity of modern working manuscripts, in particular the temporal or genetic nature of these documents. Back to context...