xml:lang="en" lang="en" dir="ltr">

Transcribe Bentham: A Collaborative Initiative

From Transcribe Bentham: Transcription Desk

Keep up to date with the latest news - subscribe to the Transcribe Bentham newsletter; Find a new page to transcribe in our list of Untranscribed Manuscripts

Help:Encoding

Jump to: navigation, search

We ask volunteers to encode their transcripts in Text Encoding Initiative (TEI) compliant XML; TEI is a de-facto standard for encoding electronic texts. This can be done relatively simply by clicking the buttons on your transcription toolbar.

For more information on the practicalities of encoding your transcripts, please have a look at the Transcription Guidelines.

We have included some background information below about the structure of encoding/markup, which should help you understand how it works.

Encoding / Markup

Encoding and Markup are terms which may be used interchangeably. They refer to tags that are included in the transcription in order to identify features of the text and manuscript in a manner that allows them to be processed by a computer.

We ask that users encode their transcripts in Text Encoding Initiative (TEI)-compliant XML; [TEI] is a de-facto standard for encoding electronic texts. TEI markup involves using tags to label features of Bentham's manuscripts such as paragraphs, additions and marginal notes. This markup means that these transcripts can be preserved, understood and searched far into the future.

If you click the buttons on the transcription toolbar, markup will appear in the transcription box alongside the text that you have entered, for example:

whatever <add>just</add> remark may

It is very important that you do not delete or alter any of the markup that appears in the transcription box.

Tags

A tag is a string of characters surrounded by angle brackets, i.e. "<" and ">". Tags are used to identify part of the transcription, and usually come in pairs, known as "opening" and "closing" tags. A closing tag can be identified by a slash after the "<".

If, for instance, a user wished to note that the word "utility" was deleted from a manuscript they were transcribing, it would be tagged thus: <del>utility</del>.

Users will not have to type tags into the Transcription Box: they will be automatically generated by highlighting the relevant part of the text, and clicking a button in the Transcription Toolbar.

Elements

The element is the core part of the tag, and occurs after the "<". In <del>utility</del>, "del" is the element.

Attributes

Tags may also contain attributes, which describe the element in more detail. The attribute appears after the element in the opening tag (it never appears in the closing tag), and is followed by an attribute value (see below).

For example, to note the manner in which the word "utility" was deleted, the following attribute may be used: <del rend="strikethrough">utility</del>. An element may have multiple attributes, each separated by a single space.

Values

An attribute value is a word or short phrase that classifies the element in terms of a particular attribute. It is contained within quotation marks and preceded by an equal sign. In the example above, "strikethrough" is the value.

Nesting

In order for a computer to be able to process a TEI document effectively, it must be well-formed. This means that it must obey certain syntax rules, one of which is that tags are nested properly, and do not overlap. The best way to think about nesting is that it works on a radial principle, from the centre outwards, without overlapping.

The following example is not well-formed, because the <del> element opens before the <add> element closes; thus, the tags are not correctly nested:

<add><del></add></del>

A correctly-nested formulation of the same tags might look like either of the following:

<add><del></del></add>
 <del><add></add></del>

Consider the following example, in which the word "direct" has been added to the manuscript, and subsequently deleted:

Deleted addition

If we consider the sequence of actions logically, "direct" must first have been added to the manuscript, and then deleted. The encoding can implicitly register this sequence, by first registering the addition and then the deletion. Two sets of tags are used for this purpose, and they must be nested properly:

of immediate <del><add>direct</add></del> use,
UCL Home » Transcribe Bentham » Transcription Desk