Help:Encoding

We ask volunteers to encode their transcripts in Text Encoding Initiative (TEI) compliant XML; TEI is a de-facto standard for encoding electronic texts. This can be done relatively simply by clicking the buttons on your transcription toolbar.

For more information on the practicalities of encoding your transcripts, please have a look at the Transcription Guidelines.

We have included some background information below about the structure of encoding/markup, which should help you understand how it works.

We ask that users encode their transcripts in Text Encoding Initiative (TEI)-compliant XML; [TEI] is a de-facto standard for encoding electronic texts. TEI markup involves using tags to label features of Bentham's manuscripts such as paragraphs, additions and marginal notes. This markup means that these transcripts can be preserved, understood and searched far into the future.

If you click the buttons on the transcription toolbar, markup will appear in the transcription box alongside the text that you have entered, for example:

whatever <add>just</add> remark may

It is very important that you do not delete or alter any of the markup that appears in the transcription box.

Elements

The element is the core part of the tag, and occurs after the "<". In <del>utility</del>, "del" is the element.

Attributes

Tags may also contain attributes, which describe the element in more detail. The attribute appears after the element in the opening tag (it never appears in the closing tag), and is followed by an attribute value (see below).

For example, to note the manner in which the word "utility" was deleted, the following attribute may be used: <del rend="strikethrough">utility</del>. An element may have multiple attributes, each separated by a single space.

Values

An attribute value is a word or short phrase that classifies the element in terms of a particular attribute. It is contained within quotation marks and preceded by an equal sign. In the example above, "strikethrough" is the value.

Nesting

In order for a computer to be able to process a TEI document effectively, it must be well-formed. This means that it must obey certain syntax rules, one of which is that tags are nested properly, and do not overlap. The best way to think about nesting is that it works on a radial principle, from the centre outwards, without overlapping.

The following example is not well-formed, because the <del> element opens before the <add> element closes; thus, the tags are not correctly nested:

<add><del></add></del>

A correctly-nested formulation of the same tags might look like either of the following:

<add><del></del></add>
 <del><add></add></del>

Consider the following example, in which the word "direct" has been added to the manuscript, and subsequently deleted:

Deleted addition

If we consider the sequence of actions logically, "direct" must first have been added to the manuscript, and then deleted. The encoding can implicitly register this sequence, by first registering the addition and then the deletion. Two sets of tags are used for this purpose, and they must be nested properly:

of immediate <del><add>direct</add></del> use,

Views

Help:Encoding

Contents

Encoding / Markup

Tags

Elements

Attributes

Values

Nesting

Navigation

Search