Encoding with TSX

N.B. these instructions about how to encode transcripts using TSX are largely intended for users coming new to transcription. Experienced users of Transcribe Bentham will be familiar with these instructions, but may still find them to be of use as a reference point.

The key difference between encoding in Transcribe Bentham and in TSX is that in the latter, the encoding is displayed mostly in green (and occasionally red and blue), making it much more straightforward to distinguish between transcribed text and the text-encoding. Transcribe Bentham volunteers will also notice that TSX does not use 'line break' or 'page break' tags; these are added automatically by the transcription system.

You may wish to refer to our guide to getting started with TSX, as well as our more detailed guidelines to using the platform.

Why Encode?

In the past, scholars who have transcribed the manuscripts of Jeremy Bentham have done so with a standard word-processing tool (most recently, Microsoft Word). These transcriptions were undertaken for the purpose of providing text for the editors of the various volumes of The Collected Works of Jeremy Bentham. This is still one of the intended benefits of the Transcribe Bentham Initiative, but this will result in slightly different transcriptions.

For one, because the earlier transcriptions were always produced with an eye towards their eventual publication in print format, diplomatic transcriptions (which aimed to represent faithfully every textual aspect of the manuscript) were not a priority. This was particularly the case if the editor of the volume was doing the transcription work: the editor might see a deleted passage in the manuscript, realise that it would not form part of the printed volume, and thus leave it out of the transcription.

The implicit assumptions of Transcribe Bentham, and now TSX (which are prioritising the production of diplomatic transcriptions) is that while the transcriptions are a means to contributing to the publication of further volumes of The Collected Works of Jeremy Bentham, they are also a valuable and interesting end for research, in and of themselves. The embodiment of this assumption will come with the linking of the transcriptions produced in the Transcription Desk to [www.ucl.ac.uk/library/bentham UCL's digital Bentham Papers repository].

Not only will the future Bentham editor be supplied with fully transcribed manuscripts from which to produce new editions, but the scholar who is interested in examining Bentham's writing processes, his deletions, revisions, and marginalia, will be afforded the opportunity to pursue this interested in an unmediated fashion.

The value that the encoded transcriptions will add includes the possibility of more powerful and refined searching than a simple full-text search. Rather than simply searching for every occurrence of the word "panopticon", a user may wish to see where "panopticon" occurs only in marginal summaries - encoding, which can identify and mark features such as marginal summaries, will facilitate such a search. The transcriptions that result from the Transcribe Bentham Initiative will be encoded in TEI-compliant XML, which is the de facto standard for encoding electronic texts in the humanities academic community, and is a widely-used and -supported non-proprietary format. To learn more about TEI, visit the website of the Text Encoding Initiative.

Core Guidelines

Transcribing

When transcribing the text, your aim should be to produce a transcription which represents the text of the manuscript as accurately as possible. Please reproduce Bentham's capitalisation and punctuation exactly as it appears on the manuscript, even if it seems incorrect to you. Bentham and his scribes frequently get accents on foreign letters (accents, umlauts, codicilles etc) wrong or miss them out altogether, and they should not be corrected, or added to the text if missing. Do not expand any contracted words (Mr./mister) or verbalise symbols (&/and). Changes to the text may only be made in the case of line-end hyphenation, as described below.

Emendation may be appropriate in a critical text, but the primary purpose of Transcribe Bentham is not to produce critical texts: it is to represent, in typographic form, the textual inscriptions of Bentham's manuscripts. At a future time, these transcriptions will form the basis for critical editions of Bentham's writings, at which point the editor may choose to emend material, to normalise punctuation, and so on, but such practices should not occur here. By way of example, from the start of 2015, Dr Michael Quinn of the Bentham Project began using transcriptions produced by Transcribe Bentham volunteers in creating the critical edition of Bentham's writings on the Thames River Police.

Transcription might seem so self-explanatory that it does not require a definition. Since some scholarly editing manuals have, in the past, advocated (and in some cases, encouraged) standardisation, alteration, and silent correction of certain manuscript features, it is very important to know Transcribe Bentham's policies on such matters before you begin transcribing.

Headings

If the page you are transcribing includes a title or heading, you can identify this feature by highlighting the transcribed text of the heading and clicking the button on the toolbar. This will surround the heading with <head></head> tags. Bentham occasionally provides more that one heading: in this instance, simply apply separate <head></head> tags to each heading.

Headings

<head>Annuity Notes Proposed Advertisement on proposed publication on painless fees</head>

Paragraphs

Once a paragraph from a manuscript has been transcribed, it can be identified by highlighting the text of the paragraph with the cursor and clicking the button on the toolbar. This surrounds the text with tags. Any text not included in heading or note tags should be enclosed by tags, even if a discrete paragraph is not physically represented on the manuscript image (i.e. even if the text is a continuation of a paragraph from the previous manuscript of if it continues to the next manuscript).

Line Breaks

As TSX delivers manuscript images which have already been segmented into lines, line-break tags are automatically applied by the transcription system. Transcribers do not need to add line-break tags.

Line-end Hyphenation

When a hyphenated word appears at the end of a line, please transcribe the word without the hyphen.

Line-end hyphenation

In the example opposite, the word 'circumstance' is hyphenated at the end of the first line. The transcription should read as follows:

customs, religion of the inhabitants, every circumstance
in which a difference in the point

If the hyphenated word is followed directly by a punctuation mark, include the punctuation mark before the <lb/> tag.

Page Breaks

As TSX delivers manuscript images which are discrete pages, page-break tags are automatically applied by the transcription system. Transcribers do not need to add page-break tags.

Additions

In its simplest form, the button in the toolbar is used to mark interlineations added to the manuscript after the surrounding text was written; these may be substitutions or alternative words. The exception to this is marginal additions, which are described below. This method may be used to mark additions, whether they are added above or (much less frequently) below the line. Highlight the addition and click the button to surround it with <add></add> tags, as in the example below:

Addition

whatever <add>just</add> remark may

Deletions

Where a word or a sequence of words has been deleted in the manuscript, highlight the relevant text and click the button in the toolbar. This will surround the text with <del></del> tags.

Deletion

artificial: <del>tables of it's population:</del> tables of the

In some instances, entire pages or paragraphs are crossed out (e.g. JB/027/029/003, which indicate where Bentham or his scribes have used a particular passage when putting together a work. Text which is struck through in this manner should not be enclosed in deletion tags.

Complex Additions and Deletions

Transcribers will quickly become aware of instances of more complex intervention in the manuscripts, often where there is a combination of added and deleted text. One such example might be called 'substitution', where text added above the line is intended to replace text that is deleted with a strikethrough.

Substitution

The TEI provides guidelines about encoding such phenomena with the <subst> element, but for the purposes of this project, simply identifying text that is added and text that is deleted will suffice.

For example, once the relevant parts of text from the example above have been tagged, the transcription will look like this:

<del>[To bring]</del> <add>I will reduce</add> the question at once

For the sake of consistency, transcribers are advised that when ordering substitutions like this, the deleted text should be transcribed first, followed by the added text, following the implicit reading order in which the respective parts originally appeared in the manuscript.

Catchwords

A catchword is the first word of the following page inserted at the right-hand lower corner of a manuscript folio, below the last line. They appear quite frequently in Bentham's writings, and should be encoded in the same fashion as an addition, as in the example below:

Catchword

 <p>...in the <add>act</add> can not <add>be</add></p>

Illegible Text

In the course of transcribing, you may encounter text that is illegible, either because Bentham's handwriting is difficult to read, or because it has been obscured by a strikethrough. There are slightly different ways to deal with each instance.

Undeleted

If a word or sequence of words on the manuscript is illegible, but has not been deleted, it may be identified by clicking the button in the toolbar. This inserts a <gap/> tag. Insert one <gap/> tag for each illegible word, if it is possible to distinguish the number of illegible words in a sequence.

Deleted

If the word or sequence of words is illegible because it has been deleted or struck through on the manuscript, you should use the <gap/> tag in conjunction with <del> tags to indicate the reason for illegibility.

Illegible text

But of that which remained, <del><gap/></del> as not

Note that <gap/> is a milestone element, and does not have any content.

Questionable Reading

Where you have provided a transcription that you are not entirely certain about, this uncertainty may be registered by highlighting the word or sequence of words in question, and clicking the button on the toolbar. This will surround the relevant text with <unclear></unclear> tags.

Questionable reading

as <unclear>particular</unclear> as <unclear>possible</unclear>

Marginal Notes & Summaries

Bentham wrote in the margins of a manuscript for two main purposes: to add text to a portion of the manuscript that was already written, or to provide a summary of the text opposite.

Marginal Notes

In the first of these instances, Bentham often used a symbol in the main text of the manuscript to identify the point of attachment of the note: the symbol would then be reproduced at the text of the note in the margin. When this occurs, transcribe the text of the marginal note at the relevant point of attachment in the main text of the manuscript. Then, in order to identify it as a marginal note, highlight the text, and click the button. This will surround the text with <note></note> tags.

Marginal note

a former chapter be true <del><add>just</add></del>, that <note>even 
in a civilised<lb/> life</note> the whole
complement of punishment that is judged

When a symbol is not provided for the note at the point of attachment, you should encode the note at the point in the main text at which you think it is relevant.

The <note> tags will generally be nested within tags. In rare circumstances, a note will apply to a heading, and will then appear nested within <head> tags.

Marginal Summaries

Marginal summaries are intended to provide a brief summary of adjacent text in Bentham's manuscripts. They are usually written in pencil, and should not be included in your transcription. If some marginalia is written in ink, and you are unclear whether it is a note or summary, you should encode it in the same fashion as marginal notes, as suggested above.

Underlined Text

When a word has been underlined in the manuscript, you may identify it by highlighting the relevant text and clicking the button on the toolbar: this will surround the text with tags containing an attribute: <hi rend="underline"></hi>

Underline

But <hi rend="underline">where</hi> and <hi rend="underline">when</hi>

You may occasionally encounter pieces of text that have double or multiple underlinings. You may simply tag these in the same fashion as single-underlined text.

Superscript

Text in superscript is distinct from text that has been added to the manuscript after the surrounding text has been written; the latter should always be marked up by using the button. A common example of superscript is seen in ordinal numbers, where the letters often appear above the line (3^rd).

To encode an instance of superscript, highlight the relevant text and hit the button on the transcription toolbar. This will surround the text with this piece of code: <hi rend='superscript'></hi>, as in the examples below:

Superscript 1

Superscript 2

Happ.<hi rend='superscript'>ss</hi> and Unhapp.<hi rend='superscript'>ss</hi>

a 5<hi rend='superscript'>th</hi> ingredient

Unusual Spellings

There are occasional instances in the manuscripts where Bentham employs an unusual spelling for a familiar word: these may include previously-acceptable spellings which are no longer in use, or idiosyncratic misspellings. Where they occur, they may be encoded by highlighting the relevant word and clicking the button on the toolbar. This will result in <sic></sic> tags surrounding the word.

Unusual spelling

<sic>compleat</sic> code of laws

<sic> should also be used to encode archaic contractions in words such as employ'd or suppos'd. You should not use it for familiar contractions like it's, don't, they're, and so on.

If you encounter a word that appears to have an unfamiliar spelling, you may refer to this list of unusual spellings to see whether it is one that Bentham used frequently. You may also add words to this list to benefit other transcribers.

Supplementary Guidelines

User Comments

In the event that you encounter something in the course of your transcription that is not covered by these Guidelines, you can email the Transcribe Bentham project at transcribe.bentham@ucl.ac.uk, stating the name of the manuscript (found at in the web address, the top of the page, e.g. 333/6) and describing the nature of your discovery.

It may also be useful to insert a comment in the transcription to alert Transcribe Bentham editors and other transcribers to the problem. In order to do this, you should type your comment inside these characters: , which are generated by clicking the button on the toolbar.

<!-- There is an unusual feature at this point in the manuscript -->

The text of your comment will not appear in the saved transcription once you have saved the page, but will remain present in the transcription box for the editors to see.

Non-English Language

While transcribing Bentham's manuscripts, you may encounter languages other than English: this may occur in isolated words, brief passages, or longer sections of writing. You may encode such instances by highlighting the relevant non-English text, and clicking the button on the toolbar. This will surround the text with <foreign></foreign> tags.

Non-English language

<foreign>d'une fantaisie contrariée</foreign>

Ampersands

Bentham uses the ampersand sign (&) quite frequently in his manuscripts. When it occurs in a manuscript you are transcribing, you should click the button on the toolbar: this will add a piece of code (& amp;) which will render the ampersand correctly in the saved transcription.

The reason you cannot simply type a '&' character on your keyboard is that in markup, '&' is an escape character which invokes an alternative interpretation on subsequent characters in a character sequence.

Dashes

Occasionally, you will encounter dashes of varying lengths in the manuscripts. In general, these may correspond to the en-dash and the em-dash. In printing houses, pieces of type that held the letters 'n' and 'm' were used as units for measuring and estimating the amount of printed matter in a line or page. Thus, a dash the width of a letter 'n' became known as a en-dash, while a longer dash, the width of a letter 'm', was called an em-dash.

Use your discretion to judge whether a Bentham dash is best represented by an en- or em-dash. For an en-dash, you may simply type a hyphen (-) into the transcription box; if inserting an em-dash, you should click the button on the toolbar. This will insert a Unicode character code (& #x2014;) which will enable the representation of the em-dash in web browsers.

Ligatures

Bentham occasionally uses ligatures in his writing, e.g. æ and œ. These do not occur very frequently, but should you encounter one, you should simply transcribe the individual letters of the ligature ('oe' rather than 'œ'), and insert a User Comment containing the word 'ligature' directly afterwards.

oeconomy<!-- ligature -->

Symbols

The case with symbols used in the manuscripts is similar to that of ligatures: Bentham uses a number of symbols (e.g. section sign: §), with varying regularity. If it possible to reproduce a symbol from the keys on your keyboard, you should do so, otherwise, you should simply register its presence with a User Comment:

Brackets

In your transcription, you may represent the various types of brackets used in the manuscripts, including parentheses ( ), square brackets [ ], or braces { }. Take care not to use angle brackets < >, as these are used only for markup elements.

Terms Used in the Guidelines

The following is an explanation of some of the terms used in these Guidelines, which may be unfamiliar to some users. It is important to understand these terms in order to grasp the full import of the Guidelines.

Transcription

Transcription refers to the text that the user reads from the Bentham manuscript and then copies into the Transcription Box.

Encoding / Markup

Encoding and Markup are terms which may be used interchangeably. They refer to tags that are included in the transcription in order to identify features of the text and manuscript in a manner that allows them to be processed by a computer.

If you click the buttons on the transcription toolbar, markup will appear in the transcription box alongside the text that you have entered, for example:

whatever <add>just</add> remark may

It is very important that you do not delete or alter any of the markup that appears in the transcription box.

Elements

The element is the core part of the tag, and occurs after the "<". In <del>utility</del>, "del" is the element.

Attributes

Tags may also contain attributes, which describe the element in more detail. The attribute appears after the element in the opening tag (it never appears in the closing tag), and is followed by an attribute value (see below).

For example, to note the manner in which the word "utility" was deleted, the following attribute may be used: <del rend="strikethrough">utility</del>. An element may have multiple attributes, each separated by a single space.

Values

An attribute value is a word or short phrase that classifies the element in terms of a particular attribute. It is contained within quotation marks and preceded by an equal sign. In the example above, "strikethrough" is the value.

Nesting

In order for a computer to be able to process a TEI document effectively, it must be well-formed. This means that it must obey certain syntax rules, one of which is that tags are nested properly, and do not overlap. The best way to think about nesting is that it works on a radial principle, from the centre outwards, without overlapping.

The following example is not well-formed, because the <del> element opens before the <add> element closes; thus, the tags are not correctly nested:

<add><del></add></del>

A correctly-nested formulation of the same tags might look like either of the following:

<add><del></del></add>
 <del><add></add></del>

Consider the following example, in which the word "direct" has been added to the manuscript, and subsequently deleted:

Deleted addition

If we consider the sequence of actions logically, "direct" must first have been added to the manuscript, and then deleted. The encoding can implicitly register this sequence, by first registering the addition and then the deletion. Two sets of tags are used for this purpose, and they must be nested properly:

of immediate <del><add>direct</add></del> use,

Additional Advice

Views

Encoding with TSX

Contents

Why Encode?

Core Guidelines

Transcribing

Headings

Paragraphs

Line Breaks

Line-end Hyphenation

Page Breaks

Additions

Deletions

Complex Additions and Deletions

Catchwords

Illegible Text

Undeleted

Deleted

Questionable Reading

Marginal Notes & Summaries

Marginal Notes

Marginal Summaries

Underlined Text

Superscript

Unusual Spellings

Supplementary Guidelines

User Comments

Non-English Language

Ampersands

Dashes

Ligatures

Symbols

Brackets

Terms Used in the Guidelines

Transcription

Encoding / Markup

Tags

Elements

Attributes

Values

Nesting

Additional Advice

Navigation

Search