<span class="mw-page-title-namespace">Help</span><span class="mw-page-title-separator">:</span><span class="mw-page-title-main">Transcription Guidelines</span>

Transcribe Bentham: A Collaborative Initiative

From Transcribe Bentham: Transcription Desk

Find a new page to transcribe in our list of Untranscribed Manuscripts

Help:Transcription Guidelines

Revision as of 10:59, 7 September 2010 by Justintonra (talk | contribs)

Introduction


This page offers Guidelines for users of the Transcribe Bentham Transcription Desk. Here, you can find specific directions about how best to transcribe Bentham's writings, and how to encode specific features and phenomena of the manuscripts.

Once you have seen the Getting Started page, these Guidelines are useful as a reference resource for transcription. We recommend that you open this page in a new browser tab or window so you can refer to the Guidelines while you are transcribing.

This document was first written in May 2010, and represents the initial Guidelines for Transcribe Bentham. It is important to note that the Guidelines have evolved since May 2010 are likely to continue to evolve as editorial discussions about transcription and encoding continue. The current Guidelines should be considered stable, however, and should be followed accordingly.

The Guidelines contain some terms related to text-encoding, which are explained here. They are also divided into two sections: Core and Supplementary.

  • Core Guidelines describe the manuscript features that users will encounter most frequently, and how to deal with them. Such features include additions, deletions, and notes.
  • Supplementary Guidelines discuss the treatment of less-frequently occurring features of the manuscripts, such as ligatures, symbols, and foreign-language words.

Other sections of note, which are located elsewhere on this wiki include: Palaeography Skills and Unusual Spellings.

Why Encode?


In the past, scholars who have transcribed the manuscripts of Jeremy Bentham have done so with a standard word-processing tool (most recently, Microsoft Word). These transcriptions were undertaken for the purpose of providing text for the editors of the various volumes of The Collected Works of Jeremy Bentham. This is still one of the intended benefits of the Transcribe Bentham Initiative, but this will result in slightly different transcriptions.

For one, because the earlier transcriptions were always produced with an eye towards their eventual publication in print format, diplomatic transcriptions (which aimed to represent faithfully every textual aspect of the manuscript) were not a priority. This was particularly the case if the editor of the volume was doing the transcription work: the editor might see a deleted passage in the manuscript, realise that it would not form part of the printed volume, and thus leave it out of the transcription.

The implicit assumption of the Transcribe Bentham Initiative (which is prioritising the production of diplomatic transcriptions) is that while the transcriptions are a means to the publication of further volumes of The Collected Works of Jeremy Bentham, they are also a valuable and interesting end, in and of themselves. The embodiment of this assumption will come with the linking of the transcriptions produced in the Transcription Desk to the Bentham Papers Database.

Not only will the future Bentham editor be supplied with fully transcribed manuscripts from which to produce new editions, but the scholar who is interested in examining Bentham's writing processes, his deletions, revisions, and marginalia, will be afforded the opportunity to pursue this interested in an unmediated fashion.

The value that the encoded transcriptions will add includes the possibility of more powerful and refined searching than a simple full-text search. Rather than simply searching for every occurrence of the word "panopticon", a user may wish to see where "panopticon" occurs only in marginal summaries - encoding, which can identify and mark features such as marginal summaries, will facilitate such a search. The transcriptions that result from the Transcribe Bentham Initiative will be encoded in TEI-compliant XML, which is the de facto standard for encoding electronic texts in the humanities academic community, and is a widely-used and -supported non-proprietary format. To learn more about TEI, visit the website of the Text Encoding Initiative.

Core Guidelines


Transcribing

When transcribing the text, your aim should be to produce a transcription which represents the text of the manuscript as accurately as possible. Reproduce Bentham's capitalisation and punctuation exactly as it appears on the manuscript, even if it seems incorrect to you. Do not expand any contracted words (Mr./mister) or verbalise symbols (&/and). Changes to the text may only be made in the case of line-end hyphenation, as described below.

Emendation may be appropriate in a critical text, but the primary purpose of Transcribe Bentham is not to produce critical texts: it is to represent, in typographic form, the textual inscriptions of Bentham's manuscripts. At a future time, these transcriptions may form the basis for critical editions of Bentham's writings, at which point the editor may choose to emend material, to normalise punctuation, and so on, but such practices should not occur here.

Transcription might seem so self-explanatory that it does not require a definition. Since some scholarly editing manuals have, in the past, advocated (and in some cases, encouraged) standardisation, alteration, and silent correction of certain manuscript features, it is very important to know Transcribe Bentham's policies on such matters before you begin transcribing.

Headings

If the page you are transcribing includes a title or heading, you may identify this feature by highlighting the transcribed text of the heading and clicking the button on the toolbar. This will surround the heading with <head></head> tags. Bentham occasionally provides more that one heading: in this instance, simply apply separate <head></head> tags to each heading.

Headings








<head>Annuity Notes Proposed Advertisement on proposed publication on painless fees</head>

Paragraphs

Once a paragraph from a manuscript has been transcribed, it may be identified by highlighting the text of the paragraph with the cursor and clicking the button on the toolbar. This surrounds the text with <p></p> tags.

Line Breaks

In order to preserve the lineation of the manuscripts, a line break should be inserted directly after the final word or punctuation mark of each line. In order to do this, click the button on the toolbar: this inserts an <lb/> tag. It is important to note that the <lb/> tag does not have opening and closing tags, as it is a milestone element, which marks a place in a text and does not have any content.

Line-end Hyphenation

When a hyphenated word appears at the end of a line, transcribe the word without the hyphen, and insert the <lb/>, by clicking after the complete word.

Line-end hyphenation

In the example opposite, the word 'circumstance' is hyphenated at the end of the first line. The transcription should read as follows:



customs, religion of the inhabitants, every circumstance<lb/>
in which a difference in the point<lb/>

If the hyphenated word is followed directly by a punctuation mark, include the punctuation mark before the <lb/> tag.

Page Breaks

Like line breaks, a page break is indicated in markup with a milestone element: <pb/>. When transcribing a folio that contains more than one page (JB/027/124/001, for example), a page break should be inserted to mark the point at which one page ends and another begins.

To do this, position the cursor at the relevant point in the transcription and click the button on the toolbar: this will insert a <pb/> tag.

In JB/027/124/001, the page break would be recorded thus:

 <p>...we who are not of the Profession of the Law, cannot<lb/>
  positively assert</p>
  <pb/>
  <head>C</head>
  <head>Prefat.</head>
  <p>England has long been regarded...</p>

Additions

In its simplest form, the button in the toolbar is used to mark a part of the text that was added to the manuscript after the surrounding text was written. The exception to this is marginal additions, which are described below. Highlight the addition and click the button to surround it with <add></add> tags, as in the example below:

Addition







whatever <add>just</add> remark may

Deletions

Where a word or a sequence of words has been deleted in the manuscript, highlight the relevant text and click the button in the toolbar. This will surround the text with <del></del> tags.

Deletion




artificial: <del>tables of it's population:</del> tables of the

Exercise common sense when deciding on the extent of deletions. Where the strikethrough does not physically cancel a punctuation mark that is apparently part of the deletion, you may assume that it forms part of the deletion. If in doubt about a particular example, you may email the editors at transcribe.bentham@ucl.ac.uk.

Complex Additions and Deletions

Transcribers will quickly become aware of instances of more complex intervention in the manuscripts, often where there is a combination of added and deleted text. One such example might be called 'substitution', where text added above the line is intended to replace text that is deleted with a strikethrough.

Substitution

The TEI provides guidelines about encoding such phenomena with the <subst> element, but for the purposes of this project, simply identifying text that is added and text that is deleted will suffice.

For example, once the relevant parts of text from the example above have been tagged, the transcription will look like this:

<del>[To bring]</del><add>I will reduce</add> the question at once

For the sake of consistency, transcribers are advised that when ordering substitutions like this, the deleted text should be transcribed first, followed by the added text, following the implicit order in which the respective parts originally appeared in the manuscript.

Illegible Text

In the course of transcribing, you may encounter text that is illegible, either because Bentham's handwriting is difficult to read, or because it has been obscured by a strikethrough. There are slightly different ways to deal with each instance.

Undeleted

If a word or sequence of words on the manuscript is illegible, but has not been deleted, it may be identified by clicking the button in the toolbar. This inserts a <gap/> tag. Insert one <gap/> tag for each illegible word, if it is possible to distinguish the number of illegible words in a sequence.

Deleted

If the word or sequence of words is illegible because it has been deleted or struck through on the manuscript, you should use the <gap/> tag in conjunction with <del> tags to indicate the reason for illegibility.

Illegible text




But of that which remained, <del><gap/><gap/></del> as not

Note that <gap/>, like <lb/>, is a milestone element, and does not have any content.

Questionable Reading

Where you have provided a transcription that you are not entirely certain about, this uncertainty may be registered by highlighting the word or sequence of words in question, and clicking the button on the toolbar. This will surround the relevant text with <unclear></unclear> tags.

Questionable reading






as <unclear>particular</unclear> as <unclear>possible</unclear>

Marginal Notes & Summaries

Bentham wrote in the margins of a manuscript for two main purposes: to add text to a portion of the manuscript that was already written, or to provide a summary of the text opposite.

Marginal Notes

In the first of these instances, Bentham often used a symbol in the main text of the manuscript to identify the point of attachment of the note: the symbol would then be reproduced at the text of the note in the margin. When this occurs, transcribe the text of the marginal note at the relevant point of attachment in the main text of the manuscript. Then, in order to identify it as a marginal note, highlight the text, and click the button. This will surround the text with <note></note> tags.

Marginal note





a former chapter be true <del><add>just</add></del>, that <note>even 
in a civilised life</note> the whole<lb/>
complement of punishment that is judged

When a symbol is not provided for the note at the point of attachment, you should encode the note at the point in the main text at which you think it is relevant.

Note that you should not include line breaks within the text of the note, even if they occur in the manuscript.

Marginal Summaries

Marginal summaries are intended to provide a brief summary of adjacent text in Bentham's manuscripts. They are usually written in pencil, and should not be included in your transcription. If some marginalia is written in ink, and you are unclear whether it is a note or summary, you should encode it in the same fashion as marginal notes, as suggested above.

Underlined Text

When a word has been underlined in the manuscript, you may identify it by highlighting the relevant text and clicking the button on the toolbar: this will surround the text with tags containing an attribute: <hi rend="underline"></hi>

Underline







But <hi rend="underline">where</hi> and <hi rend="underline">when</hi>

You may occasionally encounter pieces of text that have double or multiple underlinings. You may simply tag these in the same fashion as single-underlined text.

Unusual Spellings

There are occasional instances in the manuscripts where Bentham employs an unusual spelling for a familiar word: these may include previously-acceptable spellings which are no longer in use, or idiosyncratic misspellings. Where they occur, they may be encoded by highlighting the relevant word and clicking the button on the toolbar. This will result in <sic></sic> tags surrounding the word.

Unusual spelling





<sic>compleat</sic> code of laws

If you encounter a word that appears to have an unfamiliar spelling, you may refer to this list of unusual spellings to see whether it is one that Bentham used frequently. You may also add words to this list to benefit other transcribers.

Supplementary Guidelines


User Comments

In the event that you encounter something in the course of your transcription that is not covered by these Guidelines, you should email the Transcribe Bentham project at transcribe.bentham@ucl.ac.uk, or post a message on the Discussion Board stating the name of the manuscript (found at the top of the page, e.g. JB/088/002/003) and describing the precise nature of your discovery.

It may also be useful to insert a comment in the transcription to alert Transcribe Bentham editors and other transcribers to the problem. In order to do this, you should type your comment inside these characters: <!-- -->, which are generated by clicking the button on the toolbar.

<!-- There is an unusual feature at this point in the manuscript -->

The text of your comment will not appear in the saved transcription once you have saved the page, but will remain present in the transcription box.

Non-English Language

While transcribing Bentham's manuscripts, you may encounter languages other than English: this may occur in isolated words, brief passages, or longer sections of writing. You may encode such instances by highlighting the relevant non-English text, and clicking the button on the toolbar. This will surround the text with <foreign></foreign> tags.

Non-English language





<foreign>d'une fantasie contrariée</foreign>

Ampersands

Bentham uses the ampersand sign (&) quite frequently in his manuscripts. When it occurs in a manuscript you are transcribing, you should click the button on the toolbar: this will add a piece of code (& amp;) which will render the ampersand correctly in the saved transcription.

The reason you cannot simply type a '&' character on your keyboard is that in markup, '&' is an escape character which invokes an alternative interpretation on subsequent characters in a character sequence.

Dashes

Occasionally, you will encounter dashes of varying lengths in the manuscripts. In general, these may correspond to the en-dash and the em-dash. In printing houses, pieces of type that held the letters 'n' and 'm' were used as units for measuring and estimating the amount of printed matter in a line or page. Thus, a dash the width of a letter 'n' became known as a en-dash, while a longer dash, the width of a letter 'm', was called an em-dash.

Use your discretion to judge whether a Bentham dash is best represented by an en- or em-dash. For an en-dash, you may simply type a hyphen (-) into the transcription box; if inserting an em-dash, you should click the button on the toolbar. This will insert a Unicode character code (& #x2014;) which will enable the representation of the em-dash in web browsers.

Ligatures

Bentham occasionally uses ligatures in his writing, e.g. æ and œ. These do not occur very frequently, but should you encounter one, you should simply transcribe the individual letters of the ligature ('oe' rather than 'œ'), and insert a User Comment containing the word 'ligature' directly afterwards.

oeconomy<!-- ligature -->

Symbols

The case with symbols used in the manuscripts is similar to that of ligatures: Bentham uses a number of symbols (e.g. section sign: §), with varying regularity. If it possible to reproduce a symbol from the keys on your keyboard, you should do so, otherwise, you should simply register its presence with a User Comment: <!-- symbol -->

Brackets

In your transcription, you may represent the various types of brackets used in the manuscripts, including parentheses ( ), square brackets [ ], or braces { }. Take care not to use angle brackets < >, as these are used only for markup elements.

Terms Used in the Guidelines


The following is an explanation of some of the terms used in these Guidelines, which may be unfamiliar to some users. It is important to understand these terms in order to grasp the full import of the Guidelines.

Transcription

Transcription refers to the text that the user reads from the Bentham manuscript and then copies into the Transcription Box.

Encoding / Markup

Encoding and Markup are terms which may be used interchangeably. They refer to tags that are included in the transcription in order to identify features of the text and manuscript in a manner that allows them to be processed by a computer.

If you click the buttons on the transcription toolbar, markup will appear in the transcription box alongside the text that you have entered, for example:

whatever <add>just</add> remark may

It is very important that you do not delete or alter any of the markup that appears in the transcription box.

Tags

A tag is a string of characters surrounded by angle brackets, i.e. "<" and ">". Tags are used to identify part of the transcription, and usually come in pairs, known as "opening" and "closing" tags. A closing tag can be identified by a slash after the "<".

If, for instance, a user wished to note that the word "utility" was deleted from a manuscript they were transcribing, it would be tagged thus: <del>utility</del>.

Users will not have to type tags into the Transcription Box: they will be automatically generated by highlighting the relevant part of the text, and clicking a button in the Transcription Toolbar.

Elements

The element is the core part of the tag, and occurs after the "<". In <del>utility</del>, "del" is the element.

Attributes

Tags may also contain attributes, which describe the element in more detail. The attribute appears after the element in the opening tag (it never appears in the closing tag), and is followed by an attribute value (see below).

For example, to note the manner in which the word "utility" was deleted, the following attribute may be used: <del rend="strikethrough">utility</del>. An element may have multiple attributes, each separated by a single space.

Values

An attribute value is a word or short phrase that classifies the element in terms of a particular attribute. It is contained within quotation marks and preceded by an equal sign. In the example above, "strikethrough" is the value.

Nesting

In order for a computer to be able to process a TEI document effectively, it must be well-formed. This means that it must obey certain syntax rules, one of which is that tags are nested properly, and do not overlap. The best way to think about nesting is that it works on a radial principle, from the centre outwards, without overlapping.

The following example is not well-formed, because the <del> element opens before the <add> element closes; thus, the tags are not correctly nested:

<add><del></add></del>

A correctly-nested formulation of the same tags might look like either of the following:

<add><del></del></add>
 <del><add></add></del>

Consider the following example, in which the word "direct" has been added to the manuscript, and subsequently deleted:

Deleted addition

If we consider the sequence of actions logically, "direct" must first have been added to the manuscript, and then deleted. The encoding can implicitly register this sequence, by first registering the addition and then the deletion. Two sets of tags are used for this purpose, and they must be nested properly:


of immediate <del><add>direct</add></del> use,

Additional Advice


UCL Home » Transcribe Bentham » Transcription Desk
  • Create account
  • Log in