Brought to you by the Massachusetts Historical Society

"I have nothing to do here, but to take the Air, enquire for News, talk Politicks and write Letters."

John Adams to Abigail Adams, 30 June 1774

Tuesday, December 23, 2008

Miscellaneous Proofreading Rules

During the first phase of work on the control file, XML files received by the vendor will be undergo a character-by-character proofread against the physical slips. In an effort to speed up our proofreading, please note the following rules. Issues of formatting bib records and transcription of lengthy handwritten material will be handled during Encoding Level 1.
  • Ignore underlining and italics
  • Ignore Arabic number following Roman numeral for series, II or III
  • Insert post-it note on the XSL printout for a new slip, place a sticky flag the new paper slip in the chronological file
  • Transcribe handwriting if under one line, otherwise flag on XSL printout

Proofreaders' Marks

Please use the formal proofreading marks for all phases of proofreading and verification. A copy can be found at the Chicago Manual of Style Online.

Bracketed Dates

Question: Some dates are both bracketed and italicized. This is an editorial directive used in the volumes to distinguish between conjectural dates (bracketed) and editorially supplied dates (bracketed and italicized). The use of italics was not used in the early days of the control file (because typewriters could not create italics) but occasionally the text was underlined. The italics and underlining practices are not uniformly applied throughout the catalogue.

Resolution: Brackets will be used to indicate any uncertain date (whether conjectural, incomplete, supplied, etc.). Dates inside of brackets will NOT retain an italics or underlining.

Encoding Guide. Handwriting

Many records will have handwritten material on them. Unless the handwriting is illegible, the vendor is asked to transcribe handwriting to the best of their ability, and to tag it with the <h> tag. For example, a slip that has “Dec. 1780” typewritten and is followed by “[5 Feb. 1781]” in legible handwriting should be rendered thus:

<d>Dec. 1780 <h>[5 Feb. 1781]</h></d>

Further, if there is illegible handwriting anywhere on the slip, the vendor is asked to insert three asterisks <h>***</h> wherever the handwriting occurs. For example, a slip that has “to John [illegible handwriting] Adams” should be rendered thus:

<to>to John<h>***</h>Adams</d>

Encoding Guide. Series

Record: SERIES
Some records will have a handwritten code on the bottom right corner consisting of a Roman numeral, either “II” or “III”, followed by an Arabic number. Only the Roman numbers should be transcribed and should be tagged with <s>.


Encoding Guide. Notes

Record: NOTES
Many records will have a line or several lines of text on the bottom half of the slip that does not conveniently fit into any other categories. All of this text should be transcribed and coded with the <n> tag. If there is a logical paragraph break between one note and another, each line may be tagged separately. There is no limit to the number of <n> tags in a record.

<n>Note: See the letter from 5 March above for enclosure</n>

<n>Original in possession of unknown Congressman, 3 February 1939</n>

Occasionally, a paragraph of text was clipped from an auction catalog and pasted to the slip. If legible, this text should be transcribed and tagged with <n>.

Encoding Guide. Printed

Many records will have a line or several lines of text following the word “Printed.” All of this text should be coded with the <pr> tag. If there is a logical paragraph break between one printed citation and another, each citation may be tagged separately. There is no limit to the number of <pr> tags in a record.

<pr>Printed: AFC vol. 2:345</pr>

<pr>Printed: Boston Columbian Centinel, 18 August 1785</pr>

Encoding Guide. Format

Record: FORMAT
Many records will include a format designation, usually on the fourth line, following the page number. The most common text will be: MS, Xpr, EnlPr, Xerox, microfilm, Photo. These abbreviations and any text following should be coded with the <f> tag. Please note that these abbreviations are not case-sensitive, but the vendor is asked to transcribe it as it appears.


<f>Photo, duplicate in Pictorial file</f>

Encoding Guide. Length

Record: LENGTH
Many records will include the document length, generally on the third line, below the author and recipient line. The convention most records will follow is an Arabic number and “p.” for “pages.” All text on this line should be coded as <l>.

<lɰ p.</l>

<lɮ p. with a 12 p. enclosure</l>

Encoding Guide. Title

Record: TITLE
Most records represent correspondence and have an author and recipient. Sometimes, a record represents a unique document with an author followed by a period and a title. The key indicator to whether a title is present is the absence of the word “to.” Any text on the second line that cannot be clearly separated into either the author or recipient categories should be coded with the title tag, <ti>.

<ti>Declaration of Independence</ti>

<ti>An Elegy</ti>

<ti>Letters from Publicola</ti>

If a name (or initials) are present, followed by a period “.” this can be transcribed as an author of the following title statement. For example, “John Laurens. Memorandum of Agreement between John Laurens and Alexander Gillon, with certification by Thomas Paine” should be separated and tagged as:

<a>John Laurens.</a>
<ti> Memorandum of Agreement between John Laurens and Alexander Gillon, with certification by Thomas Paine</ti>

If there is any confusion in identifying an author with a title, simply tag the entire statement with <ti> tags.

Encoding Guide. Recipient

Most records will have a recipient name (or initials) that follows the word “to.” All text following—and including—the first instance of the word “to” should be coded as <to>.

<to>to Abigail Adams</to>

<to>to the President of Congress</to>

<to>to the town of Braintree</to>

<to>to the Peace Commissioners</to>

Encoding Guide. Author

Record: AUTHOR
Every record has an author name (or initials) on the second line under the date. All text that appears before the first instance of the word “to” should be considered an author and coded under one author <a> tag. If there is no clearly defined “author to recipient” statement, the second line under the date should be tagged as a title, <ti>, see example #5, below. To conform to the RelaxNG schema, there should always be either an AUTHOR and/or a TITLE present in each record.

<a>John Adams</a>


<a>JA, Benjamin Franklin, and Thomas Jefferson</a>

Encoding Guide. Codes

Record: CODE
About half of all records will include a code on the upper right corner of the slip. The codes are usually a combination of letters and numbers. All of this text should be tagged with <c>.



<c>Thompson MB:320</>

Encoding Guide. Place

Record: PLACE
If a place is included on the record, it will follow the date on the top line. Place names should be recorded as they appear, either abbreviated or spelled out, coded with the <pl> tag.



<pl>Paris, France</pl>

Encoding Guide. Dates

Record: DATE
Every record has a date in the top left corner. The most common dating convention will be day/month/year. There may be a range of days, months or years. There may also be the term “Ante” or “Post” written before it. All text and punctuation in the date field should be coded with the date <d> tag.

<dɰ July 1776.</d>

<d>Ante 4 July 1776</d>

<d>[4-6] July 1776</d>


IMPORTANT: Creating new records

A new record should be created for each slip. As long as the slip has something in the date <d> category, it should be considered a new record and assigned a new record ID number (See RECORD IDs, above, for the assignment of ID numbers).

The slip file was created to be viewed just like a traditional card catalog, one slip after another. Sometimes, a document’s slip required more information than could fit on one slip of paper. In these instances, additional slips of paper were included immediately following the first slip—the slip with a date on it. These follow-up slips do not have a date but do have “cont.” or “2nd page” centered on the top line. These slips should be considered part of the previous slip and all of the content on the continuation slips should be encoded within the same <x> element with the content of the initial slip. Most of the additional text, including the phrase “cont.” or “2d page,” will fall under the NOTE category.

Any slip that cannot be determined to have a new date or to be a continuation slip should be given a unique ID number, the information transcribed and tagged, and must be flagged for review using the empty review <r/> tag.

Encoding Guide. Record ID

Record ID
Each record should be assigned a Document ID number. Using the <x> element with an @ identification number, each record should be composed of the 2-digit reel number plus a 4-digit sequential record number, assigned in the order that they appear on the reel.

<x id=“160075”>
</x> indicates reel number 16, record number 75

There are two optional attributes that may be included with the ID tag: the &z and the &r.

Some records may have been crossed out in a Z pattern with red pencil. These slips should be transcribed like any others, with all appropriate tags. Their x ID tag should also include the &z with a value of “z”.

<x id=“160075” z=“z”>

If any record presents confusions and/or questions on the part of the vendor, those records should be marked for review with the optional &r with a value of “r” in the x ID tag. The MHS will address these records separately and follow-up with the vendor as necessary.

<x id=“160075” r=“r”>

The &z and &r guidelines represent a change from the original in the RFP process.

Encoding Guide. Root Tag

XML file
Each reel of microfilm will represent one XML file, coded with the root tag and an @ number, spanning “01” to “42.”
<rl no= “01”> indicates reel number “1”

Encoding Guide. Tag Set

General Record/Slip Structure
Each record is recorded on a four (4) by six (6) inch slip of paper and each slip of paper is presented on one frame of microfilm. Each record or slip can be broken down into sixteen (16) categories:

REEL NUMBER <rl no=“00”> Includes an attribute of the reel number, based on the chart, ranging from no=“01” to “42”.

SLIP/RECORD ID <x id=“000000”> 2-digit reel number, 4-digit sequential record number; Every record should be assigned an ID number and it should be tagged <x id=“000000”> (“420023” would be reel #42, record #0023).

DATE<d> Includes any or all of day, month, year in characters, numerals and brackets; also may include range of dates or “n.d.” for “no date”

PLACE <pl> Includes any or all of city, state, or country

CODE <c> A combination of letters and numbers, always appearing in the top right corner

AUTHOR <a> First name, last name or initials; may also include more than one name. Anything to the left of the first instance of the word “to” should be rendered as AUTHOR.

TO <to> “to”, first name, last name or initials; and any other text following. All text following the first instance of the word “to” (and including the word “to”) should be rendered as TO

TITLE<ti> Anything on the second line that does not clearly fit into the AUTHOR or TO categories.

LENGTH<l> Number and “p.”

FORMAT <f> MS, Xpr, EnlPr, xerox, microfilm, photo, and any text that follows.

PRINTED <pr> Publication information, anything that starts with the word “Printed”

NOTES <n> Any information that does not fit into the above categories.

SERIES <s> Roman numerals “II” or “III” with Arabic number following handwritten in the bottom right corner should be rendered as either “II” or “III”--the following numbers should be disregarded--and tagged <s>

HANDWRITING <h> Any slip with hand-written notes or corrections, transcribe if legible at the place it is written; otherwise mark illegible handwriting with three asterisks (***) and tag with <h>

CANCELLED Z <z/> Any slip that has been crossed out with a zig-zag mark, include a blank <z/> tag

REVIEW <r/> Any slip that needs clarification should be tagged for review by MHS staff with <r/> tag.

Encoding Guide. Conversion Basics

Reel Container
Each reel will be digitized as a single XML file.
Name it according to its reel number as shown in boldface in the Microfilm Coverage Chart:
Reel 01 1639 - 1777
Reel 02 1778 - 1780
Reel 03 1781 - 1782
Reel 04 1783 - 1784
Reel 05 1785 - 1787
Reel 06 1788 - 1793
Reel 07 1794 - 1796
Reel 08 1797
Reel 09 1798
Reel 10 1799 - 1800
Reel 11 1801 - 1808
Reel 12 1809 - 1811
Reel 13 1812 - 1813
Reel 14 1814
Reel 15 1815
Reel 16 1816
Reel 17 1817
Reel 18 1818
Reel 19 1819
Reel 20 1820
Reel 21 1821
Reel 22 1822
Reel 23 1823
Reel 24 1824
Reel 25 1825
Reel 26 1826 - 1827
Reel 27 1828 - 1830
Reel 28 1831 - 1834
Reel 29 1835 - 1837
Reel 30 1838 - 1840
Reel 31 1841 - 1842
Reel 32 1843 - 1845
Reel 33 1846 - 1851
Reel 34 1852 - 1860
Reel 35 1861
Reel 36 1862
Reel 37 1863
Reel 38 1864
Reel 39 1865 - 1866
Reel 40 1867 - 1870
Reel 41 1871 - 1879
Reel 42 1880 - 1895 undated

Encoding Standard
UTF-8: Encode all characters in UTF-8. If this is not possible, use numeric Unicode character references. Do not use X(HTML) named entities.

The MHS will provide a RelaxNG schema for validation of files.

Encoding Guide. Introduction

Project Goal
The goal of the Adams Papers Control File Digitization project is to convert all 108,400 paper records that make up the “Control File” into an XML database, made available at the Massachusetts Historical Society (MHS) website. The vendor will tag the data using abbreviated tags created by the MHS staff, which will be transformed later into a formal schema.

Record Content
The Control File is a paper catalogue used by the Adams Papers Editorial Project. It represents approximately 108,400 documents in the Adams Family Papers archive held by the MHS. The entire run of paper records has been photographed on 16mm microfilm and those 42 reels of microfilm will be used by the data entry vendor to convert the content into 42 XML files.

Each record (which represents a document) is recorded on a four (4) by six (6) inch slip of paper and each slip of paper is presented on one frame of microfilm. The information on each slip of paper can be broken down into 14 (fourteen) content tags. The data entry vendor will tag the data based on these fourteen categories. The project staff at the MHS will parse those categories further, as needed.

Thursday, December 18, 2008

MHS awarded NHPRC grant

This fall, the Massachusetts Historical Society was awarded a grant from the National Historical Publications and Records Commission (NHPRC) to digitize the Adams Papers item-level paper catalog. The fifty-year old catalog represents over 100,000 documents held by the MHS and other repositories and private owners and reflects the known universe of the private and public correspondence of Presidents John and John Quincy Adams, their wives, children, and extended family. The two-year project will produce an online XML database of the catalog, available at the MHS website, as well as a viable strategy and methodology for other institutions to digitize their own item-level paper catalogs. This blog will serve as a resource for project staff and interested researchers to track the progress of the project and to refer to all technical documentation and project directives. We hope the blog will also provide a forum for researchers, librarians, and archivists (and any other interested parties!) to ask questions and offer feedback as the catalog grows.