Brought to you by the Massachusetts Historical Society

"I have nothing to do here, but to take the Air, enquire for News, talk Politicks and write Letters."

John Adams to Abigail Adams, 30 June 1774

Friday, November 20, 2009

Place names and attributes

The element for place includes an attribute for a specific authority-controlled location. Thus, text that appears as "35 Court Street, Philadelphia" should be tagged as

<place location="philadelphia">35 Court Street, Philadelphia</place>

In controlling the attribute names in a separate database, all locations should (when possible) be listed at the city level. The default city is assumed to be in Massachusetts and then the U.S. If there is a duplicate, then you should add a hyphen (with no spaces) and the two-letter state postal code. If it is a duplicate name in a foreign country then you should add a hyphen and the full country name.

For example, London is understood to be "london", but Plymouth when alone is assumed to be Massachusetts, but if it is Plymouth, England then the attribute should read "plymouth-england" or likewise Plymouth, New Hampshire is "plymouth-nh."

Check the place name directory to confirm the authority spelling. When adding a new place name to the directory, use the Getty Thesaurus English spelling.

If only the county name is known, render it with a hyphen, all lower case, i.e. "suffolk-county."

Lastly, sometimes there are several locations listed. The first city name should be in the main location attribute. Subsequent place name attributes can be added as empty tags.

Tuesday, November 10, 2009

Checklist for Encoding Level 1

We are currently 35% through Encoding Level 1 which involves inputting proofreading corrections, verifying the basic code and creating the first of several authority look-up tables--this one for place names. The work is still broken down by reel. The following is the checklist for each record.

To open a new file for encoding level 1:

--Open new XML file through the Tortoise SVN Directory at C:\Repositories\slipfile\xml\proofread and confirm that the file name ends in “_level1”

--Confirm that FULL_schemaV2_MR.rng is associated through the Tortoise SVN Directory at C:\Repositories\slipfile\xml\schemas (reassociate if the red underlines don’t appear)

--Run the XSL transformation copyformat.xsl; overwrite the new file under same name.

--Commit these changes by right clicking on the slipfile folder on your C:\ drive and selecting "SVN Commit" from the drop down menu. Select the files to commit, click "OK" and then type in your password.

To open a working file for encoding level 1:

--Open XML file through the Tortoise SVN Directory at C:\Repositories\slipfile\xml\level1

--Enter changes and save periodically to the Tortoise SVN Directory at C:\Repositories\slipfile\xml\level1

--When finished with work, commit changes by right clicking on the slipfile folder on your C:\ drive and selecting "SVN Commit" from the drop down menu. Select the files to commit, click "OK" and then type in your password.
For each record:

Input proofreading file changes

--Confirm @color, enter if absent (if you delete the entire @color and hit the space bar, a drop down menu will appear with possible attributes and values). The choices are: 1pink, 2yellow, 3white, 4blue, or 5goldenrod

--Confirm <place>, remove unnecessary information from @location and confirm correct English spelling; confirm place name against Excel spreadsheet list and add new authority names to list i.e. “Philadelphia, 31 South Street” should have a @location value of “Philadelphia”

--Confirm <code>, use drop down prompts to fill in attributes when necessary Codes that are not @type=Accesssion, Letterbook, Miscellany or Diary should be encoded as “General” under the @type, i.e. “TS Wills and Deeds”

--Confirm <length>, enter value in @pages if absent: add multiple page numbers listed, i.e. if there is an enclosure and <length>2 p., 3 p. </length> then the total value for @pages= “5”.

--Confirm <copy>, enter value for @format. The copyformat.xsl should have populated most of these. when there are two values, one for MS and one for XPr (or the like), copy @format should have “Manuscript” as value and the subsequent XPr’s should be encoded as a note

--Confirm <date>, verify that populated dates are correct, confirm all attributes are present as necessary, enter @to for date ranges and any other appropriate @.

Most of the date should be automatically populated, except for date ranges. A date range will have the first date entered as an @when, the encoder must enter the end date in @to as year-month-day. For unknown months or days, enter “99” . For conjectural or corrected dates, encode the corrected date. For questions, check Master Encoding Guide.i.e., "1 January 1799 [i.e. 1800]" should be @when="1800-01-01".

--Add new slips found in paper file, create new ID number at end of reel

--Cross check any changes in the Corrections Binder (may be redundant, but important!)

Wednesday, November 4, 2009

Proofreading Complete...Finally

Yes, the proofreading phase has finally ended. Yesterday, I finished doing a paper-to-paper cross check on the 109,348th slip--and then some. While the vendor counted 109, 348 records, when all is said and done, the number may be off by several hundred slips. The proofreading phase did not just check, character by character, the transcription completed by our vendor, it also served as a slip-by-slip inventory of the entire catalog. The microfilm that the vendor used to transcribe was created in 2001 and since that time the editorial staff has continued to find more documents to add to our archive. The number of additions is not yet known, but it will probably number in the hundreds. These new slips will all be added to the XML files during the first phase of encoding, which is well under way.

While the proofreading phase has been the most unpredictable aspect of the project thus far, it has been a critical component to complete. Ensuring the integrity of the database by making the content is as accurate as the tagging is important not just for the editors but for all online users of the archive. A catalog is only as good as its accuracy--if we can't trust it then no amount of fancy web coding will encourage people to use it!

I have to give proper acclamation to the Control File team:
  • Jim the Proofreader/Encoder proofread 57,157 slips
  • Susan the EAD Gal proofread 14, 878 slips
  • and I clocked in about 37,000 (give or take a few)
Cheers!

Monday, October 19, 2009

Encoding excitement and other things

I feel I should apologize for not posting recently, although to whom I would be apologizing I never can tell. Anyway, encoding for level 1 actually began September 1 and has been proceeding steadily. We have Jim (Connolly) the Encoder working four days a week and Susan (Martin) the EAD Gal encoding one day per week. We have moved through seven reels and are picking up steam. I will follow up with another post about the XSL transformation we ran with the help of a very clever consultant and the checklist for encoding level 1.

As to my time, I am just finishing up the proofreading and coming off the ADE Annual Meeting in Springfield, IL. Ondine LeBlanc (Director of Publications) and I presented a workshop on getting legacy content out of MS Word and into XML and I was able to share a little bit about this project as well. Overall, we were very pleased with the participation and follow-up questions. I, for one, am very interested to see what manner of digital resources come out of the documentary editing community over the next few years.

Now I hope to get down to the busy work of encoding too... stay tuned.

Wednesday, August 12, 2009

Phase 2 Timeline

Our project to digitize the Adams Papers Control File began in January 2009. We originally planned on spending a few short months on proofreading before moving into encoding. However, proofreading 109,348 slips, one by one, has taken a little longer than we anticipated. This phase of the work is vitally important, though, and we have continued doggedly pursuing our final reels. We have found important corrections and updates and have begun entering those changes into the XML files now. The input of corrections has been folded into first phase of encoding and so far is going smoothly.

The first seven months of the project were also devoted to schema development (see Master Encoding Guide) and this summer we secured the services of an excellent XSL consultant to write an XSL transformation to convert our abbreviated vendor schema into the full schema and populate much of the consistent data automatically. The XSLT's have been very helpful and we hope to build on them to automatically generate other data as we work through the initial encoding.

Thus our schedule for 2009:
  • January-August: proofreading (project manager, proofreader, EAD coordinator)
  • March-June: schema development (project manager and web developer)
  • July-August: XSL development and contract work (project manager, web developer, and consultant)
  • August-December: encoding level 1 (project manager, encoder, EAD coordinator)
  • September-December: XSL development (project manager and web developer)

Master Encoding Guide: Record Basics

<record>
The control file contains 109,348 color-coded slips. While each slip refers to a document, the slip is not exclusive. One document may have several slips representing it. For example, one letter written from John to Abigail may exist as a letterbook copy (white), a recipient’s copy held in the Adams Family Papers archive (pink), a contemporary copy held by the National Archives (yellow), and a printed version from the Boston Chronicle (blue). This same letter may have copies in a dozen institutions, and thus garner a dozen separate yellow accession slips. The overall database, then, is a database of the slips, not of the documents. Because of this data model, all text on the slip is retained as it is written and controlled vocabulary should appear in the attributes. Thus, a recipient that reads "to JA, with 1 enclosure (copy of Lovell to Dana, 6. Jan. 1781)" will appear as it is written but the attributes will be controlled for searching and sorting purposes.

<recipient> to <person ref="JA">JA</person ref="JA">, with <enclosure ref="030001"> 1 enclosure (copy of <person ref="lovelljames">Lovell</person> to <person ref="danafrancis">Dana</person>, 6. Jan. 1781)</enclosure></recipient>.

Required Attributes

@id. Each slip is assigned a unique ID number. It is an attribute of the Record element and is required. To learn how the number is constructed and assigned, see Record ID.

@color. The color of the slip conveys a great deal of information in a quick, easy way, therefore this information will be retained. The vendor was unable to record this during data entry because they were working from black and white microfilm. The color assignments are made in the first phase of work, proofreading, and will be input during the second phase, tag refinement. There are five color choices (1pink, 2white, 3yellow, 4blue, 5goldenrod) to chose from—records will not validate without a color selected.

Optional Attributes.

@z. In the case of redating a letter or correcting authors or recipients, slips are struck through with a large Z and a reference to the newly created slip. This allows for proper tracking of a letter that may have previously been published under the incorrect information. The original slip is marked with the @z (with a value of "z") and the new cross-reference is tagged with a <zref> note tag, see Notes below. [Occasionally, editors have underlined a date and drawn a line to a note on the slip. These are not cancelled but rather cross-references.]

@r. The review tag was introduced for the vendor to flag a record that presented confusion on their part. Almost all of these questions have been resolved at the proofreading phase, however the attribute will remain an option if needed throughout the project.

@language. The large majority of documents described in the control file are written in English, however there is a need to track the number of foreign language documents. When possible, the @language will include any language other than English. The codes will mirror those used in MARC21.
Within the structure of the schema, each record contains <ref name: "record contents"> This is a cross-reference in the <record> element and the contents are detailed under <define name="record-contents">.

Master Encoding Guide: Value-Added Information

Value-Add information

The value-added information will be encoded in Encoding Level 2. These will add links to the reel numbers of the Adams Papers Microfilm, and unique URLs to the Adams Papers Digital Edition, and the Adams Electronic Archives. The details will be determined at a later point.

Master Encoding Guide: Notes

Record Content: Notes

Many records have a line or several lines of text on the bottom half of the slip that does not conveniently fit into any other categories. The vendor was asked to transcribe all of this text and code with the <n> tag. If there was a logical paragraph break between one note and another, each line was tagged separately. There is no limit to the number of <n> tags in a record. In the full schema, the <n> tag will be parsed to control for different kinds of notes. This section may be expanded as we move through Encoding Level 1, but thus far we have identified ten distinct types of notes: cross-reference, see-reference, content, collection, auction, quote, z-reference, enclosure, subject, and internal. Examples will be shown as we work through encoding.

Master Encoding Guide: Printed

Record Content: Printed

Many records will have a line or several lines of text following the word “Printed.” All of this text was coded by the vendor with the <pr> tag. If there was a logical paragraph break between one printed citation and another, each citation was tagged separately. There is no limit to the number of <pr> tags in a record. In the full schema, the <pr> tag will be parsed to allow for tracking of bibliographic information, including titles and authors. This level of encoding is reserved for Level 2.

Example:
<printed>Printed: <ref target="AFC2" href="linktoAbigail">AFC vol. 2</ref>:345</printed>

<printed>Printed: <ref target="BcolCent">Boston Columbian Centinel</ref>, 18 August 1785</printed>

The @target and the @href will link to a separate database of the Adams Papers short titles and to the bibliographic record in ABIGAIL, the MHS online catalog. The section of the full schema may also change as we learn more about the capabilities of the full database.

Master Encoding Guide: Title

Record Content: Title Element

If there is no clearly defined "author to recipient" statement, the second line under the date should be tagged as a title, <ti>. To conform to the RelaxNG schema, there should always be either an AUTHOR and/or a TITLE present in each record.

<title>
As with the author and recipient elements, a title element may include child elements for <person>, <corporate>, and <office>, all with target attributes to separate directories. As with the recipient element, a title element may also include optional child elements for <enclosure>, <content>, and <ref> elements. The <ref> element includes a @target.

Example:

<title> Letter of credence from Congress <person target="huntingtonsamuel">(Samuel Huntington</person>, <office target="presidentofcon">President)</office> to the <office target="stadtholder">Stadtholder of the Netherlands.</office></title>

<legaltitle><docket>
A child element unique to <title> is the <legaltitle> element. Within <legaltitle> there is another child element, <docket>.

Example:

<title> <legaltitle>Minutes on court cases: <docket>J[oshua] Green, Admr. v. G[eorge] Green</docket> <docket>Malcom v. Mackay</docket> </legaltitle> </title>

Master Encoding Guide: Recipient

Record Content: Recipient

<recipient>
Most records consist of an author and a recipient. The author is loosely defined as any text before the first instance of the word "to." The recipient, then, is any text following the first instance of the word "to." The vendor was instructed to include all text following "to" in the recipient tag. Frequently, the text following a recipient’s name includes statements about content, enclosures or references to other letters. During Encoding Level 2, these other elements will be separately tagged. The recipient element includes the same child elements as the author element, that is, <person>, <corporate>, and <office>. The same guidelines and rules apply to these child elements as in the author element (see above). In addition there are three child elements may also be used in <recipient>.

<enclosure>
A letter often includes enclosures. The text of the record will expressly state that an "enclosure" is included. These may also contain a target reference to another letter and there may be multiple references. These are tagged with <ref @target>.

<content>
Similar to an enclosure statement, the recipient text block may also include a note regarding content. This is a child element of the recipient element.

<ref>
There may also appear miscellaneous references to other documents or persons that can be tagged with a <ref> tag with a target to a separate directory or separate document.

Example:

<recipient>to <person target="washingtongeorge">George Washington</person>: <content>Opinion re salaries of American diplomats,</content> with several <enclosure>enclosures (in which are found, on p. 9-10, copy of <ref target="000000">JA to John Jay, 13 May 1785,</ref> & on p. 10, copy of <ref target="00000"> John Jay to JA, 3 Aug. 1785)</ref></enclosure></recipient>

Master Encoding Guide: Author

Record Content: Author Element

<author>
Every record has an author name (or initials) on the second line under the date. The vendor was instructed to consider all text that appears before the first instance of the word "to" as an author and code under one author <a> tag. For most records, there will be one name (or initials) before the first instance of the word "to." In the full schema, there are three optional child elements to encode within the parent author tag: person, corporate, and/or office. These tags can be used together or separately. See below for examples.

<person>
Refers simply to a single individual. This element includes a target reference to a separate database of persons with birth and death dates. During encoding (Level 2), encoders will assign a target reference pulled on a list of the most commonly found names. In addition, any names not found on the list will be assigned a target reference. Construction of the target reference should be all lower case, "lastfirstmiddletitle." This will allow for cleaner sorting in the names database.

<corporate>
The corporate tag borrows from the standard cataloging rules for a corporate author. That is, a group of individuals, such as the U.S. Congress.

<office>
The office tag is used to distiguish an individual author from someone writing in the capacity of their office, such as the President of Congress, writing official documentation in his capacity as the President of Congress, but not necessarily as his own person. For example, John Adams may write to his colleague, Samuel Huntington, or he may write to the President of Congress who happens to be Samuel Huntington for several months.

Example:
<author><person target="huntingtonsamuel">Samuel Huntington,</person><office target="presidentofcon"> President of Congress,</office></author>

<author><corporate target="houseofrep">House of Representatives<corporate></author>

Master Encoding Guide: Series

Record Content: Series

Some records will have a handwritten code on the bottom right corner consisting of a Roman numeral, either “II” or “III”, followed by an Arabic number. Only the Roman numbers should be transcribed and should be tagged with <series>. The encoder need only verify that this code is valid. If not, check with project manager for possible slip error.

Example:

<series>II</series>

<series>III</series>

Master Encoding Guide: Copy

Record Content: Copy

Many records will include a format designation, usually on the fourth line, following the page number. The most common text will be: MS, Xpr, EnlPr, Xerox, microfilm, Photostat. These abbreviations and any text following should be coded with the <copy> tag and the information will be controlled using @format. There are only four @formats to chose from: Photocopy, Manuscript, Microfilm, Digital Image.

Example:

<copy format="Photocopy">Xerox print</copy>

<copy format="Manuscript">MS</copy>

<copy format="Microfilm">microfilm</copy>

<copy format="Digital Image">Electronic achive scan</copy>

Master Encoding Guide: Length

Record Content: Length

Many records will include the document length, generally on the third line, below the author and recipient line. The convention most records will follow is an Arabic number and "p." for "pages." All text on this line should be coded as <length pages="3">. If there are multiple page numbers listed to signify a letter and an enclosure, the pages are added up and entered as one value.

For example:

<length pages="4">2 p., 2 p.</length>

Master Encoding Guide: Code



Record Content: Code

About half of all records include a code found on the upper right corner of the slip. The codes are used to track individual accession documents, letterbook documents, or diary and miscellany documents. The codes are a combination of letters and numbers. The vendor was asked to encode all of this text with <c> and we later ran an XSL transformation to parse out the distinct pieces of information. This will allow us to track the documents by their accession and letterbook numbers as well as to separately track the institutional codes for accesssioned documents.

There are four types of codes and each one is also associated with slip color: letterbook=white, accesssion=yellow, diary=pink, and miscellany=pink.

Letterbook codes all begin with "Lb" and have up to a five-digit number following it. These slips are white (<record color="3white">). They have an optional @author as well, but this is only rarely applicable:
Example:

<code type="letterbook" number="1234">Lb1234</code>

<code type="letterbook" number="21" author="JQA">JQA/Lb/21 [end]</code>

Accession codes are made up of an insitution code (where the original is housed and from whom we received a copy) and up to a six-digit number (the document tracking number). These slips are almost always yellow (<record color="2yellow">). Occasionally, an accession code is found on a blue slip. This is considered an editor error in the creation of the slip and will be reconciled later. The code should be rendered the same.
Example:

<code type="accession" repository="DNA" number="2589">DNA:2589</code>

<code type="accession" privateowner="MBSmith" number="2589">MBSmith:2589</code>

Miscellany codes are essentially shelf marks for their physical location, constructed with M for Miscellany, initials (usually JA or JQA) for the author, and a number for the volume.
Example:

<code type="miscellany" author="JA" number="78">M/JA/78</code>

Diary codes are also essentially shelf marks for their physical location, constructed with D for Diary, initials (usually JA or JQA) for the author, and a number for the volume.
Example:

<code type="diary" author="JQA" number="12">D/JQA/12</code>

General codes. Some slips, usually blue, have a code associated only with an institution but do not have a number attached because we do not have a physical copy to accession and track.
Example:

<code type="general">NN</code>

Master Encoding Guide: Place

Record Content: Place

If a place is included on the slip, it will follow the date on the top line. Place names should be transcribed as they appear, either abbreviated or spelled out, tagged with <place> element. Place names are controlled through two attributes: location and axis.

Required Attributes

@location.
If a place name appears on the slip, it is usually rendered exactly as it appears in the document, therefore "The Hague" might be written in the French, "La Haye," if it appeared that way on the original manuscript. The value of the @location should have the English spelling of the city only. Any other place information, for examle a street address, will be found in the text of the record. If two city locations are included, both cities will be listed as the value of @location, separated with one white space. When there is a question about the proper English spelling of a city, consult the Getty Thesaurus for Geographic Names.

Example: <place location="The Hague">La haye</place>
<place location="Philadelphia">Philadelphia, 32 South Street</place>
<place location="Braintree Quincy">Braintree and Quincy<place>

Most of the city names have been automatically populated using XSL transformation. Encoders need to just check that the values are English city names only.

Optional Attributes

@axis.
There is an optional attribute to include the axis point for a given location. This information is provided in the Getty Thesaurus. The rules for encoding the longitude and latitude axis points will be determined during encoding level 2.

Still Proofreading... but Encoding Begins!

It has been a busy couple of months since the last post. While continuing with the proofreading, which has taken much longer than originally estimated, we have developed a full schema (also in RelaxNG) for encoding the data and have hired a consultant to develop some nifty XSL transformations to move the data from our short vendor schema to the full schema. Following this post I will begin uploading the master encoding guide that provides a detailed narrative of each element and examples of the mark-up.

Master Encoding Guide: Dates


Record Content: Dates

<date> The date element is the most important piece of information on each slip. The physical card file, as well as the physical archive collection and the microfilm, is arranged chronologically. It is the first access point to the documents. It is also the most complicated, due to the variety of date formats and the rules regarding sorting. The date section of the full schema is derived from the unique dating structure of the control file, as it was originally organized and inventoried. As noted above, all text in the date field is retained as is and the attributes control the data for sorting and searching.

Required Attributes
The <date> requires a choice of one of the following attributes:

@when.
The most comment type of date found in the control file are exact dates with a known day, month, and year. These are controlled in the @when with a date parameter of year-month-day (YYYY-MM-DD). If either the month or day is uncertain, encode the @when with "99", i.e. YYYY-99-99. This will allow for proper sorting—complete dates come before incomplete dates.

Example: <date when="1771-12-21">21 Dec. 1771.</date>
<date when="1776-07-99">July 1776</date>

@when and @ante.
Any date that is known to be written before a certain date using the term "ante." These are also controlled with the standard date parameter year-month-day.
Example: <date when="1790-07-11" ante="ante">ante 11 July 1790</date>

@when and @post.
Any date that is known to be written after a certain date using the term "post."
Example: <date when="1790-07-11" post="post">post 11 July 1790</date>

@when and @to.
Any range of dates with a known beginning and end. These elements must be used as a group.
Example: <date when="1790-07-11" to="1790-08-31">11 July–31 Aug. 1790</date>

Optional Attributes
The following attributes may be added to any of the required attributes above:

@circa.
Any of the above four types of dates (when, when/ante, when/post, and when/to) that are preceded by a "ca." or "circa" should also include the @circa. The value is "yes" and @circa can be combined with any other date attributes.
Example: <date when="1745-10-21" circa="yes">ca. 21 Oct. 1745</date>

@conjectural.
Any date that appears in brackets is a conjectural date. The date may be conjectural for a variety of reasons (illegible, not present on the letter, supplied by context), but the key feature is the use of brackets. The value is "yes" and @conjectural may be combined with any other date attributes.
Example: <date when="1772-12-02" conjectural="yes">2 Dec. [1772]</date>

@noDate.
Any date field that includes the phrase "n.d." should have the @noDate. This may appear alone, but more often accompanies a supplied date in brackets. The @noDate may be combined with any other date attribute.
Example: <date noDate="yes" post="1773-06-16" conjectural="yes" >n.d. [post 16 June 1773?]</date>

@rank.
The rank will be added using a separate XSL transformation. The ranking will be determined based on the number of attributes to allow for proper sorting.

A note on sorting.
The sorting rules for the database will follow the filing rules used in the physical paper file. Not only are the editors already familiar with these rules, they are the most logical method for the variety of dates—certain and uncertain, complete and incomplete—found in the control file. The original directive adequately described the sorting:

The file contains all ribbon copies of all four types of control slips. In cases where several slips represent different versions of the same document, the order of filing is as follows:

Pink (CDF), White (Letterbook), Yellow (Accession), Blue (Printed or Manuscript Lead)

Arrangement of slips in this file follows strict chronological order throughout the entire span of our documents, 1639-1889. Dates which contain all three elements (day, month, and year) precede those which are incomplete. Inclusive dates are filed under the earlier of the dates, but following all other slips for that day. (An exception to this rule is the filing of accounts with inclusive dates, which go under the later of the inclusive dates, but preceding all other slips for that day.) An example may be taken from the end of a theoretical year:
Ante 15 Dec. 1800
15 Dec. 1800
15-19 Dec. 1800
31 Dec. 1800
31 Dec. 1800-21 Jan. 1801
31 Dec. 1800-18 May 1801
Dec. 1800
1800
Ca. 1800
[1800?]
1800-1809
[post 1800]

Friday, April 24, 2009

Phase 1 Continues

Since late February, we have been downloading our completed XML files from the vendor and preparing them for proofreading. During this phase of work, the focus has been to check the accuracy of the transcriptions against the good old paper slips. After some hemming and hawing, our committee decided that paper-to-paper proofreading was still the best method, so we have run the XML files through an XSL transformation, producing a fresh paper copy that closely matches the original paper files. The major difference, of course, is that our new paper copies can fit about six records to a page. So, for the past six weeks (and continuing until it's done) several of us are spending the bulk of our time proofreading about 100,000 tiny slips of paper against about 20,000 larger pieces of paper. This stage, while tedious, is an important first step before the full encoding and data improvement.

For the project manager, sitting down with hundreds of records everyday has been enormously helpful in developing the full schema. After much discussion, investigation, and trial and error, the project committee decided to develop a home-grown schema. A full discussion of the evolution of the schema and its latest iteration will follow in the next post--stay tuned!

Wednesday, April 22, 2009

Data Entry Complete!

After downloading the final reels on April 6, the data entry component of the project is now complete. Our vendor, Atlis Publishing & Graphics Services, which was acquired by Data Stream Content Solutions in January, delivered 42 XML files over six weeks. These files represent the 42 reels of microfilm that we sent them in January. DSCS converted the microfilm to JPG images and transcribed and marked up the text according to the encoding guide we prepared for them. Using a combination of programmatic and manual conversion, DSCS converted 109,348 records into XML. As they were completed, the files were uploaded onto their FTP server, which we downloaded every few days. They also transcribed the not infrequent handwriting--all to great success. In addition to the XML files, DSCS also sent us PDF files of all the images with the corresponding unique ID number they assigned each record in the encoding. This resource has been extremely helpful in proofreading.

Overall, we are very happy with their work and would highly recommend their services.

A note on estimates:
Our extensive planning over the past two years started with a major grant request as well as an RFP process, which forced us to calculate the number of records and approximate key strokes to better assess our options. Using Statistics 101 and a little common sense sampling, we estimated a total of 108,400 records (note the actual of 109,348) and an estimated keystroke per record of 247 (the actual turned out to be 246.625). Pretty good since we only had three giant file cabinets, a ruler, and a calculator to figure it out!

Thursday, February 19, 2009

Proofreaders: Note the Slip Color

During proofreading, please note on the XSL printout the color of each slip. In the left margin, mark either "p" for pink, "y" for yellow, "b" for blue, "w" for white, and "g" for goldenrod. This information will be included as a controlled attribute in the record ID tag.

Friday, January 30, 2009

AP Catalog featured in MHS eNewsletter

Check out the latest edition of @MHS, the eNewsletter of the Massachusetts Historical Society for an article on the new NHPRC-funded project to digitize the Adams Papers Catalog.

Update to Initialed and Post-Dated 1954

The resolution to the previous question about editors who had initialed and dated the slips they created has been updated. While in many cases, this added piece of information may ultimately not be used, several editors felt that it is worth keeping in the rare event that we would need to track the creation of a slip. This most likely would occur with a blue lead slip that was created from incomplete auction information. 

New Resolution: Since the information will all be transcribed by the data entry vendor, information that is pertinent or useful only to Adams editors will be retained in a note field that will not show up in the public online presentation.

Tuesday, January 13, 2009

Initialed and Dated Post-1954

Question: As the control file was created over the past 55 years, there have been a variety of editors who "signed" each slip with their initials and the date. For example, "LCF, 14 March 1958." Presumably, this measure was taken to track the work if questions arose at the time. Alas, we cannot follow up with Mr. LCF in 2009.

Resolution: This information, usually found at the bottom corner of a slip, will be omitted from the final records and should be marked for deletion in the proofreading process. The project manager will make the deletion to ensure that it is not pertinent to auction sales, etc.