Brought to you by the Massachusetts Historical Society

"I have nothing to do here, but to take the Air, enquire for News, talk Politicks and write Letters."

John Adams to Abigail Adams, 30 June 1774

Friday, November 20, 2009

Place names and attributes

The element for place includes an attribute for a specific authority-controlled location. Thus, text that appears as "35 Court Street, Philadelphia" should be tagged as

<place location="philadelphia">35 Court Street, Philadelphia</place>

In controlling the attribute names in a separate database, all locations should (when possible) be listed at the city level. The default city is assumed to be in Massachusetts and then the U.S. If there is a duplicate, then you should add a hyphen (with no spaces) and the two-letter state postal code. If it is a duplicate name in a foreign country then you should add a hyphen and the full country name.

For example, London is understood to be "london", but Plymouth when alone is assumed to be Massachusetts, but if it is Plymouth, England then the attribute should read "plymouth-england" or likewise Plymouth, New Hampshire is "plymouth-nh."

Check the place name directory to confirm the authority spelling. When adding a new place name to the directory, use the Getty Thesaurus English spelling.

If only the county name is known, render it with a hyphen, all lower case, i.e. "suffolk-county."

Lastly, sometimes there are several locations listed. The first city name should be in the main location attribute. Subsequent place name attributes can be added as empty tags.

Tuesday, November 10, 2009

Checklist for Encoding Level 1

We are currently 35% through Encoding Level 1 which involves inputting proofreading corrections, verifying the basic code and creating the first of several authority look-up tables--this one for place names. The work is still broken down by reel. The following is the checklist for each record.

To open a new file for encoding level 1:

--Open new XML file through the Tortoise SVN Directory at C:\Repositories\slipfile\xml\proofread and confirm that the file name ends in “_level1”

--Confirm that FULL_schemaV2_MR.rng is associated through the Tortoise SVN Directory at C:\Repositories\slipfile\xml\schemas (reassociate if the red underlines don’t appear)

--Run the XSL transformation copyformat.xsl; overwrite the new file under same name.

--Commit these changes by right clicking on the slipfile folder on your C:\ drive and selecting "SVN Commit" from the drop down menu. Select the files to commit, click "OK" and then type in your password.

To open a working file for encoding level 1:

--Open XML file through the Tortoise SVN Directory at C:\Repositories\slipfile\xml\level1

--Enter changes and save periodically to the Tortoise SVN Directory at C:\Repositories\slipfile\xml\level1

--When finished with work, commit changes by right clicking on the slipfile folder on your C:\ drive and selecting "SVN Commit" from the drop down menu. Select the files to commit, click "OK" and then type in your password.
For each record:

Input proofreading file changes

--Confirm @color, enter if absent (if you delete the entire @color and hit the space bar, a drop down menu will appear with possible attributes and values). The choices are: 1pink, 2yellow, 3white, 4blue, or 5goldenrod

--Confirm <place>, remove unnecessary information from @location and confirm correct English spelling; confirm place name against Excel spreadsheet list and add new authority names to list i.e. “Philadelphia, 31 South Street” should have a @location value of “Philadelphia”

--Confirm <code>, use drop down prompts to fill in attributes when necessary Codes that are not @type=Accesssion, Letterbook, Miscellany or Diary should be encoded as “General” under the @type, i.e. “TS Wills and Deeds”

--Confirm <length>, enter value in @pages if absent: add multiple page numbers listed, i.e. if there is an enclosure and <length>2 p., 3 p. </length> then the total value for @pages= “5”.

--Confirm <copy>, enter value for @format. The copyformat.xsl should have populated most of these. when there are two values, one for MS and one for XPr (or the like), copy @format should have “Manuscript” as value and the subsequent XPr’s should be encoded as a note

--Confirm <date>, verify that populated dates are correct, confirm all attributes are present as necessary, enter @to for date ranges and any other appropriate @.

Most of the date should be automatically populated, except for date ranges. A date range will have the first date entered as an @when, the encoder must enter the end date in @to as year-month-day. For unknown months or days, enter “99” . For conjectural or corrected dates, encode the corrected date. For questions, check Master Encoding Guide.i.e., "1 January 1799 [i.e. 1800]" should be @when="1800-01-01".

--Add new slips found in paper file, create new ID number at end of reel

--Cross check any changes in the Corrections Binder (may be redundant, but important!)

Wednesday, November 4, 2009

Proofreading Complete...Finally

Yes, the proofreading phase has finally ended. Yesterday, I finished doing a paper-to-paper cross check on the 109,348th slip--and then some. While the vendor counted 109, 348 records, when all is said and done, the number may be off by several hundred slips. The proofreading phase did not just check, character by character, the transcription completed by our vendor, it also served as a slip-by-slip inventory of the entire catalog. The microfilm that the vendor used to transcribe was created in 2001 and since that time the editorial staff has continued to find more documents to add to our archive. The number of additions is not yet known, but it will probably number in the hundreds. These new slips will all be added to the XML files during the first phase of encoding, which is well under way.

While the proofreading phase has been the most unpredictable aspect of the project thus far, it has been a critical component to complete. Ensuring the integrity of the database by making the content is as accurate as the tagging is important not just for the editors but for all online users of the archive. A catalog is only as good as its accuracy--if we can't trust it then no amount of fancy web coding will encourage people to use it!

I have to give proper acclamation to the Control File team:
  • Jim the Proofreader/Encoder proofread 57,157 slips
  • Susan the EAD Gal proofread 14, 878 slips
  • and I clocked in about 37,000 (give or take a few)
Cheers!