Brought to you by the Massachusetts Historical Society

"I have nothing to do here, but to take the Air, enquire for News, talk Politicks and write Letters."

John Adams to Abigail Adams, 30 June 1774

Showing posts with label Encoding Level 2. Show all posts
Showing posts with label Encoding Level 2. Show all posts

Friday, March 11, 2011

Document Types & Names

At the moment, we are working on entering "Document Types" into the database. This is a hybrid part of the project where we are working both with the slips and interface. See images below. We have completed JA's document types and are working now on JQA. I skipped AA as I couldn't find her slips.














At the same time, the good people in the Adams papers are doing preliminary work on cleaning up the names database. This involved printing out the entire list of names and looking particularly at the Adamses, Smiths, etc. for duplicates and then seeing which can be merged or which can be better identified. Of course our by now classic example is Thomas Baker Johnson who had at least three entries (johnson-t-b; johnson-thomas-b; and johnson-thomas-baker). They have all been fixed in that the attributes are all "johnson-thomas-baker" now. For now, "Johnson, T. B." and "Johnson, Thomas B." still appear in the drop down list of names when searching slips as these forms of his name do appear on the physical slips. However if one were to select these options they would get a return of 0 results. Perhaps it will be worth it to remove them altogether?

Wednesday, February 23, 2011

I Killed John Quincy Adams, or, I was only doing my job

In September 2010, while performing encoding level 2 tasks on reel 33 (covering the years 1846-1851) I killed John Quincy Adams. I was only doing my job! His passing was solemnly marked with a beer after work. We remember JQA in part for his voluminous diaries and correspondence; his poetry; and his service to the United States. This post will look at some milestones and metrics including JQA's first letters to his parents and wife and the last letters he sent and received. Please keep in mind it relates only to those items present in the Adams Control File; and some of the numbers might change a bit as we clean up the data.

John Quincy Adams was born on 11 July 1767. Today, 23 February, is the anniversary of his death, 163 years ago in 1848.

The first letter he wrote held in the Adams Family Papers is to his cousin Elizabeth Cranch (Mrs. Jacob P. Norton), circa 1773.

The first letter to his father, John Adams, dates to 13 October 1774. The first letter to his mother Abigail Adams was written from Paris on 12 April 1778. The first letter JQA sent to his future wife Louisa Catherine Johnson (LCA) was from The Hague on 2 June 1796. In all, the MHS has (or knows about) 619 letters from JQA to LCA; and there were 451 the other way, from LCA to JQA.

The first letter JQA received was from his father, written from Philadelphia on 18 April 1776. The first letter JQA received from his mother was written from Braintree on 21 January 1781. The first letter JQA received from Louisa Catherine Johnson was from London, dated 4 July 1796: America's 20th birthday. In all, JQA received 18,475 letters.

His last dated poem was attributed to ca. 21 February 1848 as is titled "In days of yore the Poets pen ...." This poem was eventually published in Poems (New York, 1848, p. 108) under title of "Written in an Album." Indeed, his last dated documents seemed to have all been poetry.

The last letter he received whilst alive was a two page letter on 13 February 1848, from Willis Baldwin of Monroe Co., N. Y. JQA did receive one letter following his death, a four pager dated 29 February 1848 from Boston, co-authored by Edward Brooks and Dr. John Bigelow.

The last letters he is known to have sent were on 4 and 6 February 1848. On 4 February he sent a 1 page letter to Alexander Baring, the Lord Ashburton, from Washington; this is a letterbook copy. On 6 February, he sent a 1 page letter to Julia Raymond which included the poem "Fair Lady! when at thy request These fingers trace my name..."

The original slip files were scanned onto 42 "reels." JQA's attribute "adams-john-quincy1767" appears as an author, recipient or in the title field 47,581 times in 40 of the reels. He does not appear to be in reels 37 (the year 1863) and 40 (the years 1867-1870). This averages out to 1189.525 times his attribute appears per reel in one of the aforementioned fields.

JQA's initials - which can appear nearly anywhere, any number of times in a single record - appears 57,949 times in all 42 reels, or an average of 1379.7380952380952380952380952381 times per reel.

Tuesday, January 25, 2011

John Adams by the numbers

John Adams died on 4 July 1826 at the ripe old age of 90. His death falls slightly more than halfway through the Adams Papers slip file, but when we reached this date in Encoding Level 2 last summer, we felt we'd marked an important milestone in the project. So in honor of John Adams, I've compiled some statistics related to him.

As of now, John Adams appears in the database as an author 13,943 times. He appears as a recipient 11,782 times. These numbers reflect original letters, retained letterbook copies of outgoing correspondence, and other documents. While the MHS holds the vast majority of these items, the slip file lists all known Adams family manuscripts, including those held by other institutions and those in private hands.

The earliest extant letter from John was written at Worcester on 1 Sep. 1755 to Nathan Webb. The first letter John received was written by Richard Cranch in Oct. 1756.

Correspondence between John and Abigail Adams accounts for 1,376 records in the database. Ironically, although Abigail often complained of how little John wrote, he wrote many more letters to her than she did to him--an impressive 912 to her 464 (including letterbook copies). John's first letter to his future wife was written 4 Oct. 1762, two years before their marriage, and her first letter to him was written 11 Aug. 1763.

According to the slip file, John Adams exchanged 662 letters with his "frenemy" Thomas Jefferson--376 to him and 286 from him. Their correspondence spans almost 50 years, beginning with Jefferson's letter from Williamsburg, Va. on 16 May 1777 and ending with Adams' of 17 Apr. 1826, less than three months before the day both men died.

John Adams wrote his last letter on 22 June 1826 to Roger C. Weightman, at that time the mayor of Washington, D.C. Four days later, the last letter to John was written by Ebenezer Clough. It was one of only three letters Clough ever wrote to the former president.

Monday, January 10, 2011

Supporting Databases, Part 1

This sounds like an Oscar category...And the nominees for Best Supporting Database in a Digital Conversion Project are: Accessions. Institutions. People. Places.

The supporting databases in the project allow us to regularize and make a consistent way in which to store and retrieve information. At the present time, there are four supporting databases: Accessions, Institutions, People, Places. There are additional supporting documents that we created and used such the Microfilm Conversion Chart. Fellow Adams Slip File encoder and blogger Susan Martin worked with the Accessions and Institutions databases as well as the Microfilm Conversion Chart and MHS Collection Codes, so she will write on them.

As mentioned in the post on 15 December 2010, at that time the People database contained 19,454 names. This number will fluctuate a bit as digital control file staff and Adams Papers editors identify duplicate entries and/or clarify & identify more fully those records for which staff have more information. Occasionally also we find names skipped during encoding level 2; this generally was the result of the density or complexity of a record.

The Places database was the first to be built and populated during Level 1 Encoding. In Level 2, while not a focus, we took the opporutnity to review attributes and perform basic data clean-up if necessary. The Places database contains 3,090 records: from Abbeville to Zwolle.

The fields we populated in Level 1 in the Places database are location, city, state, country, and notes. The location field is the controlled form of the entry - the attribute. Generally the first time a city appeared it received a one word attribute: "quincy", "tallahassee", and "athol" for example. However, once the country expanded, we were left with the task of differentiating between places with the same name in different states and/or countries. A good example is Burlington. We have eight different records for Burlington: "burlington", "burlington-county", "burlington-ia", "burlington-ma", "burlington-me", "burlington-nj", "burlington-ny", and "burlington-vt". We assigned the fullest known attribute to distinguish one from the other. However, sometimes the address listed simply says Burlington. In these instances it was not always possible to determine if it was the Burlington in Massachusetts or some other state.

This is a long way of saying we did the best we could with the information we had. As with the People database, the Adams Papers editors can use their expertise to help solidly define and identify a place if needed.

Wednesday, December 15, 2010

What's in a name tag?

We started Encoding Level 2 in late April 2010 and this process continued until November. In Level 2, the focus was on names as well as data appearing within the <title> tag. An XSLT was created to automate much of this, so that in an author or recipient tag, each instance of JA, AA, JQA, etc. was automatically converted to "adams-john1735", "adams-abigail1744", "adams-john-quincy1767", etc. For non-Adams correspondents, Thomas Jefferson was flipped to "jefferson-thomas", etc. Would that it be this consistent the whole way through the project! While this did a lot of the work it did not count for all the variables that are inherent in a collection the size of the Adams Papers. For text within the <title> tag we added the information from scratch.

The XLST looked for the text within an author or recipient tag and flipped them around. If contained a recipient tag, an additional rule was created to skip the word "to" which always appears and thus take the second and the last word within the tag. For example,
<recipient>to HA</recipient>
was converted to
<recipient><ref target="adams-henry1838" type="person">to HA</ref></recipient>.
And
<recipient>to Simeon Andinwooll</recipient>
was converted to
<recipient><ref target="andinwooll-simeon" type="person">to the HA</ref></recipient>.
It should be clear also that all attributes were automatically converted to all lower case. Ultimately, this was correctly applied to the majority of records, but there were anomalies and thus the style sheet introduced also some bad attributes. For example, Classifying the type "person" was the default, so for offices, corporations, etc. we had to manually fix the type attribute. Those individuals who went by their initials (E. W. Dodge) posed another set of issues; we were very literal in our transcription of the data on the original slip files (which in its turn is faithful to the original document), so unless E. W. Dodge was defined as, for example Eliphalet Winchester, he (assumed) is confined to the anonymity of how his (assumed) name was signed.

Sounds easy, right? After the first few weeks we got into the grove and we learned tricks, what to look for, etc. On average a reel took maybe four or five days, depending on the number of records and any significant events such as deaths, wars, and the like. As I said above, we were very literal in the process of creating name authorities not all of us are Adams experts. A fine example of this is T. B. Johnson. T. B. Johnson signed many of his letters as T. B. Johnson. He also signed them Thomas B. Johnson. And, there were a few that were Thomas Baker Johnson. But, he is likely not the only T. B. Johnson in the history of the world, so it is difficult to determine if they are one in the same or different people. So currently all three variants exist in the database which could make searching kind of difficult and not exhaustive. Frequently in a run of letters we were able to determine that T. B. was indeed Thomas Baker, and so in cases like this we felt comfortable changing an instance of T. B. Johnson to the fuller Thomas Baker Johnson. However, if we could not conclusively determine that it was indeed Thomas Baker Johnson, we left it alone and one of the Adams editors can make that change.

More examples...

Letters with multiple authors and/or recipients were a little complicated, as well as letters addressed generally to someone by their title/position/office.

So, after Level 1 a sample multiple author tag appeared as:

<author>JA, B. Franklin, J. Jay, H. Laurens, and T. Jefferson.</author>


After we ran the schema and did some house cleaning, it was transformed to look like this:

<author> <ref target="adams-john1735" type="person">JA</ref>, <ref type="person" target="franklin-benjamin">B. Franklin</ref>, <ref type="person" target="jay-john">J. Jay</ref>, <ref type="person" target="laurens-henry">H. Laurens</ref>, and <ref type="person" target="jefferson-thomas">T. Jefferson</ref>.</author>


Where a letter was addressed to someone by their title/office etc., after Level 1, a sample recipient tag looked this way:

<recipient>to the President of Congress</recipient>


After the XSLT was run, it looked like this:

<recipient><ref target="congress-the" type="person">to the President of Congress</ref></recipient>


You can see that that transformation skipped the word "to" and then looked at the next word and the last word. Once we reviewed the records and conducted a little research, it thus became this:

<recipient><ref target="huntington-samuel" type="person">to the President of Congress</ref></recipient>.


Simple beauty!

As of right now, the names database contains 19,454 names. We have yet to systematically clean up possible duplicates like Johnson example above, or instances where the names were spelled different in America than in, say, the Netherlands. Fortunately these are exceptions and not the rule, so the process should go smoothly. A lot more needs to be said about Encoding Level 2 and I'm sure I didn't touch on many of the aspects of our process. But hopefully this post gives a little flavor as to the goings-on at the Massachusetts Historical Society during the spring, summer, and fall of 2010.

Tuesday, April 20, 2010

Control of Control File Closing In!

Despite the lack of posting, we have all been plugging away at Encoding Level 1 and have just completed the initial phase of mark-up on all 110,000 records. As it stands now, we have 42 XML files that have been proofread and input to match the massive paper file down the hall. In addition, we have control (through attributes) over all dates, codes, locations, length, and format for each record. The big pieces that remain for Encoding Level 2 are the control of names--for both authors and recipients--published references, and notes.

As we've gone through the encoding, we also been developing supplemental databases that will enhance search-ability in the final interface. These currently include locations (where letters were written) and accessioned documents (repositories other than the MHS that hold the original manuscripts, i.e. Library of Congress). We are also building a supplemental database of all persons and short titles (published versions of documents).

Much of the work in the coming months will focus on Encoding Level 2 (with an emphasis on automating as much data entry as possible through XSLTs) and the building of the database infrastructure in eXist. As we iron out the kinks in building and managing these databases, I will post what we learn and produce. Stay tuned!