Since late February, we have been downloading our completed XML files from the vendor and preparing them for proofreading. During this phase of work, the focus has been to check the accuracy of the transcriptions against the good old paper slips. After some hemming and hawing, our committee decided that paper-to-paper proofreading was still the best method, so we have run the XML files through an XSL transformation, producing a fresh paper copy that closely matches the original paper files. The major difference, of course, is that our new paper copies can fit about six records to a page. So, for the past six weeks (and continuing until it's done) several of us are spending the bulk of our time proofreading about 100,000 tiny slips of paper against about 20,000 larger pieces of paper. This stage, while tedious, is an important first step before the full encoding and data improvement.
For the project manager, sitting down with hundreds of records everyday has been enormously helpful in developing the full schema. After much discussion, investigation, and trial and error, the project committee decided to develop a home-grown schema. A full discussion of the evolution of the schema and its latest iteration will follow in the next post--stay tuned!
Friday, April 24, 2009
Wednesday, April 22, 2009
Data Entry Complete!
After downloading the final reels on April 6, the data entry component of the project is now complete.  Our vendor, Atlis Publishing & Graphics Services, which was acquired by Data Stream Content Solutions in January, delivered 42 XML files over six weeks. These files represent the 42 reels of microfilm that we sent them in January. DSCS converted the microfilm to JPG images and transcribed and marked up the text according to the encoding guide we prepared for them. Using a combination of programmatic and manual conversion, DSCS converted 109,348 records into XML. As they were completed, the files were uploaded  onto their FTP server, which we downloaded every few days. They also transcribed the not infrequent handwriting--all to great success. In addition to the XML files, DSCS also sent us PDF files of all the images with the corresponding unique ID number they assigned each record in the encoding. This resource has been extremely helpful in proofreading.
Overall, we are very happy with their work and would highly recommend their services.
A note on estimates:
Our extensive planning over the past two years started with a major grant request as well as an RFP process, which forced us to calculate the number of records and approximate key strokes to better assess our options. Using Statistics 101 and a little common sense sampling, we estimated a total of 108,400 records (note the actual of 109,348) and an estimated keystroke per record of 247 (the actual turned out to be 246.625). Pretty good since we only had three giant file cabinets, a ruler, and a calculator to figure it out!
Overall, we are very happy with their work and would highly recommend their services.
A note on estimates:

Our extensive planning over the past two years started with a major grant request as well as an RFP process, which forced us to calculate the number of records and approximate key strokes to better assess our options. Using Statistics 101 and a little common sense sampling, we estimated a total of 108,400 records (note the actual of 109,348) and an estimated keystroke per record of 247 (the actual turned out to be 246.625). Pretty good since we only had three giant file cabinets, a ruler, and a calculator to figure it out!
Labels:
Handwriting,
Record ID,
Vendor Encoding Guide
Subscribe to:
Comments (Atom)
 
