UNR Campus Conferences: Big Data

bigdataunrconf

outside_color

NATURAL LANGUAGE PROCESSING AND OPTICAL CHARACTER RECOGNITION:
THE VIEW FROM THE HUMANITIES
Christopher Church, Assistant Professor, History
NLP and OCR: the View from the Humanities Natural Language Processing (NLP) presents numerous opportunities for humanities research, particularly in the field of history. For one, NLP allows historians to overcome their current inability to look at cultural memes in context ­to see how much or frequently something was said relative to everything else that was said. Additionally, it enables historians to create maps of linguistic and cultural change over time, or to paint a synchronic picture of a particular decade, movement, or ideology. In short, it allows historians-and humanists more generally-to obtain a bird’s-eye view of their material. However, there are real challenges to performing NLP on historical documents, namely issues related to Optical Character Recognition (OCR). Rather than relying upon “born-digital;’ “found” or “curated” data sets, historians must create their data themselves from oftentimes spotty archives, degraded materials, or handwritten documents. This presentation will explore the importance of OCR to NLP in the humanities, while attending to the pitfalls of relying too heavily on curated data and proposing some ways to overcome the inherent messiness of the data with which humanists wrestle.

This entry was posted in Presentations. Bookmark the permalink.