Adding Unique Identifiers in OpenRefine

Sometimes you may want to add unique identifiers (UIDs) to your data in OpenRefine (eg. migrating the data into a Database Management System (DBMS) like Access or Filemaker).

It’s nice to have a set number of leading zeroes, especially if you’ll sort your data alphabetically.

To do this, you’ll need to add a new column based on any column, which will bring up a dialogue window.  Edit column > Add column based on this column…


For your GREL (Google Refine Expression Language) expression, enter the following:

      “0000”[0,4-row.index.length()] + row.index


* Make sure to enter a column name (above circled in blue).

* * *

Here’s what the GREL means:

  • row.index” is a controlled term for the number of the row counting from the top (beginning with 0)
  • “0000” is a string of four zeroes that will be spliced into the index.
  • row.index.length() is how many characters make up row.index (treating it as a string) — so “1981” would have a length of 4, whereas “30” would have a length of 2.
  • [0,4-row.index.length()] slices the string of zeroes to match however many are needed to bring the total number of numeric places to 4. If the index is “13” (length of 2 characters) and you want a total four numbers (0013), then it will take only 2 zeros from the string.
  • finally, “+ row.index” concatenates the original index to the preceding zeros. — so in the case of the above example, it’ll add together “00” and “13” to get “0013”

You can increase the number of leading zeroes to however many you need, but you’ll need to make a few changes.

  1. First, you’ll need to update “0000” to match however many number places you want.
  2. Then you’ll need to change 4-row.index…. to X-row.index….. — where X equals the number of number places.

For example, if you want to increase the total number places to 6, change the expression to

  • “000000”[0,6-row.index.length()] + row.index
Digital Humanities at Berkeley, Summer Institute


This workshop will discuss methods of data retrieval, data cleaning, and visualization.  Participants will discuss how websites are structured and learn how to collect a data set with webscraping.  Participants will learn how to use tools like OpenRefine for cleaning and transforming data and then visualize data using Gephi, an open source tool for network analysis.

Syllabus – Network Analysis


Christopher Church is an assistant professor of history at the University of Nevada, Reno. Before joining the history department at UNR, he was the Program Coordinator at UC Berkeley’s D-Lab. He studies colonialism, citizenship, and environmental history. He is well versed in databases, GIS, scripting, network analysis, and web design. He is tasked with developing a digital humanities curriculum at UNR.



Human Face of Big Data (April 30, 2015)

human-face-big-data2_april2015 human-face-big-data3_april2015 human-face-big-date_april2015

The Human Face of Big Data (April 30, 7pm)


The Human Face of Big Data: the Promise and Perils of a Planetary Nervous System

Come watch the award-wining documentary, The Human Face of Big Data.
Thursday, April 30, 7pm – Wells Fargo Auditorium (MIKC 124)

Stay for hors d’oeuvres and a panel discussion featuring UNR faculty:

  • Dr. Chris Church, Dept. of History
  • Dr. Katherine Hepworth, Dept. of Journalism
  • Kari Barber, MFA, Dept. of Journalism
  • Dr. David Alvarez, Dept. of Biology
  • Dr. Nicholas Seltzer, Dept. of Political Philosophy


Fight For Your Right to Think film festival

Thursdays in March, 7pm
MIKC 124, Wells Fargo Auditorium


Pryor’s Peoria Nominated for a DH Award

Pryor’s Peoria has been nominated for “Best Use of DH For Public Engagement.” Voting is determined by popular vote, so if you like the project, please vote!

227px-Red_Checkmark.svg_ Click here to vote!

HIST703 – Introduction to Digital History for Graduate Students



Welcome to HIST 703: Introduction to Digital History! This is where you will blog about your experience this semester. You’ll respond to the readings on the syllabus, essentially creating an online annotated bibliography of the semester’s readings. You will also blog about new tools you’ve found, other readings on digital history you’ve read (outside of the official reading list), ideas you have for projects, and your experience with learning technical skills, from databases to XML to Python.

You are encouraged to comment on each other’s posts, as well as read them. We’re all in this together as we explore what it means to do digital history.


HIST 498 (Adv Topics): Sugar, Slaves, and Revolution: Caribbean History



The modern world was forged in the coves of the Caribbean. Its pirates and smugglers built and destroyed European states; its sugar plantations started the industrial revolution; its revolutionaries changed how Western societies think about liberty and justice; and its philosophers have defined social identity in the modern world. This course will examine Caribbean history from Columbus to Blackbeard to Toussaint Louverture to CLR James to Bob Marley, looking at how the region’s constant struggle between freedom and slavery, between unity and disunity, has shaped the world we live in today.

Required Texts

Textbook: The Caribbean: History of the Region and its Peoples. (txtbk)

Sidney Mintz, Sweetness and Power



Week 1: The Pre-Columbus Caribbean: 7200 BC – AD 1492

Tu: Syllabus and Welcome

Th: The First Peoples and their Geography

· David Barker, “Geographies of Opportunity, Geographies of Constraint”, 25-38 (txtbk)

· L. Antonio Curet, “The Earliest Settlers,” 53-68 (txtbk)

Week 2: The Columbian Cataclysm: 1492 – 1630

Tu: First Encounters and the Columbian Exchange

Th: Discussion

· Reinaldo Funes Monzote, “The Columbian Moment: Politics, Ideology, and Biohistory,” 83-96 (txtbk)

· Lynne A. Guitar, “Negotiations of Conquest,” 115-130 (txtbk)

Week 3: A New World with the Old World’s Problems

Tu: Crusades, Millennialism, and the Caribbean

Th: Discussion

· Pauline Moffitt Watts. “Science, Religion, and Columbus’s Enterprise of the Indies.” OAH Magazine of History, 5(4). 14-17.

· William Phillips. “Old World Precedents: Sugar and Slavery in the Mediterranean.” 69-79 (txtbk)

Week 4: Plantations and the Rise of Agro-Industrial Capital: 1630 – 1770

Tu: The Sugar Revolution

Th: Discussion

· Hilary Beckles, “Servants and Slaves during the 17th Century Sugar Revolution” 205-216 (txtbk)

– Sidney Mintz, Sweetness and Power.

Week 5: Challenges to the Caribbean Order: 1630 – 1720

Tu: The Golden Age of Piracy and the Caribbean Alternative

Th: Discussion

· Isaac Curtis, “Masterless People: Maroons, Pirates, and Commoners,” 149-162. (txtbk)

Week 6: Unfree Labor, Forced Migration, and Slave Society: 1630-1770

Tu: The Slave Trade

Th: Discussion

· Carrington and Noel. “Slaves and Tropical Commodities: The Caribbean in the South Atlantic System,” 231-242. (txtbk)

Week 7: Rebels and Revolutionaries: 1770-1870

Tu: Maroons and Slave Resistance

Th: Discussion

· Philip Morgan, “Slave Cultures: Systems of Domination and Forms of Resistance,” 245-261 (txtbk)

· Richard Price, “Maroons and their Communities.” pp 1-30 in Maroon Societies.

Week 8: Rebels and Revolutionaries: 1770-1870

Tu: The Age of Atlantic Revolutions

Th: Discussion

· Laurent Dubois, “The Haitian Revolution,” 273-286 (txtbk)

· Robert Whitney, “War and Nation Building: Cuban and Dominican Experience.” 361-372. (txtbk)

Week 9: Midterm

Tu: Review

Th: Midterm

Week 10: Abolition and the Rise of Labor: 1807 – 1900

Tu: Abolition: Causes and Effects

Th: Discussion

– Eric Williams, Capitalism and Slavery

· Dale Toich, “Econocide: From Abolition to Emancipation in the British and French Caribbean.” 303-316 (txtbk)

Week 11: What has Changed? : 1807 – 1900

Tu: There’s Revolution and then there’s Revolution

Th: Discussion

· Christpher Schmidt-Nowara. “A Second Slavery: the 19th-Century Sugar Revolutions in Cuba and Puerto Rico.” 333-345 (txtbk)

· Gad Heuman. “Peasants, Immigrants, and Workers: The British and French Caribbean after Emancipation.” 347-360 (txtbk)

Week 12: America and New Imperialism: 1898 – 1945

Tu: The US Walks Softly: Caribbean Occupation and the American Empire

Th: Discussion

· Cesar Ayala, “The American Sugar Kingdom. 1898-1934” 433-444 (txtbk)

– Hans Schmidt. “Introduction,” United States Occupation of Haiti: 1915-1934, 1-18.

Week 13: The Caribbean Since 1945

Tu: Decolonization and the Empire Strikes Back

Th: Discussion

· Humberto Garcia Muniz. “The Colonial Persuasion: Puerto Rico and the Dutch and French Antilles.” 537-551 (txtbk)

· Anne Macpherson. “Toward Decolonization: Impulses, Processes, and Consequences since the 1930s.” 475-488. (txtbk)

Week 14: The Caribbean Since 1945

Tu: The Cuban Revolution

Th: Discussion

· Michael Zueske, “The Long Cuban Revolution.” 507-522 (txtbk)

· Marifeli Perez-Stable. “Revolution and Radical Nationalism, 1959-1961,” The Cuban Revolution. 61-81

Week 15: Independence? American and European Hegemony

Tu: Tourism and the New Plantations

Th: Discussion

· Robert Goddard. “Tourism, Drugs, Offshore Finance, and the Perils of Neoliberal Development.” 571-582. (txtbk)

· Ian Strachan. “Introduction: Paradise Discourse.” Paradise and Plantation. 1-16

Week 16: Tying it all Together

Tu: The History of a Fragmented Nationalism

Th: Review

Draft for New Course: Introduction to the Digital Humanities (UNR)

Introduction to the Digital Humanities: Syllabus

Professor Christopher Church
Department of History
University of Nevada, Reno

 Course Description

In the past three decades, our world has become increasingly digitized, and today the use of computers is unavoidable. All humanists today are digital humanists whether they realize it or not. Every day, we use digital tools that we take for granted: search engines and keyword searches, digital databases and online publications, email and scholarly collaboration, and the list goes on. Humanists need to approach the use of these tools critically, because their use has dramatically shaped the course and tenor of our research today.

With an eye to the methodological implications of digital scholarship, this course will provide students with a hand-on introduction to some of the core technologies that are necessary for courseconducting digital humanities research: databases, text encoding, and scripting. Being versed in these three technologies is vital to engaging with scholarship in the digital age, because all digital information is structured and manipulated using these fundamental tools. The goal of the course is not to become an expert in these technologies, which is not possible in a single 16-week course, but to be exposed to them and to think about their ramifications for the digital humanities.

This course address the historical methodological issues raised by digital scholarship and provide technical training in the core digital tools. The course will also meet in a computer lab every week for 2 hours, and each course meeting will consist of two parts: practical training and methodological implications.

Readings (selections from below)

1. Cameron and Richardson. Using Computers in History. New York: Palgrave Macmillan, 2005.

2. A Companion to Digital Humanities, ed. Susan Scriebman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.

3. Mark Merry. Databases for Historians: Designing Databases for Historical Research. London: Inst. for Historical Research, 2012.

4. Turkel, Crymble and MacEachern, The Programming Historian, 2nd ed. Network in Canadian History & Environment, 2009.

5. Salminen and Tompa. Communicating with XML. New York: Springer, 2011.

6. “Learn the TEI”. Text Encoding Initiative.

7. The Virtual Representation of the Past. Ed. Mark Greengrass; Lorna Hughes. Farnham, UK: Ashgate, 2008.

8. Kristen Nawrotzki; Jack Dougherty. Writing History in the Digital Age. Ann Arbor, MI: University of Michigan Press, 2013.

9. Hacking the Academy. Eds. Dan Cohen and Tom Scheinfeldt, 2013. Center for History and New Media, George Mason University.

10. Todd Presner et al. Digital_Humanities. 

Recommended Online Companion Courses


Course Assignments

1. A short paper (5-7 pages) on the methodological implications of digitization and digital tools (30%)

2. Weekly practice exercises with the digital tools (20%)

3. A final project using the digital tools to conduct primary source research (50%)


Schedule of Classes

D: Discussion
L: Lab

Week 1 – Introduction to Digital Scholarship

D) What is (digital) history?


• Sherman Dorn. “Is (Digital) History More than an argument about the Past?”
• Stefan Tanaka. “Pasts in a Digital Age”
• Lorna Hughes. “Conclusion: virtual representation of the past: new research methods, tools and communities of practice.” Virtual Representation of the Past.

Week 2 – A Critical Approach to the Digital World

D) Who gets to be an historian, and why? History and the Digital Public


• Leslie Madsen-Brooks. ““I Nevertheless Am a Historian”: Digital Historical Practice and Malpractice around Black Confederate Soldiers.”

• Robert S. Wolff. “The Historian’s Craft, Popular Memory, and Wikipedia.”

• Jons Unsworth. “The Crisis of Audience and the Open-Access Solution.”

L) Exploring Blogs and Online Archives

• Creating your own blog (

• Exploring Online Archives:

Old Bailey

• Online Archive of California

French Revolution Digital Archive

UNR Shared History Program

Week 3 – Digital Databases and Database Management Software

D) Historians as information scientists


• Ansley T. Erickson. “Historical Research and the Problem of Categories: Reflections on 10,000 Digital Note Cards”

• Tim Hitchcock. “Digital searching and the re-formulation of historical knowledge.” Virtual Representation of the Past.

L) Building an Historical Database in Open Office Base


• Mark Merry. Databases for Historians: Designing Databases for Historical Research.

Week 4 – Digital Databases and Database Management Software

D) History as Data?


• Gibbs and Owens. “The Hermeneutics of Data and Historical Writing.”

• Donald Spaeth. “Representations of sources and data: working with exceptions to hierarchy in historical documents.” Virtual Representation of the Past.

L) Building an Historical Database in Open Office Base


• Mark Merry. Databases for Historians: Designing Databases for Historical Research.

Week 5 – Database Management Software, Data Sets, and Data Analysis

D) Interpreting Historical Data


• Fabio Ciravegna et al. “Finding needles in haystacks: data-mining in distributed historical datasets.” Virtual Representation of the Past.

• Helle Porsdam. “Digital Humanities: On Finding the Proper Balance between Qualitative and Quantitative Ways of Doing Research in the
Humanities.” Digital humanities Quarterly. Vol 7. No 3. (2013).

L) Building an Historical Database in Open Office Base


• Mark Merry. Databases for Historians: Designing Databases for Historical Research.

Week 6 – XML, Markup Languages, and Textual Data

D) Data, Metadata, and the Stuff of Digital History

• Shlomo Argamon. Mark Olsen. “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis.” Digital Humanities Quarterly. Vol 3, No. 2 (2009).

• Michael Gramer. “Going Meta on Metadata.” Journal of Digital Humanities. Vol. 3, No. 2. (2014).

L) Activity: Marking Up Text in XML

• (Video) Lynda. What is XML.

Week 7 – HTML, CSS, and the Web

L) Codecademy


L) Codecademy

• CSS: An Overview, Selectors, Positioning

Week 7 – XML and TEI

D) Markup, Historical Preservation, and Interoperabiltiy


• “Learn the TEI”. Text Encoding Initiative.

• Jerome McDonough. “XML, Interoperability and the Social Construction of Markup Languages: The Library Example.” Digital Humanities Quarterly. Vol 3, N.3 (2009).

• “John Walsh. “Comic Book Markup Language: An Introduction and Rationale.” Digital Humanities Quarterly. Vol 6, No. 1.

D) History and Born-Digital Sources


• Matthew Kirschenbaum. “The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary.” Digital Humanities
Quarterly. Vol 7. N. 1. (2013).

• Aden Events. “Web 2.0 and the Ontology of the Digital.” Digital Humanities Quarterly. Vol 6, no. 2.

Week 9 – Scripting Programming Languages – Python

D) Do you need to code to be a digital historian?


• Lee Ann Ghajar, I code, you code, we code…Why Code?, February 16,

• Michael Widner, Learn to Code; Learn Code Culture.

• Miriam Posner. “Some things to think about before you exhort everyone to code.”

Week 10 – Scripting Programming Languages – Python

L) Intro and Syntax: Codecademy

L) Strings and Console Output: Codecademy

Week 11 – Scripting Programming Languages – Python

L) Conditionals and Control Flow: Codecademy

L) Functions: Codecademy

Week 12 – Scripting Programming Languages – Python (Web Scraping)

L) Lists and Dictionaries: Codecademy

L) Lists and Functions: Codecademy

Week 13 – Scripting Programming Languages – Python (Web Scraping)

L) Loops: Codecademy

L) Programming Historian

Python Introduction and Installation

Working with Text Files

Code Reuse and Modularity

Working with Web Pages

Manipulating Strings in Python

From HTML to a List of Words (part 1)

From HTML to a List of Words (part 2)

Week 14 – Practical Applications of Digital Research Tools: Text Mining

D) Text and Data Mining in the Discipline of History


Ted Underwood, “Where to start with Text Mining.”

• Justin Grimmer. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis. (2013):1-31.

• Strange et al. “Mining for the Meanings of a Murder: The Impact of OCR Quality on the Use of Digitized Historical Newspapers.” Digital Humanities Quarterly. Vol 8, No. 1 (2014).

L) Programming Historian

Normalizing Data

Counting Frequency

Creating and Viewing HTML Files with Python

Week 15 – Practical Applications of Digital Research Tools: Visualizing Data

L) Programming Historian

Output Data as an HTML File

Keywords in Context (Using n-grams)

Output Keywords in Context in HTML File

L) Visualizing Text with Voyant Tools

Brian Croxall, “Comparing Corpora in Voyant Tools”

Week 16 – Workshop on Final Projects

Update to My Struggle with OCR scans –

Follow-up to My Struggles with OCR and Microfilm Scans – No end in sight (yet)

In my quest to achieve better OCR results on newspaper scans, I have stumbled upon Imagemagick’s morphology commands (see here for more). I’ve been able to dramatically reduce the noise in my binary (black/white) scans using morphology, which compares each pixel in an image against its neighbors and then transforms it in some way (add, remove, brighten, or darken). The basic idea, as proposed by @HugoRune on the imagemagick forums, is to close the gaps in a sliding 4×4 rectangle and then erode around the rectangle, using the original image as a mask. This will eventually fill in the letters while removing the noise around them. While the results are not yet perfect, I’m definitely making headway.

The code

You can also change the number of times it erodes by adding more switches. This is what I ran to do a lot of eroding.

And in case you want it in a handy-dandy script, here it is. Just put it in “” and call it using “bash infile.png outfile.png #” (i.e. bash img.png out.png 5)


And here are the results:

Before morphology dilate/erode:



Tesseract output:

.ru‘iha Jerpqpvoir usufpé.ïlèïfl‘çïlgdùnäÿouï
:Shogdu’ns, (et releyà l”qüloritfé_dùEMil{ado.-.’-
gChaque‘a‘nnéevdès fêteswiennentrràppelefî
‘ lëàrm’iyersairïe ‘de’rcetterrévb’lutîqn; Èhes’ réa”
Qu‘issa’n’c‘es 5 duren_t- _,trjoië_s” jo’upsfiqcfimrrie’,
ultvl‘ïe fioisï ch‘ez 1 n oug? Ou ‘I: 4‘19âäŒñôÎSAfi l ol-I

gfiel; Asles. ÉLçI’fpremièrç :jôïim1ée-‘e’ät_jçrä fille”;

Îp‘ar les coùias’esufle ch’evaùx‘ïlajgstagènäg-‘j
jpa’q unsrgq’ gçlfiàgçifîcçîliré ’66 i’yîldi‘hïgspllèi-l-Ç Av
‘avec’r- gelfifibswde ï’fuznéç Virp’uïtiéôlçinélfîLe 4‘;



After morphology dilate/erode:


Tesseract output:

ruina lerpouvoir usurpe de Tallxouns ou
‘Shogouns, ‘et releva l’autorite du Mikado.
.Chaque année, des l‘êtes viennent-rappeler
l’anniversaire de cette révolutionJLes re-
jou‘issances durent trois jours, comme—r
autlef‘ois chez nous pounles ‘Erois.__Glo-
rieuses. La’première journée est remplie
par les courses de chemuxplauseconde
par un ’feu d’artifice’tiré en plein,soleîl,
avec gel-Les de ‘fumce muÏticoloreË‘Le-
troisiènfie‘jour qppantient aux. «Souinôs »
ou luueurs. * t “I

