Adding Unique Identifiers in OpenRefine

Sometimes you may want to add unique identifiers (UIDs) to your data in OpenRefine (eg. migrating the data into a Database Management System (DBMS) like Access or Filemaker).

It’s nice to have a set number of leading zeroes, especially if you’ll sort your data alphabetically.

To do this, you’ll need to add a new column based on any column, which will bring up a dialogue window.  Edit column > Add column based on this column…


For your GREL (Google Refine Expression Language) expression, enter the following:

      “0000”[0,4-row.index.length()] + row.index


* Make sure to enter a column name (above circled in blue).

* * *

Here’s what the GREL means:

  • row.index” is a controlled term for the number of the row counting from the top (beginning with 0)
  • “0000” is a string of four zeroes that will be spliced into the index.
  • row.index.length() is how many characters make up row.index (treating it as a string) — so “1981” would have a length of 4, whereas “30” would have a length of 2.
  • [0,4-row.index.length()] slices the string of zeroes to match however many are needed to bring the total number of numeric places to 4. If the index is “13” (length of 2 characters) and you want a total four numbers (0013), then it will take only 2 zeros from the string.
  • finally, “+ row.index” concatenates the original index to the preceding zeros. — so in the case of the above example, it’ll add together “00” and “13” to get “0013”

You can increase the number of leading zeroes to however many you need, but you’ll need to make a few changes.

  1. First, you’ll need to update “0000” to match however many number places you want.
  2. Then you’ll need to change 4-row.index…. to X-row.index….. — where X equals the number of number places.

For example, if you want to increase the total number places to 6, change the expression to

  • “000000”[0,6-row.index.length()] + row.index
This entry was posted in Digital Humanities. Bookmark the permalink.