- Louisiana State University
- Research Guides
- LSU Libraries
- Cleaning Data with OpenRefine
- Tips and Further Resources
Cleaning Data with OpenRefine: Tips and Further Resources
A guide to using the OpenRefine program to organize messy datasets
Tips and Best Practices
-
Name your projects clearly and save frequently. OpenRefine by default autosaves your work every 5 minutes.
-
Document your cleaning steps: the history can be exported and reused.
-
Use facets to quickly spot inconsistencies or outliers.
-
Use the Undo/Redo panel liberally -- everything is reversible.
-
Use GREL (OpenRefine’s expression language) to build powerful transformations.
Tutorials and Learning Resources
-
OpenRefine User ManualThe official documentation from the developers. Contains both basic and advanced functions and instructions.
-
OpenRefine RecipesThis page collects OpenRefine recipes, small workflows and code fragments that show you how to achieve specific things with OpenRefine.
-
Using OpenRefine by
ISBN: 9781783289097Publication Date: 2013-09-24Styled after a cookbook, this manual will guide readers in the proficient use of OpenRefine. No prior knowledge of OpenRefine is required, and links to example data are provided if you do not have your own data set with which to practice.
Datasets
-
Messy DataA spreadsheet of fictional data with various mistakes and inconsistencies. The names and locations have been auto-generated and do not represent real individuals. This data is for training purposes only.
-
IMDB DataAnother messy public dataset used in some of the lessons in this guide.
-
KaggleA repository of free public datasets for analysis and practice.