Research Tools: Clean Data
Prepare raw data for analysis by correcting errors, filling in missing values, and formatting it consistently. Clean data helps ensure accurate results and is a key first step in any research involving datasets.
OpenRefine

Tool for cleaning messy data and transforming it between formats.
Type: Data Cleaning
Access: Free
Where Can I Access It?: Must be installed on personal computer
Availability: Desktop
Skill Level: Intermediate
Potential Use Cases: Students working with large CSVs or inconsistent datasets in digital humanities, survey analysis, or archival research.
Tabula

Extracts tabular data from PDFs into CSV or Excel formats.
Type: Data Cleaning
Access: Free
Where Can I Access It?: Must be installed on personal computer
Availability: Desktop
Skill Level: Basic
Potential Use Cases: Students extracting data from academic papers, public reports, or scanned documents for use in their assignments or research.
EasyMorph

Tool for automating data preparation without coding.
Type: Data Cleaning
Access: Free (Limited features available)
Where Can I Access It?: Must be installed on personal computer
Availability: Desktop
Skill Level: Intermediate
Potential Use Cases: Business or social science students organizing large datasets for reports or capstone projects without needing to code.
LSU Resources
| Digital Humanities Guide |
LSU Library's guide to Digital Humanities tools. |
Further Resources
-
Best Practices in Data Cleaning by
ISBN: 9781412988018Publication Date: 2012-01-10This book provides a clear, step-by-step process of examining and cleaning data in order to decrease error rates and increase both the power and replicability of results. -
Data Wrangling with R
by
ISBN: 9783319455983Publication Date: 2016-11-23This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. -
Hands-On Data Visualization by
ISBN: 9781492086000Publication Date: 2021-05-18This introductory book teaches you how to design interactive charts and customized maps for your website, beginning with simple drag-and-drop tools such as Google Sheets, Datawrapper, and Tableau Public. -
Using OpenRefine by
ISBN: 9781783289080Publication Date: 2013-09-10The book is styled on a Cookbook, containing recipes - combined with free datasets - which will turn readers into proficient OpenRefine users in the fastest possible way.This book is targeted at anyone who works on or handles a large amount of data. No prior knowledge of OpenRefine is required, as we start from the very beginning and gradually reveal more advanced features. You don't even need your own dataset, as we provide example data to try out the book's recipes.
-
KaggleA repository of free public datasets and online community for data scientists and machine learning practitioners.
-
Programming HistorianAdditional online data manipulation lessons.