What is data curation?
ICPSR, one of the oldest and largest data repositories, defines data curation as such: "Data curation is akin to work performed by an art or museum curator. Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future. With the modern Web, it's increasingly easy to post and share data. Without curation, however, data can be difficult to find, use, and interpret. "
Data curation is generally distinguished from data management. Data curation is performed by a third party on an existing dataset, where data management is performed by data's creator or user and happens throughout all stages of the "data lifecycle." Data curation aims to add value to datasets by making them more useful, more FAIR, and enhancing reproducibility. Depending on the situation, a data curator may go over metadata and suggest improvements, or they may make more substantial changes to the dataset directly.
Those outside of traditional data curator roles, for example those with data management responsibilities in a lab or researchers re-sharing secondary data, may benefit from data curation workflows and approaches.
The below is a list of common actions to be taken at each step. It is not exhaustive. For more detail and a fuller list of possible actions, please consult the CURATE(D): Checklist for Data Curation linked below.
The Data Curation Network divides data curation activities into five "levels" of depth and involvement. According to DCN research, most repositories will perform a mixture of levels 1-4, with 1 and 3 being most common.
Level 0: Data at this level is deposited as submitted, with no curation performed.
Level 1: Record level curation
At level 1, a curator performs a brief check of the metadata to enhance FAIR-ness.
Level 2: File level curation
At level 2, the curator reviews file arrangement and performs or suggests file type transformations for increased accessibility.
Level 3: Document level curation
At level 3, the curator also reviews documentation and adds or requests missing information for increased reusability.
Level 4: Data level curation
At level 4, the curator opens data files and examines them for accuracy and interoperability.
This content is adapted from:
Data Curation Network. (2023). Specialized data curation workshop [presentation]. https://docs.google.com/presentation/d/1rhal3C1UQtxxx3tjYTqH_IthbFrlwWbN-8ylbOAbYHk/
which is licensed under a Creative Commons CC-BY 4.0 license.