LibGuides: Data Management: How to Curate Your Data

CURATE(D)

What is data curation?

ICPSR, one of the oldest and largest data repositories, defines data curation as such: "Data curation is akin to work performed by an art or museum curator. Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future. With the modern Web, it's increasingly easy to post and share data. Without curation, however, data can be difficult to find, use, and interpret. "

Data curation is generally distinguished from data management. Data curation is performed by a third party on an existing dataset, where data management is performed by data's creator or user and happens throughout all stages of the "data lifecycle." Data curation aims to add value to datasets by making them more useful, more FAIR, and enhancing reproducibility. Depending on the situation, a data curator may go over metadata and suggest improvements, or they may make more substantial changes to the dataset directly.

Those outside of traditional data curator roles, for example those with data management responsibilities in a lab or researchers re-sharing secondary data, may benefit from data curation workflows and approaches.

The below is a list of common actions to be taken at each step. It is not exhaustive. For more detail and a fuller list of possible actions, please consult the CURATE(D): Checklist for Data Curation linked below.

Check files and metadata
- Inventory and review the contents (e.g. open and sample the files).
- Review to ensure the data is in the scope of the repository where you plan to deposit it.
- Verify metadata.
Understand and run files
- Examine the dataset closely to understand what it is, how the files interrelate, and what information is needed for reuse.
- Check for quality assurance and usability (e.g. missing data, ambiguous headings, code extraction failures, and data presentation concerns)
- Determine if the documentation is sufficient for others to understand and reuse the data.
Request missing information
- Generate a list of questions to help fix any issues or errors and to enrich the usability of the data.
Augment metadata
- Ensure the metadata conforms to repository and/or appropriate disciplinary standards.
- Adjust metadata to improve findability (such as by assigning a permanent identifier) and accessibility.
- Improve documentation to make data more understandable, interoperable, and reusable.
Transform file formats
- Consider the file formats in the dataset to make them more interoperable, reusable, preservation friendly, and non-proprietary when possible.
- The "How to Package Your Data" page of this guide has more information about this!
Evaluate for FAIRness
- Review the data set and companion data records and metadata against international standards such as FAIR (for all research and data), CARE (for indigenous data governance), and FATE (for research involving AI or machine learning).
- Address any ethical concerns.
- Verify the language being used is not racist or harmful.
Document curation process throughout
- Record the significant treatments or actions applied to the dataset for archival record keeping.
- Always leave the raw data raw -- work from a copy rather than making changes to your original raw data!

CURATE(D): Checklist for Data Curation

Levels of data curation

The Data Curation Network divides data curation activities into five "levels" of depth and involvement. According to DCN research, most repositories will perform a mixture of levels 1-4, with 1 and 3 being most common.

Level 0: Data at this level is deposited as submitted, with no curation performed.

Level 1: Record level curation

At level 1, a curator performs a brief check of the metadata to enhance FAIR-ness.

Level 2: File level curation

At level 2, the curator reviews file arrangement and performs or suggests file type transformations for increased accessibility.

Level 3: Document level curation

At level 3, the curator also reviews documentation and adds or requests missing information for increased reusability.

Level 4: Data level curation

At level 4, the curator opens data files and examines them for accuracy and interoperability.

Attribution

This content is adapted from:

Data Curation Network. (2023). Specialized data curation workshop [presentation]. https://docs.google.com/presentation/d/1rhal3C1UQtxxx3tjYTqH_IthbFrlwWbN-8ylbOAbYHk/

which is licensed under a Creative Commons CC-BY 4.0 license.