Skip to Main Content

Digital Repositories at Chapman University

Digital Commons and Figshare are open access repositories for sharing and preserving the research outputs of Chapman scholars and researchers.

Documentation

All data sets published on Chapman Figshare must be accompanied by documentation. At minimum, your data documentation should address:

  • Where and how data was obtained/generated/collected/compiled.
    • include citations to other data, tools, software, etc.
  • How files are organized and named, and what they contain.
  • How the data is structured and defined variables.
  • What was done to the data and files (if not sharing raw data).
    • i.e. steps or code used to clean or standardize data

Readme files

The simplest and most flexible way to document your data is through a README file - a text document that acts as a 'user manual' for your dataset. README files are most often found as plain text (.txt, .md) or PDF files. It's possible to insert a codebook or data dictionary into a README but it may not always be practical.

Readme templates


Data dictionaries and codebooks

Traditionally a codebook defines the variables of a data set while a data dictionary defines variables and provides extra details such as information on the origin of the data, its relationship to other data, and how to use it. However, the two terms are fairly interchangeable so use whichever you like.

These documents are important because they explain attributes that are not within the data itself, such as what each column of a spreadsheet represents, how null and zero values differ, and what encoded data means. For example, a column titled "date" does not tell you why the date matters, only that it is one and a value of "1" could be a quantity or might mean "yes", "question 1", or a variety of other things.

 

Examples on this page were created for Iowa State University's equivalent Figshare instance, DataShare, and kindly provided by Iowa State University, reused with permission