Metadata is often defined as "data about data," a characterization that fails to capture why it is important and what it does.
Metadata is structured information about an object, like a dataset, and has value to both the original creator and other users. Complete metadata allows researchers to locate data they created and recall the circumstances and context under which they created and analyzed the data. It allows researchers outside of the original research team to:
For data to be interpretable and useful to others, researchers should document their research workflow, decisions that they make during their research process, and their manipulation of the data.
The UK Data Archive outlines a set of best practices for data documentation, which is captured here:
Good data documentation includes information on:
At data-level, datasets should also be documented with:
Metadata are structured information that provides context for information objects of all kinds, including research data, and in doing so enables discovery, use, exchange, and preservation of those objects. Metadata for data typically includes information about the researchers involved with the data creation, a name or title of the data set, dates associated with the creation of the data, a brief description or abstract, and terms and conditions associated with the data set.
There are a variety of metadata standards for describing data sets, based on discipline, international standards, and many other characteristics of the data. Academic disciplines have supported initiatives to formalize metadata specifications within their community. The type of resource being represented and the desired uses of the represented resource will influence the metadata standards. Some examples of widely adopted metadata standards include the following:
General
Humanities
Sciences
Social Sciences
In the context of research data, a readme file is a plain text file (.txt) that helps others understand your data and interconnections among data files. By titling the file "readme," the data creator signifies that this file should be looked at first.
Cornell University's Research Data Management Service Group has made a useful readme file template available for download. At a minimum, the Cornell group recommends completing the following sections in the readme file template:
General information
Data set title
Name and contact information for investigators
Date (or date range) of data collection
Geographic location of data collection
Data and file overview
A short description of each file
Date that the file was created
Methodological information
Description of methods for data collection
Description of methods for data processing
Data specific-information
Variable list, with full names and definitions of column headings if tabular data
Units of measurement
Definitions for codes or symbols used to record missing information (see Cornell University, Guide to writing "readme" style metadata)
A data dictionary describes all the data stored in a data set or used by a database, including their types, attributes, structure, relationships, and usage in the database or software program. A good data dictionary can be a valuable part of the metadata describing a data set, enabling a user to get a clear understanding of the content and organization of the data and how it could be modified, if necessary. In the context of a database or software package, the data dictionary may be an essential piece of software that programmers and the database management system require to access and use the data properly. The user view of a data dictionary is usually presented as a table or spreadsheet. Dictionaries may also be incorporated into XML files or other mark-up languages. A data dictionary does not contain the data but rather describes it.
A data dictionary typically contains:
These may include type (text, date, numeric, etc.), standard formats, units, field length, description, unique identifiers, default values, whether a value is required or not, and more, depending on the specific data.
For some examples of data dictionaries, check the following sites: