Skip to main content

Data Management Planning: Documenting & Structuring Your Data

Guidance and resources for research data management.

Best Practices for Documentation

Document:

  • Data collection methods
  • Context of data collection
  • Variable names and description
  • Algorithms used
  • Transformations of data from the raw data through analysis
  • Software and systems used for analysis

Use a script rather than GUI during data analysis, better for documentation and makes results easier to reproduce

Incorporate a workflow tool such as Kepler, Taverna or VisTrails

Establish a Descriptive File and Dataset Naming Convention

A consistent convention will help you easily identify your files and what they contain. Use abbreviated descriptive information such as

  • project
  • content or parameter
  • location, date and/or time (yyyymmdd for easy sorting; hhmmssTZD for time)
  • version number (establish numbering system for versions)

Use numbers, letters, dashes, underscores. Do not use spaces or special characters. Stay concise to be practical.

Describing Your Data

Define a data dictionary:

Example Data Dictionary

Use discipline specific metadata standards:

Best Practices for Using Excel

  • Use in conjunction with a "Data Dictionary" containing information about:
    • Variable name
    • Variable types
    • Codes and Ranges
    • Missing values
  • Place variable names in row 1
  • Always have a unique identifier per entity
  • Keep track of changes made to worksheet
  • Format columns to matchthe variable type (date, numeric, text, etc.)
  • Data entry guidelines:
    • Freeze column headings so they will not scroll off the screen
    • Enter string variables in a consistent case
    • Do not leave any blank rows in the spreadsheet
    • Do not include unessential text or fancy formatting in the spreadsheet
    • Get rid of formulas - copy the entire spreadsheet into a new sheet using "Values" option
    • Sort data with caution (always SAVE first) 
  • Verify data using double data entry
  • Save as .csv for forward compatibility and interoperability

Resources:

  • Elliott, A C. (2006). Preparing data for analysis using Microsoft Excel. Journal of investigative medicine, 54(06), 334-341.