Get Help
Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

Steps in data lifecycle

Steps in Research Data Lifecycle

" "

Image credit: UC Irvine Library Digital Scholarship Services

https://www.lib.uci.edu/dss

Start with Planning

  • identify grants & funding.
  • collect & manage preliminary assets.
  • describe and organize assets.

Implementation

  • Collect Assets.
  • Organize Assets.
  • Describe Assets.
  • Analyze Assets.

Publishing

  • identify open access publications.
  • deposit work.
  • share & cite work.

Discovery & Impact

  • understand metrics.
  • use social media.

Preservation

  • migrate to sustainable formats
  • store reliably

Re-use this cycle!

How to document your data

Documenting your data includes capturing sufficient metadata (descriptive information) about your data in order to make it discoverable, identifiable and usable in the future.  Information you capture should include some, if not all, of the following elements:

  • Title of the dataset or research project.
  • Creator names of individuals or institutions responsible for creating the data.
  • Unique Identifier that helps distinguish the data used to identify the data.
  • Dates: Project start and end dates, release date, any other date of importance during the length of the research study.
  • Subject: Keywords or phrases describing the subject or content of the data.
  • Funding Agency responsible for funding the research.
  • Intellectual Property Rights associate with the data.
  • Language(s) in which data is generated.
  • Sources for data derived from other sources.
  • Geographical location or coverage where data was collected.
  • Methodology for data collection.
  • Version of the dataset if updated.

Using sustainable metadata standards is highly recommended though to ensure that data are accessible in the future. Such standards are open (not proprietary), used widely, uncompressed, use standard encoding and contain enough information to analyze the context, content and structure of record.
 

Metadata schema sources

Data storage and preservation

Storage

Storing data reliably is an important function of data management. There are several options to store your data files -

  • Personal computers, external hard drives, departmental or university servers.
  • LMU's ITS offers Box.com. Contact them to determine storage solutions for software storage that Box does not support.
  • Other cloud storage services that may suit your data storage/backup needs include Amazon S3, Elephant Drive, Jungle Disk, Mozy, Carbonite.
  • CDs or DVDs are not recommended because they fail frequently.

Security

  • Unencrypted security is ideal for storing your data so that you and others can easily read it, but if encryption is required because of sensitive data:
    • Keep passwords and keys on paper (2 copies) and in a PGP (pretty good privacy) encrypted digital file.
    • Don’t rely on 3rd party encryption alone.
  • Uncompressed is also ideal for storage, but if you need to do so to conserve space limit compression to your 3rd backup copy.

To make sure your backup system is working properly, test your system periodically. Try to retrieve data files and make sure you can read them.

The UK Data Archive provides additional guidelines on data storage, back-up, and security.

File formats

File formats used to capture, store and deliver research data are an important consideration as they influence future file/program accessibility. It is important to plan for software obsolescence.

Formats more likely to be accessible in the future are:

  • Non-proprietary.
  • Open, documented standard.
  • Common usage by research community.
  • Standard representation (ASCII, Unicode).
  • Unencrypted.
  • Uncompressed.

Examples of preferred file format choices include:

  • ODF or PDF/A, not Word.
  • ASCII, not Excel.
  • MPEG-4, not Quicktime.
  • TIFF or JPEG2000, not GIF or JPG.
  • XML or RDF, not RDBMS.

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format. Note that not all repositories are able to migrate data files to newer file formats for preservation.

File naming

File names should be unique, consistent, informative and have the ability to be sorted/updated easily. Before beginning your project, determine any file naming hierarchy and file naming conventions. File names should easily indicate which project they belong to. Elements that may be included in your file names are date, project name, type of data, location, and version. There are other features to consider as you design your file naming plan described on this google doc.
 
When organizing files, it's important to standardize file naming and directories so they're descriptive.
 
  • Best Practice
    • File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.
    • When choosing a file name, check for any database management limitations on file name length and use of special characters. Also, in general, lower-case names are less software and platform dependent. Avoid using spaces and special characters in file names, directory paths and field names. Automated processing, URLs and other systems often use spaces and special characters for parsing text string. Instead, consider using underscore ( _ ) or dashes ( - ) to separate meaningful parts of file names. Avoid $ % ^ & # | : and similar.
    • If versioning is desired a date string within the file name is recommended to indicate the version.
    • Avoid using file names such as mydata.dat or 1998.dat.
  • Description Rationale
    • Clear, descriptive, and unique file names may be important when your data file is combined in a directory or FTP site with your own data files or with the data files of other investigators. File names that reflect the contents of the file and uniquely identify the data file enable precise search and discovery of particular files.
  • Examples
    • An example of a good data file name:
    • Sevilleta_LTER_NM_2001_NPP.csv.
    • Sevilleta_LTER is the project name.
    • NM is the state abbreviation.
    • 2001 is the calendar year.
    • NPP represents Net Primary Productivity data.
    • csv stands for the file type—ASCII comma separated variable.
    • Source: DataOne.

Metadata Standards

Metadata (data about data) standards help to describe data in a consistent manner. Metadata can include descriptive information, provenance, quality and access/use of data.  Here are a few standards that may be useful in describing your data for access and preservation.

Reproducibility of Data

When searching for data, whether locally on one's machine or in external repositories, one may use a variety of search terms. In addition, data are often housed in databases or clearinghouses where a query is required in order access data. In order to reproduce the search results and obtain similar, if not the same results, it is necessary to document which terms and queries were used.

  • Note the location of the originating data set.
  • Document which search terms were used.
  • Document any additional parameters that were used, such as any controls that were used (pull-down boxes, radio buttons, text entry forms).
  • Document the query term that was used, where possible.
  • Note the database version and/or date, so you can any limit newly-added data sets since the query was last performed.
  • Note the name of the website and URL, if applicable.
Description Rationale

In order to reproduce a data set or result set, it is necessary to document which terms were originally used to capture that data. By documenting this information while the search is being conducted, one greatly enhances the chance of being able to reproduce the results at a later date.

Source: DataONE