figshare
Browse
Poster-BD2K2016-ValueRecommendation-final.pdf (3.89 MB)

Faster and Better Metadata Authoring using CEDAR's Value Recommendations

Download (11.66 MB)
Version 3 2016-11-20, 19:52
Version 2 2016-11-20, 19:07
Version 1 2016-11-20, 03:15
journal contribution
posted on 2016-11-20, 19:52 authored by Marcos Martinez RomeroMarcos Martinez Romero, Martin J. O’ Connor, Ravi D. Shankar, Maryam Panahiazar, Debra Willrett, Attila L. Egyedi, John Graybeal, Mark A. Musen

In biomedicine, good metadata is crucial to finding experimental datasets, to understand how experiments were performed, and to reuse data to conduct new analyses. Despite the growing number of efforts to define guidelines and standards to describe biomedical experiments, the impediments to creating accurate, complete, and consistent metadata are still considerable. Authoring good metadata is a tedious and time-consuming task that biomedical scientists tend to avoid.

 

The Center for Expanded Data Annotation and Retrieval (CEDAR) is developing novel methods and tools to simplify the process by which investigators annotate their experimental data with metadata. The CEDAR Workbench (cedar.metadatacenter.net) is a set of Web-based tools for the acquisition, storage, search, and reuse of metadata templates. As a step towards decreasing authoring time while increasing metadata quality, we have enhanced the CEDAR Workbench with value recommendation capabilities.

 

Our system identifies common patterns in the CEDAR metadata repository, and generates real-time suggestions for filling out metadata acquisition forms. These suggestions are context-sensitive, meaning that the values predicted for a particular field are generated and ranked based on previously entered values. Our value recommendation approach supports both free-text values and terms from ontologies and controlled terminologies. We discuss some of the challenges that have arisen while implementing our approach, and our strategies for making this capability useful to the end users of CEDAR. We demonstrate CEDAR's intelligent authoring capabilities using metadata from the Gene Expression Omnibus (GEO), and show how the technology that we are developing leverages existing metadata to make the authoring of high-quality metadata a manageable task. 

Funding

CEDAR is supported by grant U54 AI117925 awarded by the National Institute of Allergy and Infectious Diseases through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov)

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC