Sponsor
This work is supported by NSF award OCE-0424602.
Document Type
Poster
Publication Date
2013
Subjects
Metadata -- Analysis, Information retrieval -- Technological innovations, Information technology -- Management, Database management
Abstract
The rapid growth of scientific data shows no sign of abating. This growth has led to a new problem: with so much scientific data at hand, stored in thousands of datasets, how can scientists find the datasets most relevant to their research interests? We have addressed this problem by adapting Information Retrieval techniques, developed for searching text documents, into the world of (primarily numeric) scientific data. We propose an approach that uses a blend of automated and “semi-curated” methods to extract metadata from large archives of scientific data, then evaluates ranked searches over this metadata. We describe a challenge identified during an implementation of our approach: the large and expanding list of environmental variables captured by the archive do not match the list of environmental variables in the minds of the scientists. We briefly characterize the problem and describe our initial thoughts on resolving it.
Persistent Identifier
http://archives.pdx.edu/ds/psu/13228
Citation Details
Megler, Veronika Margaret, "Taming the Metadata Mess" (2013). Computer Science Faculty Publications and Presentations. 131.
http://archives.pdx.edu/ds/psu/13228
Description
This poster was submitted to the ICDE Brisbane Workshops (PhD Symposium), April 2013. The author was supervised by David Maier.