Published In
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15)
Document Type
Conference Proceeding
Publication Date
2015
Subjects
Information retrieval, Scientific archives -- Research, Database management, Data mining
Abstract
Prior work proposed "Data Near Here" (DNH), a data search engine for scientific archives that is modeled on Internet search engines. DNH performs a periodic, asynchronous scan of each dataset in an archive, extracting lightweight features that are combined to form a dataset summary. During a search, DNH assesses the similarity of the search terms to the summary features and returns to the user, at interactive timescales, a ranked list of datasets for further exploration and analysis. We will demonstrate the search capabilities and ancillary metadata-browsing features for an archive of observational oceanographic data. While comparing search terms to complete datasets might seem ideal, interactive search speed would be impossible with archives of realistic size. We include an analysis showing that our summary-based approach gives a reasonable approximation of such a "complete dataset" similarity measure.
Rights
Copyright The Authors, 2015. Published here with permission.
Locate the Document
DOI
10.1145/2723372.2735360
Persistent Identifier
http://archives.pdx.edu/ds/psu/20876
Citation Details
V.M. Megler and David Maier. 2015. Demonstrating "Data Near Here": Scientific Data Search. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 1075-1080.