Published In

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15)

Document Type

Conference Proceeding

Publication Date

2015

Subjects

Information retrieval, Scientific archives -- Research, Database management, Data mining

Abstract

Prior work proposed "Data Near Here" (DNH), a data search engine for scientific archives that is modeled on Internet search engines. DNH performs a periodic, asynchronous scan of each dataset in an archive, extracting lightweight features that are combined to form a dataset summary. During a search, DNH assesses the similarity of the search terms to the summary features and returns to the user, at interactive timescales, a ranked list of datasets for further exploration and analysis. We will demonstrate the search capabilities and ancillary metadata-browsing features for an archive of observational oceanographic data. While comparing search terms to complete datasets might seem ideal, interactive search speed would be impossible with archives of realistic size. We include an analysis showing that our summary-based approach gives a reasonable approximation of such a "complete dataset" similarity measure.

Rights

Copyright The Authors, 2015. Published here with permission.

DOI

10.1145/2723372.2735360

Persistent Identifier

http://archives.pdx.edu/ds/psu/20876

Share

COinS