Advisor

David Maier

Date of Award

Fall 11-7-2018

Document Type

Thesis

Degree Name

Master of Science (M.S.) in Computer Science

Department

Computer Science

Physical Description

1 online resource (xiii, 123 pages)

Subjects

Time-series analysis, SQL (Computer program language), Computer science

DOI

10.15760/etd.6592

Abstract

As we continue to produce large amounts of time-series data, the need for data analysis is growing rapidly to help gain insights from this data. These insights form the foundation of data-driven decisions in various aspects of life. Data annotations are information about the data such as comments, errors and provenance, which provide context to the underlying data and aid in meaningful data analysis in domains such as scientific research, genomics and ECG analysis. Storing such annotations in the database along with the data makes them available to help with analysis of the data. In this thesis, I propose a user-friendly technique for Annotation-Enabled Analysis through which a user can employ annotations to help query and analyze data without having prior knowledge of the details of the database schema or any kind of database programming language. The proposed technique receives the request for analysis as a high-level specification, hiding the details of the schema, joins, etc., and parses it, validates the input and converts it into SQL. This SQL query can then be executed in a relational database and the result of the query returned to the user. I evaluate this technique by providing real-world data from a building-data platform containing data about Portland State University buildings such as room temperature, air volume and CO2 level. This data is annotated with information such as class schedules, power outages and control modes (for example, day or night mode). I test my technique with three increasingly sophisticated levels of use cases drawn from this building science domain. (1) Retrieve data with include or exclude annotation selection (2) Correlate data with include or exclude annotation selection (3) Align data based on include annotation selection to support aggregation over multiple periods. I evaluate the technique by performing two kinds of tests: (1) To validate correctness, I generate synthetic datasets for which I know the expected result of these annotation-enabled analyses and compare the expected results with the results generated from my technique (2) I evaluate the performance of the queries generated by this service with respect to execution time in the database by comparing them with alternative SQL translations that I developed.

Persistent Identifier

https://archives.pdx.edu/ds/psu/27687

Share

COinS