Systems Science Friday Noon Seminar Series

The Emerging Field of Data Engineering



Download (332.6 MB)

Download Captions (146 KB)


Media is loading




Organizations of all sizes are dealing with rapidly growing volumes of data and are trying to decide how best to use that data to their advantage. With the rise of AI and machine learning in recent years, data science has received a great deal of attention, both from companies wishing to apply these techniques to their own business problems, and from technical professionals attracted to solving interesting problems and to the significant career benefits associated with data science-related roles. But as many organizations have been moving to embrace a data-driven approach, it has become increasingly apparent that a significant data infrastructure is required to support the front-end data science and analytics that gives companies insight into their data and the ability to leverage it to their advantage. The engineering of this back-end infrastructure, which focuses on designing and deploying high-performance storage and processing systems (data pipelines) that can handle large volumes of data, often in real time, and make it available for analysis, is often beyond the scope of data science, which typically focuses on designing, selecting, and applying models, rather than on operationalizing and scaling those models. As this data infrastructure becomes increasingly important, especially with the shift to cloud computing, the role of data engineer has seen a rapid rise in demand, with over 80% of companies saying they plan to hire more data engineers. But that rise in demand has not yet been matched by a corresponding increase in supply - there are many more open data engineering positions than there are people with the skills to fill them. This seminar will discuss some of the reasons for this mismatch and what the day-to-day job of a data engineer looks like, why the role might be of interest to SYSC students, and possible learning paths for acquiring the necessary skills.

Biographical Information

Guy has worked in tech for twenty years in various capacities, first as an application and database developer, then as a data analyst and scientist, and now as a data engineer and instructor. He learned Python when developing experimental control software for optics and radio telescope projects at OSU. He began his graduate studies at PSU in economics, but eventually discovered Systems Science and made the switch. His academic areas of focus were alternative modeling methods, such as agent-based and discrete systems modeling, and machine and statistical learning. He received his master’s degree in Systems Science in 2018.


Engineering -- Study and teaching, Database management, Big data, Cloud computing, Information technology


Computer Engineering | Data Storage Systems | Systems Science

Persistent Identifier


© Copyright the author(s)

This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

The purpose of this statement is to help the public understand how this Item may be used. When there is a (non-standard) License or contract that governs re-use of the associated Item, this statement only summarizes the effects of some of its terms. It is not a License, and should not be used to license your Work. To license your own Work, use a License offered at

The Emerging Field of Data Engineering