David Montoya was instrumental in the development of key insights and ideas contained in this work. Sonja Johanson performed testing and documented the Crux tool, and Yasodha Suriyakumar provided the DroughtHPC example. PSU students Jaspar Alt, Kobe Davis, and Kristina Frye participated in group discussions. Portions of this work were conducted at the Ultrascale Systems Research Center (USRC) supported by Los Alamos National Laboratory, United States under Contract No. DE-AC52-06NA25396 with the U.S. Department of Energy. This work supported in part by the New Mexico Consortium.
BenchCouncil Transactions on Benchmarks, Standards and Evaluations
High performance computing, Critical path analysis, Cloud computing
Current trends in HPC, such as the push to exascale, convergence with Big Data, and growing complexity of HPC applications, have created gaps that traditional performance tools do not cover. One example is Holistic HPC Workflows — HPC workflows comprising multiple codes, paradigms, or platforms that are not developed using a workflow management system. To diagnose the performance of these applications, we define a new metric called Workflow Critical Path (WCP), a data-oriented metric for Holistic HPC Workflows. WCP constructs graphs that span across the workflow codes and platforms, using data states as vertices and data mutations as edges. Using cloud-based technologies, we implement a prototype called Crux, a distributed analysis tool for calculating and visualizing WCP. Our experiments with a workflow simulator on Amazon Web Services show Crux is scalable and capable of correctly calculating WCP for common Holistic HPC workflow patterns. We explore the use of WCP and discuss how Crux could be used in a production HPC environment.
© 2022 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd.
Locate the Document
Nguyen, D. D., & Karavanic, K. L. (2021). Workflow critical path: a data-oriented critical path metric for holistic HPC workflows. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 1(1), 100001.