An Integrated Framework for Memory-Centric Analysis: from Trace Collection to Co-Design

Sponsor

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the U.S. DOE Office of Science, Office of Advanced Scientific Computing Research, 76125: “AMAIS - Advanced Memory to support Artificial Intelligence for Science.” The Pacific Northwest National Laboratory is operated by Battelle for the U.S. Department of Energy under contract DE-AC05-76RL01830. This work was also supported by the PNNL Data-Model Convergence (DMC) LDRD at Pacific Northwest National Laboratory, ‘Fixing Amdahl's Law within the Limits of Accelerated Systems' (Fallacy), 2019–22.

Published In

Frontiers in High Performance Computing

Document Type

Article

Publication Date

5-19-2026

Subjects

Cache modeling, hardware-software co-design, memory tracing, memory-centric analysis, performance optimization, processor tracing, trace analysis

Abstract

IntroductionThe memory wall phenomenon—where advances in processor performance significantly outpace those in memory subsystems-poses a fundamental challenge for contemporary computing systems. In memory-bound applications, memory subsystem behavior dominates performance, yet existing analysis approaches present significant limitations: detailed microarchitectural simulators require days to weeks to simulate modest workloads; hardware performance counters provide only aggregate statistics that obscure temporal and spatial access patterns; and scaled simulation approaches face challenges in capturing contention effects, bandwidth saturation, and interference patterns that emerge at larger scales. These limitations reflect a processor-centric design philosophy—in both performance analysis tools and system co-design methodologies—that is increasingly misaligned with memorybound workloads, where a detailed understanding of memory access patterns, cache hierarchy interactions, and contention is critical for effective optimization.MethodsThis paper presents an integrated framework for memory-centric analysis that enables effective hardware-software co-design. We describe practical trace collection techniques, including hardware assisted processor tracing with minimal overhead and portable software-based instrumentation with statistical sampling. We present multi-perspective analysis methods that examine memory behavior from temporal, sequential, spatial, and relational viewpoints, revealing distinct optimization opportunities invisible in aggregate metrics. For example, a data structure switch from an open to a closed hash table in the miniVite graph application—guided by spatial anticipation metrics—improved hardware prefetcher utilization and delivered a 1.8 × runtime improvement, a benefit invisible to aggregate performance counters. We detail an architectural modeling framework that uses sampled traces with temporal interpolation and confidence-based filtering to evaluate cache and memory configurations.ResultsEvaluation on representative benchmarks demonstrates that this framework achieves practical accuracy (L2 cache errors of 2.64%, confidence-filtered L3 errors of 9.92%, bandwidth errors of 7.33%) while providing substantial speedup (26.8 × ) over cycle-accurate simulation, enabling rapid design space exploration.DiscussionWe demonstrate how this integrated framework enables systematic identification of both hardware optimizations (memory controller tuning, bank partitioning, NUMA configuration) and software optimizations (data layout restructuring, pre-fetching strategies, and memory-aware scheduling). Through this comprehensive treatment of the memory-centric analysis pipeline—from trace collection through architectural modeling to co-design application—we provide researchers and practitioners with practical techniques for addressing memory bottlenecks in contemporary computing systems.

Rights

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Locate the Document

https://doi.org/10.3389/fhpcp.2026.1801169

DOI

10.3389/fhpcp.2026.1801169

Persistent Identifier

https://archives.pdx.edu/ds/psu/44728

Publisher

Frontiers Media SA

Citation Details

Gajaria, D., Challa, P., Suriyakumar, Y., Manzano, J., Tallent, N., & Márquez, A. (2026). An integrated framework for memory-centric analysis: from trace collection to co-design. Frontiers in High Performance Computing, 4.

Computer Science Faculty Publications and Presentations

An Integrated Framework for Memory-Centric Analysis: from Trace Collection to Co-Design

Published In

Document Type

Publication Date

Subjects

Abstract

Rights

Locate the Document

DOI

Persistent Identifier

Publisher

Citation Details

Included in

Find

Connect

Computer Science Faculty Publications and Presentations

An Integrated Framework for Memory-Centric Analysis: from Trace Collection to Co-Design

Authors

Sponsor

Published In

Document Type

Publication Date

Subjects

Abstract

Rights

Locate the Document

DOI

Persistent Identifier

Publisher

Citation Details

Included in

Share

Find

Connect