![group photo](/sites/g/files/bxs2656/files/inline-images/00_photo_1080.jpg)
By Clarissa Piatek
Special to Rice News
Rice University’s John Mellor-Crummey was honored in January with a Secretary of Energy Achievement Award as a member of the leadership team of the Department of Energy’s (DOE) Exascale Computing Project (ECP).
A collaboration among six DOE national laboratories, universities and industry partners, the $1.8 billion project created the world’s first sustainable exascale computing system.
“The 2024 Secretary’s Honor Award for the Exascale Computing Project recognizes the success of a nationwide partnership for envisioning, designing and building a complete software stack for the world’s largest supercomputers,” said Mellor-Crummey, professor of computer science and electrical and computer engineering.
“Designing and building highly parallel software to measure and analyze applications executing so many operations per second was one of the biggest challenges of my career in high-performance computing. Personally, I am proud that my team research group at Rice was selected as one of the few university teams involved in this project.”
Mellor-Crummey’s contribution to ECP was HPCToolkit, a vendor-agnostic set of software tools capable of performing application analysis on exascale platforms then providing actionable feedback to improve performance and efficiency. The HPCToolkit project was part of ECP’s Development Tools research group, which “enhanc[ed] existing widely used performance tools and develop[ed] new tools for next-generation platforms.”
What is exascale computing?
Before 2022, the fastest supercomputers in the world operated at the petascale — capable of 1 quadrillion operations per second. Then in May 2022, Oak Ridge National Laboratory’s (ORNL) Frontier supercomputer broke the exascale barrier, meaning it surpassed 1 quintillion (that’s 1 followed by 18 zeros) operations per second. In essence, exascale is 1,000 times faster than petascale. In November 2024, Lawrence Livermore National Laboratory’s El Capitan supercomputer became the fastest, achieving 1.742 “exaflops” compared to Frontier’s 1.1.
And it isn’t speed for speed’s sake. Exascale computers are equipped to tackle modeling and simulation problems too complex for petascale machines, which will benefit fields such as health care, national security, economics and energy security.
“Exascale supercomputers enable extremely detailed simulations and data analysis,” said Mellor-Crummey, “enabling projects ranging from quantum chemistry simulation of molecules to simulations to aid the design and analysis of wind farms as well as the origins of matter in stellar explosions such as supernovae, to name just a few.”
ECP delivered an exascale ecosystem
Merely having exascale supercomputers isn’t enough, however, if there is not a compute environment that can successfully support applications running at large scale, so the ECP was begun in 2016 to “uplift the high-performance computing community toward capable exascale platforms, software and application codes.” In short, ECP promised “the delivery of an exascale computing ecosystem.”
Managed by the DOE Office of Science and the National Nuclear Security Administration (NNSA), ECP funded almost 3,000 researchers and staff over its seven-year duration. The project resulted in advances in modeling and simulation, software tools, analytics, machine learning and artificial intelligence, including two different GPU architectures proven to work in exascale environments and the Extreme-scale Scientific Software Stack — the first and only open-source scientific software stack that provides portable high-performance tools and libraries across all GPU and CPU architectures.
The evolution of HPCToolkit
“HPCToolkit had long been developed to address needs on the Department of Energy’s supercomputers,” Mellor-Crummey said. “It employs asynchronous sampling — measuring a representative sample of an application’s activity to provide high-confidence estimates of where an application is spending its time and why.” Mellor-Crummey said that is the only possible way “to measure such a massive amount of computation without being overwhelmed by the volume of measurement data.”
Mellor-Crummey’s HPCToolkit had already been evolving for almost two decades when it became part of ECP and was further developed to meet the needs of exascale compute environments.
Mellor-Crummey explained how HPCToolkit needed to be adapted from CPU to GPU systems: “When the HPCToolkit project began at the turn of the century, nodes in clusters and supercomputers were much different, employing a single compute core per chip. We couldn’t envision building software for GPU-accelerated systems. GPUs weren’t used for high-performance computing until 2006, and even as the Exascale Computing Project began, the conventional wisdom was that the exascale systems would be based on many-core CPUs rather than the GPU accelerators that were ultimately responsible for almost all of the computational performance of these systems.”
He added that “what distinguishes HPCToolkit is its ability to measure, analyze and attribute instruction-level performance information within and across CPUs and GPUs at the full scale of these systems.”
ExaWind as a test case
HPCToolkit found its proving ground in ExaWind, an ECP application tasked with simulating “the complex physics of an entire wind farm under various geographic and weather conditions.” High-performance computers had been used for years in wind turbine simulation, but limitations in processing speed could only yield general trends. An ideal simulation of an entire wind farm would capture airflow processes on a micro and macro scale simultaneously, an endeavor that would span “roughly eight orders of magnitude,” according to the project’s homepage. The ExaWind application successfully integrated “a whole wind farm simulation at scales ranging from microns to kilometers” and reduced simulation time from days to hours.
To assess whether the application was running as efficiently as possible, the ExaWind team used HPCToolkit to analyze ExaWind’s executions on thousands of nodes of the Frontier supercomputer.
“Using HPCToolkit to measure and analyze the performance of ExaWind enabled the project team to identify that inefficient communication within the application significantly degraded the performance of the application at larger scales,” said Mellor-Crummey.
“Using performance insights from HPCToolkit as a guide, the application team worked with ORNL and vendor staff to identify a better communication strategy and implement it in the code. The result of these efforts was a 28-times improvement in performance at the scale of 1,000 nodes. That enables the project team to deliver more insight into wind farm design given the fixed amount of compute hours available to them.”
Jon Rood, a computational scientist at the National Renewable Energy Laboratory, said, “We can’t overstate how complicated the ExaWind software is in general and how complicated it is to build, so learning that HPCToolkit could be easily injected into our entire application without special instrumentation steps during the build process, then be able to profile large simulations on Frontier, was really amazing to us.”
Legacy of ECP
ECP is the DOE’s largest software research, development and deployment project to date, and its developments will touch nearly every scientific field. Researchers now have a reliable exascale computing ecosystem to tackle challenges heretofore impossible to take on.
Mellor-Crummey acknowledged the years of work that went into his team’s contributions to ECP. “Development of highly sophisticated parallel software for tackling problems at exascale doesn’t happen overnight,” he said. “The HPCToolkit performance tools for exascale computer systems are the result of an investment of time and effort spanning more than a decade and several generations of graduate students. The software builds on ideas described in M.S. and Ph.D. theses dating back to 2008.”
He pointed to the many people who can share in the success of HPCToolkit and its inclusion in EPC. “Adapting HPCToolkit for analysis of GPU-accelerated applications on supercomputers was not an individual effort,” he said. “This involved a large team over the years, including my current staff Mark Krentel and Laksono Adhianto; past staff Matt Barnett, Scott Warren, Wil Phan and Xiaozhu Meng; current students Jonathon Anderson, Yumeng Liu, Yuning Xia and Dragana Grbic; past students Keren Zhou, Aaron Cherian, Dejan Grubisic, Ryuichi Sai and Heather McIntyre; and an independent consultant Marty Itzkowitz.”