Images by Evan Creep, courtesy of the US Department of Energy
October 18 2022
Professor Sunita Chandrasekaran UD Students play key roles in Exascale computing
From fast food to rapid COVID tests, the world has an “unrelenting need for speed”.
The fastest time in the US this year, with the shortest average service time from placing your order to getting your food, was Taco Bell at 221.99 seconds.
The fastest car, the Bugatti Chiron Super Sport 300+, broke records at 304.7 mph in 2019 and, as of this writing, still holds the title.
Then there’s Frontier, the supercomputer at the US Department of Energy’s Oak Ridge National Laboratory in Oak Ridge, Tennessee. In May 2022, it was named the world’s fastest computer, recording 1.1 exaflops, which is more than a quintillion calculations per second. This is a large set of math problems to be solved – over 1,000,000,000,000,000,000 of them – in the blink of an eye, a feat that earned Frontier the coveted status as the first computer to achieve exascale computing power.
Scientists are eager to harness Frontier in a wide range of studies, from mapping the brain to creating more realistic climate models, exploring fusion energy, improving our understanding of new materials at the nanoscience level, enhancing national security, and achieving a clearer and deeper insight. of the universe, from particle physics to star formation. This hardly scratches the surface.
at the University of Delaware, Sunita Chandrasekaran, Associate Professor and David L. and Chair of Career Development Beverly J.C. Mills in the Department of Computer and Information Sciences, and her students have worked to ensure that the master program is ready to run on Frontier when the Exascale computer is “open to business” to the scientific community in 2023.
Since existing computer codes do not automatically transfer to exascale, I worked with a team of researchers in the US and at HZDR in Germany to stress-test a basic computer application called Particles in a Cell (PIConGPU).
The particle-in-cell algorithm, a key tool in plasma physics, describes the dynamics of plasma – a substance rich in charged particles (ions and electrons) – by calculating the motion of these charged particles based on Maxwell’s equations. (James Maxwell was a nineteenth-century physicist best known for his use of four equations to describe electromagnetic theory. Albert Einstein said Maxwell’s influence on physics was the most profound since Sir Isaac Newton.) Such tools are essential to the development of radiotherapy for cancer, also as expanding the use of X-rays to examine the structure of materials .
“I tell my students, imagine your laptop connected to millions of other laptops and being able to harness all that power,” Chandrasekaran said. “But then comes the exascale — a 1 followed by 18 zeros. Think how big and powerful such a huge system would be. Such a system could light up an entire city.”
Chandrasekaran explained that executing instructions on an exascale system requires a “different programming framework” than other systems, due to the unique architecture consisting of multiple parallel processing units and unique high-performance GPUs.
Overall, Frontier has 9,408 CPUs, 37632 GPUs and 8730112 cores, all connected by more than 90 miles of networking cable. All that computing power helped Frontier break through the exascale barrier, and Chandrasekaran ensures that the software makes the leap, too.
To take advantage of the specialized architecture of the system, she and her fellow researchers are working to ensure that computer code in high-priority programs is literally up to Frontier speed — and that it is bug-free — some of the key components of SOLLVE Exascale Computing Project, Now led by Chandrasekaran. It is a collaboration between Brookhaven National Laboratory, Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Livmore National Laboratory, Georgia Tech and UD.
“Our team has been working together since 2017 for stress testing to improve the system,” Chandrasekaran said, noting that the work involves collaboration with several compiler developers that provide applications for Frontier.
“The machine is so new that the tools we need to operate it are also immature,” Chandrasekaran said. “Our goal is to have software ready for scientists to use. We help by saving bugs, providing fixes, testing beta versions, and helping vendors prepare powerful tools for scientists to use.”
UD Students Eliminate Vital Programming Tools Errors
Thomas Huber, who holds a bachelor’s degree from DVD University, worked on the project with Chandrasekaran for more than two years before graduating with a master’s degree in computer and information sciences from the university last May. A native of Lynnwood, New Jersey, he now works as a software engineer at Cornelis Networks, a computer hardware company.
“When we started working on this a few years ago, we knew we had Frontier coming up with Exascale, and that required getting a lot of people together to work on 20 or so core apps that were considered very important,” Huber said. . “All of these programs should work flawlessly.”
With this unique opportunity afforded by Chandrasekaran, Huber has gained valuable research and real-world experience. He also trained four university students on the project, who worked together to verify that OpenMP, a popular programming tool, could run on Frontier.
As the group’s work progressed in evaluating compilers providing implementations of new programming features, they found some bugs, and then a few bugs. That’s when they decided to start GitHub – the developer forum – to share their findings and open source code, as part of ECP – SOLLVE.
“We started GitHub to review versions of the OpenMP specification. They are published every few years, and it’s like new features – 600 pages of what you can and can’t do,” Huber said. “Importantly, the section at the end explains all the differences between software versions. We take a list of all the new features and go through them and create test cases for each of them. We write code that no one else has written before, and we make all our code public.”
Huber estimates that the UD team, in collaboration with the Oak Ridge National Lab, has written 500 or so tests, and 50,000 lines of code, so far.
“It’s all about high-performance computing,” Huber said. “Imagine you’re in a lot of traffic heading to a toll booth with only one EZ lane. Parallel programming allows you to split into many EZ lanes. OpenMP allows you to do this parallel work and run very quickly. What we’ve done with OpenMP ensures that Scientists and others will be able to use the software on Frontier. We’re guinea pigs for that.”
Huber was drawn to research through the Vertical Integrated Program (VIP) in the College of Engineering. Chandrasekaran was the group leader for the project. He stayed for a semester, worked on a research paper (he said “that was cool”) and met colleagues who became best friends. They even won a poster contest.
Chandrasekaran is credited for his involvement in this field.
“Being so passionate and stressed about the importance of these things in helping researchers and the real world, she made all the difference,” Huber said. “She’s a high-ranking professor of high-performance computing.”