All of this delivered via an HPE Cray EX system with Slingshot fabric, which should help NREL handle both data-intensive and AI workloads. While this is not NREL’s first dance with HPE, it is its first foray into the Cray system architecture via HPE’s purchase of Cray. The HPE commitment is also strong at NREL with the previous systems based on HPE designs with unique regard to power and cooling efficiency (“Peregrine” was the first supercomputer deployed with HPE’s “Apollo” warm water cooling technology, which was used to keep nearby buildings warm in cold Colorado winters, capturing 97 percent of datacenter heat). The current 8 petaflop Intel Skylake-based “Eagle” supercomputer is an all-CPU machine, which replaced the “Peregrine” system. This has meant firm commitment to vendors including Intel and HPE, although the introduction of Nvidia GPUs might mean a new loyalty will build, depending on how many codes can benefit from GPU acceleration.Īs for NREL, the Intel tradition is strong. NREL chooses to spend its innovation points, so to speak, less on novel compute and more on energy efficiency, which is no surprise given its mission within the US Department of Energy.
AMD is a credible threat to Nvidia’s GPU dominance at the upper end of HPC and Intel has a strong enough story to keep that two-year cadence a driving force. Further, while an Arm-based supercomputer of this magnitude would be quite the story, it could be that no-one wants to go first – that is, no one with such a clear practical science mission as NREL has.Įven though Huang says large-scale supercomputers only drive around one percent of the company’s datacenter business, these are still important symbolically for the company that created the concept of GPU supercomputing over a decade back. It is likely a matter of guaranteeing delivery of the system, which does real mission-critical work and can’t be subject to Intel-driven delays as with the “Aurora” supercomputer at Argonne. By breaking those into four 25 billion transistor chips, odds of success go up – just in time for real-world exascale and ever-growing AI training.Īside from NREL using what will be an “older” CPU via the Sapphire Rapids choice, we also have to wonder why NREL did not decide on Nvidia’s “Grace” CPU, which will be available in the 2023 timeframe. After all, Nvidia is already at 54 billion transistors, which starts to get tricky yield-wise.
If we were to speculate on the A100NEXT Tensor Core GPU, the A100 will probably have a die shrink to 5nm and include more compute units, perhaps doubling via a chiplet design.
Nvidia CEO, Jensen Huang, says the company will unveil new GPUs, CPUs, and networks every two years and it could very well be that this A100NEXT is that news in advance – albeit without much detail. This is the first we have heard the term A100NEXT, so we have to assume this will be one of Nvidia’s GPUs in the lineup for the next round of improvements which will likely be announced formally next year.
Even though we might expect the HPE Cray EX supercomputer to sport a future Intel Xeon processor like “Granite Rapids”, NREL and HPE confirmed today that Intel “Sapphire Rapids” will be the host CPU, with acceleration coming from a forthcoming Nvidia A100 GPU, called A100NEXT Tensor Core GPU. The newly announced system will be capable of 44 petaflops peak performance when it goes live in 2023. At this point in supercomputing, it’s becoming an anomaly to see an upcoming double-digit petaflops system not using AMD for CPU and GPU, but the National Renewable Energy Laboratory will be taking a more traditional route for the “Kestrel” machine.