Mark-x Posted March 7, 2020 Posted March 7, 2020 As NVIDIA's GTC 2020 closes in, new specifications of the Ampere GA100 GPU have been leaked which once again shows that the next-gen GPU architecture from the green team is going to be an absolute beast of a Compute powerhouse.The latest specifications come from the Stage1 Chinese forums where a user who's know to post leaks before has listed down key details for the flagship Ampere GPU, the GA100. NVIDIA's Ampere GPU family has been known for a while now but it is something that NVIDIA has yet to reveal to the public. There are several GPUs of the Ampere family that have appeared in various leaks such as the GA100 itself but there hasn't been any conclusive evidence if Ampere is the name of the family of GPUs which NVIDIA is going to introduce next for the HPC / Data Center segment. According to the forum member, the flagship Ampere GPU would be the GA100 and as expected, the full configuration would feature 128 streaming multi-processor units or 8192 CUDA cores. It is not known which process node NVIDIA is using but 7nm has been highlighted in previous reports. Utilizing the new process and GPU architecture, the chip is rumored to feature a maximum boost clock of up to 2.2 GHz on the GPU core. This is a huge bump in clock speed which if true is at least 35% faster than the GV100 GPU featured on the Quadro GV100 graphics card. The Quadro GV100 features the fastest clock for the GV100 GPU at 1627 MHz and delivers 16.6 TFLOPs of FP32 Compute performance. 2x increase in FP32 Compute and if these numbers are legit, we would be looking at an insane 18 TFLOPs of FP64 compute horsepower which is far ahead of any FP64 numbers that modern GPUs can crunch out. The GPU is stated to feature a 300W TDP and would feature HBM2e memory and come in two flavors, a 24 GB and a 48 GB model. These memory configurations could be for the top variant only as we have also seen other variants with 32 GB HBM2e memory. NVIDIA is also rumored to double its tensor cores on the new Ampere GPUs. The current 5120 CUDA Core Volta GV100 GPU features 640 Tensor cores so based on that, an Ampere GPU with 8192 CUDA Core would feature 1024 cores for tensor operations. But since the rumor states that NVIDIA is likely to increase the tensor core count by 2x, we will be looking at 2048 tensor cores for an 8192 CUDA core chip In addition to the core and memory specifications, the GPU packs a 32 MB L2 cache which is a 5.33x increase over the Volta GV100 GPU which packs an L2 cache of just 6 MB in comparison. Given the massive amount of cache, we can expect some huge performance uplifts and a huge architectural change on NVIDIA's next-generation GPU which has been years in development. As far as the performance is concerned, the GPU scores 222377 points in the OpenCL benchmark (CUDA) on Geekbench 5. The platform is running CUDA 8.0 and it is highly likely that the GPU was not fully optimized for it at the time of testing. With that said, the specifications of this card are looking literally insane so let's get on with the other two variants. The second GPU features a total of 118 SMs or 7552 CUDA cores. This is a 47.5% increase in CUDA cores over the Tesla V100 with its 5120 CUDA Cores packed in 80 SMs and a total of 24 MB L2 cache. This GPU is also clocked at a maximum speed of 1.10 GHz and features 24 GB of HBM2e memory running along a 3072-bit bus at 1200 MHz clock speed. At these speeds, this chip should deliver a total theoretical compute horsepower of around 16.7 TFLOPs but once again, the clock speeds definitely don't look final and it could be higher. This particular GPU was tested in both OpenCL and CUDA Compute benchmarks. In the OpenCL benchmark, the chip scored 184096 points while in the CUDA benchmark, it scored 169368 points. Both the 124 and 118 SM parts were running on CUDA 8.0 which once again shows that these GPUs aren't yet fully optimized for the Geekbench 5 benchmark. There's a huge difference in score for both parts despite just a 5% difference in core count. Lastly, we have the 108 SM or 6912 CUDA core variant which has a reported clock speed of 1.01 GHz or the slowest of all three GPUs. The GPU offers a 35% increase in CUDA core count over the Tesla V100 and apparently packs 46.8 GB of HBM2e memory. This could be an error with how the Geekbench benchmark sees the total memory and it could actually be 48 GB which makes more sense. This GPU scores 141654 points in the Geekbench 5 (CUDA) benchmark which once again, is not the final score due to the lower clock speeds. Yesterday, AMD announced that they will be splitting its GPUs into separate Gaming and Compute segments, similar to how NVIDIA has been doing since its Pascal architecture. The new CDNA GPU family is expected to launch this year and will be based on the 7nm process node, going against NVIDIA's HPC lineup. According to the Vice President of Information Technology and the Chief Information Officer at Indiana University, who will be deploying their Big Red supercomputer this summer, it was revealed that NVIDIA's next-generation GPUs offer a massive 75% performance uplift over existing Volta-based GPUs. There are also similar reports which we have heard in the past with the GPUs offering up to 50% performance increase with twice the efficiency which would be an incredible feat to pull off. Given that NVIDIA would be on process parity with AMD with its next-generation GPU and with a brand new architecture too, we can see some real disruptive performance. These are definitely some big specifications & numbers reported in the rumor for NVIDIA's next-generation GPUs and while we would advise our readers to take them with a grain of salt, we can definitely expect a full-blown 'official' announcement of the next-gen GPUs by NVIDIA at its GTC 2020 online keynote on 22nd of March.
Recommended Posts