We’re only two weeks away from Ryzen’s half a decade long awaited arrival. This is the AMD CPU that PC hardware enthusiasts have eagerly looked forward to for so long and what a CPU it is. We’re going to be taking you through a deep-dive on the company’s brand new high performance Zen CPU microarchitecture, its features, specs and its performance.
Ryzen, AMD’s Most Important Product In More Than A Decade
Many Years In The making
The journey of the Zen microarchitecture, which sits at the core of every Ryzen chip has been, has been a long and challenging one. It’s the company’s first attempt to compete at the high-end, enthusiast, CPU market since the introduction of the Bulldozer microarchitecture five years ago. Zen breaks new ground for AMD in many ways. It’s the company’s first ever CPU architecture to feature simultaneous multithreading. It’s also the very first CPU for AMD to be built on a process technology that’s very close to parity with Intel since the days of the original Athlon more than a decade ago.
It means that for the very first time since the early 2000s AMD’s CPUs are no longer at an inherent disadvantage due to Intel’s process lead. From an architectural point of view Zen is a brand new clean-slate design that’s been led from the get-go by accomplished CPU architect Jim Keller. The very same engineer that played a pivotal role in designing the original Athlon XP and Athlon64 processors. The most successful and competitive CPU products in the history of the company.
Zen is AMD’s biggest long-term technology bet and one of the largest engineering efforts undertaken by the company. Design work on the microarchitecture began in 2012 and was completed four years later. The very fist products based on the brand new CPU core design are Ryzen processors. Which are set to launch at the end of the month. However, we know that AMD is working on far more than just high performance desktop CPUs. The company has had 32 Core Zen server CPU, a sixteen core Zen HPC APU and a quadcore Zen consumer APU called Raven Ridge. All of these products have been in the works since the very beginning.
The Zen Microarchitecture
Below we have a visual representation of an actual Zen core on silicon. The core is comprised of one floating point unit and one integer engine. This is a huge step away from the Bulldozer design, which featured two integer engines and one floating point unit per core. Each integer cluster in each Zen core has six pipes, four ALUs, Arhithmatic Logic Units, and two AGUs which is short for Address Generation Units.
These AGUs can perform two 16-byte loads and oine 16-byte store per cycle via a 32 KB 8-way set associative write-back L1 data cache. According to AMD the move from a write-through to a write-back cache has noticeably reduced stalls in several types of code paths. The load/store cache operations cache in Zen also reportedly exhibit lower latency compared to the 4th generation Bulldozer core Excavator.
Bulldozers relatively power hungry and slow cache hierarchies were one of the key factors in its poor single threaded performance and power efficiency. A lot of work has gone into designing a new cache sub-system for Zen to minimize the power and area footprints as well as make it as fast as the silicon will allow.
The L2 and L3 caches were grouped in a very clever way to minimize the access times by any given core at any given time. The write-through cache architecture has also been forgone in favor of a more power and area efficient write-back cache.
Another key area Zen differentiates itself from the Bulldozer family of cores is through its access to a relative abundance of L3 cache. Each Zen core has access to twice the capacity of L3 cache compared to AMD’s last 8-core chip code named “Orochi” . Which was sold under the FX 8300 and FX 8100 brand names.
AMD’s First Microarchitecture To Feature Simulataneous Multithreading
The company has done away with the CMT – clustered multi-threading – concept that was introduced with the Bulldozer family of cores in 2011 in favor of a more traditional SMT – sumultaneous multi-threading – design. This means that each Zen core will be able to execute two threads simultaneously. A primary very high throughput thread and a secondary thread with less oomph that can be used opportunistically.
In contrast, each Bulldozer module can execute two identical threads. This is achieved through two separate integer clusters with a single front-end. This approach saves area versus building two separate cores and delivers two high throughput threads. However, there are advantages that Zen’s SMT implementation holds over the Bulldozer CMT implementation. For one it allows AMD to build a single larger integer cluster with significantly higher single threaded performance. Another advantage with this approach is that it leaves a lot of wiggle room for clever savings in area and power.
Incredible Drive For Power Efficiency
AMD’s 8-core Ryzen chip has an army of sensors buzzing away to monitor voltages, temperatures, frequency and overall power at any given moment. These sensors are part of what AMD dubs its SenseMI family of technologies. We’ll talk about these technologies in much more detail further down. It’s these little engines that bring amazingly cool technologies such as the auto-overclocking XFR feature from the realm of science fiction to reality.
A lot of the engineering effort around Zen has also been done to address one of Bulldozer’s major flaws. Bulldozer and Intel’s Sandy Bridge – and subsqeuent Intel architectures including Skylake – had equally deep pipelines to achieve high clock speeds. The deeper the pipeline the more latency that the design will exhibit. Particularly when it comes to branch misprediction errors, which are quite common in such pipelines.
On the front-end each Zen core is capable of decoding four instructions per cycle, which are fed to the operations queue. The micro-op cache along with the queue have a throughput of six operations per cycle going into the schedulers.
The latency that results from branch mispredicts are quite significant. To combat this issue Intel introduced a micro-op cache with Sandy Bridge. It worked to a great extent in reducing mispredict penalties and was believed to be the principle reason behind Intel’s significant single threaded performance advantage over Bulldozer. AMD has finally introduced its own micro-op cache with Zen.
The Zen Microarchitecture In A Nutshell
The Zen core features a significantly wider execution engine than anything we’ve seen before from the AMD before. Leveraging simultaneous multi threading and a micro-op queue to boost throughput and single threaded performance. This combined with a brand new, low latency cache sub-system and a new set of pre-fetch algorithms result in a dramatic instruction per clock improvement and doubling of throughput per core compared to AMD’s previous 8 Piledriver FX 8300 series CPUs.
High Level Overview:
Two threads per core
8 MB shared L3 cache
Large, unified L2 cache
Micro-op Cache
Two AES units for security
14nm FinFET Transistors
Ryzen On The Desktop
New Ryzen CPU Coolers With Customizable RGB Lighting
AM4 Motherboards
Leaked Ryzen Benchmarks