AMD CPUs for CAD Workstations: Third-Generation Threadripper Makes Impressive Debut19 Feb, 2020 By: Alex Herrera
Herrera on Hardware: Three years into the era of AMD’s Zen architecture for central processing units (CPUs), the company has climbed into a better position to compete with Intel, equipped with a high-performance workstation-caliber CPU.
It’s been a while since we checked in on AMD's progress in the market for workstation CPUs. This column first highlighted the potential of the Zen architecture to legitimately re-challenge Intel back in March of 2016, prior to its official market launch. We revisited Zen twice in 2017 after the emergence of not one but three Zen-derived product lines capable of addressing CAD computing markets, including a look at the tradeoffs of a CPU like Threadripper with a massive number of cores.
Those perspectives first hinted at — and then substantiated — a compelling case for Zen’s place on CAD workstation platforms, based on its aptitude for high-performance single-thread — and more so, multi-thread — computation. But those columns also addressed the fact that the workstation market is a tough nut to crack, for reasons that have nothing to do with the merits of a CPU architecture. In the workstation market, arguably more than in any other domain, original equipment manufacturers (OEMs) are looking for the confidence that a partner can deliver long-term reliability and longevity in the market. Those are traits that AMD had yet to prove — and in fact had failed to deliver a decade earlier with Opteron, which offered so much promise early on but failed to sustain itself over time.
Now in 2020, it’s been roughly three years since Zen CPUs entered the market. How has AMD done in its efforts to crack that workstation nut? Not great, if you’re basing success on whether it has signed up a major workstation OEM (meaning Dell, HP, or Lenovo — or DHL, for short) — it hasn’t. But AMD has made important progress toward that goal. First, the company has several high-quality partner OEMs and system integrators on board to showcase Zen, and in particular the high-performance Threadripper product line. And second, AMD has now proven the ability to deliver on three successive, timely, and highly competitive generations of CPUs. Just entering the market now is the third-generation Threadripper (TR3), which appears to me to be one of AMD's most compelling offerings to date for a high-performance workstation-caliber CPU.
The Zen 2 Core and TR3: More Than Just a Lot of Cores
Out of the chute, TR3 comes in 24-core (24C 3960X) and 32-core (32C 3970X) SKUs, priced initially at $1,400 and $2,000, respectively. Yes, that’s a lot of cores, moving well beyond the typical 6- to 8-core CPUs that dominate the workstation mainstream and encroaching on the turf of big multisocket workstation and datacenter CPUs like Intel’s Xeon Scalable and AMD’s EPYC. And AMD will continue to drive up the push-the-envelope core counts that have been Threadripper’s claim to fame, with a massive 64C SKU expected later to fill out the TR3 product family.
The first two TR3 SKUs offer 24 and 32 cores.
But while the Threadripper brand has certainly made its mark with hefty core counts, TR3 is about much more than (eventually) doubling cores. TR3’s advancements are several, but three outweigh the rest in the context of its potential in CAD computing: the inclusion of the enhanced Zen 2 core, a higher-performance chipset, and the move to a more capable sTRX4 socket. Introduced in all third-generation Zen parts, the Zen 2 core lets TR3 boast about more than just high core counts: It allows the TR3 core to execute up to 15% faster (15% higher IPC, or instructions per cycle) than the original Zen core microarchitecture allowed. That’s as meaningful as boosts in core count — and arguably more so in CAD computing, where workflows may still be dominated by single-thread or several-thread execution.
Why the new socket? Well, neither independent hardware vendors (IHVs) like AMD nor OEMs like Dell want to change sockets frequently. They’ll only make such a change if there’s a compelling reason — and in the case of TR3, there certainly is. It’s important to note that no TR chips (or EPYCs, for that matter) are built on a single, monolithic silicon die. Rather, they are created by packaging multiple lower–core-count die (that would also individually ship as Ryzen CPUs) in multi-chiplet modules that sit in the CPU socket and interface to memory, I/O, and chipset. Now, the previous-generation Threadripper (TR2) housed up to four 8-core Zen chiplets to create the 32C 2990WX TR2, but AMD made the business decision to maintain socket compatibility with the first-generation Threadripper (TR1). After all, CPU suppliers can’t jump between sockets willy-nilly, as that would alienate both motherboard and system vendors.
So while carrying that compatibility forward for TR2 was a sensible choice, the decision came with drawbacks, or at least compromises. Limited to Threadripper’s memory interface, two of the TR2 chiplets/die have direct access to memory, while the other two don’t, and instead have to access indirectly through a neighboring die via AMD’s inter-chip Infinity Fabric in a Non-Uniform Memory Access (NUMA) topology. As illustrated in the diagram below, half of TR2’s chiplets (and cores therein) incur a hop through the primary, proximal die’s access to memory. At the very least, that hop means additional read latency, which can incur a performance penalty ranging from negligible to very substantial, depending on workload and dataset. AMD was able to compensate for some of the potential downside with intelligent, NUMA-aware control that (if application-supported) could best allocate threads to cores to ensure traffic priority and flow throughout the entire processor, managed by the Infinity controllers inside each die.
Second-generation Threadripper’s compatibility with the original-generation socket meant slower access to memory for half the die (aka Ryzen chiplets).
For TR2, the choice to stick with the old socket was sensible, and AMD smartly mitigated the impact of asymmetric latency in its NUMA approach. But clearly, as core and/or chiplet counts climb, a move to a new memory interface and socket would have to come eventually, and it made sense for AMD to make that move with the launch of TR3, pairing it with the new sTRX4 socket. TR3 and sTRX4 move to distribute one memory channel for each chiplet, giving each chiplet (and presumably each core within the chiplet) equitable access to memory, with each memory channel improved in DDR4 speed.
A switch to a new socket opens the door wide to introduce a new chipset, which AMD did as well, with the TRX40. Most notably, TR3 provides a x8 PCI Express Gen 4 link to the TRX40 chipset, quadrupling the previous I/O bandwidth. That’s on top of the 72 available PCIe 4.0 lanes available directly from the CPU for add-in card (e.g., GPU/NVMe storage) slots. Add that all up, and beyond a faster CPU core, TR3 delivers higher performance, more equitable memory access, and a boatload more I/O.
Hand in hand with the sTRX4 socket comes the new TRX40 chipset, quadrupling available chipset I/O bandwidth.
TR3 Takes Big Step Forward in the Core vs. GHz Curve
Looking past the Zen 2 core goodness, if there’s one element that bodes particularly well for TR3, it’s the clock frequency. Physics and thermodynamics dictate that engineering faster cores or more plentiful cores tends to be an either/or proposition. All else (e.g., process and cooling technology) being equal, the more you have of one, the less you’ll typically have of the other. Populate a few cores and it’s far easier to drive the frequency up, but start piling on cores and frequency will need to come down. Given that, how TR3 matches up against other CPUs — especially its predecessors — is significant, as it pushes notably north of the recent GHz versus core count curve.
Why is that so noteworthy? Well, CAD computing, perhaps more than most any other client-side application, tends to emphasize both single-thread computing performance — for example, in modeling and interactive 3D graphics — and many-thread throughput, which is put to heavy use in rendering and engineering simulation. The ability to offer many cores at a higher GHz is a big plus, and TR3 would appear to do it as well as any CPU to date.
24C Threadripper 3 substantially improves on the standard core count vs. frequency curve.
Specs Are Nice, but the Proof Is in Performance Measurement
So yes, the metrics covered so far on TR3 would tend to indicate a much higher level of real-world performance than its predecessors and even hint at making its mark against rival processors. But as emphasized in my two recent columns on benchmarking, marketing and datasheet specs don’t necessarily correlate to commensurate performance increases in real-world CAD computing. Similarly (and also as touched on in the benchmarking topic), blanket statements like AMD’s on TR3 — including “+15% IPC” and “40-60% faster than 2nd Gen Threadripper” — while not dishonest, are hardly telling the whole story. Performance, especially with regard to CAD, are so workload-dependent that blanket figures might give a very rough idea, but are not necessarily indicative of real-world performance with your workflow.
A better choice than either datasheet bullets or marketing one-liners is a benchmark — one that tests performance for more common and critical CAD workloads, and hopefully one for which previous Threadripper results are known. As luck would have it, with SPECwpc (v2.1, also covered in the previous benchmarking columns), we have both. We ran SPECwpc’s Product and Development test suite, which includes workloads stressing both single- to several-thread computation and also heavily threaded tasks, the latter of which essentially spawn as many threads as the CPU’s array of cores can handle. (Note that most modern Intel and AMD CPUs can concurrently manage two threads per physical core, which vendors often equate to having two logical or virtualcores).
Now, you’d think running a higher clock at a still-hefty core count would mean TR3 can deliver significantly higher throughput for common many-threaded CAD workloads … and you’d be right. Second, that jump in GHz at high core count (along with improved IPC) means that not only has TR3 jumped significantly relative to TR2 and TR, but it looks like it will give up far less to Intel in terms of single-thread performance. On the chart below, you'll see the SPECwpc performance of all three normalized to an overclocked 4.8 GHz (base, not turbo) 6C Coffee Lake with 15 MB L3 cache. The best-fit lines per Threadripper generation and SKU are based on SPECwpc results averaged across workloads by each load’s thread count.
Now a worthwhile disclaimer, even in the context of low-to-single thread counts: This is not quite a fair fight with this Coffee Lake CPU, most notably with respect to L3 cache, just like the fact that the Intel CPU’s lower core count (and in this case, overclocked motherboard) give it an unfair advantage over the Threadrippers in single-thread computation. But disclaimer considered, TR3’s performance line is impressive, since not only has the TR3 performed a lot better on single-thread processing than the TR1 and TR2, it managed to best the 4.8-GHz Coffee lake on 1-thread scores (averaged) — a feat its predecessors could not manage.
(* Scores from SPECwpc 2.1 Product and Development test suite, averaged by thread count and normalized to 6C 4.8 GHz Coffee Lake CPU with 12 MB L3.)
24C Threadripper 3 makes a dramatic jump in performance on both multi-threaded and single-threaded CAD workloads.
AMD’s Made Big Strides — Will Big Wins Follow?
The big takeaway for consumers of CPUs appropriate for CAD computing is this: AMD has not only delivered on three timely and competitive generations of Threadripper, but has made this third generation arguably the most competitive of all. Worth considering with that conclusion, however, is where Threadrippers tend to fit in today’s workstation market. While offering highly competitive price-to-performance (for example, relative to Intel’s Xeon W and even Xeon Scalable, especially on multi-threaded workloads), the truth is that any CPU in the $1,400 to $2,000 range will command a relatively small served segment in the workstation TAM (total available market); call it around 15% near the top end. (Consider that the 3970X’s $2,000 price for just the CPU exceeds the average price today’s workstation buyers are paying for the entire system). But within that smaller served segment of the market, the TR3 stands out as a highly capable — and in that segment, very cost-effective — choice for heavy-duty CAD computation.
AMD’s Threadripper, the first Zen CPU line to gain a foothold in workstations, is very sensibly positioned in the Premium, single-socket (1S) segment.
But for a prospective AEC, design, or engineering professional interested in Threadripper, the question remains: How to get ahold of one? Today, those interested in a CAD workstation build on Ryzen, Threadripper, or EPYC need to look to vendors like Boxx, a premier but far smaller provider of workstations that signed on early to Zen. Recently and not surprisingly, Boxx announced an upgrade to its line, with its T3 model showcasing Threadripper 3.
Signing on Dell, HP, or Lenovo to build a TR3 workstation — or anything Zen, for that matter — is a different marketing ballgame, as DHL operates on different goals and principles. Nimble vendors like Boxx can search out market opportunities generation to generation, even if that may not sustain momentum in the long term. To gain that edge, they’re willing to take on some level of risk from the uncertain competitive longevity of both the supplier and its products — risk DHL can’t take. As I stated back in 2017 after the introduction of Zen products, DHL measures a potential supplier differently than a relatively low-volume OEM would:
By contrast, taking a risk and having faith aren’t at all what Tier 1 vendors like HP, Dell, and Lenovo are interested in, instead preferring more of a sure thing. While all place importance on competitive specifications, maximum performance is not a goal that overrides all others. A high-volume supplier needs the confidence that a supplier like AMD will have products not competitive for just one or two workstation product cycles, but for many years and cycles to come. OEMs Dell and HP will likely remember quite well that up-then-all-the-way-down chart of Opteron of the mid-2000s, and that’s a memory that won’t help lock up sockets for Zen.
So yes, Zen generation CPUs will let AMD compete again in workstations, but they represent the ante, not necessarily the winning hand. Succeeding will require AMD to convince OEMs they will be able to compete and invest in the market not for one generation but many to come. Do that, and CAD professionals will find a more vibrant competitive market for the workstations they need, and I’ll once again be talking about AMD as a fearsome vendor of workstation-caliber CPUs for years to come.
But none of this means that one of the DHL trio won’t take the Zen plunge in 2020 (or beyond), because what AMD has accomplished, especially with TR3, is delivering on that most important proof point that DHL cares about: multi-generational longevity with sustained, reliable high performance. Having successfully launched its third generation, AMD has shown it can compete with Intel over the longer haul. And I’d actually go further and argue that with TR3, AMD has its best weapon yet to crack the big time in workstations — and especially for CAD, given its compelling combination of single- and multi-thread performance.