More CPU Cores or Faster CPU Clocks?1 Dec, 2015 By: Alex Herrera
To best address the demands of a modern CAD workflow, look for a balance of CPU core count and clock rate in your next CAD workstation.
Which do you think is better for processing common CAD workloads: more central processing unit (CPU) cores or faster CPU cores? It's a debate we never had prior to the advent of multicore CPU architectures. But now that multicore CPUs are the rule — and multi-threaded applications are not — it's a debate that's worthwhile. The best combination of CPU core count and clock rate for your next CAD workstation depends on a variety of factors, from the applications you use to the tasks you most frequently stress — and your budget, of course.
This topic has been addressed in many places over the years, including this column, but mostly with qualitative arguments and little in the way of demonstrative results. To rectify that situation, I'll provide some concrete examples and a bit of experimental data that more clearly illustrate how common CAD computing workloads behave differently — and thereby perform differently — when run on CPUs that favor faster cores versus CPUs that favor more cores.
Why Multiple Cores Instead of Maximum Frequency?
We're about a decade into the era of multicore processors, and it's worth recapping the rationale that drove the industry to make the very intentional, dramatic shift away from pursuing big gains in clock rate (the frequency at which the processor runs) and instead focus on creating more cores per processor. There were two primary reasons: First, chasing instruction-level parallelism was becoming too difficult. For years, CPU designers added or enhanced superscalar techniques to speed the processing of a single thread of execution. By the turn of the century, all the low-hanging superscalar fruit had long been picked. Increasing throughput by architectural means had extended well beyond the point of diminishing returns, so even modest gains required big tradeoffs in complexity and cost.
Second, and more critically, with frequencies so high (in combination with related issues, such as pipeline lengths growing so deep) and silicon transistor dimensions so small, chips were simply becoming too hot to operate reliably. The faster a chip runs, the more power it consumes and the more heat it produces. With the power trajectory chips were on at the time, they would have eventually become impossible to cool. As a result, by the mid-2000s, the frequency-first design strategy for CPUs had ended. Rather than build more complex, faster-clocked monolithic processors, virtually every vendor on the planet changed course to create a more modest and manageable core, and then instantiate that core multiple times in a single processor chip.
Today, a CAD buyer looking to configure a workstation has many CPU models or stock-keeping units (SKUs) to choose from — some that offer higher clock rate and fewer cores, some with lower clock rate but more cores, and some that are a compromise between the two. For example, Intel's current-generation Core i7 brand CPU SKUs (appropriate for high-performance PCs and workstations) tend to focus on configurations with four cores, but span a range of frequencies from around 3.5 to 4.5 GHz. Meanwhile, Intel's Xeon E5 CPU SKUs that best fit workstation applications offer options with many more cores (up to 10 and beyond), but tend to do so with somewhat more modest clock frequencies for purposes of thermal management and emphasizing reliability (figure 1).
Figure 1. Intel Core i7 and Xeon E5 CPU SKUs offer a range of clock rates and core counts that are most appropriate for CAD workstations. Graph based on data from Intel.com.
Today's lengthy menu of processor options begs the question: Should a CAD user configuring a new machine select a CPU with the highest frequency, the most cores, or one that offers a balance of the two? Well, for the majority of CAD workflows, the right answer — as it so often is in hardware design — is balance. But to show why, let's first take a look at the arguments for the other two options.
The Case for More Cores
That conscious switch in CPU design strategy and tactics, away from the fringe of max-GHz clock rate and toward multiple-core architectures, made perfect sense. However, it also meant a fundamental difference in how applications would benefit when running on a new generation of microprocessor. For the first time, applications didn't get the typical, automatic 30% or 40% boost in speed when running the same binary code on the next generation of CPU. Rather, the boost they got depended heavily on how well those applications could break up code into independent, parallel sequences of instructions, or threads, to run effectively in parallel on multiple cores.
Some common CAD-relevant algorithms (and resulting code) could multithread well, but the unfortunate truth is, many others couldn't. Worse, they still can't. Engineering simulations, for example, tend to offer higher degrees of parallelism for independent software vendors (ISVs) to exploit. Other tasks, however — such as modeling and real-time graphics processing unit (GPU) rendering — don't.
To illustrate the degree to which some common CAD computing tasks can leverage multiple CPU cores, we ran some test cases on a Lenovo ThinkStation C30 workstation outfitted with two Xeon processors, each of which contains eight physical cores. In addition, each physical core can support two concurrent threads through Intel's HyperThreading technology. That adds up to a total of 16 physical cores and 16 additional "logical" cores.
Running a common engineering simulation tool, such as computational fluid dynamics (CFD) solver Rodinia (from SPEC's SPECwpc benchmark), we see how an effectively multithreaded algorithm can keep many cores occupied (figure 2). In fact, Rodinia showed best-case efficiency, with all cores from both our Xeon-based workstations maxed out at near 100% utility — note the green line pinned to the top of each core's usage history chart.
Figure 2. CFD solver Rodinia (in SPECwpc) illustrates a best-case example of multicore utilization.