New Intel Xeon W Has Arrived

31 Aug, 2017 By: Jon Peddie

The bottom line is, the 28-core Skylake processor is coming to the workstation market and things will never be the same.

Intel has been powering the workstation CPU market for decades. Its previous processor generation, Broadwell, was a performance leader that pleased millions of workstation users. The company didn’t stop with its processor developments and just recently introduced its newest generation of workstations processors: the Intel Xeon SP (Scalable processor, dual-socket capable) and the Intel Xeon W processor (single-socket capable). To differentiate the workstation-class Xeon processors from server-class processors, Intel for the first time has added the W designation to the name. Built on the Skylake architecture, these new processors deliver more cores, higher frequency, and more cache, plus expanded memory management and PCIe lanes. 

Workstations come in desktop, mobile, and rack versions — some powerful enough to be considered small supercomputers. Intel has successfully exploited Moore’s law for decades, and through it has developed the basic transistor miniaturization that enables processor advancements. But it’s not just faster CPUs that are the result; the processors themselves get new features and functions with every generation. When Intel x86 processors were first deployed in a Windows-based workstation back in 1997, one of the salient features was an integrated floating-point processor. Since then expanded memory mangers, security, and communications have been added, and one 32-bit core grew to 28 64-bit cores plus a 512-bit SIMD processor and transcoder engines.

However, to offer such processors, the company first has to design them, and the current design is designated Skylake. That design will be used in several different processor forms, from laptops to supercomputers. The current-generation processors are used in a platform specifically designed for them; it includes supporting chips to provide USB 3.1 type A to C, Thunderbolt 3, gigabit Ethernet, SATA, and other ports. In the case of the Skylake platform, the supporting chip for the Intel Xeon SP is known as the Lewisberg PCH, and for the Intel Xeon W processor it is the Kaby Lake WS PCH. It can get confusing at times because various people use the different names interchangeably. It gets even more complicated because the processor, supporting chip, and platform, all have arcane part numbers too, which denote frequency, core count, and other esoteric elements. And, just to make you even crazier, things are expressed in acronyms.

Jon Peddie Research compared the new Skylake-based Xeon W to a four-year-old, Intel Xeon E5-1680v2, 3.9 GHz–based workstation; running a professional workstation workload, we found the Xeon W would provide an average of 87% more performance.

Comparison of four-year old workstation with a new Skylake workstation. (All charts courtesy of Jon Peddie Research unless otherwise noted.)

What would that increase mean — that you’d get the job done in 13% of the time it took before? Well, no, not exactly. If you repeatedly ran the same workload with the same dataset, then yes, it would finish much, much faster. But that’s not how people work. And some engineers make the joke, “When am I supposed to get a coffee if the new machine is so much faster?”

In the case of CAD, an 87% improvement in performance doesn’t translate into much by itself. The days of engineers and CAD dogs making A-B-C, 2D line drawing views of a widget are pretty much gone. Just as no one has a secretary any more, design engineers no longer have rendering grunts. Engineers today either do the rendering themselves or send it to a farm and wait for it to come back. If they do it themselves, that 87% all of a sudden starts to look pretty good, especially if the rendering is using ray-tracing. With a modern machine, you can run tasks in parallel.

The benefit of parallel processing is undeniable: Performing multiple processes simultaneously provides huge gains in productivity and accuracy. The bottleneck has been legacy software that simply couldn’t be threaded and recompiled to take advantage of more than one core. It’s taken longer than we thought, but the industry has developed new apps with threading as an integral feature. And ironically, the software companies doing that haven’t made any fanfare about it; rather, we now just assume that any new app would be multithreaded.

Numbers Don’t Lie

If you are using apps that involve math, then you are really going to like these new processors. We ran a little comparison of benchmark results from generation to generation.

The new Xeon W processors can deliver three times the performance of previous generations.

The benchmarks tell part of the story. According to Intel, the new processors can deliver a 300% performance improvement over a machine that is four years old (based on best-published two-socket SPECfp_rate_base2006 result submitted to/published at as of 11 July 2017), or an 80% improvement from the last generation to this one, based on the same data. And that’s all true — it just might not apply to you.

However, you can get a better sense of the relationships through a block diagram like the following.

Purley platform overview (Source: Intel)

The bottom line is, the 28-core Skylake processor is coming to the workstation market and things will never be the same. With Intel’s launch, these are the new Intel Xeon processors:

  • The Intel E5-1600 Product Family processor is now branded Intel Xeon W (single socket). 
  • The Intel E5-2600 Product Family processor is now branded Intel Xeon SP (Scalable processor, dual socket).

We’ve come a long way from that first Windows and Intel–based workstation in 1997. It came with a 266-MHz Pentium II processor and on a good day could hit 48.3 MFLOPS. The top-of-the-line workstations of the day had a 300-MHz Intel Pentium II processor that could deliver 62.1 MFLOPS.

A Dell workstation, circa 1997.

By comparison, in June 1997, the fastest supercomputer in the world was the ACSI RED at the Sandia National Laboratories’ U.S. nuclear arsenal, and it could do 1.3 TFLOPS. It took up 1,600 ft2, comprised 104 racks that held 9,298 CPUs and 1.2 TB RAM, required 850 KW of power, and cost $46 million (or $67 million in today’s dollars).

Today you can have a supercomputer small enough to sit under your desk that outperforms that 1997 monster. For example, Dell just announced the Precision 7920 Tower with dual Intel Xeon Platinum processors, each with up to 28 physical cores, running up to 3.8 GHz Turbo frequency and capable of 4 TFLOPS and 112 threads. That’s four times faster than the fastest supercomputer just 20 years ago for less than one ten-thousandth the cost and requiring less than one-thousandth the power. Not only that, almost anyone can use it. Very few people could use that magnificent ACSI-RED — which, by the way, is still at work, giving U.S. taxpayers a pretty good ROI.

The new Dell Precision 7920 workstation features the dual-socket Intel Xeon SP. (Source: Dell)

When you take the new Intel Xeon SP Platinum series Skylake processors with 28 physical cores, running up to 3.8 GHz Turbo frequency capable of 2 TFLOPS across 56 threads, then double that in a dual-socket system, you have 112 threads at 3.8 GHz approaching 4 TFLOPS. It’s almost unbelievable. Drop in a modern add-in board such as a graphics processing unit (GPU) designed for compute, and you have a theoretical 16 TFLOPS in a system that can fit under your desk, run on conventional wall socket power, and function without extra air-conditioning. Oh, and the entire package would cost less than $15,000.

Every user has a unique workload, so the best that benchmarks can do is indicate what you might achieve. However, over the years I have yet to hear anyone ever say they didn’t get their money’s worth from a new workstation. The math is simple: Do more (or better) work in the same time, and calculate the payoff based on the salary of the engineer doing the work.

A Quick Comparison

The generational differences are impressive and illustrate the results of making billions of tiny transistors available to computer architects.


However, as mentioned above, it’s the application of all those speedy little transistors that is the real magic, and the primary benefit to users and organizations.

Workstations don’t break and aren’t cheap, so they don’t get replaced every year, or every other year. In fact, they seldom get replaced more often than every three or four years, and only then if there is a software or hardware improvement significant enough to justify the upgrade. Although Moore’s law has been fairly predictable over the past 40 years, with the move to 14-nm processes, there is more being accomplished by the hardware developers than just clock speed-ups. With a smaller feature size, more transistors can be stuffed in a chip, and when that is done, the result is more functions; faster, wider communications; and specialized capabilities like security, artificial intelligence (AI), and power management. 

Intel has always been a leader in process technology and therefore in a perfect position to recognize and exploit the inherent opportunities of compute density and throughput. The Skylake processor is the latest instantiation of that strength, and workstation users are the beneficiaries. 

About the Author: Jon Peddie

Jon Peddie

Add comment