Allocate Your Workstation Budget According to Your Workload3 Sep, 2014 By: Alex Shows
Uncertain about how best to spend your hardware dollars? A careful evaluation of the type of work to be done — and these guidelines — will provide the answers.
The most important criterion for configuring the right workstation is knowing how it will be used. The intended purpose determines which components are critical to performance and which are optional or unnecessary. In addition, the more you know about how the workstation will be used, the more performance you’ll be able to achieve per dollar spent. Start by identifying the various modes of use, then weigh the importance and frequency of those tasks; that way, you can more effectively determine the right workstation for the job.
Computational vs. Interactive
You should begin by considering the type of work to be done on the workstation, sorting the major tasks into two categories: computational or interactive. Computational tasks involve little user interaction and are characterized by high utilization of all available resources in an automated sequence. Rendering frames of video, integrated finite element analysis, motion simulation, and computing the downforce of a new racecar spoiler design are all examples of computational workloads. Interactive workloads, in contrast, involve heavy user interaction and are characterized by sporadic peaks of high utilization separated by idle periods where the user is thinking about the next interaction. Viewing and rotating an engine model, annotating the HVAC routing through a multi-story building, and animating a complex rigged model in a 3D modeling program are all examples of interactive workloads.
Dividing the usage model into computational and interactive buckets helps to determine the necessity of components such as dual socket support and the number of memory channels populated, as well as the importance of particular attributes of those components, such as peak possible central processing unit (CPU) frequency. For purely computational workloads, multi-socket platforms can provide great performance improvements by reducing the amount of time a task requires, so long as the software processing the work is able to scale in performance as processor count increases. If the application does not scale across the available processors, either due to architectural or licensing limitation, the additional cost and complexity of the second socket may not be justified.
A CAD user, for example, who spends his or her time editing and annotating a design on a workstation, and then submits simulation or rendering jobs to a separate system (perhaps in a data center), would benefit much less from a multisocket workstation when compared with a user that spends most of his or her time in simulation and analysis of designs. There is a wide variety of standalone and plugin-based simulation, rendering, and analysis tools available, in addition to those that may be integrated into a CAD suite. These tools, unlike interactive modeling, typically scale quite well across as many cores as possible, including those provided by additional populated sockets.
Similar to the question of a second CPU socket, some computational workloads may scale in performance by using the graphics processing unit (GPU, familiarly called a graphics card) as a computational resource. To understand the differences between CPU and GPU, it may help to think of the GPU as a dragster and the CPU as a rally car. Given a set of data (fuel), and a straight track (predictable, repeated instructions), the GPU is incredibly fast in a straight line. On the other hand, the rally car has a navigator inside that is like the CPU’s branch prediction algorithm, providing hints to the driver about the turns coming up and how best to negotiate them, while the driver is adept at quickly responding to road conditions around a highly complex track. Many computationally intensive applications are improving performance through the use of the GPU. Thus it’s important to determine whether your application can make use of the GPU, and what type of GPU might be required.
When choosing a CPU, first think about how much time will be devoted to computational workloads, where all available cores will be driven for long durations at high utilization. The more time spent in these usage types, the more of the workstation budget should be spent on maximizing core count. Begin by maximizing core count in a single socket, while considering budgetary requirements for other components. If more computational performance is desired or made possible by savings on other components, move to a platform with dual CPU sockets to further increase computational performance.
It is important to avoid maximizing computational performance by moving first to a dual–CPU socket platform. While these platforms will provide the best computational performance, there is a slight performance penalty due to the nature of multi-socket architectures. This penalty will impact interactive usage models by slightly reducing the frame rates generated by the graphics card. (See the next section for more on graphics performance.)
While it is best to maximize core counts for computational workloads, interactive usage models require the highest CPU frequency available. This is because interactivity (as measured by frames per second) is often limited by the efficiency of a single core to feed the GPU with instructions and data. Most modern graphics programming interfaces can only feed data and instructions to the GPU using a single thread, and despite the GPU driver being multithreaded, performance benefits with increasing core count are negligible beyond four. Thus, the greater the amount of time spent in interactive usage models, the greater the portion of the workstation budget that should be allotted to increasing the maximum CPU frequency.
Most Intel CPUs available in Precision workstations support a feature called Turbo. When operating in Turbo mode, a CPU adjusts its frequency based on the workload distributed across its cores. When fewer cores are busy, the CPU runs at a higher frequency. The highest Turbo frequencies are possible when only a single core is active. The lowest Turbo frequencies are used when many, or even all, cores are active. This dynamic clocking allows interactive workloads to operate at peak Turbo frequencies, while computational workloads still operate above the nominal frequency of the CPU. This is important because comparing the nominal frequency of two CPUs (or their rated frequency, which is commonly quoted alongside the model name) isn’t always representative of the frequency at which they will be operating the majority of the time.