Allocate Your Workstation Budget According to Your Workload4 Sep, 2014 By: Alex Shows
Uncertain about how best to spend your hardware dollars? A careful evaluation of the type of work to be done — and these guidelines — will provide the answers.
The most important criterion for configuring the right workstation is knowing how it will be used. The intended purpose determines which components are critical to performance and which are optional or unnecessary. In addition, the more you know about how the workstation will be used, the more performance you’ll be able to achieve per dollar spent. Start by identifying the various modes of use, then weigh the importance and frequency of those tasks; that way, you can more effectively determine the right workstation for the job.
Computational vs. Interactive
You should begin by considering the type of work to be done on the workstation, sorting the major tasks into two categories: computational or interactive. Computational tasks involve little user interaction and are characterized by high utilization of all available resources in an automated sequence. Rendering frames of video, integrated finite element analysis, motion simulation, and computing the downforce of a new racecar spoiler design are all examples of computational workloads. Interactive workloads, in contrast, involve heavy user interaction and are characterized by sporadic peaks of high utilization separated by idle periods where the user is thinking about the next interaction. Viewing and rotating an engine model, annotating the HVAC routing through a multi-story building, and animating a complex rigged model in a 3D modeling program are all examples of interactive workloads.
Dividing the usage model into computational and interactive buckets helps to determine the necessity of components such as dual socket support and the number of memory channels populated, as well as the importance of particular attributes of those components, such as peak possible central processing unit (CPU) frequency. For purely computational workloads, multi-socket platforms can provide great performance improvements by reducing the amount of time a task requires, so long as the software processing the work is able to scale in performance as processor count increases. If the application does not scale across the available processors, either due to architectural or licensing limitation, the additional cost and complexity of the second socket may not be justified.
A CAD user, for example, who spends his or her time editing and annotating a design on a workstation, and then submits simulation or rendering jobs to a separate system (perhaps in a data center), would benefit much less from a multisocket workstation when compared with a user that spends most of his or her time in simulation and analysis of designs. There is a wide variety of standalone and plugin-based simulation, rendering, and analysis tools available, in addition to those that may be integrated into a CAD suite. These tools, unlike interactive modeling, typically scale quite well across as many cores as possible, including those provided by additional populated sockets.
Similar to the question of a second CPU socket, some computational workloads may scale in performance by using the graphics processing unit (GPU, familiarly called a graphics card) as a computational resource. To understand the differences between CPU and GPU, it may help to think of the GPU as a dragster and the CPU as a rally car. Given a set of data (fuel), and a straight track (predictable, repeated instructions), the GPU is incredibly fast in a straight line. On the other hand, the rally car has a navigator inside that is like the CPU’s branch prediction algorithm, providing hints to the driver about the turns coming up and how best to negotiate them, while the driver is adept at quickly responding to road conditions around a highly complex track. Many computationally intensive applications are improving performance through the use of the GPU. Thus it’s important to determine whether your application can make use of the GPU, and what type of GPU might be required.
When choosing a CPU, first think about how much time will be devoted to computational workloads, where all available cores will be driven for long durations at high utilization. The more time spent in these usage types, the more of the workstation budget should be spent on maximizing core count. Begin by maximizing core count in a single socket, while considering budgetary requirements for other components. If more computational performance is desired or made possible by savings on other components, move to a platform with dual CPU sockets to further increase computational performance.
It is important to avoid maximizing computational performance by moving first to a dual–CPU socket platform. While these platforms will provide the best computational performance, there is a slight performance penalty due to the nature of multi-socket architectures. This penalty will impact interactive usage models by slightly reducing the frame rates generated by the graphics card. (See the next section for more on graphics performance.)
While it is best to maximize core counts for computational workloads, interactive usage models require the highest CPU frequency available. This is because interactivity (as measured by frames per second) is often limited by the efficiency of a single core to feed the GPU with instructions and data. Most modern graphics programming interfaces can only feed data and instructions to the GPU using a single thread, and despite the GPU driver being multithreaded, performance benefits with increasing core count are negligible beyond four. Thus, the greater the amount of time spent in interactive usage models, the greater the portion of the workstation budget that should be allotted to increasing the maximum CPU frequency.
Most Intel CPUs available in Precision workstations support a feature called Turbo. When operating in Turbo mode, a CPU adjusts its frequency based on the workload distributed across its cores. When fewer cores are busy, the CPU runs at a higher frequency. The highest Turbo frequencies are possible when only a single core is active. The lowest Turbo frequencies are used when many, or even all, cores are active. This dynamic clocking allows interactive workloads to operate at peak Turbo frequencies, while computational workloads still operate above the nominal frequency of the CPU. This is important because comparing the nominal frequency of two CPUs (or their rated frequency, which is commonly quoted alongside the model name) isn’t always representative of the frequency at which they will be operating the majority of the time.
To more precisely compare CPUs when deciding which to choose, you should compare the low-frequency mode (LFM), high-frequency mode (HFM), minimum Turbo frequency (all cores loaded), and maximum Turbo frequency (one core loaded). LFM is important to compare if you want more power efficiency at idle. If the CPU isn’t doing any work, how important is it that the CPU consumes as little power as possible? HFM is important to compare if the CPU doesn’t support Turbo. Minimum Turbo frequency is important to compare if the CPU will spend most of its time running computational workloads. And finally, the maximum Turbo frequency is important to compare if the CPU will spend most of its time running interactive workloads.
When deciding whether to maximize CPU core count, maximum frequency, or find a compromise between the two, one should always seek the latest CPU microarchitecture and generation. Newer CPU generations typically come with either a process shrink (smaller transistors) or a new architecture. Newer architectures often bring greater performance at the same frequency, and the benefits of this extend beyond just CPU performance. Because many applications spend time waiting on a single core to feed the GPU instructions and data, as the CPU’s integer performance increases, so does the graphics performance. This means that the same frequency CPU, on a newer-generation architecture, can provide higher frames per second with the same graphics card!
In general, when it comes to graphics cards (GPUs), the more you spend the more speed you can buy. Speed in graphics is most commonly associated with real-time rendering performance, as measured in frames per second. The higher your frames per second in an application, the more fluid your interactions with the data model, and the more productive you can be. Computational capabilities aside, finding the right graphics solution for a workstation depends on the desired frames per second in the applications of greatest interest.
A good rule of thumb for graphics performance is to look for a card that is capable of delivering more than 30 frames per second in the most important applications, using data models and rendering modes most like those in your day-to-day use. While the persistence of vision phenomenon dictates that we require 25 frames per second for smooth animation, more is always better. If a particular graphics card is able to deliver more than 100 frames per second in a particular rendering method using a specific model size and type, it is reasonable to assume that you can increase the complexity and/or size of the model and still be able to interact with that model without observable stuttering.
SPECviewperf is an excellent benchmark for comparing workstation graphics cards because it measures the frames per second of several varied workloads using rendering methods that mirror those of popular workstation applications. Anyone can view the detailed frames-per-second measurements of several different methods of rendering and compare graphics card performance based on published results, as well as see representative screen captures of the image quality of these methods. If one were a user of PTC Creo, for example, one could use this data to compare how one card performs versus another, not just in Creo, but specifically with a data model and rendering mode that most closely represent a particular use of Creo.
Thus when considering graphics, weigh the amount of time in a typical day that the workstation will spend in either highly interactive work, or in computational work that utilizes the GPU. The more time spent in these usage types, the greater the portion of the workstation budget that should be spent on graphics.
The Importance of Memory
It has been said that you can never have too much random-access memory (RAM). While that adage may be true for modern multicore systems running massively multithreaded applications, it is still very important to weigh other factors when considering which type of memory to include in the workstation. For computational workloads, you’ll almost always want to maximize the amount of memory bandwidth available to the processing cores. Thus if given the choice about whether to populate eight DIMMs of an 8 GB capacity each, or four DIMMs of a 16 GB capacity each, choose the option that populates more DIMM slots. The increase in available memory bandwidth will reduce the likelihood that memory bandwidth is the bottleneck to computational workloads, shifting the computational burden back to the CPU cores, frequency, and cache.
Choosing the right frequency is also important, and varies depending on the workload. In applications requiring maximum memory bandwidth, populating all available DIMM slots with the highest-frequency memory is important. However, some applications require the lowest latency possible, irrespective of available bandwidth, and in that case you would want to populate all available DIMM slots with the lower-frequency memory. An example of this is in random accesses of memory that is small enough to fit in the CPU cache but there is no way for the CPU to predict what memory location to access next. While memory bandwidth remains important, the lower latency of the slower memory speed can provide benefits to these random reads and writes.
Lastly, when the integrity of data used in individual computations is paramount to the end result, error-correcting code (ECC) memory should be used. ECC memory uses a parity bit scheme that computes whether the bit is a 0 or a 1 depending on the data saved, and can not only detect when the data is incorrect, but can also correct the error. This is especially important when iterating across a large dataset where the outputs of computations are continually provided as inputs into another sequence of computations, because one mistake missed in early computations can have a dramatic impact on the final outcome.
Storage for Various Use Cases
A variety of storage performance considerations depend completely on the usage model. For instance, is the data stored on the network or stored locally on the workstation? If the former, how frequently are updates committed to the network resource? If the latter, how much capacity is required locally? Is redundancy required on the local storage? All of these factors are important to determining the right storage components for the workstation, and due to this complexity the subject deserves much greater attention than given here.
For simplicity, we’ll assume the data is stored locally on the workstation and not concern ourselves with network bandwidth, frequency of updates, or check-in/check-out procedures. A single user of the workstation will have a blend of three common local storage use cases:
- “Office Productivity” — reading and writing small files with occasional large file transfers (common activities for a project manager, design reviewer, or approver).
- “Interactive Workstation” — opening and saving a wide variety of file sizes (frequent tasks for a 2D or 3D drafter).
- “Computational Workstation” — iterating across very large sets of data, often generating large temporary files (such as an engineer using 3D rendering, simulation, and analysis packages).
Optimizing for the Office Productivity use case is usually as simple as weighing anticipated capacity needs with the highest-performing drive class within the budget. While rotational drives have traditionally dominated this segment, in recent years the decreasing cost of MLC (multilevel cell) memory and controllers has brought the more favorable solid-state drives (SSDs) and hybrid drives within reach of more users. In general, for this use case, hybrids provide the best price-to-performance ratio, while SSDs provide the best outright performance. Hybrids store the most commonly used data in cache, which is faster to access than the rotating media in the drive, and as long as the files are relatively small they’ll all fit nicely.
The Interactive Workstation usage model requires greater performance, and this is where solid-state disks (SSDs), serial-attached SCSI (SAS) drives, and redundant array of independent disk (RAID) arrays begin to play a more important role. If a single SSD provides the capacity needs of both your office productivity and interactive workstation usages, this option will perform best, second only to a multidrive RAID 0 array.
RAID arrays enable the creation of a large virtual drive that spans one or more physical (or logical) drives. Depending on the RAID type, new features such as redundancy (having more than one copy of the data simultaneously) and greater performance are possible. If redundancy is equally or more important than performance, having a RAID array such as a RAID 1, 10, or 5 would be the better choice. Then it becomes a decision between available (matching) drives to build the array. Moving to a RAID array can increase storage costs considerably, making it prohibitive to include high-performing drives in the array. One way to mitigate this cost while maintaining high performance is to use an SSD boot drive that hosts the operating system and applications, while building a RAID array out of lower-cost rotational disk drives.
For the computational workstation wherein significantly large datasets are used, the only option for this type of usage may be a RAID array composed of large drives. By combining the smaller-capacity drives into a single large volume, the application can use all of this capacity as if it were a single large drive. Multiple drives in RAID 0 will maximize performance and capacity, but this provides no redundancy. Multiple drives in RAID 1 provide redundancy but don’t maximize performance or capacity. RAID 10 increases performance and capacity and adds redundancy, but is the most costly in terms of the number of drives required.
Between RAID 0 and RAID 10 is RAID 5, which increases performance and capacity and adds redundancy with fewer drives than a RAID 10, but requires more overhead to manage the array due to the computation of parity data, which is then distributed across the array. When considering whether to add a fourth drive to an integrated storage controller and creating a RAID 10, consider the option to upgrade to a discrete RAID controller with onboard memory and moving to RAID 5. You’re likely to see higher capacity, and the benefits of a discrete RAID controller may mean higher performance in office productivity and interactive workstation usages, not to mention benefits to computational workstation usage types.
The Advantage of Application Certification
One of the key differentiators of professional workstations, as compared with conventional PCs, is the platform certifications to run specific professional workstation applications. Considering the complexity of the software environment, one can imagine the incredible number of variations that might exist in operating system versions, application versions, hardware, firmware, and driver versions. All of these variables can have an effect on application stability and performance. Workstation certification addresses this by being prescriptive about the components and revisions tested and found to be compatible with the application. This mitigates the risk to users when purchasing a new workstation or upgrading an existing workstation, as they can be confident before purchasing that the workstation in question has been certified by the software vendor of their desired application.
To find the right workstation configuration, first identify the primary and any secondary or tertiary usage models.
For interactive usage models, focus on maximizing CPU frequency, followed by the class of graphics. For individual CPU models, choose the latest architecture and look primarily at the peak Turbo frequency. Of the available CPU models, determine the best frequency for the price. Then look to graphics and compare frames per second using industry-standard benchmarks such as SPECviewperf, focusing on the applications and/or rendering modes that are most important to your usage. Judge which of the available GPU models offers the best frames per second for the price. Then look to memory and maximize memory bandwidth at the capacity desired. And finally, look to storage, where a single SSD might address all the interactive usage model needs, unless capacity or redundancy requires a RAID array, or spending limits dictate a single rotational disk drive.
For computational usage models, focus on maximizing core count, followed by CPU frequency. For individual CPU models, choose the latest architecture and look primarily at the lowest Turbo frequency (which reflects the lowest frequency the CPU will Turbo up to under heavy load). Look for the best core count per dollar, and if the workstation will spend more than half of its life in computational work, consider upgrading to a dual-socket workstation. If the application supports GPU compute, consider upgrading the GPU to models with more compute cores, as the performance per dollar in GPU upgrades will often be higher than the CPU (here again based on the percentage increase in core count). Upgrade memory by populating as many slots as possible first; except for a limited set of applications which are highly sensitive to latency, it is always best to upgrade to the fastest memory speed for the maximum possible computational throughput. Finally, consider the storage requirements primarily in terms of capacity and bandwidth required by the application, which is often much larger and higher than with other usage models.
Optimal performance for a particular usage model can be achieved by identifying the factors that are most important to that application: those having the highest impact on performance. Combining those selections in a workstation certified for operation with the key applications of that usage model will ensure that the user has the best experience possible.
Autodesk Technical Evangelist Lynn Allen guides you through a different AutoCAD feature in every edition of her popular "Circles and Lines" tutorial series. For even more AutoCAD how-to, check out Lynn's quick tips in the Cadalyst Video Gallery. Subscribe to Cadalyst's free Tips & Tools Weekly e-newsletter and we'll notify you every time a new video tip is published. All exclusively from Cadalyst!