In the literal sense, the end of Moore’s Law — long and accurately defining the incessant downscaling in silicon-integrated transistor area and cost — is imminent. But focusing energy to push only on the literal meaning of Moore’s Law, rather than its spirit, is a fool’s errand. Because even after its end, technology will continue to advance both performance and price-performance, the true end goals of what Moore’s Law delivered for decades.
Now on the surface, CAD professionals shouldn’t necessarily care about whether Moore’s Law continues or not. But they will care a lot if the industry does not continue to find ways to deliver the generation-to-generation improvements in performance and price-performance that Moore’s Law so elegantly dictated over the past five decades. And really, given their stature as some of the most demanding computing professionals around, they will ultimately be impacted as much as any, should the industry fail to keep up that pace.
The good news is that Moore’s Law or not, the innovative powers of vendors are showing promise to keep progress moving forward, both in the short term and the long. One key component in the short term (at least) is chiplet technology, an approach that leverages the best of silicon at any generation to create higher-performing and better-balanced systems for the most compute-hungry users.
Moore’s Law Was Never the True End
The semantic details are often debated, but Moore’s Law essentially states that transistor counts (per area and cost) double around every two years. It began as an observation based on the early progression of silicon fabrication technology and extended to the current CMOS (complementary metal oxide semiconductor), the technology responsible for virtually all digital ICs (integrated circuits) made for the past 25 years or more. What the law implies — and is often equated with — is a doubling of performance that often comes from the doubling of transistors. The problem, one the industry has seen coming for some time, is that continuing to shrink dimensions at that pace will eventually hit a limit — if not by exacerbating problems like current leakage and thermal dissipation, then eventually by bumping up against quantum effects and atomic dimensions. Opinions vary on how many economically viable denser process nodes are left (each of which can accommodate many more transistors in the same silicon area), and how long it will take to leverage them. But regardless, vendors accept the coming end to Moore’s Law, at least measured in the context of conventional, ubiquitous CMOS technology.
However, too often we dwell on what Moore’s Law is or isn’t — the specific semantics and if it’s alive or dead — rather than the far more important aspect: what it has enabled. Its ability to double performance at the same cost as the previous generation, or to cut cost in half for the same performance — a property no other industry can match — has directly or indirectly given rise to virtually every major technological achievement of the past half-century. Most certainly, it has enabled modern CAD, for without the advances in performance and performance-per-dollar, designers, engineers, and architects wouldn’t have their mission-critical computing tools. Ultimately, the end goal should never be about keeping Moore’s Law alive, but about finding ways to deliver in the future what Moore’s Law delivered in the past. Given that, the industry is re-positioning itself to establish new paths forward to keep a Moore’s Law-ish progression in place — that is, to continue to grow geometrically the value of what the next generation of products and technology can offer.
To do so, vendors are opening up new fronts of attack, open to just about anything (or at least they should be), from relatively conventional evolutionary steps to downright radical departures. Ideas along the evolutionary lines are looking to advance based on our current ecosystem of technologies, materials, and development tools, both hardware and software. What “radical” implies is just about anything else, such as quantum computing that breaks the binary limits, by its nature opening up geometric growth potential via qubit (quantum bit) processing. Though few are likely to bet the farm on it quite yet, quantum computing holds enough promise to justify substantial investment from industry and academia alike. It’s not clear which paths are the most viable, or if some are much more than theory. Despite its promise, quantum computing for example, has some very difficult challenges to overcome — most notably its thermal sensitivity, which currently requires operation at temperatures just a tad north of absolute zero. And regardless of whether quantum computing will eventually evolve into a primary axis to extend computation scaling long term, it’s most certainly not an answer for the short term.
No, in the valley between the peak of CMOS scaling and the next technology peak beyond, the industry needs a compelling answer to bridge forward, and that answer needs to be much more evolutionary and conventional, one compatible with current CMOS-driven design and development infrastructure. The industry is exploring multiple conventional paths, and the answer is shaping up to be not just one tool, but a box full that — in aggregate — can help digital systems like CAD workstations take meaningful steps forward in performance and price-performance, generation to generation.
Chiplet Architectures Pack More Computing Power into the Same Space
While the industry has managed incredible advancements on the back of Moore’s Law, the industry doesn’t have to stagnate with it. For the innovative and adventurous — of which the industry counts many — there are other avenues to explore. One coming to the fore currently is chiplet scaling, an approach being aggressively pursued by both premier providers of high-performance CPUs: Intel and AMD. The chiplet proposition is straightforward: if you can’t stuff more transistors onto a monolithic piece of silicon (or don’t want to, for economic reasons), the other option is to stuff more silicon in the same system area or volume. So rather than simply brute-forcing geometrically higher core counts on the same monolithic die, a chiplet approach aims to package multiple chiplets into a more compact size, consuming less physical area on the circuit board that ultimately populates the computer. That is, if you’re struggling to pack more transistors in the same silicon area, why not change tack and try to pack more silicon chips in the same circuit board area? Though the means is different than on-silicon scaling, the result is similar: more computing power in the same space (assuming proper thermal management, of course).
The preferred way to do that and maintain high performance levels is to use multi-chip packaging. It’s a technique that AMD and Intel are both putting into overdrive, in part to battle the slowing pace of Moore’s Law. It addresses the shorter-term needs, as multi-chip packaging in its general sense is not particularly new and doesn’t require revolutionary thinking or technologies. But the manner in which AMD and Intel are investing in and extending multi-chip packaging is novel, and delivers scaling advantages to help bridge the gap opened up by the demise of Moore’s Law.
Intel’s Foveros. Intel has been attacking the multi-chip packaging front for years, and recently unveiled Foveros technology, representing its most significant and ambitious technology to date. Now being implemented for the first time in a CPU-class product, Foveros pushes beyond conventional interposers and memory to allow clever stacking of two high-performance logic chips, for example CPUs, GPUs, and accelerators (e.g., compute or AI).
A multi-chiplet Foveros 3D package. Image source: Intel.
Foveros goes a step beyond existing multi-chip packaging by stacking two digital logic “chiplets” on top of each other in a scheme that can genuinely be called 3D stacking (and existing memory stacking technologies can be further employed to build up the stack height further). Intel previewed Foveros a while back, combining 10-nm processor/logic chiplets with a 22-nm base die and memory in a remarkably small 12 × 12 × 1 mm package that draws only 2 mW of standby power. Just this summer, Intel unveiled the first product incarnation in mobile-focused Lakefield. With Foveros, Intel can argue that it can double effective transistors per unit surface area and thereby deliver comparable density gains to what Moore’s Law would have achieved in a monolithic die over two years, with a negligible gain in thickness.
Lakefield: Intel’s first Foveros-based product. Image source: Intel.
1 2
During the first half of 2020, it became obvious to people in every business that expectations for the near future would have to be seriously adjusted. Pandemics will do that.
The 2020 CAD Report from Jon Peddie Research was completed at the end of 2019. At that time — which seems so long ago — we were cautious about the prospects for 2020 and beyond. We would love to claim that we had seen the pandemic coming, but back then, we thought of viral epidemics as a seasonal challenge centered primarily in Asia.
Market share in CAD has remained relatively stable over the past ten years, but that’s a deceptive measure because each company is carving out specific areas of expertise so each can run the race in its own lane. There has been enormous change over the past decade that is serving these companies well in recession.
Instead, we were more concerned that desperately needed infrastructure projects had been stalled for years in the US, and worldwide global manufacturing had taken a downturn due to trade issues and fracturing trade alliances. In addition, the threat from climate change loomed large over every human activity.
CAD Industry Undergoes Major Changes, with AEC at the Forefront
The AEC industry was the success story of 2018–2019 thanks to the growing acceptance of digital technologies, especially building information modeling (BIM) and also modularization and on-site fabrication. Specifically, the construction industry re-emerged as a huge opportunity as decades of hidebound traditional practices began to modernize and reveal long pent-up demands.
The CAD industry as a whole has restructured for resiliency. It has transitioned to subscription, which offers a buffer for short-term shocks, and there is plenty of room for expansion in the steady adoption of digital twin approaches that connect designs, data, analysis, and documentation to the real-world objects they represent, including autonomous vehicles, smart cities, airplanes, power plants, industrial machines, and mobile phones.
So far, the plan has been working: CAD company revenues have been stable through the first half of 2020. But most companies are guiding down for the rest of the year as hopes fade for a fast end to the pandemic and a fast recovery. Instead, the crisis is evolving, and we’re seeing rolling outbreaks worldwide. We know now that recovery is going to take some time.
What’s also true is that in our modern age, every period of recession has been accompanied by innovation and transformation.
Room to play: The major players in the CAD market have their own areas of influence, but the digitization of industry means there is quite a bit of overlap. The successful companies in the coming decade will be those most able to make interoperability easy for customers.
As we said, the AEC industry has been ahead of the curve. Driven by a shortage of skilled workers in construction and also the lack of affordable housing challenges seen worldwide, the construction industry has been on the forefront of digital transformation. Autodesk, Bentley, and Trimble have all been investing in new tools for construction. In 2018 Autodesk began the acquisition process for Assemble, BuildingConnected, and PlanGrid to build on to BIM 360 and create the Construction Cloud; Trimble has bought Viewpoint and e-Builder; and Bentley has bought Synchro and most recently NoteVault. All these tools enable much better transparency for construction costs and work progress.
Continued demand for connected construction software is pretty much guaranteed. The industry has long relied on armies of low-cost workers working side by side. It was always inefficient, dangerous, and often immoral; now it’s becoming impossible. Instead, construction companies are adapting techniques from manufacturing: They’re moving to using pre-fabricated modules that can be built elsewhere and delivered to the job site, more automation and fabrication onsite, and direct communication with workers onsite who can interact with more-informative 3D models.
The companies building software for mechanical design and manufacture have led the industry in digitalization for decades. They are marching forward into a new age of digital twins, which are now coming much closer to being reality instead of an ambitious vision. For example, Siemens has defined the digital twin as a single model that grows and develops with the development and eventual deployment of the real-world project.
In process and power, we’ve seen Siemens and Bentley forge a powerful alliance that builds on the strengths of both companies, and not coincidentally fuels continued whispers that Siemens may buy the privately owned Bentley Systems. The rumor of a Bentley–Siemens merger has been countered by Bentley’s cyclical announcements of an imminent IPO. (The one thing that possible IPO has never been is imminent.)
1 2
Editor's Note: Click here to read Part 1 of this article, "Boxx Expands into Remote Workstations with Help from Cirrascale."
Boxx Technologies’ acquisition of Cirrascale and subsequent launch of Boxx Cloud Services was not only prescient in its timing, but unique in what it’s brought to the rapidly expanding cloud computing ecosystem. As explored in the first part of this article, Boxx Cloud Services is one of the most recent providers of remote desktop hosting solutions — a launch that dovetailed with the world’s urgently renewed interest in remote computing, triggered by the COVID-19 crisis.
While it’s not first to the cloud computing party, its offerings are anything but copies of hosted desktops available from names like Amazon Web Services and Microsoft Azure. Boxx Cloud Services’ for-rent workstations offer one-to-one dedicated hosted machines — not just comparable to their traditional deskbound machines, but identical. Forget slower-clocked server-optimized CPUs and the shared memory, storage, and GPU resources of a virtualized cloud platform; Boxx Cloud Services workstations would represent the top end in performance (including overclocked CPUs), were they packaged and sold as deskside towers.
Verifying the Premise of Identical Performance
Now, while it’s theoretically solid to argue that the system throughput of the remote machine should essentially match the identically configured local machine, I (with Boxx’s help) went ahead and benchmarked anyway. We ran SPECwpc 3.0.4’s Product Development (focusing on common CAD compute and visual workloads), General Operations, and GPU Compute test suites. The results supported the theory, no surprise, as five composite results for workloads stressing CPU, graphics, storage, and GPU compute showed tight tracking between systems.
Differences were extremely small — in the noise — with the exception of 3D graphics performance. Overall, graphics ran about 5% slower on the remote machine, a result with an understandable explanation: PCoIP does chew up a bit of overhead, most notably in encoding the desktop screen as a video stream for return transmission. By default, PCoIP Client Software has to both perform that encoding in software and interrupt GPU graphics processing to fetch frames from video memory, the combination of which could logically account for a 5% hit. The good news is that PCoIP Client Software now also supports hardware video encoding on Nvidia RTX–class GPUs, further offloading the CPU and reducing that penalty (though this is a remedy I did not test).
No surprise, the same workstation produces essentially the same throughput (when tested with the SPECwpc 3.0.4 benchmark), no matter where it is.
Network — Perhaps Especially Latency — Is the Most Important Performance Consideration
Using SPECwpc to test that two essentially identical machines can deliver the same throughput is not a particularly interesting exercise or revealing comparison (with the exception of quantifying that modest and explainable graphics performance penalty). We’re talking about the same Boxx model, just in one case, with that machine next to your desk, and in the other, in a rack somewhere else. Rather, when we’re comparing using a local workstation under the desk to using the same machine located in a remote datacenter, we need to consider how well the network — both local-area and wide-area networks (LAN and WAN) between you and the remote workstation — can support both the display of your desktop screen and interactivity. Essentially, that comes down to bandwidth and latency. With respect to bandwidth, the network will be burdened with the additional bandwidth required to transport at least one (and more likely, two or three) FullHD-resolution (again, at least) encoded streams from datacenter to client. Thankfully, with the robust improvement in available bandwidth of mainstream LAN technologies and WAN providers, bandwidth is arguably the lesser issue of concern, as modern Internet access today will more than likely suffice in the vast majority of small business and home offices.
Often, the more worthy consideration than a network connection’s available bandwidth is its latency, which is the thing that can turn an otherwise pleasant interactive computing session into an irritating struggle. Ultimately, for each specific user’s environment it’s the round-trip time (RTT), for example, manifested in the delay from when you make a request and the result of that request appears on your local screen. For example, when you click the mouse to change the model view, all the following then occurs: Your local client processes with PCoIP (in this case), transmits over LAN through your router, onto the WAN, and eventually to the datacenter LAN and your allocated workstation. Then for the return trip, the remote machine processes the request (exactly as it would have for a mouse click connected directly to your deskside machine), creates the updated graphical view, uses PCoIP software to encode the updated screen, transmits through the datacenter LAN to their router back over the WAN to your router, then across your LAN to your client, whose PCoIP software decodes that desktop screen and displays on your monitor. It sounds like a lot, but most of that happens in the blink of an eye on a local workstation — the incremental difference in the remote solution is roughly equal to the amount of time spent crossing the entire network twice.
Dragging a window around the hosted desktop very rapidly — at a rate that would stress the responsiveness of the system, albeit far beyond the speed I would ever do in normal use — did reveal a noticeable lag from mouse location to window location. But while noticeable, it was certainly not irritating. Other subjective tests I used to stress the interactive round trip response, like fast zooming in and out of a Google map and scrolling on a web page, showed a lag I could notice, but just barely. In the context of CAD, a more difficult test of response might be very rapid and continuous pan-and-zoom of models.
So yes, chances are that even with the 70-millisecond (ms) latency, you can create interactive sequences to make a remote solution noticeably less responsive than the deskside, but then that leads to two questions: how often are you engaging in that worse- or worst-case behavior, like extremely rapid and continuous pan-and-zoom, for example? And even if noticeable, is it annoying? If not, chances are that even if you can find perceptible lag when going out of your way, like I did with niche usage, it probably doesn’t translate to a negative overall experience.
1 2