The End of Moore’s Law. And The Coming Computing Renaissance.
How do we make progress on the exponential trajectories we have grown accustomed to when the underlying computing hardware performance has flatlined? Ramu Arunachalam

The computing industry is at a crossroads. Moore’s law, the engine that powered the computing revolution, has finally run out of steam.

Against this pessimistic backdrop, we are entering a period of great optimism as machine learning systems replace algorithms and human expertise. We are scratching the surface on what machine learning can do, and the possibilities seem endless.

How do we make progress on the exponential trajectories we have grown accustomed to when the underlying computing hardware performance has flatlined? We have to start by breaking out of a computing model that dates back to 1945 — reassessing core assumptions behind the internal organization of a computer and rethinking the fundamental representations of code inside the machine, as algorithms are replaced by machine learning models.

The End of Moore's Law

The founder of Intel, Gordon Moore, made the observation in 1965 that integrated circuits were going to get cheaper, faster and smaller at an exponential rate (~2x every 18 months). This phenomenon created the modern microprocessor (computers on an integrated circuit chip) which very quickly became powerful enough to displace its larger rivals starting with minicomputers to mainframes and eventually supercomputers.

On the back of Moore’s law we came to think of computing hardware as a commodity. Almost like clockwork the hardware got exponentially better and with it the software.

No exponential force can last forever and today we have reached the limits of scaling Moore’s law. It was never a law but an observation that held up surprisingly well for 50 years, eventually succumbing to real physical laws that cannot be rewritten.

Moore’s law today looks like just another technology S-curve that has plateaued. The era of automatic improvements is over. We will have to find new creative ways to keep the progress going.

The Mixed Legacy of Moore's Law

One of the legacies of Moore’s law is that it created a dominant model for computing based on a general-purpose architecture first proposed by John von Neumann in 1945. Modern machines today are still based on the von Neumann architecture — a remarkable and enduring feat for an industry that is used to so much change and disruption.

There were many attempts to break out of the von Neumann mold. Cray computers pioneered vector-processing based supercomputers that were well suited for numerically intensive applications such as weather forecasting and scientific simulations. Thinking Machines Corporation, a startup out of the MIT Media Lab, came to market with the Connection Machine — a computer based on a radical architecture modeled after the human brain. There were other notable attempts around Dataflow architectures that didn’t make it past academic projects.

All these ideas failed because of Moore’s law. Moore’s law put conventional microprocessors on a trajectory of exponential growth making it impossible for alternative designs to compete on equal footing. Even the most promising new designs quickly found their advantage erased as microprocessors got better at a faster rate. Great new technical designs were no match for the economics of mass production and commoditization.

By the mid-1990s, even the world’s fastest supercomputers were assembled from commodity microprocessors. This came to be known as “the attack of the killer micros” (based on a talk by Eugene Brooks of Lawrence Livermore Labs).

Eventually, everyone just gave up and a whole generation of engineers came to believe in the infallibility of Moore’s law. Computer architects shifted their focus to micro-architecture optimizations, further reinforcing the status-quo. Exploration of new ideas gave way to the exploitation of Moore’s law. No company exemplified this mindset more than Intel — the entire company on a mission to keep the Moore’s law gravy train running smoothly and on time.

The real tragedy of Moore’s law is that it bred a homogeneity in design, promoting inside-the-box thinking and rewarding good engineering more than great design.

With Moore’s law now history, we are back to exploration. How do we continue to make progress building ever faster and cheaper machines? A good starting point would be to study the failed computing designs of the past (of which there are many). Like the Renaissance artists who rediscovered the classics, the next generation of computer architects and programmers will have to draw inspiration from old ideas.

Nowhere is the need for a new architecture more apparent than in machine learning where the growing computational needs are exposing the limits of current microprocessors that aren’t getting any faster.

Machine Learning and its Unique Hardware Needs

Machine learning is increasingly replacing traditional algorithms. Companies like Google are well on their way to remaking themselves into machine learning factories, systematically transforming human crafted code into machine learning based models. Jeff Dean, a highly respected early Google engineer, has gone as far as to say that if that Google was built today “much of it would not be coded but learned.”

As machine learning techniques get better we may soon reach a point where we will trust a machine learning based system more than human crafted software. A few among us are already there!

 

Some recent academic work by Tim Kraska at MIT shows that conventional data-structures (B-trees, Hashmaps etc.) can be rewritten as deep learning based models. These “learned” data-structures are as much as 70% faster and 10x more space efficient than their traditional counterparts. This is an exciting area of research and provides a glimpse of things to come. It is entirely possible that in the near future core parts of a database or an operating system are redesigned as machine learning systems.

All this is challenging the traditional notion of software as code. Increasingly software may better be described as a set of machine learning models. Machine learning changes application execution into a numerical problem, the “algorithm” itself represented as a set of numerical weights. The act of learning is essentially a numerical optimization problem where the weights are adjusted based on new data; inference (deciding and acting) boils down to simple matrix multiplication.

If traditional software needs general-purpose microprocessors, machine learning systems need fast calculators that can multiply and add large sets of numbers in parallel. This key observation has led Google to build custom hardware for machine learning that it calls the Tensor Processing Unit (TPU). The TPU looks nothing like a general-purpose microprocessor — it looks more like a limited but super fast calculating engine (Ironically the earliest electronic computers were also specialized calculators. Another example of history coming full circle).

The Google TPU is interesting on many levels:

  • First, it performs really well. By some benchmarks, it is 30-70% faster than regular CPUs or even GPUs.
  • Second, Google has taken an agile development approach to the TPU. The first TPU was built in 15 months with a team of only around 30 engineers. Since the first release in 2016, Google has released a new generation TPU every year — an unprecedented pace for releasing hardware. They can do this because it has a very simple design — most of the chip is taken up by 64k multiply-add units. The magic in the TPU is not the process technology that packs transistors into a chip but a simple architectural design tailored for machine learning. In stark contrast, modern microprocessors require large teams and billions of dollars in R&D and manufacturing investment with a release every 2+ years.
  • Third, architecturally the TPU borrows a number of old ideas from computer architecture such as dataflow computing and systolic arrays that go back to the 1980s. What is old is becoming new again.

Today the TPU looks more like an accelerator to the microprocessor with most of application processing still executing inside the microprocessor. If machine learning does eat software it is conceivable this model gets flipped: the TPU (or something like it) in the center of the action with most of the application running inside it as models, while the microprocessor is relegated to the role of a helper.

Rethinking Computing Hardware in a Post-Moore's Law World

What will computing hardware look like in a post-Moore’s law world?

  1. There is no more “one-size fits all.” Gone are the days where general-purpose microprocessors ran all application workloads. Going forward we will need to depend on a diversity of computing hardware to accomplish a range of tasks (general deep learning, video processing, image/speech recognition and many others) in a range of locations (cloud, edge, mobile). The era of domain-specific hardware is upon us. The iPhone is a good early example of this trend, besides a 6 core CPU and a 3 core GPU it has dedicated co-processors for motion sensing, image processing and neural network computations.
  2. Hardware design will look a lot more like agile software design. The focus will be to release quickly and release often. Hardware design will move from a slow waterfall model to quick agile sprints. Architectural innovation will be the key differentiator and not process or manufacturing technology.
  3. We will see more multi-disciplinary teams building hardware. Application specific hardware will require a cross-disciplinary approach to design involving application developers, compiler writers, computer architects and circuit designers, not unlike the early days of the computer industry when vertically integrated firms competed in the marketplace.

It was said the great computer architect Seymour Cray began every new computer design on a blank sheet of paper: each new design unique and pulled together from first principles — unencumbered by the mistakes and legacies of the past. By the time he passed away in 1996, his approach seemed quaint and impractical — the world firmly in the spell of Moore’s law and a procession of Intel microprocessors, every new generation an iterative improvement over the previous. Today 42 years after the first Cray computer came to market, we are entering a new phase of computing design that looks a lot like the early days of computing. It’s a new beginning and we could use a Seymour Cray in our midst.