Tracing NVIDIA’s Journey to the Multibillion Dollar GPU Market

Get started for free

The History of NVIDIA and the Rise of GPUs

We all know NVIDIA as the force to be reckoned with in the graphics card wars of the 1990s. But as the company entered the 2000s, they embarked on a journey to do more. Moving towards an entirely new kind of microprocessor and the multibillion dollar market, it would unlock. In this article, we’ll look at how NVIDIA turned the humble graphics card into a platform that dominates one of the most important fields - artificial intelligence.

The Graphics Pipeline

To understand what was so special or groundbreaking about this particular NVIDIA product, we need to dive into the graphics pipeline. The job of a three-dimensional graphic system is to turn a scene into images sixty times a second, sometimes more, with minimal lag. Here is a very simplified example of how they did it. Early real time graphic systems broke everything down into triangles, more complicated squares and quadra laterals, triangles, curved patches, triangles. It’s triangles all the way down. Baby. At the start of the pipeline, you have a bunch of 3D coordinates data and 3D model or shape data. The pipeline first ingests those three coordinates and turns them into two-dimensional window coordinates. This is called the geometry stage. The geometry stage can be further subdivided into two sub stages. The first is transformed and lighting, the second is triangle setup and clipping. In transforming lighting, the pipeline needs to adjust the 3D coordinates data and apply lighting effects to take into account the user’s own personal perspective. The triangle setup and clipping stage is where we process the coordinate data into something that the rendering engine can render. This especially matters in scenes where multiple three-dimensional objects are overlapping one another. The overlapped object would need to be clipped, hence the second part of the name.

The Math Behind It

When the graphics pipeline is processing and manipulating these coordinates, they use a mathematical structure called a matrix. A matrix is a rectangular array of numbers. You can perform operations on these matrices like multiplication or addition in a standard defined way. We don’t need to know exactly how matrices figure into manipulating 3D coordinates within the overall graphics process. We’re not in this is not a math class but we do need to know that they exist and that the graphics pipeline and thus graphics cards running them were heavily optimized the process and run mathematical operations on them. It will come up again later.

The Rendering Stage

At the rendering stage, the pipeline fills the pixel space between the translated to D coordinates with image pixels representing the object surface. The setup engine passes the render engine all the information about the objects, color and texture at a particular spot as well as how those are affected by lighting conditions and perspective. The final result is then sent to the monitor for displaying to the user.

The Rise of GPUs

As recently as 1996, graphics cards handled just a subset of the graphics pipeline, the render stage, the CPU handled the rest. This made the most sense for the companies early hardware since rendering was simple and repetitive, you receive the coordinates of the triangles, lines, then you just draw the right lines with the right colors many times over. But very quickly, NVIDIA’s cards started to take on more of the graphics pipeline on its own. In 1997, their cards took on triangle setup and clipping. If you recall, this stage was about computing the actual triangle coordinates for the render stage to process. The CPU previously handled this but the work was repetitive and became a bottleneck in the pipeline. Now free of the lame work of setting up triangle coordinate data, the CPU could focus on the geometry stage and it did this part quite well, throwing more triangles at the graphics card than ever before.

The Evolution of GPU Technology

The GPU is a revolutionary piece of technology that has revolutionized the way we interact with computers. It has become a crucial part of our everyday lives, from playing video games to creating stunning visuals. But how did the GPU come to be and how has it evolved over the years? In this article, we’ll take a look at the history of the GPU and how it has evolved over the years to become the integral part of technology that it is today.

The Beginning of GPU Technology

The GPU was first introduced in 1999 with the NVIDIA GeForce2. It was the first graphics processing unit (GPU) to combine the rendering pipeline into a single chip. Prior to this, the rendering pipeline was broken into three parts: vertex processing, geometry processing, and lighting and rendering. This meant that data had to be transferred back and forth between the CPU and GPU, which was a bottleneck in the pipeline. To solve this, the video and graphics engineers converted the work into a pipeline of sequential steps and added many identical pipelines, which could then work in parallel.

The G Force to 56

This was the first GPU, and it opened the door for NVIDIA to take on the last part of the graphics pipeline, transform and lighting calculations. With the G Force to 56, NVIDIA was able to take the entire graphics pipeline onto a single chip, no longer needing to transfer data back and forth with the CPU. This was a major achievement and the market started to really take off. However, this generation of GPUs ran what was known as fixed function pipelines. Once the programmer sent data to the GPU, it could not be modified.

Customization and the G Force Three

Two years after the G Force, NVIDIA released the G Force 3, which broke apart the fixed graphics pipeline. Programmers could now send data to customize programs called shaders. Vertex shaders replaced the transform and lighting stage of the pipeline, while pixel shaders operated at a pixel level and helped with rendering the image. These shaders were written in a low level language and had their limits, but it opened the door for more generalized programming capabilities down the line.

CUDA and the G Force Eight

In 2006, NVIDIA released the G Force 8 series GPU, as well as a new proprietary software framework called CUDA. This made it substantially easier to program a GPU, and the old fixed function graphics pipeline was completely wiped away. With CUDA, the GPU was no longer a piece of specialized graphics hardware, but a generalized processor with thousands of cores, able to manipulate large chunks of data in parallel.

The Performance of GPUs

GPU performance have scaled much faster than CPUs as of late. This is for several reasons. First, NVIDIA has been able to benefit from TSMC’s progression in commercializing new process nodes. Because GPU processing power is almost literally proportional to the number of processing course, it can scale its processing power by scaling its transistor counts. Some recent GPUs have over 20 billion transistors. Second, NVIDIA was able to scale up frequencies with the biggest jump coming in 2016, when their Pascal GPUs used TSMC’s 16 nanometer process node for the first time. This note was TSMC’s first to use a new type of transistor, the three D transistor called FinFET. This allowed chips to run at a much higher clock speed and use less energy.

The Market Demand for GPUs

Fourth, and perhaps most importantly, NVIDIA found a new market demand for insanely powerful GPUs. In 2010, ImageNet held a competition between various computer vision models to see who had the best image classifier, like identifying a strawberry, blueberry or pigeon in an image. This demand for powerful GPUs has continued to grow, and is perhaps most evident in the recent surge in GPU-based artificial intelligence (AI) technology.

In 2010 and 2011, the top performing models in the ImageNet competition used hand-engineered computer vision methods. These models had an error rate of 25%. In the top five performers, a normal human with a little bit of training could probably do about 5%. Then in 2012, 3 researchers, Alex Kuczynski and Geoffrey Hinton, participated in the competition using something called a deep neural network trained over 5 to 6 days on two GTX 5 83-gigabyte GPUs. This model took first place by a mile with a 15% error rate, 10% points better than the runner up. Since then, the winning models’ error rate has dropped to the point where it regularly performs better than a trained human. In 2016, 4 years later, deep learning hit the mainstream when Google DeepMind’s AlphaGo won a five-game match of Go against 18-time world champion, Lee Sedol.

Deep learning is based on one conceptual understanding of how the biological neural network works. People represent a network's layers of neurons using matrices. The inputs and weights are basically just numbers within a big matrix. This takes advantage of the GPU’s relative strength in handling matrices to train networks vastly faster than ever before. Neural networks have been around since the 1950s, but researchers never got good results with neural networks before. The reason has to do with the sheer amount of data required to get reliably good results. For instance, HBO’s Silicon Valley trained a neural network for its not hot dog mobile app. In order to train that simple app, the developer had to collect 3000 photos of hot dogs and 147,000 photos of things that were not hot dogs for many years.

Researchers had to use CPUs to train their neural networks. This took an in practically long amount of time because CPUs are not meant for parallel operations. They have too few cores for doing that. Work furthermore, they have limited amounts of memory cache neural network training is quite memory intensive. You have to store the input data parameters and more has the inputs. Work through the network's layers. Anyway, training a network with 128 images using a dual 10-core CPU from 2012 took about 124 seconds. Extrapolate that out to one million images or more and it takes 11 days to train a single machine learning model. That is not practical. You would never be able to iterate on your work.

GPUs, on the other hand, can do it 8.5 times faster than a CPU. A million images in about a day. A massive improvement. In other use cases, the speed improvement can get even better up to 40 times faster.

Scientists had noticed this possible application for years has GPU’s power as measured by floating operations per second, vastly scaled up. But prior to Cuda being released in 2006, there was no easy way to actually program these operations on a GPU. Researchers had to fit the neural network into the confines of graphical operations, which was about as hard as it sounds.

Companies quickly realized, however, that they would be able to apply the same techniques to more financially lucrative tasks, things like autonomous driving, content moderation, revenue optimization and more. Almost all of this work has been done on NVIDIA GPU’s and coded up in CUDA, the company has invested substantial resources developing the space correctly believing that it would lead to more high margin GPU sales. Their flagship GPU for enterprise level neural network training is the A 100 with some 54 billion transistors and fab by TSMC. These are sold to data centers at something like $20,000 a pop and the data centers then rent access to their hardware through the cloud for as much as $33 an hour. The company’s biggest revenue generators are still consumer GPUs for gaming or crypto. But starting in 2017, its data center revenue started accelerating too. In the third quarter of 2021 the company made $3.2 billion 2.94 billion from data centers. But the data center division is growing much faster, 55% year over year compared to 42% in gaming and video.

GPU’s help enable the deep learning revolution. Competition in the hardware space||

In 2010 and 2011, the top performing models in the ImageNet competition used hand-engineered computer vision methods. These models had an error rate of 25%. In the top five performers, a normal human with a little bit of training could probably do about 5%. Then in 2012, 3 researchers, Alex Kuc

#The Machine Learning Revolution and NVIDIA#

The field of machine learning has seen a dramatic upswing in recent years, and it’s no surprise that many companies have jumped on the opportunity to provide hardware and software products to meet the demand. NVIDIA, a former graphics card competitor, has become a major player in the field with its CUDA platform and ROCM. The latter is open source and offers programmers a greater amount of access to internal GPU details.

Startups and Deep Learning Chips

On another front, a variety of startups have emerged offering chips designed for one purpose only, like machine learning, graphic, or cerebral. Companies such as these have raised multimillion dollar funding rounds from big VCs looking for a piece of the NVIDIA deep learning lunch. Other companies have opted to create their own deep learning hardware, mostly for economic or strategic reasons. Google, Amazon, and Tesla, for example, have the expertise or scale for this option.

The Ultimate Standard

Despite all of this competition, the machine learning revolution remains centered on NVIDIA and their ecosystem of hardware and software products. Both of which have been integrated together to deliver the best experience and performance. Furthermore, the company’s close collaboration with AI researchers in the industry has helped make their products the de facto standard. This evolution reminds me of how Apple evolved the iPhone when it first came out. There was no app store and third party apps could not exist. Steve Jobs imagined it to be a phone with a few narrowly defined extra functions. The first iPhone sold well, but it didn’t become a phenomenon until Apple opened up the hardware to outside developers, turning it into a powerful computing platform that you can fit into your pocket.

The Power of Platforms

This evolution reminds me of how Apple evolved the iPhone when it first came out. There was no app store and third party apps could not exist. Steve Jobs imagined it to be a phone with a few narrowly defined extra functions. The first iPhone sold well, but it didn’t become a phenomenon until Apple opened up the hardware to outside developers, turning it into a powerful computing platform that you can fit into your pocket. Likewise, within video and its GPUs, it took longer than iPhones opening up. But the results have been just as ground-shaking, and video probably didn’t see that its newly generalized hardware would eventually power the latest AI revolution. But they didn’t really need to. That’s why a platform can be so powerful.

Get started for free