What makes a fast CPU fast?

Years ago, it was possible to figure out how faster processor was going to be by just looking at its clock speed. Granted, there were a couple of other factors that determine the processor’s speed, such as wait state. For the most part though, you could be reasonably sure that a PC with an 8 MHz processor was going to run faster than a comparable PC with a 4.77 MHz processor.

As time went on though, a processor’s clock speed became a lot less important. It became possible for two processors to have the same clock speed, but to have entirely different Levels of performance. The earliest example I can think of was when the Pentium was first released. Prior to that point, the fastest PC available was the 486 DX4 100, which had a clock speed of 100 MHz. When the Pentium chip was introduced, it was first made available in 75 MHz and 90 MHz clock speeds. Eventually though, there was a 100 MHz version available. Although the Pentium and the 486 both ran at 100 MHz, the Pentium was a lot faster than the 486.

I know this is ancient history, but the same basic thing is going on today. Both Intel and AMD offer several different types of processors. These processors may run at similar clock speeds, but often perform very differently. My goal in writing this article is to help you to understand what the differences are in such processors, and how those differences ultimately affect performance.

Clock Speed

As I explained earlier, for years clock speed was the single most important factor in determining a processor’s performance. Although clock speed is not as important as a used to be, it is still relatively important. That being the case, I want to begin by explaining what clock speed is and how it affects a processor’s performance.

Clock speed, as it relates to a microprocessor is a complicated matter -so complicated in fact, that I could probably write an entire book on the subject. Since I don’t have that kind of space to work with, I will break down the concept is simply as I possibly can.

As I’m sure you know, computers work with binary code. Binary code is nothing more than a series of zeros and ones that represents the presence or absence of electricity at a given moment. The real key to a processor being able to work with binary code is that it has to be able to understand when one instruction ends and the next instruction begins. At first this probably doesn’t seem like any big deal. However, what would happen if the CPU were to receive several identical instructions in a row?

In a situation like this, the CPU would not be able to use changes in the voltage state to determine where one instruction ends and the next one begins. If the CPU were to only look at state changes, then a series of identical instructions would appear as a single instruction. This is where the clock comes into play.

Have you ever seen one of those antique manual calculators with a crank? On those calculators, you could type in a series of numbers and a mathematical operator sign, but no calculation would be performed until you pull the crank. Pulling the crank was a way of telling the calculator that you’re done entering numbers and are ready to enter the next number or to perform the calculation.

A clock inside a computer works in a similar way. Clock cycles are used to separate one instruction from another. To see how this works, let’s take a look at a very oversimplified example. Imagine for a moment that you had an eight bit processor that you needed to send a series of instructions to. In our overly simplified example, this processor would have nine wires. Eight of those wires represent individual bits. Since binary code is really just representation of whether or not power is on or off, we can represent an eight bit binary instruction by applying or not applying power to eight of the nine wires.

So we’ve sent an instruction to our imaginary processor by applying electricity (or not) on eight of the processor’s wires that represent eight bits of data. Now it’s time to pull the crank, so to speak. To do so, the computer would apply power to the processor’s ninth wire. This would let the computer know that the instruction is complete and that it can be acted on. The processor’s ninth wire, would be connected to the computer clock. Every time the clock ticks, power would be applied to this wire. If the clock ticked once per second, then the processor would be able to process one instruction per second. It isn’t that data can’t be sent to the processor any faster, but the clock has to keep everything in sync. Without the clock, the processor has no way of knowing where one instruction ends and the next one begins.

As I said earlier though, this example is grossly oversimplified. Today’s processors can handle 64-bit instruction sets, and clock speeds are measured in gigahertz. A gigahert is roughly equivalent to a billion clock cycles per second.

Earlier I mentioned that clock speeds are not as important today as they once were. Today, clock speeds represent the number of instruction sets that can be processed each second, just as has always been the case.

For many years Intel and AMD were in the constant race with each other to produce the CPUs with the highest clock speeds. The problem is that as clock speeds began to approach 4 GHz., heat dissipation became a serious issue. Generally speaking, the higher the clock speed, the more heat the processor is going to produce. Currently, the lack of a low-cost heat dissipation mechanism is preventing Intel and AMD from mass marketing CPUs with higher clock speeds. This seems to have put an end to the clock speed wars have gone on for so long.

Rather than focusing on higher clock speeds, both Intel and AMD have turned their focus toward doing more with existing clock cycles. The reason why clock speeds are less important than they once were is because newer processor designs can allow multiple instruction sets to be processed with each clock cycle. For example, a 2 GHz processor that processes two instruction sets per clock cycle could potentially be faster than a 3 GHz processor that processes one instruction set per clock cycle. I say could potentially be, because it really depends on the chip architecture and on the operating system and applications that are being run.

Processing multiple instruction sets per clock cycle almost always means running parallel threads. Unless you are running multithreaded application on an operating system that is designed to simultaneously run multiple threads, you will see no real benefit processing multiple instruction sets in a single clock cycle. I will talk about multiple threads in a lot more detail later on in another article.

Caching

Another factor that greatly affects a processor’s performance is the processor’s cache. The processor’s cache has absolutely nothing to do with the speed at which the processor runs (at least not in terms of clock speed). However, it greatly affects the system’s overall performance.

The basic idea behind the CPU cache is that the cache consists of a small amount of very expensive, but very fast memory. This memory is integrated onto the CPU. It is much faster for the CPU to access frequently used data from the CPU cache than it is to retrieve the data from the system’s RAM.

Probably the best analogy that I’ve heard to describe how a CPU cache works is to compare it to the way that a library works. As you know, in a library there are rows and rows of shelves filled with books. Let’s pretend that the shelves represent system RAM, and that the books represent data. Another thing that libraries have is a librarian. For the purpose of this example, let’s pretend that there is only one librarian in this library. This library and will represent the CPU (if there were multiple librarians, they could represent multiple CPUs).

Now, let’s pretend that someone comes into the library and requests a specific book. The librarian has to get up and walk across the library to the shelves containing the books. From there, the librarian has to look through the shelves to locate the book that the person is asking for. Once the librarian finds the requested book, the librarian has to travel back to the front desk where the person is waiting for it.

This is similar to how a CPU without a cache works. When an application requests data, the CPU has to retrieve data from across the memory bus from the system’s RAM. In computer terms, the memory bus is relatively slow. It therefore takes time for the CPU to retrieve the necessary data. Because the current instruction is dependent on the data that is being retrieved, the CPU is powerless to do anything but wait for the necessary data. Having to wait for data to be retrieved is not exactly conducive to fast processing.

Now let’s see how the situation can be improved by using a CPU cache. Going back to our library example, let’s pretend that the CPU cache is a small shelf right next to the librarian’s chair. When the person comes in and requests a specific book, the librarian still has to get up and walk across the library to the shelves, hunt for the book, and bring it back to the front desk. This time though when the person is finally done with the book, the librarian places the book on the small shelf next to their desk, rather than returning the book to its place on the library shelves.

Now let’s pretend that another person comes into the library and requests a certain book. If they happen to request a different book than what the first person requested, then the librarian will once again have to get up and go hunting for a book. Let’s say though that the books that the first person requested was something really popular, and that the second person who came into a library requested the same book. Since the librarian placed the book are small shelf next to their desk, the librarian can just pick up the book and hand at the person who ask for it. There is no need for the librarian to go hunting through the shelves for the book.

This is basically how a cache works. The CPU works under the assumption that if data has been requested once, it will probably eventually be requested again. Therefore, after data has been initially retrieved, it is placed into the CPU’s cache in case it is needed again.

Of course a cache isn’t a perfect solution. Eventually the cache is going to fill up. As I said before, the cache is extremely small in comparison to the system RAM. If the cache does fill up though, then some of the data is going to have to be removed from the cache to make room for new data.

What I have just described is a single Level cache. Most modern CPUs actually use a two Level cache though. The Level 1 cache is the small, extremely fast, but extremely expensive memory that I just talked about. The Level 2 cache also exists on the CPU, but it’s a little bit slower and less expensive than the memory used for the Level 1 cache. Because the memory used for the Level 2 cache is less expensive than the memory used for the Level 1 cache, the Level 2 cache tends to be a little bit larger in size than the Level 1 cache.

To see how this multiLevel cache works, let’s go back to our library example. The first two people that I talked about walked into the library in both requested a very popular book. Let’s pretend though that over the course of the day a fairly large number of people visited the library, and requested all varieties of books. Because the librarian is placing all of the books that have been requested on a small shelf next to the desk, the shelf fills up. This is the equivalent to the Level 1 cache filling up.

Just because the library has had a sudden spike in activity, does it make the book that the first two people requested any less popular? Of course not. That being the case, it probably doesn’t make much sense for the librarian to return the book to the shelves. The librarian has to take the book off of the small shelf next to the desk to make room for other books that are being requested. Rather than return the popular book to its shelf though, the librarian places the book onto another small shelf that’s just a little bit further from the desk. In computer terms, this secondary shelf would represent the Level 2 cache.

Now let’s pretend that later in the afternoon someone walks into the library and requests the popular book. The librarian first checks the small shelf next to the desk. Since the book is not there, the library and checks the slightly larger shelf that’s a little bit further from the desk, and finds the book. It took the librarian a little bit longer to find the book than it would have if the book had been on the small shelf next to the desk. Even so, the process of retrieving the book still took way less time than it would have if the librarian had to pull the book from its place in the stacks.

As you can see, the Level 2 cache is a temporary repository for data that has not been requested recently, but that could be requested in the near future. The Level 2 cache will eventually fill up, and when it does some data will have to be removed from the cache. Even so, having a Level 1 and Level 2 cache greatly improves a CPU’s efficiency.

A lot of factors go into speed

As you can see, two of the things that tend to have the most impact on a CPUs performance are its clock speed and the sizes of its Level 1 and Level 2 cache.

This entry was posted in Technical Archive. Bookmark the permalink.