Performance fact: performance in every area (CPU, disk, video) depends on quick memory access.
Performance fact: changes to things other than the cause of your system bottleneck are a wasted effort.
In the days of older PC systems, you never knew what the bottleneck was going to be. Usually, it was either the processor not going fast enough or the disk drives not keeping up. Intel has a paper called "Performance Factors in a Computer" sitting on their home page. It claims that the processor choice is responsible for 54% of the "speed" of a computer running Windows, while 25% is attributable to the memory (video gets 12% and the disk 9%). This is just plain wrong for today's computers (Intel's test system was a Pentium 60 with 8MB of RAM, totally obsolete at this point). Nowadays, processors are far faster than they used to be. And raw disk speed isn't as much of a factor. Everything is a bit different because of the wide-spread use of caching.
Most PC users are familiar with disk caching programs like SMARTDRV. The increased disk usage of programs like Windows made using a disk cache mandatory if you wanted your system to work well. What isn't stressed well enough is that today's processors are so fast that they too would be crippled if it weren't for caching.
Intel has been addressing this problem as their CPU lines progress. Go back a bit to when a 386 running at 16Mhz was a speedy system. Typical FPM DRAM was perfectly capable of keeping up with the memory bandwidth demands of this processor. When the 486 was introduced, it included an 8KB cache inside the CPU itself because that chip could easily outrun the memory under some circumstances. That way, instructions that had been executed recently, like when your computer was in a loop, would run at full CPU speed without needing to go back to the slower memory external to the CPU.
Now, when we move onto Pentium class machines, things get a whole lot uglier. If you've got a CPU running at 100Mhz, that's an instruction clock cycle every 10ns. Even worse, the Pentium design tries to execute multiple instructions at once, so it chews through instructions even faster than that. Obviously, regular memory is nowhere near fast enough to keep up. The 8KB code cache on the chip itself helps (there's also a 8KB data cache), but as programs have gotten bigger over the last few years, it isn't as effective at holding much. Because of this, all good Pentium motherboards include a level 2 cache (the cache inside the CPU itself is the level 1 cache). Typically, the L2 cache is 256KB or 512KB, and runs at 20ns or less. That's still not nearly fast enough to keep the CPU running constantly, but combined with the L1 cache it's an acceptable solution.
The idea here is to give you an idea of the magnitude of the problems. Fast Pentium chips chew through memory very quickly, and if you're throwing around a lot of data, you are going to be at the mercy of the memory subsystem in your computer. It doesn't matter how fast your CPU is if it doesn't have data to work with.
I stated that memory access speed was the first and most important thing to optimize back at the beginning, and hopefully you see now why that's so. The next thing to get into is how to tell just how fast your memory is working at, and what types of designs might be faster.
The easiest program to measure memory speed is Wintune 95 from Windows magazine. It's small, you can download a copy from the net or get it on their CD (it shows up on the magazine rack). More information on obtaining one to use was at the end of the benchmarks article. Grab a copy, get Windows 95 or Windows NT running on your system, and you can get a very nice display that shows how the memory system on your computer is working (there's a section below describing how to get similar information for DOS users).
After you run the analyze program, switch to the Chart tab and look at Memory Write Performance. This is the first thing to check, because it's usually the big item that lets you distinguish the good motherboards from the bad. Any motherboard build with one of Intel's modern Triton-style chipsets should get about 84MB/s writing to memory no matter what size block is used. Older motherboards based on the Neptune era chipsets used to write in the 30-40MB/s range. If you're not getting somewhere near 84MB/s, your system isn't writing memory fast enough, and it's slowing everything down significantly.
Now, switch over the Memory Read Performance. Note how performance drops as the size of the memory block accessed goes up. See where the big jumps are? They should match up with the cache sizing on your system. You should be getting upwards of 500MB/s on the 4K and 8K blocks, because they are sitting in the CPU's internal level 1 cache. The blocks from 16K up to the size of your level 2 cache should get somewhere around 180MB/s. Transfers bigger than the L2 cache need to go back to the actual DRAM itself, and this access typically happens at around 90MB/s. Look at those numbers. The internal CPU cache is well over five times as fast (maybe even close to ten times as fast) as the external memory access is. Hopefully, the L2 cache sits between these two in performance. If your motherboard doesn't implement the cache well, you should see that here. Unlike the write performance, the read performance greatly depends on the speed of the CPU itself. The level 1 cache inside the CPU itself is what's being tested with the smaller blocks, and that speed is totally dependent on how fast that CPU runs at. There's also a copy statistic, but it's not all that useful; you can pretty much predict what it's going to be by looking at the read and write speeds, having them summarized into one figure blurs the things you want to know.
If you switch to the tab for Memory I/O performance, you'll find a summary of the characteristics of your system. What's good and what's bad? Two ways to tell. You can move back to Wintune's database and compare your machine to others in the same class as your own. The danger with that approach is that not every one of those machines is necessarily a good performer, so you may very well be comparing your system with one that's a dog. Well, since I know what's good and bad, here's a little table I've created to summarize the average characteristics of good performing motherboard with Intel Triton-class chipsets running Intel Pentium processors:
| Clock | Chip MHz | Read | Write | Copy MB/s |
| 50 | 75 | 120 | 61 | 37 |
| 66 | 100 | 177 | 83 | 54 |
| 66 | 133 | 235 | 83 | 59 |
| 60 | 150 | 243 | 75 | 53 |
| 66 | 166 | 273 | 83 | 60 |
| 66 | 200 | 310 | 84 | 61 |
One thing to notice here is that I'm including the bus clock speed in addition to the processor speed. The motherboard has a clock it runs at, typically 50, 60, or 66Mhz for modern Pentium designs. The CPU itself uses a multiplier that gets it to execute multiple cycles for every stroke of the bus clock. Notice that memory write speed is very much proportional to that external bus speed. If you slow down the bus that clocks the access to external memory, obviously you're not going to be able to write to it as quickly. Because of this, you can see that a Pentium 150 is in most aspects slower than a Pentium 133. Similar reasons make a P120 slower than a P100. You want the fastest bus speed possible, because you take a hit on both writing and reading if it's slower.
Read speed is almost directly proportional to CPU speed. The slight deviation from linear is because the external memory speed is also factored into this calculation, in the form of the speeds for the blocks larger than the L2 cache. These numbers tend to be fairly consistent even with different motherboards. Because the average speed of the reading is swamped by the 4K and 8K results (which are CPU based), the rest of the memory subsystem doesn't quite impact this as much as it does writing.
Since the first Triton release (which is the 430FX chipset), Intel has produced two more chipsets in the Triton series. During that time, the company decided to switch to numeric coding instead of continuing to use mythological names. The 430HX chipset (sometimes called the Triton 2 by motherboard makers) provides somewhat enhanced performance by streamlining the entire memory subsystem. It's aimed more at the corporate market, and usually is expandable to high amounts of memory (typically 512MB). The latest 430VX chipset (sometimes called the Triton 3) is aimed more at the home market. The enhancements on it are supposed to improve multimedia performance, at the expense of memory capacity (typically these designs only support 128MB) and a bit of performance with EDO memory. Both newer chipsets have support for the Universal Serial Bus (USB), and in some motherboard designs the 430VX lets you use the new SDRAM style memory which improves considerably on older DRAM parts. Performance on both these chipsets is somewhat better than the older Triton chipset, hovering around the noticeable but not especially significant category. An across the board boost in performance of about 5% seems to be typical.
You know, Intel isn't the only company making motherboard components, although it does seem that way sometimes. Recently in particular, some of Intel's competitors have been releasing systems that are very competitive from a performance standpoint.
Cyrix has released a series of Pentium compatible CPUs that claim to have better performance relative to their clock speed than Intel's chips do. The media has been somewhat confused as to why that is; after all, many of the traditional benchmarks show that the chip performs slower. Nonetheless, on real application benchmarks, Cyrix does very well. If you look at a Cyrix 686 system with Wintune, bearing in mind the discussion here, it's obvious why this is. The entire memory write performance is considerably higher than comparable Intel chips. In particular, the caching inside the chip itself works with writes as well as reads, so that the Cyrix chip can write 4K blocks over 3 times as fast as Pentium chips do (this is approximately half the performance of the Pentium Pro in that category). Simply by improving the whole CPU to L2 cache interface, the entire system runs considerably faster, even though everything else about the chip (like reading memory and floating point) is slower than a typical Pentium. Don't be fooled by claims that the only reason the Cyrix systems are faster are because they tend to use a better video card or disk system than the Pentium systems they were compared with. The real reason is the write performance, which as we've already discussed is critical to making video and disks work well.
I can't say I recommend Cyrix chips overall. I'm less than completely convinced that total compatibility is there (read the Cyrix 6x86 Guide for information about problems with NT 4.0). Plus, the trick of using the superior write cache doesn't quite make up in my mind for the inferior integer and floating point speed, especially considering how complicated many of things I do are in those areas (Quake comes to mind as something you'll find Cyrix owners complaining about, because of the poor floating point).
Another company that has been keeping competitive with Intel is VIA. Their Apollo chipsets are a competitor to the Intel Triton series, and in many performance aspects can be superior. The major supplier of motherboards based on VIA's chips is FIC, in case you want to go looking for more information.
If you run through Wintune, and the results you're getting are significantly lower than the ones in the table I give, you should consider upgrading your motherboard and possibly your CPU. A motherboard based on either the 430HX or 430VX will be an extreme boost in performance for your system. If you've already got a Triton class motherboard, the later chips aren't quite worth upgrading to in my mind. I'd wait for the next generation of motherboard (possible optimized for things like MMX technology) before spending the cash for what is only a marginal improvement. If you already have a Triton era motherboard, and your results are dismal, make sure you actually have L2 cache, that it's enabled, and that the rest of the settings in your BIOS are configured for good performance. This can become its own adventure.
If you have anything less than a Pentium 100, now is the time to upgrade, along with that new motherboard. Pentium 100s are the price/performance leader. You get the benefits of the fastest normal clock speed around (66Mhz), and the chips are very cheap at the moment. A P133 is still a good value, with a performance increase almost proportional to the additional cost. P166 and P200 are just too expensive right now to justify for most buyers, when MMX chips are just around the corner. I recommend not going over the 133 unless you're really hurting for a performance boost. The price gouging that you'll see for the faster chips is just not worth it right now, when the performance increase is so small for real applications (the faster chip often just spends more of its time waiting for memory access).
Make sure you've got a modern motherboard, and that whatever CPU you use is using the maximum bus speed possible (even if that means you have to drop the actual CPU speed down a notch). That's the advice I give to those unsatisfied with their current performance. Real performance enthusiasts will need to approach things differently, but that's a topic for another time.