AMD’s K6-2 CPU’s were a pretty big deal in 1998. The CPU’s was still going to use the Socket 7 platform with no on-die/on-package L2 cache and depend on the motherboard based cache, but AMD was aware that their new CPU’s wouldn’t cut it when it comes to performance using the same ole Socket 7 standard vs the competition. The Pentium II was flexing very well and in part do to their better L2 (Level 2) Cache setup which ran 1/2 the Core’s clock speed. Hence, even the lower clocked 266 MHz P II had it’s L2 Cache running at 133 MHz which is much higher than the standard Socket 7 motherboard cache. And the 2nd generation Celeron (Mendocino) in August ’98 was going to bring strong performance at a lower price and likely be more of a threat to AMD since the cost of the Pentium 2’s were on the high side.
The Socket 7 and Super Socket 7 Platform
The standard Socket 7 platform has a Front Size Bus (FSB) speed of 66 MHz which means that the North Bridge, Memory Controller, Ram, and Level 2 Cache also runs at 66 Mhz. While the Celeron’s FSB was 66 MHz as well, the new Celeron was going to have 128 KB of on-die L2 cache running at the CPU’s clock speed. So each tick the Celeron’s core clock increased, the L2 cache clock increased and thus better performance scaling. Of course the K6-2 was still going to rely on an external Level 2 Cache that resided on the motherboard like every other Socket 7 CPU before it. But unless AMD made a change, no matter how high the K6-2’s clock speed increased the L2 cache was going to stay at the same, very slow 66 MHz causing the CPU to scale and perform worse than it should. So a big bottleneck was clear. The performance of the L2 Cache and by extension the Memory controller needed a boost. Since it wasn’t currently within AMD’s timeframe to add some kind of L2 cache on the CPU die they did the next best thing. They came out with the Super7 initiative that increased the Front Side Bus from 66 MHz to 100 MHz which ended up being a very solid 50% increase. This in turn allows the Northbridge, Memory, and the very important Level 2 Cache to run 50% faster.
While there was other criteria like AGP support included with AMD’s Super7 standard, it is generally the official 100 MHz FSB support that most people see in Super7 and that will be the focus. SDRam was usually seen as a requirement do to being able to run at 100 MHz reliably compared to EDO ram.
The Overview
This little review will use a pre-CXT* core 333 MHz K6-2 CPU also referred to as K6-3D. It will run under a few different speed configurations. The common 5 x 66 MHz bus, 3.5 x 100 MHz bus, and overclocked 3 x 112 MHz bus settings are used. The CPU is unfortunately a bad overclocker and didn’t seem to like the highest 5.5 x 66 MHz settings which would give a top clock speed of 366 Mhz. I had a 350 MHz CXT core CPU that would clock easily in the 400+ MHz range but I accidentally fried it.
*The CXT core improved the earlier core in a few area’s. The more notable changes are remapping the 2x multiplier to a 6x multiplier and improving the handling of memory by expanding the Write Allocation limit up to 4GB and an 8 Byte Write Merge buffer for Write Combining to reduce utilization and stalls. The performance improvement is usually in the 5% and lower clock for clock range. I want to do some testing on the performance gains from the CXT core in a future article though to get a better idea.
The Test System and Settings
All bios settings were at their best configuration possible. The desktop resolution is 1024×768 using 32-bit color. The SDRam used Turbo timings, Cas 2, 4-Way Interleaving, etc. The system is running the AGP bus at 1x speeds and uses a 7200 RPM IDE hard drive with a CDRom and Floppy drive attached.
I’m using an NVidia 6200 256 MB AGP video card that is probably not the best for Socket 7. I’ve had various issues, apparently compatibility but was able to perform most of the benchmarks I intended to do. This card may not be the best video card to pair with Socket 7 for performance in older games either. But it was one of the first I found in my stash that booted and worked without many problems overall. I did eventually find other cards that may be a better match for Socket 7 (like an Nvidia GeForce 2 MX), but for this review I stayed with the 6200 since I was already well into the benchmarking.
Hardware
- CPU: AMD K6-2 (.25 Micron Pre-CXT Core), 333 MHz Stock
- Motherboard: FIC VA-503+ v1.2a, JE4330 Bios, 1 MB Onboard Cache (256 MB Ram Cacheable), VIA MVP3 Chipset, AGP 1.0
- BFG GeForce 6200 256 MB AGP 4/8x, BFGR62256OC
- 1x 128 MB Infineon PC100 SDRam Cas 2-3-2-0, Double Sided
- No Sound Card
- Hitachi Deskstar 164 GB 7200 RPM 8MB Cache ATA133 (ATA33 Mode) IDE Hard Drive
- Realtek 10/100 PCI Ethernet Card
- Basic 300 Watt ATX Power Supply
Software and Settings
- Operating System: Windows 98 SE with 3.56 Unofficial Service Pack
- Desktop Resolution: 1024×768 using 32 Bit Color
- Chipset Drivers: Via 4 in 1 v4.35
- Video Card Drivers: nVidia Forceware 81.98
- Web Browsers: Firefox 2.0 and Internet Explorer 6.0
Benchmark Software
- Sandra 2004 Standard – CPU, Multimedia, Memory, and Cache Tests
- PCMark 2002
- Fritz Chess Bench 4.3
- 7-Zip 9.20
- Quake 3 Arena Demo
- Unreal Tournament with Official Patches
- Expendable Demo
DOS Benchmark Pack (PhilsComputerLab) – I used a hand full of pre-set benchmark options from this pack. They all were ran from the Windows 98 ecosystem.
- 3DBench 1.0c
- PC Player 320×200 8bpp
- PC Player 640×480 8bpp
- Doom 1 (Min Details)
- Doom 1 (Max Details)
- Quake 1 – Default Timedemo
- Quake 1 (360×480)
- Quake 1 – (640×480)
Application and Synthetic Performance
Synthetic benchmarks are usually best case, highly optimized usage scenarios. But they do provide a good best case opportunity for the different hardware configurations used in testing.
Sandra 2004 Standard Edition
The first result shown is the synthetic CPU Arithmetic test.
In the Arithmetic test core clock speed is most important. The code does not rely much on the secondary cache or memory subsystem so the higher overall clock speed shows the best results for both ALU and Floating Point operations.
The Multimedia test uses optimized MMX SIMD code for Integer operations and 3DNow! for the Floating-Point calculations. Similar to the Arithmetic results the code does not rely on the cache/memory subsystem so raw clock speed again has the best results.
This is an unbuffered memory bandwidth test. As expected when the code reaches out, the higher the front side bus and memory clock speed the higher the theoretical amount of data you can move. The gains are nearly linear. Each percent you raise the speed of the bus and memory, the % increase in MB/s of data being moved.
The numbers listed for the Cache/Memory result are the combined score’s. Remember the onboard cache and memory is running at the same clock speed in every test. Going from original Socket 7 to Super 7 shows an increase of 30%. The Filesystem test at 112 MHz shows that the drive is faster when the PCI bus is running out of spec at 37 MHz.
This and the memory test before shows that the possibility for real performance increases are there. Lets see how things look in more real world-like scenarios.
Content Creation Winstone 2000 Version 1.0
This was a very popular, high-end test suite in the years around 2000. This suite simulates many various tasks like Adobe Photoshop, Premier, Encoding, Web Browsing, and much more.
I decided to add Winstone later after I had taken out the original 333 MHz CPU used in every other test except for a couple more tests below which are also mentioned as using the newer CPU. I had a 450 MHz K6-2 installed I had just gotten and was testing. Instead of taking out the 450 MHz CPU I simply adjusted the FSB/Multiplier settings used for this review. Even though the new one is the CXT Core, Write Combining didn’t look to be enabled and the overall performance and scaling would be similar between the two either way.
The change to the Super7 FSB brings about a solid gain in performance. Over 20% more performance with the FSB increase. These are gains in what was considered some of the most popular programs back then. Applications like Photoshop and Premier were very popular productive tools and would readily welcome any extra performance. The change to 112 MHz FSB still shows almost a 2% gain despite the 100 MHz setup having over 4% more raw CPU clock speed.
Winbench 99 v1.1
Winbench 99 is a suite of tools that calculates CPU and FPU performance and measures Business and High-End hard disk and graphics based performance. This is also a bench I used a different CPU like above but set every proper setting for accurate testing.
The gains in this suite can be quite strong. The CPU and FPU performance is influenced more by raw processor clock speed and doesn’t look to rely on the FSB or memory subsystem much. But running the Disk and Graphics suite shows increases going from standard Socket7 to Super7. I’m not sure how they feed their Disk bench but the increases are 8.3% for their Business test and 13.5% for the High-End test. The Graphics bench show very large increases of almost 30% with Business graphics and 27% for High-End. Very impressive increases and the 112 MHz bus continues to add more.
PC Mark 2002
This is a multi-test benchmark application that mimics real world usages like number crunching, media playback, encoding, etc. The chart below includes the CPU, Memory, and HDD scores.
These results again shows that real world scenarios can see real increases with the FSB changes. The CPU result from the 66 MHz FSB to 100 MHz FSB gives a solid 11.7% increase. The memory intensive scenarios saw a huge increase of 35%. While the overclocked FSB of 112 MHz didn’t help with the CPU score, running the PCI bus out of spec (37.3 MHz instead of 33.3) gave the HDD a boost of about 7.8%.
Fritz Chess Benchmark 4.3
Fritz Chess Benchmark is based on the engine used in Fritz Chess. This benchmark stresses the CPU’s Integer capabilities and calculates the number of nodes per second or number of positions the CPU can push the engine to generate and evaluate each second. As a reference, the program says that a Pentium 3 256KB L2 @ 1 GHz calculates 480 Node per Second.
Going from the original Socket 7 66 MHz FSB to the Super 7 100 MHz bus gives improved results even though the CPU’s overall clock speed are up to 5% of each other. A solid 11% increase in performance.. After the initial increase in FSB and Memory clock speed, the 12% increase from the 112 MHz FSB does nothing to increase the score.
7-Zip 9.20
7-Zip is a very popular File Compression/Decompression utility. It is likely one of the most popular Zip style utilities these days. I used a 16 MB Dictionary size. This is a test I forgot to run originally. As mentioned in a couple other areas I had taken out the original 333 MHz CPU and had a new 450 MHz K6-2 installed I had just gotten and was testing. Instead of taking out the 450 MHz CPU I simply adjusted the FSB/Multiplier settings used for this review and benched it instead of the older CPU. Even though the new one is the CXT Core Write Combining didn’t look to be enabled and the overall performance scaling would be similar between the two either way.
The Compression performance is very low. No matter what the FSB speed is, the performance does not want to scale. Even with the slight Core clock speed advantage of 350 MHz over 333 MHz there is no change. Once the Decompression code kicks in there becomes a separation from 66 MHz FSB and 100 MHz FSB. The 50% higher bus speed helps to give a notable 8.8% increase in real world performance.
3DBench 1.0c (DOS)
This is an old DOS based benchmark utility that will draw and render a 3D scene and calculates the frames per second.
The results of the 100 MHz FSB with this test is a 6.3% increase in performance over the 66 MHz FSB. Interestingly, raising the bus by just 12 MHz more gives another 3% increase in performance. Perhaps running all of the buses like PCI /AGP 12% higher also played a role in the increase over the 100 MHz bus speed. On the motherboard used for this test the 100 MHz FSB runs the PCI and other buses within spec. But the increase to 112 MHz causes the other buses like PCI to run at 37 MHz instead of 33 MHz and AGP to run at 74 MHz instead of 66 Mhz..
PC Player Benchmark (DOS)
This program is similar to the previous 3DBench. The program will draw, render, and animate a 3D scene to give the score in frames per second.
The results in the 320×200 and 640×480 resolutions from Standard 7 to Super 7 show an impressive 17% and 20% increase in performance. You continue to gain as you overclock the FSB another 12 Mhz. The increases are about 1.8% and 6.5% respectively.
Gaming Test
Doom 1 (Dos)
This Game is based on DOS and used the default options specified in the DOS Benchmark Pack. The games built-in bench returns the results in Realtics. To get the frames per second you divide the number 76490 by NumberOfRealtics. So if you have 250 realtics, you take 76490 and / (divide) by 250 which equals 305.96 frames per second.
Gaming is where performance increases would be readily expected and those increases are there. The Minimum settings test shows an almost 10% increase going from 66 MHz FSB to 100 Mhz. Going another 12 MHz on the FSB nets over a 4% increase. The Max settings shows a 7.3% increase from 66 to 100 Mhz. Adding 12 MHz FSB the gains are 5.4%. The big increase over 100 MHz is likely do to the other busses and devices being overclocked by 12%.
Quake 1 (Dos OpenGL)
This bench tested all 3 of the pre-configured Quake options in the DOS Benchmark Pack.
Another solid increase for gaming. The 360×480 resolution saw a 17% increase, The 640×480 resolution saw almost 21% and the final configuration looks to possibly offload certain functions to the video card and made the game run and look much better overall (If I am remembering correctly). A very solid 17% in performance going to Super 7 FSB speeds. There wasn’t much to be had by the additional 12 MHz on the FSB.
Quake 3 Arena (Dos OpenGL)
A very popular game in it’s day. It was a very common benchmark for testing CPU and GPU hardware. Everything was set to Default except Lighting was set to Vertex (GPU). The resolution was 640×480 and 16-bit Color Depth.
As the numbers show, another big increase do to the FSB settings. In fact, a very large 28.2% increase in performance changing the FSB from 66 MHz to 100 Mhz. Another 12 MHz on the FSB nets another 4.8% increase.
Expendable (Direct3D)
This is a game that was considered to be pretty brutal on CPU’s in it’s day. It uses Direct X 6 and has 3DNow! support. I used Default settings. I did try different resolutions with this game but the performance at 640×480 16-bit was similar to 1024×768 32-Bit with the video card I used showing a truly CPU limited scenario. So I decided to just use the 1024×768 32-Bit Color setting with everything else on default.
And the good trend continues. The gain in Min FPS is 12.5%, Max is 25%, and Average is about 18.8%. Very notable gains in a game that pushed the CPU’s of the time very well. The Minimum FPS stayed the same when going the extra 12 MHz to 112 MHz FSB, but Max gained another 5.8% and Average got 5.7%.
Unreal Tournament (Direct3D)
A very popular game that I remember playing online many, many times. The engine uses Microsoft Direct X 7 and has 3DNow! support. This test is using what’s called the Flyby scene I believe. Everything was set to default with a 640×480 Resolution and 16-Bit Color Depth.
This is the final test and continues to show very solid gains. Going from original Socket 7 to a Super7 FSB in Minimum FPS showed an 18.6% gain, Maximum gained 25.7% and Average got 22%. These are the gains that turns a subpar gaming experience into a playable, more enjoyable scenario. If your motherboard supports and handles even more FSB frequency over 100 MHz, this test shows an extra 12 MHz will net you another 7% Minimum, 3% Maximum, and 4.5% increases in Average FPS.
Conclusion
AMD bringing the Super7 standard obviously made their K6-2 CPU’s so much more performant and desirable than they otherwise would have been upon release. Many tests, especially in the gaming world had very solid gains raising the motherboard’s standard Socket 7 FSB of 66 MHz to Super7 speeds of 100 Mhz. In fact, the gains can be noticeable and make the difference between a game being unplayable to an overall playable experience. There were very solid 20%+ increases in performance multiple times. In every case the biggest difference between CPU Core clock speed was only 5%. So the overall gains really were because of the increase in Front Side Bus clock speed. The performance to price ratio are what made the K6-2 such a popular and successful CPU. You could usually save hundreds of dollars in cash and get performance not far from the Pentium 2 CPU’s of the time in multiple area’s. The Socket 7 platform with the Super7 K6-2 would make for a solid, high performing retro machine.