For those who have been following hardware news and reviews for a long time, the concept of testing a processor at low resolution is not something new, it has almost become a habit to see CPUs being tested at relatively lower resolutions when measuring their gaming performance.
We know why reviewers generally do it. They want to create a situation wherein the graphics card (GPU) is not the bottleneck such that we can measure how the CPU performs without the GPU holding it back. While this is simple to grasp on the surface, it is also difficult to figure out why this is the correct way to test.
The idea of writing this article occurred to me when Neowin colleague and friend Steven Parker and I were discussing how to benchmark the CPUs in games for the reviews he does. Steve said he did not really like benchmarking games at lower resolutions as running a high-end GPU at something like 1080p or below made little or no sense since it wasn't a realistic way to test such a system.
He jokingly quipped that running games at such a low resolution on a powerful card like a 4090 was sort of similar to running Cinebench single-threaded tests. As rendering is a multi-threaded workload, a Cinebench single-thread test is unrealistic; and since we don't assess Cinebench single-threaded outcomes in our reviews for that reason, Steve said we shouldn't be doing low-resolution game tests either as the same logic should stand.
This made me dwell on the idea of whether Steve had a good point, that perhaps such testing is simply too academic and does not serve much real-world usefulness.
It had me thinking for a long while. Sure I know we lower game resolution and sometimes graphics settings to unload the GPU in the case of CPU testing but figuring out whether it made any real-world sense outside of comparing some bar graphs was harder than I had initially fathomed.
In the next sections of the article, I break down why such testing is conducted and why it is the right way to test
Quick links:
- Why we shouldn't base our CPU testing on 'realistic' game settings
- Realistic CPU testing is useful and relevant
- Benchmarking 101!
- How low-resolution testing exposes a weak gaming CPU
- Proving this is indeed the right way to test a CPU in games
- More tests to confirm
- The takeaway
Why we shouldn't base CPU testing on realistic game settings
This, frankly speaking, stumped me for a while, because I had a hard time figuring out why Steve was wrong, even though I sort of intuitively understood the flaw in his reasoning. Finally, though, after much thought, I was ready to put the logic into words.
You see, when we set up to run a CPU benchmark in games choosing 'realistic' settings based on the particular system we have, we are setting ourselves up for inaccurate measurements. Let me explain using a couple of hypothetical situations!
Let's say we have a Ryzen 7 7800X3D that we are trying to benchmark, and we have a complementing RTX 4090 in our system. If we were to measure this setup based on realistic test settings, we would likely choose 4K as our test resolution, and at these settings the PC would give us a certain frame rate.
Now let's assume in the second situation that we instead have a Radeon RX 7800 XT, which is far less powerful than a 4090. So based on realistic settings, we would likely choose 1440p in this second scenario. The frame-rate output this time with a 7800 XT at 1440p would be different than what we had previously with the RTX 4090 at 4K.
Therefore, testing the same CPU at two different realistic settings will lead to two widely different scores even though we are measuring accurately. This means the flaw is not in our testing data but rather in the methodology itself.
This methodology issue gets even more obvious when you try and think about all the possible hardware combinations (CPU + GPU combos) that can be made with the Ryzen 7800X3D. It is nearly impossible to guesstimate the performance of the 7800X3D we can get out of all such systems based on data from realistic settings tests as every combo will have widely different results across the board.
Hence, while a reviewer could argue that they are only trying to test for their own gear, it would then hardly be representative of the 7800X3D's performance had the paired GPU been a different one.
Realistic CPU testing is useful and relevant
Of course, this does not mean that game data at higher resolutions like 4K or 1440p is not useful. That data in itself is meaningful to determine how capable a certain setup is at those particular resolutions. After all, most people with an RTX 4090, outside of eSports players and competitive gamers, are far more likely to play at 4K or ultrawide 1440p.
However, it is not an accurate way to gauge if that's what a CPU is best capable of.
Benchmarking 101!
First, it's important to understand the basics of benchmarking and the purpose it's meant to serve.
To put it simply, benchmarking hardware is to gauge and assess the maximum performance or throughput that the hardware is capable of outputting, and an essential criterion to meet is that the tested hardware itself is the only limiting factor in that assessment scenario and nothing else should be holding it back.
For example, if we are testing a graphics card, the GPU must be the limiting factor, and not the CPU or the PCIe interface; or if it's an SSD, that drive itself must be the limiting factor, not the SATA / M.2 interface or some other disk drive we are transferring data to and fro.
Likewise, when we test a processor to evaluate its gaming performance, that CPU must be what's limiting the performance in that game and not the GPU (or something else).
The analogy is essentially the same as bottlenecking, except when benchmarking, the goal is to intentionally bottleneck the component we test. Therefore, for CPU game testing, it has to be the CPU that is the bottleneck.
There are two simple ways to create a CPU bottleneck in a game:
1) Lowering the resolution and graphics settings (at least some):
Resolution is typically the most intense workload for a graphics processing unit (GPU). Pixel pumping hits both the core of the GPU as well as the memory (VRAM) the hardest. Hence, once the resolution is turned down sufficiently, we establish a CPU-bound situation.
Interestingly, and although it might seem counter-intutive at first, we try and keep certain graphics settings maxed since such settings as shadows, geometry/ tessellation, and ray tracing do have an impact on the CPU as well. Others like textures are turned down. The level of detail (LOD) is also lowered such that the GPU has to render fewer details at a distance.
2) Pairing with a powerful GPU, like an RTX 4090:
To further aid in making a CPU the bottleneck, top-tier graphics cards like the RTX 4090, a 7900 XTX, or a 4080 Super, are used. A mid-tier card like an RX 7700 XT, or an RTX 4060 Ti/ 4060/ 7600 XT, will have a generally hard time not becoming the bottleneck itself since games, in general, are more graphically demanding.
How low-resolution testing exposes a weak gaming CPU
I ran multiple game tests on my rig consisting of a Ryzen 7 5700G and an RX 6800 XT to demonstrate this. My system runs dual 16 Gig DDR4-3600 CL16 memory for 32GB in total. To simulate a less powerful CPU part, I disabled cores and converted my octa-core 16-thread 5700G into a dual-core quad-thread part, making it similar to an entry-level AMD Athlon or an Intel Pentium.
In terms of testing methodology, I measured two different situations.
- To create a GPU-bound instance, I dialed all the graphics settings including the resolution up to as far as possible (without over-flowing the VRAM buffer).
- On the other end, to create a CPU-bound scenario, I lowered the resolution and settings as far as the game would allow (without changing the 16:9 aspect ratio).
Some graphics settings like those related to shadows and geometry (tessellation) were kept maximized as these are known to be CPU-dependent. On the other hand, the object level of detail (LoD) was minimized to make the scenes were heavy on the processor as possible.
Shadow of the Tomb Raider
To become GPU-bound in 2018's Shadow of the Tomb Raider (SotTR), the game was run at 2880p with the Resolution Modifier (RM) maxed out for 100% resolution scaling. The cool thing about the Tomb Raider benchmark is that it tells us how GPU-bound the game is, and at 2880p I managed to be "100%" GPU bound.
You may notice that I have "RT shadows on" and "off" mentioned in the graphs. That's because I tested two scenarios: one where ray-traced shadows were used and the other where ray-traced shadows were disabled (the game automatically replaces it with the rasterized alternative). Ray tracing is also said to require some decent CPU grunt and this was done to account for that.
RT on
The performance of the simulated Athlon/Pentium (2c/4t) and the 5700G (8c/16t) are almost identical at 2880p. However, the octa-core is truly able to stretch its arms and legs once I lower the resolution to 634p (1128x634) and minimize the Resolution Modifier (RM), such that the benchmark became CPU-bound; the SotTR benchmark too confirmed this as it indicated the results as being "0% GPU bound."
At these CPU-bound settings (634p, RM min), the 5700G comes out to be around twice as fast as the simulated Athlon, clearly exposing the weakness of the dual core and the advantage of having four times more cores. This was not evident at the higher resolution where the game was GPU-bound.
The averages are nearly double here (145 vs 73), and the gaps between the 5% lows and minimums are even bigger.
RT off
In the next situation when the ray-traced shadows are replaced with the less CPU-demanding rasterized shadows, the dual-core manages to gain some ground on the octa-core, though once again, the performance difference is quite substantial in favour of the latter. I see that the eight-cored 5700G is able to easily outpace the dual-core part once I was at the CPU-bound settings. The 8c/16t is 65% faster than the 2c/4t in terms of averages. And once again, the gap widens in the case of 5% lows and minimums.
The SotTR charts above depict how high-resolution realistic settings can be misleading in the case of CPU benchmarking.
If one were to only look at the 2880p data in the above images, one would get the impression that a dual-core chip is just as capable as the octa-core for gaming. However, the low-res numbers reveal the full picture and demonstrate the actual performance gap between the two CPUs. The data shows the 8-core is capable of pushing up to an average of 152 and 145 frames per second (fps), meanwhile, the dual-core is limited to around 92 and 73 fps.
The extra frames on the CPU-bound octa-core 5700G do not appear out of thin air. The higher number of cores and threads on the 8c/16t 5700G helps it keep up with the 6800 XT as the 8-core is capable of processing the game logic, physics, and draw calls much faster than the 2-core can.
This difference is not evident in the GPU-bound situation as the graphics card is under a much heavier load leading to longer GPU processing time.
Hence, the weaker dual-core is able to catch a breather and completes processing the game logic, physics, and draw calls in around the same amount of time the octa-core does.
The Assassins' Creed Origins GPU-bound 3888p frametime chart below highlights this well as both the dual-core and octa-core report 33 ms of processing time for the GPU.
Proving this is indeed the right way to test a CPU in games
From 1128 x 634 (634p), I went a step further and lowered the pixel count to 800 x 600 and the results were identical to what I had got at 634p. Although this may appear surprising at first since lowering resolution typically leads to an increase in frame rate, the same does not happen here indicating that I had successfully bottlenecked the CPU and no further drops in resolution will lead to any performance improvement.
On the flip side too, adding a more powerful card in this situation won't add any more frames either as our 6800 XT is nowhere near maxed out with its usage hovering around a lowly 30-50%, meaning it still has a ton of headroom left.
So, even if I were to replace the 6800 XT with an RTX 4090 (the latter is twice as fast), the fps numbers of the two processors here (octa-core 5700G and the simulated dual-core 5700G) would not change as they won't be able to process the game at a rate faster than what they are already doing.
Hence if you think about it, this confirms that I have effectively maxed out the 5700G CPU's performance.
More tests to confirm
Assassin's Creed: Origins
Aside from SotTR, I also tested more games as examples of this study. Up next, I have Assassin's Creed: Origins, which, although a slightly older title (released in 2017), is known to be fairly CPU-heavy.
Like in SotTR, I again increased the resolution to as far as I could go in order to become fully GPU-bound. I managed to do so at 3888p (6912 x 3888). At the polar opposite end, I benchmarked at 360p (640 x 360) for a CPU-bound situation.
And it's a similar story here too. At the 360p settings, the dual-core part manages to go up to 50 fps, a 67% increase compared to the 30 fps it managed at GPU-bound 3888p settings. Unsurprisingly, the octa-core gains a whole lot more, almost 200%, or nearly three times, as it goes from 30 fps up to 89.
Final Fantasy XV
Up next, I have Final Fantasy XV (FF XV) and this is one of those games that seems to scale nearly perfectly with cores and threads.
Unfortunately, since we used the FF XV Windows Edition standalone benchmark, I was unable to go higher than 2160p (4K), which meant I wasn't able to create a situation where I became totally GPU-bound such that both the dual-core and octa-core would be putting out similar FPS figures.
Nonetheless, the numbers are still insightful as I see that the octa-core performs ~3.73 times the dual-core at the CPU-bound 720p resolution, which is the near-perfect scaling I talked about before. In contrast, the dual-core was only improved by 30% when I dropped from 2160p down to 720p.
It is interesting to note here that FFXV, despite being a DirectX 11 title, is so easily able to utilize the CPU resources as that's something typically associated with DirectX 12.
Far Cry 6
Finally, I have Far Cry 6 (FC 6), where we ran the game at 3024p (5376 x 3024). We can see in the numbers below that I wasn't completely GPU bound even at 3024p as the octa-core setup manages to render significantly more frames. However, going any higher was not possible as the game was allocating more than 16GB VRAM on my card (guess I needed a 4090 or a 7900 XTX/XT) to go higher.
In the total frames rendered metric, the dual-core goes up from 1651 to 3527 as I go down from 3024p to 360p, an increase of over two times. The octa-core meanwhile gets three times higher as it reaches 6277 frames up from 1999.
The outcome is fairly similar on FC6 as well. The average sees a bigger uplift at 360p on the octa-core (103 up from 59 or +74.6%) against
The minimums are, again, much more in favour of the more powerful octa-core. I see a 92.7% improvement on the 8c/16t at 360p (79 vs 41) as opposed to the 36.36% gain (30 vs 22) at the more GPU-bound 3024p. This indicates that the game, despite not being mult-threaded, does enjoy the benefits of those extra threads on the 8-core in some of the intense scenes where the dual-core is overwhelmed.
The takeaway
The numbers speak for themselves. In all the tested games, the lower resolutions do expose the real difference between a more powerful processor and a less powerful one. And the 5700G in its full octa-core glory was undoubtedly faster. As such, even if I were to pair something like a hypothetical future super-powerful Nvidia flagship like an RTX 9090 Ti Super with my 5700G, I can confidently say that 8-core will easily outperform the simulated 2-core.
Ultimately, the point of benchmarking a CPU should not just be about how well the chip performs today, or how well it does with a particular GPU, but also to gauge how powerful the CPU truly is which is important to estimate the potential of the processor in terms of future performance. And our test data above helps us understand this.
9 Comments - Add comment