Disclaimer (Please read this before up- or downvoting): I'm reporting this early based on the limited info available to spread awareness and encourage more testing by qualified testers and analysis by qualified experts. That's why I've marked as a rumour even though it's not a rumour nor remotely close to settled facts. We're nowhere near knowing the true culprit(s) for Blackwell's performance inconsistency and this only neccessitates additional work and analysis so we can draw valid conclusions.
Not trying to push an agenda here or arm copers, just highlighting a potentially serious issue with the Blackwell's AMP logic unit that warrants investigation. If the AMP issue is real and can be fixed in software, then it'll require an army of NVIDIA software engineers to help rewrite application and game specific code and/or rework the NVIDIA driver stack.
Detailing The Performance Inconsistencies
Blackwell's overall performance consistency and application support is extremely lackluster compared RTX 40 series and resembles an Intel ARC launch more than a usual rock solid NVIDIA launch where relatively uniform uplifts are observed across the board and applications and every usually just works. Application performance sees the wildest inconsistencies but they do extend to gaming as well. The lackluster performance at 1080p and 1440p is an issue plaguing both the 5090 and 5080, that gets somewhat resolved at 4K.
With 50 series Delta Force and Counter-Strike 2 experience FPS regression from 4080S to 5080 as shown in Hardware Unboxed's 5080 review. In the same review TLOU and Spiderman manage atrocious 0% gains for 5080 vs 4080S at 1440p. The result of that is that when using upscalers the performance gains of 50 vs 40 series tank. And remember upscaling is almost mandatory for heavy RT games a key selling for NVIDIA graphics cards. But the worst example yet is probably TechPowerUp's Elden Ring ray tracing performance at 1080p where the 5080 trails even the 4070 Super, available
>here<. In TechPowerUp's review Elden Ring at native 1080p both 5090 and 5080 fail with 5-8 FPS regression vs the 4090 and 4080S.
But the most odd thing so far has been that RT performance uplift was consistently worse than raster uplift in nearly every single 5080 review. This is clearly shown in the TechPowerUp review, available >
here< where in the majority of games 5080 and 5090 saw larger penalties from turning on RT than equivalent 40 series cards.
Then there's the obvious lack of support for a ton of professional and AI applications where reviewers had to wait for an update adding Blackwell support, but that obviously didn't happen, not even a week later when the 5080 launched. IDK if this is just me but I don't recall this level of incompatability with professional applications for any of the previous launches (20-40 series), isn't it unprecedented for an NVIDIA generation?
And when applications work their performance is sometimes broken resulting in a 5090 loosing to even a RTX 4080. Just watch some of the professional workload centric reviews and you'll see how bad it is.
The most insane performance degradation I've seen is outside of professional workloads is the testing by Guru3D with 3D Mark ray tracing, available >
here<. When testing the 5080 in 3D Mark Hybrid ray tracing benchmark they observed a 21% lead over the 4080S. But when they ran the full path tracing benchmark the 5080 was now 31% slower than the 4080S and could only match a 4070 Super. The 5090 has the same issue although to a lesser degree showing a lead vs the 4090 of 45% (Hybrid RT) vs 24% (Full PT).
The Possible Culprit
Thanks to NVIDIA's Blackwell GPU Architecture Whitepaper we might have a likely culprit, although it's likely it's not the only one, although probably still the most significant. The new AI Management Processor is in fact much more than just an AI workload scheduler:
"The AI Management Processor (AMP) is a fully programmable context scheduler on the GPU designed to offload scheduling of GPU contexts from the system CPU. AMP enhances the scheduling of GPU contexts in Windows to more efficiently manage different workloads running on the GPU. A GPU context encapsulates all the state information the GPU needs to execute one or more tasks."
"The AI Management Processor is implemented using a dedicated RISC-V processor located at the front of the GPU pipeline, and it provides faster scheduling of GPU contexts with lower latency than prior CPU-driven methods. The Blackwell AMP scheduling architecture matches the Microsoft architectural model that describes a configurable scheduling core on the GPU through Windows Hardware-Accelerated GPU Scheduling (HAGS), introduced in Windows 10 (May 2020 Update)."
AMP is a direct on-die RISC-V CPU context scheduler with extremely low latency and high bandwidth access to the Gigathread Engine. It sits in front of Gigathread engine and offloads context scheduling from the CPU and taps into Hardware Accelerated GPU Scheduling (HAGS) supported in Windows 10 and 11. This tight cointegration is crucial for MFG, neural rendering and LLM integration into video games, and beneficial to multitasking, content creation, and existing gaming experiences. It doesn't just magically work as intended and requires a proper code implementation and can be a double edged sword (more on that later).
Doing the largest redesign of the GPU wide frontend (not GPC level) since Fermi introduced the Gigathread Engine in 2010 without significant game and/or driver code rewrites is asking for trouble. On 40 series and prior the CPU communicated directly with the Gigathread Engine. But on 50 series, assuming no code rewrites, the CPU has to communicate to the Gigathread Engine through the AMP which adds significant latency and scheduling overhead or either partially or completely breaks scheduling. This results in severe performance degradation without code rewrites as seen with Elden Ring RT and 3D Mark Path tracing. It's also not surprising that when implementing a change this profound some applications just straight up refuse to work.
There's a Twitter post by "Osvaldo Pinali Doederlein" on AMP, where people are discussing why AMP could be causing Blackwell's inconsistent performance and how big the performance impact of Blackwell's new CPU -> AMP -> Gigathread scheduler paradigm is without code rewrites. Psst it's likely -10 to -20%.
The 5090 also seems to have worsened memory and SRAM latencies and L1 bandwidth regression vs the 4090 as reported by "harukaze5719" on Twitter. This is unrelated but could in some instances explain some of the 5090’s performance degradation (vs mean uplift).
(Conclusion): It's too early pass definitively judgment on Blackwell's performance issues, what's causing them, how likely the issues are to be either software and/or hardware related, and if they can even be fixed. With that said there's clearly a lot wrong and the issue spans across many games and many different types of compute and AI applications.
Getting 5080 gaming uplifts anywhere from sub -20% (Elden ring RT 1080p) to +40% (CB2077 HW Canucks 5080 Review) is unprecendented for an NVIDIA launch. NVIDIA Blackwell has severe performance inconsistencies and is reminiscent of ARC Battlemage. This shouldn't be dismissed as node stagnation or something to be expected. No there's clearly something wrong a fundamental level, which could be either a hardware flaw, broken software or a combination of both.
Hopefully NVIDIA can iron out the issues and improve Blackwell's overall performance and consistency over time, and improve it enough to deter AMD from doing something really stupid with RDNA 4's pricing.