AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism Than Maxwell

Since the release of Ashes of the Singularity, a lot of controversy surrounded AMD’s spectacular results over NVIDIA’s underwhelming ones. Was this DX12 benchmark gimped in order to run faster on AMD’s hardware? Apparently not as‘s member ‘Mahigan’ shed some light on why there are so dramatic differences between AMD’s and NVIDIA’s results.

What’s also interesting here is that Mahigan has provided a number of slides to back up his claims (which is precisely why we believe this explanation is legit).

As Mahigan pointed out, Maxwell’s Asychronous Thread Warp can queue up 31 Compute tasks and 1 Graphic task, whereas AMD’s GCN 1.1/1.2 is composed of 8 Asynchronous Compute Engines (each able to queue 8 Compute tasks for a total of 64 coupled) with 1 Graphic task by the Graphic Command Processor.

This basically means that in terms of parallelism, GCN GPUs should be able to surpass their direct Maxwell rivals, something we’ve been witnessing in the Ashes of the Singularity benchmark.

It’s been known that under DX11, NVIDIA has provided better results than its rival. And according to Mahigan, this is mainly because NVIDIA’s graphics cards can handle better Serial Scheduling rather than Parallel Scheduling.

“nVIDIA, on the other hand, does much better at Serial scheduling of work loads (when you consider that anything prior to Maxwell 2 is limited to Serial Scheduling rather than Parallel Scheduling). DirectX 11 is suited for Serial Scheduling therefore naturally nVIDIA has an advantage under DirectX 11.”

Regarding the really curious results of DX11 and DX12 on NVIDIA’s graphics cards, Mahigan had this to say:

“People wondering why Nvidia is doing a bit better in DX11 than DX12. That’s because Nvidia optimized their DX11 path in their drivers for Ashes of the Singularity. With DX12 there are no tangible driver optimizations because the Game Engine speaks almost directly to the Graphics Hardware. So none were made. Nvidia is at the mercy of the programmers talents as well as their own Maxwell architectures thread parallelism performance under DX12. The Devellopers programmed for thread parallelism in Ashes of the Singularity in order to be able to better draw all those objects on the screen. Therefore what we’re seeing with the Nvidia numbers is the Nvidia draw call bottleneck showing up under DX12. Nvidia works around this with its own optimizations in DX11 by prioritizing workloads and replacing shaders. Yes, the nVIDIA driver contains a compiler which re-compiles and replaces shaders which are not fine tuned to their architecture on a per game basis. NVidia’s driver is also Multi-Threaded, making use of the idling CPU cores in order to recompile/replace shaders. The work nVIDIA does in software, under DX11, is the work AMD do in Hardware, under DX12, with their Asynchronous Compute Engines.”

And as for AMD’s underwhelming DX11 results, Mahigan claimed that this is mainly due to GCN’s architecture, as the graphics cards are limited by DX11’s 1-2 cores for the graphics pipeline.

“But what about poor AMD DX11 performance? Simple. AMDs GCN 1.1/1.2 architecture is suited towards Parallelism. It requires the CPU to feed the graphics card work. This creates a CPU bottleneck, on AMD hardware, under DX11 and low resolutions (say 1080p and even 1600p for Fury-X), as DX11 is limited to 1-2 cores for the Graphics pipeline (which also needs to take care of AI, Physics etc). Replacing shaders or re-compiling shaders is not a solution for GCN 1.1/1.2 because AMDs Asynchronous Compute Engines are built to break down complex workloads into smaller, easier to work, workloads. The only way around this issue, if you want to maximize the use of all available compute resources under GCN 1.1/1.2, is to feed the GPU in Parallel… in comes in Mantle, Vulcan and Direct X 12.”

This is definitely interesting and will make people understand why Ashes of the Singularity performs so well on AMD’s GPUs.

Do note that a game’s draw calls are not its only bottleneck under DX12. Both 3D Mark’s DX12 benchmark and Ashes of Singularity use a lot of draw calls. However, a game may hit a Geometry or Rasterizer Operator bottleneck, in which case an NVIDIA GPU will outperform an AMD GPU.

What ultimately this means is that NVIDIA will have to re-design its graphics cards in order to be able to handle more draw calls in parallel. A software solution sounds almost impossible at this stage, though NVIDIA’s engineers may come up with some interesting techniques to overcome this limitation. That, or some DX12 games may hit another bottleneck that may favour NVIDIA’s GPUs over AMD’s GPUs.