Well, here is some good news for NVIDIA users. It appears that NVIDIA will fully implement Async Compute via an upcoming driver. As Oxide’s developer “Kollock” wrote on Overclock.net, NVIDIA has not fully implemented yet Async Compute in its driver, and Oxide is working closely with them in order to achieve that.
“We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We’ll keep everyone posted as we learn more.”
From the looks of it, NVIDIA will – at least for now – rely on a software/hardware solution for Async Compute instead of a fully hardware one.
As Overclock’s Mahigan explained:
“The Asynchronous Warp Schedulers are in the hardware. Each SMM (which is a shader engine in GCN terms) holds four AWSs. Unlike GCN, the scheduling aspect is handled in software for Maxwell 2. In the driver there’s a Grid Management Queue which holds pending tasks and assigns the pending tasks to another piece of software which is the work distributor. The work distributor then assigns the tasks to available Asynchronous Warp Schedulers. It’s quite a few different “parts” working together. A software and a hardware component if you will.
With GCN the developer sends work to a particular queue (Graphic/Compute/Copy) and the driver just sends it to the Asynchronous Compute Engine (for Async compute) or Graphic Command Processor (Graphic tasks but can also handle compute), DMA Engines (Copy). The queues, for pending Async work, are held within the ACEs (8 deep each)… and ACEs handle assigning Async tasks to available compute units.
Maxwell 2: Queues in Software, work distributor in software (context switching), Asynchronous Warps in hardware, DMA Engines in hardware, CUDA cores in hardware.
GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, Copy (DMA Engines) in hardware, CUs in hardware.”