AMD logo image 2

AMD submits a new patent for an MCM “GPU chiplet” design for future RDNA architectures

It appears that AMD has recently submitted a new patent to the US Patent and trademark Office/USPTO on December 31 2020, which outlines the company’s plan for its future GPU technology. This leak comes via Twitter user @davideneco25320 who spotted this entry.

The patent outlines a potential GPU chiplet design which AMD plans to implement. The patent’s filing date appears to be 06/28/2019, but AMD has filed and submitted similar MCM patents in the past as well.

According to AMD the conventional “monolithic” die designs are becoming increasingly expensive to manufacture, so the company plans to adopt a new GPU chiplet design. Due to the high latency between chiplets, an MCM approach was dropped in favor of a Monolithic design.

We can also expect massive yield gains from just shifting to an MCM based approach instead of a monolithic design. Despite having very poor yields, a single huge die was expensive to produce, and wastage was also an issue.

AMD MCM GPU chiplet design-1

On Monolithic designs, the GPU programming model is often inefficient because the parallelism is difficult to distribute across multiple different working active groups and chiplets . It is also expensive design-wise to synchronize the memory contents of shared resources throughout the entire system, to provide a coherent view of the memory to applications.

Additionally, from a logical point of view, applications are written with the view that the system only has a single GPU. That is, even though a conventional GPU includes many GPU cores, applications are programmed as addressing a single device.

So, it has been historically challenging to bring chiplet design methodology to GPU architectures.

According to AMD such problems can be avoided by implementing a high bandwidth “passive crosslink”, which is basically an on-package interconnect. The first GPU chipset would be directly communicably coupled to the CPU, while each of the GPU chiplets in the array would then be coupled to the first GPU via a passive crosslink.

To improve system performance by using GPU chiplets while preserving the current programming model, AMD illustrated systems that will utilize high bandwidth passive crosslinks for coupling GPU chiplets.

The said system includes a central processing unit (CPU) communicably coupled to a first graphics processing unit (GPU) chiplet of a GPU chiplet array. Basically the array includes the first GPU chiplet communicably coupled to the CPU via a bus, and a second GPU chiplet communicably coupled to the first GPU chiplet via a passive crosslink.

The passive crosslink is a passive interposer die dedicated for inter-chiplet communications. AMD plans to use a passive crosslink for communication between GPU chiplets, which are all on a single interposer.

This could mean that groups of GPU chiplets act like a System-on-a-Chip (SoC) which is divided into separate and different functional chips. This new chiplet design appears to suggest that each GPU chiplet will be a GPU in its own right and fully addressable by the operating system/OS as well.

When it comes to the cache, currently various architectures already have at least one level of cache (e.g., L3 or other last level cache (LLC)) that is coherent across the entire conventional GPU die. But with this new patent design, the chiplet-based GPU architecture positions those physical resources (e.g., LLC) on different dies and communicably couples those physical resources such that the LLC level is unified and remains cache coherent across all GPU chiplets.

During operations, a memory address request from the CPU to the GPU is transmitted only to a single GPU chiplet, which then communicates with a high bandwidth passive crosslink to locate the requested data. From the CPU’s point of view, it appears to be addressing a single die, monolithic GPU.

This allows for use of a large-capacity, multi-chiplet GPU that appears as a single device to an application.

Such a design would see the cache being communicably coupled to other physical resources so that the cache stays unified and remains coherent across all GPU chiplets. Even though AMD has not officially confirmed it is working on such a GPU chiplet design, but there were rumors in the past that the future next-gen RDNA 3 architecture and its successors will be based on a multi-chip design.

AMD already has experience with a multi-chip design with the Ryzen CPU lineup and various APUs, and NVIDIA and Intel will both have their own GPU chiplet-style architectures in the coming years as well.

This is a block diagram illustrating a processing system employing high bandwidth passive crosslinks for coupling GPU chiplets in accordance with some embodiments.

AMD MCM GPU chiplet design-2

This next block diagram illustrates a sectional view of GPU chiplets and passive crosslinks in accordance with some embodiments.

AMD MCM GPU chiplet design-3

Block diagram illustrating a cache hierarchy of GPU chiplets coupled by a passive crosslink in accordance with some embodiments.

AMD MCM GPU chiplet design-4

Block diagram illustrating a floor plan view of a GPU chiplet in accordance with some embodiments.

AMD MCM GPU chiplet design-5

Block diagram illustrating a processing system utilizing a four-chiplet configuration in accordance with some embodiments.

AMD MCM GPU chiplet design-6

This is a flow diagram illustrating a method of performing inter-chiplet communications in accordance with some embodiments.

AMD MCM GPU chiplet design-7

It remains to see whether AMD indeed adopts this GPU chiplet design for its future RDNA architectures. For what it’s worth, Intel has already adopted a TILE-based design for its upcoming Xe-HP graphics cards, whereas NVIDIA is rumored to introduce its first MCM/multi-chip-module design with the next-gen Hopper architecture.

Stay tuned for more tech news!

18 thoughts on “AMD submits a new patent for an MCM “GPU chiplet” design for future RDNA architectures”

  1. Would be nice to see AMD be competitive on the GPU high-end again. It has been many, many years.

    1. I guess you have not seen the RT benchmarks. RDNA2 is good for legacy and competes with Ampere in this respect, at least at lower resolutions. Once you use next gen features, RT and DLSS, you see just how far behind RDNA2 is. To make matters worse AMD is trying to match Nvidia on price with the 3080/6800XT.

  2. I wish AMD success with this.I have believed for years that MCM will be the future of GPUs. I have heard the rumors that Nvidia will introduce MCM with Hopper GPUs. It makes sense to me.

    1. Ya, the HOPPER GPU architecture should have this MCM approach to the design, but there are rumors circulating on the web that Hopper might land up only for the server and enterprise/HPC market segment.

      They might not target the consumer gaming segment.

      Before Hopper, Nvidia might also release another GPU arch codenamed as “Lovelace”, named after Ada Lovelace, an English mathematician and writer in the early to mid-1800s, known for her work on Charles Babbage’s mechanical general-purpose computer, the Analytical Engine.

      But again, all of this is entirely based on gossip and rumors, and with how little we know about Nvidia’s next GPU architectures, there’s a huge chance most of this info is wrong.

      Though, one part of this rumor that does make sense is the codename “Lovelace”. Few years back at CES of 2018, Nvidia CEO Jensen was wearing a T-shirt with several names of popular mathematicians.

      And with Nvidia’s love for codenaming architectures with names of mathematicians, many believe Jensen was leaking the names of future architectures on his T-Shirt. One of those names was Lovelace, lol.

      https://uploads.disquscdn.com/images/23354e3ebc142b9bcff496e4660d87752b39802022b0620e67f0c8b6c2348ae6.jpg

      1. When I hear the name Lovelace I think of Linda Lovelace the p^rn actress. The star of Deep Throat.

        Jensen could probably make a better choice with another mathematician.

        I believe Nvidia will react to pressure from AMD if they do bring MCM to the consumer market and release something even if it’s not Hopper.

    1. Nvidia are well underway with chiplet design, while Intel already has working GPUs. Again it looks like AMD are dragging their heals.

      1. RDNA3 is huge !! and it’s very scalable and meant for multi purpose = High-End Gaming GPUs, APUs, GPU Mobiles, Consoles ( Ps5 Pro & XBX SX XX !!) , Laptops, SoC ….. etc, AMD want to developpe One Big Architecture for Multi-purposes.

    2. nvidia already have their white paper about MCM type GPU a few years ago. make a combination of several GPU but the system only see them as one. their DGX-2 also try to work upon this idea. 16 volta GPU inside but the system only see them as one big GPU. and this is back in 2018. so on the software side of things nvidia probably have them covered which is the most complicated thing in MCM design. intel their Xe already try to make this into reality. so it is hardly a bad news for nvidia or intel when they already some sort of implementation that being work on. probably even ahead of AMD at this time.

      1. The thing is AMD has a “Savoir-Faire” when it comes to make small chispets like Zen’s CCX linked via “Infinity Fabric” and also, they have excellent skilled engineers mastering this “Technique” and we’ve seen what they’ve achieved with Zen1~Zen3 a(nd even RDNA2!) So, if we are talking about Chiplet design & Scalability and Connectivity (Infinity Fabric) i think AMD is away ahead !!

        1. sometimes nvidia did not show it but it doesn’t mean they are behind their competitor. AMD might have infinity fabric but nvidia also have nvlink and now they have mellanox under their belt which is world’s best when it comes to interconnect tech. we have seen it a few times in the past where it seems nvidia is behind adopting certain tech but going forward nvidia end up doing it better. tessellation, GDDR memory and i suspect the same happen with HBM.

    3. This is old tech – the 3dfx Voodoo 2 had the same concept. I am not talking about SLI, I am talking about the fact that a single Voodoo 2 board had 3 chips on-board. Each of the three chips present on the card has its own 64-bit RAM interface, giving the card a “total” bus width of 192 bits or 800MB/s per chip, and the Voodoo2’s third chip was a second TMU for multi-texturing. Seems like the same logic being applied here…

      1. Small GCX ( GPU Complex) units inteconnected via infinity fabric it’s like you have a Quad-Crossfire in one Huge-Chip with very very low latency !! this is the way AMD wants to follow, !! let’s hope intel comes with good products to make prices low again !!

  3. This is awesome news article ! I hope AMD goes for this MCM stuff. Can’t wait to see how this chiplet design works on GPUs.

Leave a Reply

Your email address will not be published. Required fields are marked *