It appears that AMD has recently filed and published a new patent titled “Chiplet-integrated Machine Learning Accelerators”, in which techniques for performing machine learning operations are outlined. AMD describes what the company calls a new MLA (Machine Learning Accelerator) chiplet design.
The patent application is for a machine-learning chiplet, which is going to be integrated into a package coupled with GPU cores as well as the cache, in order to create an “APD” or Accelerated Processing Device in AMD’s terminology. GPU cores here might refer to the upcoming RDNA 3 architecture, and the cache unit most likely being a version of AMD’s Infinity Cache design, if taken as an example.
The main purpose of this machine learning accelerator would be to accelerate machine learning such as matrix multiplication operations. This combined package of GPU, Machine-learning accelerator, and cache chiplet is collectively called an APD or “Accelerated Processing Device”.
Moreover, the APD described in this patent has a chiplet of both ML Accelerator as well as a cache on the same chiplet.
Through the inclusion of such a chiplet, AMD might be able to add some machine-learning capabilities to several of their designs in a modular way. Could this be AMD’s answer to Nvidia’s DLSS technique? I guess we will come to know about this later on when it gets implemented.
According to the patent’s description, different process nodes are being mentioned for each type of chiplet. This implies AMD might use different manufacturing technologies in this chiplet-based design.
Moreover, the memory modules of each MLA can be dynamically configured to be used as a cache (i.e. infinity cache), or as a directly accessible memory, depending on the scenario.
The patent basically describes how cache-requests from the GPU die to the cache chiplet can be achieved. Lastly, the MLA may be used as a physical interface between the APD core and a high tier memory. So the MLA is physically put between the APD core and the GDDR/HBM memory.
Based on the patent’s description, the chiplet is embedded in the substrate and can be used both as a silicon bridge, as well as an accelerator. As mentioned above, this design will also allow for on-the-fly usage of cache, or as a directly-addressable memory.
As the patent Abstract reads:
“Techniques for performing machine learning operations are provided. The techniques include configuring a first portion of a first chiplet as a cache; performing caching operations via the first portion; configuring at least a first sub-portion of the first portion of the chiplet as directly-accessible memory; and performing machine learning operations with the first sub-portion by a machine learning accelerator within the first chiplet.”
Machine learning is a rapidly advancing field. Improvements to hardware for machine learning operations such as training and inference are constantly being made. The methods provided can be implemented in a general purpose computer, a processor, or a processor core.
Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine.
Such processors can be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media).
The results of such processing can be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements features of the disclosure.
The patent was filed on 07/20/2020, and published on 01/28/2021.
Stay tuned for more tech news!
Hello, my name is NICK Richardson. I’m an avid PC and tech fan since the good old days of RIVA TNT2, and 3DFX interactive “Voodoo” gaming cards. I love playing mostly First-person shooters, and I’m a die-hard fan of this FPS genre, since the good ‘old Doom and Wolfenstein days.
MUSIC has always been my passion/roots, but I started gaming “casually” when I was young on Nvidia’s GeForce3 series of cards. I’m by no means an avid or a hardcore gamer though, but I just love stuff related to the PC, Games, and technology in general. I’ve been involved with many indie Metal bands worldwide, and have helped them promote their albums in record labels. I’m a very broad-minded down to earth guy. MUSIC is my inner expression, and soul.
Contact: Email






Good. In any case I would love to see AMD’s answer to DLSS. The claimed Smart Access Memory/SAM feature turned to be a non-AMD exclusive feature, since Nvidia and Intel also can implement it.
But DLSS has been using hardware tensor cores. Would love to know how AMD plans to go for a similar approach. When it comes to hardware accelerated RTX, AMD is far behind Nvidia.
But the new RDNA 2 cards do have ray tracing core if I’m not mistaken ?
Tensor cores are just cores that can do 8 concurrent 4-bit operations in single 32-bit computation unit. This is just like Rapid Math from AMD Vega which allow for 2 concurrent 16-bit operations in single 32-bit computation unit. There are many algorithms when you don’t need full 32-bit precision or even 16-bit precision. Good example is fast 4-bit convolution used by DLSS
AMD confirmed 4-bit support on PC and Xbox version of RDNA2. Playstation 5 support only 16-bit float and don’t support 8-bit integers and 4-bit integers.
“For larger 64-bit (or double precision) FP data, adjacent registers are
combined to hold a full wavefront of data. More importantly, the compute unit vector registers natively support packed data including two half-precision (16-bit) FP values, four 8-bit integers, or eight 4-bit integers“
AMD version of DLSS will use DirectML for 4-bit convolution so it should work just like DLSS. MS even created own Super Resolution 4-bit convolution algorithm in 2019 and run it on Nvidia Tensor cores to show DirectML. This algorithm will work on RDNA2, Nvidia Tensor Cores and next gen Intel GPU
The problem is though that RDNA2 has no dedicared hardware to run it on. That means it has to join the workload queue, while running as a compute based job.
Yes, but with support of native 4-bit compute you can reserve 4 CU which can execute 32 concurrent ops. Average modern GPU have 60-80 CU so reserve 4 CU for resolution reconstruction is not a big problem. You gain more performance from lower resolution. Compare it to older hardware like RDNA1 when you need reserve 16 CU for 32 concurrent ops (16-bit) or GCN where you need 32 CU for 32 concurrent ops.
This is why low precision concurrent ops are so important. You need a very little resources to do a huge work if your job not require a full 32-bit precision
Yup, the AMD Radeon RX 6000 series actually feature Ray Accelerators.
One official AMD slide elaborated on the hardware component that Radeon RX 6000 Series GPUs leverage for ray tracing – the Ray Accelerator (RA). Each Compute Unit carries one Accelerator as shown below:
These RA units are responsible for the hardware acceleration of ray tracing in games. The RX 6900 XT features 80 RAs, RX 6800 XT features 72, and RX 6800 has 60. The same Ray Accelerators can be found in RDNA2-based next-gen gaming consoles.
The Ray Accelerator is specialized hardware that handles the intersection of rays providing an order of magnitude increase in intersection performance compared to a software implementation.
https://uploads.disquscdn.com/images/d9d63830c38ced6fadfbe8a60ce448f9a92f1a33b14d1953d97b0be95cec4e19.jpg
.
DLSS don’t use rays. This is a fast 4-bit convolution of models. It work just like old 16-bit Rapid Math from AMD but even faster because use low 4-bit operations (8 concurrent ops in single 32-bit unit)
“DLSS, is a convolutional neural network that has been trained on conventionally (non-AI) rendered images from video games. The network uses lower resolution rendered images as it’s input and tries to improve the image to approximate high quality rendered images”
Decision making is simple but there are a lot of pixels. This is why it is written as low precision 4-bit operations to push as many concurrent operations as possible. Fast (concurrent) low precision computation is main reason why Nvidia created tensor cores
Oh, obviously I know that. I wasn’t referring to DLSS here at all !
Was answering to this query by simonsCC instead. I don’t think he was referring to an equivalent of DLSS here, but ray tracing instead:
“”But the new RDNA 2 cards do have ray tracing core if I’m not mistaken ?””
exactly. some confusion here.
we all know DLSS don’t use rays
what ever .
they can put unicorn blod in that sh.t who cares if you cant buy it
Stop blaming companies if we can’t buy any of these cards. You really think AMD or even Nvidia have created such a situation for us ? It’s that damn covid pandemic which has created an enormous demand for gaming gear, and other production costs including shipping
situation wont improve even this whole year, so can we keep on blaming companies for low availability and high prices ? nope man.
This won’t stop companies from innovating new hardware techniques like what AMD is doing here though. They won’t stop producing GPUs either, regardless of what the demand or supply situation is there in the market.
This has nothing to do with the PLANDEMIC. Taiwan & China have been opened for business because there decided not to tank their economy.. Please go do your research, this is bigger than you think and clearly knowns. Aren’t SSDs, HDDs, Movies and every other computer part is on the market just fine. But NOT the GPUs, why do you think that is? This is AMD & Nvidia’s doing. Stop blaming this garbage so called virus for everything. There are people pulling strings and these strings are getting investors nice and fat. You can shop at Costco, Walmart and all big box stores but small businesses are closed? Why you think that is? Where do you think they get their products from?
Do you believe every damn thing they tell you dude? Use your brain, analyze & comprehend the situation PLEASE! COVID Didn’t do this to us, GOVERNMENTS & BIG BUSINESSES DID! Everytime the wanna rule and control people they tell you it’s for your safety. If they cared about your Fuqin safety they would’ve burned down the grocery stores with all the crappy A$$ foods that have in the inner isles. NO ONE CARES ABOUT YOU! YOU are cattle being lead to your death and you are just enjoying that single file line.
Burning down the grocery stores would be a huge, huge mistake. Hungry people get desperate and the crime rate would soar. In my city the grocery stores shut down for a week and a half when Covid 19 hit and the mayor made them open back up.
You are convinced that Covid 19 isn’t in anyway to blame for the present shortages but I don’t get how you blame Nvidia and AMD only.
What about the scalpers and miners? Aren’t they responsible in a large way for the shortages?
so you don’t blame the whole pandemic for any of this ? Then u are being delusional.
The hint is in the name: Covid-19
19 as in 2019.
GPU prices have been insane since 2017. Can the virus travel through time?
I’d imagine this is too late for RNDA3, perhaps RDNA4?
I don’t care who makes the GPU. The one with the best feature set gets my money.
Metal Messiah stay killing ’em with true journalism. Respect!
I read all that and don’t understand how it translates to gaming today or a year from now. Sounds impressive and above my intellect, like quite a few of Metal’s articles. What does it mean in TL;DR (but of course I read it). ???
It’s not above your intellect. It’s just above your tech knowledge. I’m in the same boat with you.
they are approaching a machine learning technique which I think might be an alternative to nvidia’s DLSS, though implemented in a different way, not fully AI based,
imo.
DirectML also more likely. or maybe not ?
>though implemented in a different way, not fully AI based,
What do you mean by this? In what way is DLSS “fully AI based” and this won’t be? And how do you know?