According to one recent Tweet posted by @InstLatX64, INTEL’s future upcoming ALDER LAKE micro architecture is going to support a new instruction set which can help speed up the processor’s cache performance. This is called CLDEMOTE, or “cache line demote”.
Apart from Alder Lake, Intel’s future next-generation Xeon “Sapphire Rapids,” and low-power “Tremont” cores are also going to feature this new instruction set. But we are more interested in the ALDER Lake- S series of consumer desktop processors.
So why is this important ? Well, it’s not that relevant if you are not a “cache or CPU” enthusiast, but it is relatively a very important shift/change in the “instruction design” philosophy nonetheless, which can really help with overall CPU performance.
You can skip reading this article if you find it too technical in nature though. As per definition, The CLDEMOTE instruction hints to hardware that the cache line that contains the linear address should be moved(“demoted”) from the cache(s) closest to the processor core to a level more distant from the processor core. This may accelerate subsequent accesses to the line by other cores in the same coherence domain, especially if the line was written by the core that demotes the line.
The operating system/OS tells the processor core that a specific content of a cache (e.g. cache line) is not required in the lower level cache which is close to this core, and can easily be “demoted” to a higher cache level, but without flushing back to the main system Memory.
CLDEMOTE instruction is a hint to the hardware that it might help performance to move a cache line from the cache level(s) “closest” to the core to a cache level that is “further” from the core.
The CLDEMOTE or “Cache Line Demote” instructions are used so that the operating system can tell the processor core which specific line in the cache is no longer needed, and its contents can be moved elsewhere. However, not directly into the main operating memory, but still into the processor cache, at higher levels (from L1 to L2 and from L2 to L3).
And secondly, CLDEMOTE will allow faster transfer of load (process threads) between individual cores, precisely because they will be able to immediately retrieve the relevant data from the shared L3 cache. This is supposed to mean a positive impact on performance, but we will not know how significant.
Unlike CLFLUSH, CLFLUSHOPT and CLWB instructions, CLDEMOTE is not guaranteed to write back modified data to memory. The CLDEMOTE instruction may be ignored by hardware in certain cases and is not a guarantee. The CLDEMOTE instruction can be used at all privilege levels. In certain processor implementations the CLDEMOTE instruction may set the A bit but not the D bit in the page tables. If the line is not found in the cache, the instruction will be treated as a NOP.
There are two use cases:
- Temporal Locality Control: The cache line is expected to be re-used, but not so soon that it should remain in the closest/smallest cache.
- Cache-to-Cache Intervention Optimization: The cache line is expected to be accessed soon by a different core, and cache-to-cache interventions may be faster if the data is not in the closest level(s) of cache.
There are quite a few benefits of this feature, CLDEMOTE. First it can help free up Lower level cache such as L1 and L2, which are dedicated to the CPU core, by moving cache lines to the last-level cache, which is L3 in most cases. It also helps with the load to be moved between cores by pushing cache lines to L3. L3 is shared between different cores, so it could be picked up by a neighboring core as well.
The Alder Lake CPUs could be the first 10nm desktop parts from Intel, featuring a hybrid architecture design. Intel Alder Lake-S is a going to be a successor to Comet Lake-S, and will feature a new socket LGA 1700. This socket will have support for at least three generations of Intel CPUs. Most importantly, Alder Lake-S is expected to feature a new “big core / small core” architecture, featuring Golden Cove and Gracemont cores.
Alder Lake-S will be the first Intel architecture to offer ARM’s big.LITTLE approach to desktop processors. These upcoming CPUs would be featuring up to 16 cores, in which 8 would be ‘Big’, and the rest would use the ‘Small’ architecture.
According to one report it is now speculated that these architectures are Golden Cove (Willow Cove successor) and Gracement (Tremont successor) respectively. Willow Cove is expected to appear in the upcoming Rocket Lake-S series of processors. LGA 1700 socket is getting a support for 3 generations of Intel CPUs. This is something new for INTEL because the company has been known to support a short socket lifespan till now. These rumors indicate that Intel’s LGA1700 socket will compete with AMD’s AM5 socket in terms of platform longevity.