内核中的编译器屏障rep_nop()
最近在看Linux内核中mutex_lock()和spin_lock()实现, 发现一个很有意思的函数调用: rep_nop()
/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
asm volatile("rep; nop" ::: "memory");
}
汇编指令rep的本意是重复执行rep后的指令直到ecx寄存器中的值变成0为止. 但是这里为什么要重复nop呢? 于是反汇编, 发现rep;nop其实是被翻译成了pause指令.
以下摘自Intel软件开发者手册第二卷第四章(下载地址):
Improves the performance of spin-wait loops. When executing a “spin-wait loop,” processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.
An additional function of the PAUSE instruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption.
This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation).
This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
简单说, Intel推荐在spin-wait loops中使用pause指令, 因为它既可以避免性能损失, 避开内存序列冲突, 还可以减少电源消耗, 并且不会有兼容性问题.
PS, 至于为什么mutex_lock()的实现也是spin-wait loop, 则是因为内核引入了spinning mutexes的特性, 详细信息在此: https://lkml.org/lkml/2009/1/14/393