Blender's implementation of Spinlock in Windows (copy)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Blender's implementation of Spinlock in Windows (copy)

Percy Ross Tiglao
 Hey all,

I originally posted a topic over at devtalk (https://devtalk.blender.org/t
/threading-question-why-no-mm-pause-yield-in-the-spinlock-implementation/466),
but its only got 25 views after multiple days. So I'm trying this mailing
list in an attempt to get a few more eyeballs on this. Also: the first time
I submitted this email, I forgot to subscribe to this list. So my first
email is stuck somewhere in the moderator queue. That's why I'm calling
this a (copy).

I noticed that the Windows implementation of SpinLock seemed to be missing
the "YieldProcessor" command. The spinlock implementation is in
"source/blender/blenlib/intern/threads.c".

void BLI_spin_lock(SpinLock *spin)
{
#if defined(__APPLE__)
  OSSpinLockLock(spin);
#elif defined(_MSC_VER)
  while (InterlockedExchangeAcquire(spin, 1)) {
    while (*spin) {
      /* pass */
    }
  }
#else
  pthread_spin_lock(spin);
#endif
}

I propose adding the "YieldProcessor()" macro to the inner while(*spin)
loop.

"YieldProcessor" is a macro in Windows that compiles into the _mm_pause
intrinsic (for the assembly instruction "pause"). Windows documentation (
https://msdn.microsoft.com/en-us/library/windows/desktop/
ms687419(v=vs.85).aspx ) suggests that this simply gives
processor-resources to the hyperthread-sibling.

Intel's documentation of _mm_pause goes even further, suggesting that the
pause asm instruction allows for the processor to come out of a spinlock
more quickly. ( https://software.intel.com/en-us/node/524249 )

"pause" is in fact a special NOP command, its an alias to the assembly code
"rep nop", and therefore compiles under all x86-supported processors. Its
the ideal command that has broad compatibility and improves
lock-performance in modern processors.

In short, I'm suggesting the following one-line diff:

    while (*spin) {
        YieldProcessor(); // Special "NOP" hint to processor for
hyperthreads and spinlocks
    }

I would expect that this diff would improve performance.

Note: This "pause" may be related to: https://developer.blender.org/T53068.
I noticed that the Windows implementation of pthreads does not have a
_mm_pause() in it, so that may have been causing poor performance a few
builds ago.

-- Percy
_______________________________________________
Bf-committers mailing list
[hidden email]
https://lists.blender.org/mailman/listinfo/bf-committers
Reply | Threaded
Open this post in threaded view
|

Re: Blender's implementation of Spinlock in Windows (copy)

Brecht Van Lommel-4
Thanks for pointing this out, I committed the change.
https://developer.blender.org/rBef502854feb6b81119954206bff414d4507f4f3c

It's probably difficult to find specific cases where this helps, but seems
to be the correct thing to do indeed.

Regards,
Brecht.


On Thu, May 24, 2018 at 9:55 PM, Percy Ross Tiglao <[hidden email]>
wrote:

>  Hey all,
>
> I originally posted a topic over at devtalk (https://devtalk.blender.org/t
> /threading-question-why-no-mm-pause-yield-in-the-spinlock-
> implementation/466),
> but its only got 25 views after multiple days. So I'm trying this mailing
> list in an attempt to get a few more eyeballs on this. Also: the first time
> I submitted this email, I forgot to subscribe to this list. So my first
> email is stuck somewhere in the moderator queue. That's why I'm calling
> this a (copy).
>
> I noticed that the Windows implementation of SpinLock seemed to be missing
> the "YieldProcessor" command. The spinlock implementation is in
> "source/blender/blenlib/intern/threads.c".
>
> void BLI_spin_lock(SpinLock *spin)
> {
> #if defined(__APPLE__)
>   OSSpinLockLock(spin);
> #elif defined(_MSC_VER)
>   while (InterlockedExchangeAcquire(spin, 1)) {
>     while (*spin) {
>       /* pass */
>     }
>   }
> #else
>   pthread_spin_lock(spin);
> #endif
> }
>
> I propose adding the "YieldProcessor()" macro to the inner while(*spin)
> loop.
>
> "YieldProcessor" is a macro in Windows that compiles into the _mm_pause
> intrinsic (for the assembly instruction "pause"). Windows documentation (
> https://msdn.microsoft.com/en-us/library/windows/desktop/
> ms687419(v=vs.85).aspx ) suggests that this simply gives
> processor-resources to the hyperthread-sibling.
>
> Intel's documentation of _mm_pause goes even further, suggesting that the
> pause asm instruction allows for the processor to come out of a spinlock
> more quickly. ( https://software.intel.com/en-us/node/524249 )
>
> "pause" is in fact a special NOP command, its an alias to the assembly code
> "rep nop", and therefore compiles under all x86-supported processors. Its
> the ideal command that has broad compatibility and improves
> lock-performance in modern processors.
>
> In short, I'm suggesting the following one-line diff:
>
>     while (*spin) {
>         YieldProcessor(); // Special "NOP" hint to processor for
> hyperthreads and spinlocks
>     }
>
> I would expect that this diff would improve performance.
>
> Note: This "pause" may be related to: https://developer.blender.org/T53068
> .
> I noticed that the Windows implementation of pthreads does not have a
> _mm_pause() in it, so that may have been causing poor performance a few
> builds ago.
>
> -- Percy
> _______________________________________________
> Bf-committers mailing list
> [hidden email]
> https://lists.blender.org/mailman/listinfo/bf-committers
>
_______________________________________________
Bf-committers mailing list
[hidden email]
https://lists.blender.org/mailman/listinfo/bf-committers