Pages

Wednesday, August 22, 2012

Lock Free And SpinWait MSDN Example

Lock Free And SpinWait MSDN Example

If you have read about the new features in 4.0 then probably you stumbled on a SpinWait structure and a MSDN article and the example code it provides. Just for the sake of clarity and history should the article change in the future I present you the code: 

public class LockFreeStack<T>
{
    private volatile Node m_head;

    private class Node { public Node Next; public T Value; }

    public void Push(T item)
    {
        var spin = new SpinWait();
        Node node = new Node { Value = item }, head;
        while (true)
        {
            head = m_head;
            node.Next = head;
            if (Interlocked.CompareExchange(ref m_head, node, head) == head) break;
            spin.SpinOnce();
        }
    }

    public bool TryPop(out T result)
    {
        result = default(T);
        var spin = new SpinWait();

        Node head;
        while (true)
        {
            head = m_head;
            if (head == null) return false;
            if (Interlocked.CompareExchange(ref m_head, head.Next, head) == head)
            {
                result = head.Value;
                return true;
            }
            spin.SpinOnce();
        }
    }
}

The example presents a lock free stack that uses the new SpinWait to issue a SpinOnce() operation after a failed CAS (Interlocked.CompareExchange). This is done to supposedly improve performance of code by doing a spin wait, but in my honest opinion there are a couple of problems with this solution. In order to see them first we need to see how SpinWait actually works.

Here is the code that does all of the work:

public bool NextSpinWillYield
{
    get
    {
        return this.m_count > 10 || PlatformHelper.IsSingleProcessor;
    }
}
/// <summary>Performs a single spin.</summary>
public void SpinOnce()
{
    if (this.NextSpinWillYield)
    {
        CdsSyncEtwBCLProvider.Log.SpinWait_NextSpinWillYield();
        int num = (this.m_count >= 10) ? (this.m_count - 10) : this.m_count;
        if (num % 20 == 19)
        {
            Thread.Sleep(1);
        }
        else
        {
            if (num % 5 == 4)
            {
                Thread.Sleep(0);
            }
            else
            {
                Thread.Yield();
            }
        }
    }
    else
    {
        Thread.SpinWait(4 << this.m_count);
    }
    this.m_count = ((this.m_count == 2147483647) ? 10 : (this.m_count + 1));
}

SpinOnce uses a counter to determine if it needs to issue a Yield, on each method call this counter gets incremented. If the counter is under ten (10) then Thread.SpinWait will be issued that multiplies the number 4 by 2 to the power of the counter number, so the maximum spin iterations will be 4096 which is allot. If the counter will be grater then ten (10) then a number of context switching methods will be invoked depending on the counter value. Yield will be invoked every time except fifth (Thread.Sleep(0)) and 20th Thread.Sleep(1). This means that after ten (10) iterations SpinWait will issue a light context switch that will issue a switch to another thread on the same processor (!). Then it will try to issue a switch to a thread with the same or higher priority, and finally a switch to thread will be issued with any given priority. 

Lock free Data Structures to be performant need to be wait free or limit the number of CAS failures, this requires building the structure in such a way that should we miss the CAS instruction and leave the underlying data structure in a inconsistent or wrong state, some other thread will set it right again. In the context of our example this is not the case as we only have a single and very simple CAS instruction present, that may have a potentially high CAS failures. So to deal with that we might just let the thread spin as it is in our best intentions to let the thread finish the loop as fast as possible, or we could issue a Yield to let another thread waiting on its time slice to continue. Yielding should have a positive effect in high CAS failures as this means that threads have a high overlap ratio and therefor step on each others boots. Issuing a Yield should decrease the overlapping and in result decrease the CAS failure rate.

SpinWait

So what's wrong with SpinWait SpinOnce in this context? The problem is that up to ten (10) failed CAS we issue an Thread.SpinWait that will get expensive fast. Another thing is that we are spinning in the loop right now so there is very little value in spinning again for a increasing number of iterations. The only possible benefit from that is that we might set the processor in the active mode (provided that we aren't in that mode already due to the high load) and thus we will increase the performance but still this will not give us to much benefits (it might if the number of threads isn't to high) as we are in a high contention loop (lots of spinning threads). Another possible benefit is that each thread that failed CAS spins longer than it's previous pair thus  racing threads have a chance to align properly and finish the operation, but in reality the penalty of waiting is to high and still even without spin wait the threads should align. Doing performance tests the code without spin waits on average runned faster compared to the original code, the SpinWait solution was on par with commented out solution in a couple of tests but was generally worse.

Proposed Solution

A better solution then using a SpingWait structure would be to have a failed CAS counter that would count failed CAS operations and upon reaching a certain threshold it would issue a Thread.Yield every time to give other threads a chance in finishing their loops for reference let's change the Push() method.

private volatile Node m_head;
private const int failedCASthreshold = 2;

private class Node { public Node Next; public T Value; }

public void Push(T item)
{
    int failedCAS = 0;

    //var spin = new SpinWait();
    Node node = new Node { Value = item }, head;
    while (true)
    {
        head = m_head;
        node.Next = head;
        if (Interlocked.CompareExchange(ref m_head, node, head) == head) break;

        if (++failedCAS > failedCASthreshold)
            Thread.Yield();
    }
}

Conclusion

So does this mean that MSDN example code is now ready for production use? Actually no it just means that SpinWait was presented in wrong context and better solution existed and this article was an attempt to explain what's wrong with it to prevent developers from using it in such context.

1 comment:

Angshuman said...
This comment has been removed by the author.