Work StealingWhen working with multi threaded applications, we tend to spawn worker threads which can lead to ineffective code when not done correctly, for example threads will not get reused and will be recreated for each work items. The solution to all those problems is a ThreadPool which reuses threads and queues work items that are consumed by those threads. This sort of implementation while simple can have side effects as unless the pool Queue is immutable it will have to be locked each time an item Enqueued and Dequeued so if the pool operates on a fair amount of threads and lightweight tasks a large chunk of the time will be spent on waiting on a lock.
One solution to that problems is to reverse the logic of queuing and split the queues and assign them to worker threads, so that each thread will process it's own unique queue, and the pool will actually hold a thread queue and assign work to the threads. This improves performance of the pool as there is almost no need to lock the queues, but also has one downside, as it may happen that one thread that finished it's queue is doing nothing while the other is working hard. This is called thread starving, well in a technical since it's not as to starve we would want to do work but we would be denied the time, but as we are in managed user code we are kinda starving our threads as we are simply wasting time doing nothing. To overcome that problem we need to incorporate one simple rule, a thread that has nothing to process can steal work from other threads, this technique is called Work Stealing.
.NET 4.0 Incorporates this technique in it's thread pool (you will have the new pool in 2.0 and 3.5 if you just install the runtime) and the performance boost over the old thread pool can be 2x times! faster as some sources show, so the technique certainly pays off. With the addition of Tasks in 4.0 we do have a very powerful programming model for multithreaded applications, but what if we would want even more control over the pool and work stealing? or we simply can't use tasks? or both?
Smart ThreadFirst we need to create a thread class that implements work queues, and can post statistics and commands to the custom pool:
The two most interesting methods here are Execute and Process. Each time a thread get's a new work item (Action) to process it queues up the item and signals it's internal worker thread that processes the queue.
After each work item excution thread statistics get collected, the thread also updates the pool internal scheduler saying that the pool queue needs to be reordered to be valid. The pool uses a heap data structure to maintain threads with the lowest item count as the more preferred ones. Thread will start to steal work after it's own queue is empty, if the steal was succeeded it will spin and continue with the processing loop. If the steal was not successful the thread will wait on the even and wake itself up after 30ms and will look for work, after set amount of tries it will wait for the event to be signaled forever. In heavy load scenarios spin waiting on a signal will improve performance as waking up the thread is less expensive. When a thread is created and started by the user code it still will end up in thread pool and will participate in scheduling as well as work stealing, but it will use different queue for user code threads.
Smart Thread Pool
The heap code is a very standard implementation of a min heap with no special tricky code inside so the implementation for the purpose of this article will be skipped, but it will get posted as a full code solution.
PerformanceHaving all that before we build the Task abstractions let's test out the raw performance of this solution and the .NET 4.0 thread pool which uses almost identical concepts. The tests were run separately as running both pools in a single process could affect one another pool.
The performance of .NET 4 thread pool is:
The performance of SmartThreadPool is:
While not that big of a difference the time change is noticeable and in performance critical applications can be quite beneficial. Why this difference? The answer is not simple as we cannot analyze the threadpool implementation detail using reflector or any other decompiler as it uses unmanaged functions most of the time so the implementation is not easy to get to. One possible reason is that my code makes use of spinning and waiting on a signal which is a big performance gain over simply waiting ( but still after changing that code the performance of SmartThreadPool is still better ). Another reason might be better work stealing logic, but that's a bit of a stretch as my way of doing things can be heavily improved upon like for e.g work could be stolen in a batch.
SummaryWork stealing as a technique can be used in many places like collections, business logic (banking for eg) to greatly boost performance of concurrent code. This article just scratched the surface what could be potentially done with SmarthThreads, WorkSteling and Task based programming models, certainly there's much more to be explored and improved upon, and that I will try to do in articles that will come.
The full source code of this article can be found here.