Parallel programming in C#
This one will not be actually about the new parallel framework in 4.0, im not into it right now, but actually this is a nice find to make threads go faster, even if they are not sharing data directly.I saw it the first time on MTS conference this year (2010) and was kinda blown away, but as I could not find any info about it in the internet I had to figure it out myself. But the credit should go to Cezary Nalewajko for setting the direction.
Let's Dig In
Consider the flowing class:public class Test { public int value1; public int value2; public void DoSomething() { //some dummy operation of read and write. for (int i = 0; i < 9000000; i++) { value1 = value2; value2++; value1--; value2 = value1; } } }
Now let's create 2 threads, each will own a new instance of this class.
Test test1 = new Test(); Test test2 = new Test(); Thread thread1 = new Thread(test1.DoSomething); Thread thread2 = new Thread(test2.DoSomething); thread1.Start(); thread2.Start(); Stopwatch watch = new Stopwatch(); watch.Start(); thread1.Join(); thread2.Join(); watch.Stop(); Console.WriteLine(watch.ElapsedMilliseconds);
Ok, nothing special here on my cpu the result's on average is around 360 ms.
But now let's modify a bit our code sample and see what happens.
[StructLayout(LayoutKind.Explicit)] public class Test { [FieldOffset(0)] public int value1; [FieldOffset(24)] public int value2; public void DoSomething() { //some dummy operation of read and write. for (int i = 0; i < 9000000; i++) { value1 = value2; value2++; value1--; value2 = value1; } } }
Now the result's are around 110 ms! nice huh. So what just happened here? If you are familiar with field offset in therms of marshaling and communication with some native Os code the Explicit layout is often used to simulate C unions, to pass to native methods if needed but as it turns out using field offset can speed things up a bit in a managed environment, this is because in a multi processor/core environment we have a data bus between cores and JIT in NET enforces a sequential layout by default (this is true for value types) and this is not optimal at all in multi threaded context. But wait if you would used a sequential layout or manually set the field offsets to 0 and some size say 10 you could actually make the performance worse, also if you are on x86 environment then you could damage the performance.
My guess for this behavior is this that data get's moved through the bus in 32/64 bit cycles and further more int's are padded in memory so setting the layout like so makes for less cycles to actually to transfer the data.
To sum it up:
Setting field offsets in memory can give some performance boost's but also can downgrade performance so caution is needed and careful testing for each class/struct where this application would be considered.
No comments:
Post a Comment