#begin
Ever since I did some really in depth research about multi-threading, parallelism and protocol languages I was really fascinated by it. It all started when I read about how changing a 64 bit integer (long) on a 32 bit machine in a multi-threaded program could lead to all kinds of awesome interesting exceptions. For my master’s thesis I wrote a protocol language called Discourje in Clojure to monitor operations on threads. Discourje will prohibit operations if they do not comply with the protocol specification.
After finishing my Udemy course on code optimizations I found this course about multi-threading and parallelism in C#. This seemed to be right in my boat so I quickly enrolled and started watching some of the lectures.
Now, everyone knows that multi-threading in a Unity3D project is really difficult since you cannot access much of the Unity3D API from other threads than the main thread. This really sucks since we are developers and don’t like to be treated as babies! However, Unity3D had promised to give developers other means of multi-threading and parallelism through their Dots and ECS libraries/frameworks. Yet, these were introduced back in 2017, and they are still highly experimental and far from stable.
So, I though this course would probably help me with some new information to bridge the gap between now and the release of these new threading libraries in Unity. And, Yeah I think I have a deeper understanding of C# .Net’s threading constructs and libraries after this course. I highly recommend this course to anyone interested, of any skill level. I think you only need some basic understanding about multi-threading concepts to start, but Dmitri explains everything nice and clear in the course. Let’s discuss the sections below!
Introduction
Yet again, a course that starts with an introduction section. This seems like the most logical step to start so here we are again. The lecturer introduces himself. He’s called Dmitri Desteruk, is a MicroSoft MVP and his interests lie in writing highly performant code in C# and C++.
This course is called Learn Parallel Programming with C# and .NET: Discover the core multi-threading and parallelization concepts supported by the .NET framework. And it has 9 sections including the introduction marked as section 0 (niicee) and summary. So let’s talk about section 1 t0 7.
Section 1: Task Programming
The first subject in this course is an introduction to Task programming. He starts with this since it is the basis for the other upcomming sections. A Task is a “unit of work” to be done on another thread. So all code inside a task runs on another thread and thus should be written that way. Dmitri shows the main options to start new tasks. One through using the constructor, another through a nice static function and yet another through a task factory. So there are different kinds of ways to start tasks.
Next he also shows how you would stop a task. A Task has a “Cancel” method, yet it wont do anything is you do not implement some sort of cancel mechanism in your code. You need to make sure you have implemented something called a CancellationToken. With this token you are able to define exactly in your code what happens upon cancellation or exceptions. You can even make multiple tokens for different purposes, and couple them together into one linked token source. Then you can “watch” a single token from outside the task, yet use multiple specific tokens inside the task for all kinds of detailed cancellation mechanisms.
The next subject was new to me which is called spinning or spinwait. This can be used to wait for x amount of time and can be used in the same manner as you use Thread.Sleep(). It is simply a way to pass time in your thread. However, it works slightly different than Thread.Sleep(); When you sleep a thread it gives up its position in the scheduler of .Net. During the time the thread sleeps, other threads are allowed to run and thus make the most of out available resources. But with a spinning mechanism you do not give up your place in the scheduler, and thus waste resources by doing so. This can be useful in a small number of situations where you must be absolutely sure your thread runs immediately when the spinwait time has passed. But, you are wasting resources by doing so, so use it with care.
Then Dmitri shows how to wait for tasks to finish. There are multiple variations to wait for tasks. You can wait for a single task to finish, or want for multiple tasks. Wait for all of them to be finished, any given task in a collection or N number of tasks to be finished. When you decide to wait on a collection of tasks you need to make sure your have implemented the cancellation tokens correctly because if you don’t, the tasks will be cancelled, yet as I explained earlier, there will no cancellation mechanism be implemented in the tasks, and thus they will keep running and start generating some nice exceptions for you. The nice thing here is that you can re-use the same cancellation token for multiple tasks. So if any task is cancelled, they all are.
The last video in this section is about exception handling. When you are using tasks as a means of multi-threading you will get exceptions called AggregateExceptions. These are special kinds of exceptions that contain one or more exceptions that were thrown on the tasks you ran. Then you are able to handle all of them independently and act accordingly to each. You can use a special “Handle” function on the AggregateException to specific how you would handle specific exception types.
Section 2: Data Sharing and
Synchronization
As the title explains, the second section is all about data sharing and synchronization between threads. Dmitri explains all about locking mechanisms, spinlocking and mutexes. I thought this section was pretty interesting since I only have some experience with simple locking strategies. But after watching the video’s presented here I think locking is a bit dated since mutexes are far more powerful.
The first video in this section is about defining what “critical sections” are; these are sections in your code that are prone to threading errors due to the fact the actions being performed there are not atomic. Meaning, that some actions may need more than one instructions, for example: updating an integer requires two actions, not one. First you need to read the memory location of the integer, then you need to update it. In between reading and updating, another thread could potentially have updated the integer already and thus you will miss that update.
This is where Dmitri shows how to use a lock to make sure only one thread may access the integer. Once the integer is read and updated, the lock is released and another thread is allowed to perform its actions.
Following the video about critical sections there is a video about the different interlocking mechanisms in .Net. These are handy shortcuts for doing exactly what Dmitri showed in his previous video. So with the interlocking statics you are able to remove your manual lock objects and make the interlocking mechanism take care of it. I think this is very useful since it will remove some cumber stone objects and logic from your code and make it cleaner and more readable. There is for example a static implementation for incrementing and decrementing integers, which are pretty useful I think. The interlocked class also provides logic for using a memorybarrier, but I’ll get into this in a later section of the review since it’s explained in more detail there. Lastly, Dmitri talks about two other useful functions, “Exchange” and “CompareExchange”. These functions can be used to perform threadsafe assignments. So the interlocked class gives you some useful tools to program without locks but they wont cover everything. Thus for simple things you might be able to use it, but in more complex situations you might need to fallback on more intricate locking mechanisms like Dmitri discusses in the next three videos.
This next video is all about spinlocks and lock recursion. He starts off with explaining what the difference between locks and spinlocks are; the different lies in the fact that using a lock will always result in the thread blocking until it is able to get the lock. With the spinlock, you are able to perform some checks to see whether you acquired the lock or not, and act to each situation accordingly. Dmitri also points out that the lock construct is shorthand for using System.Threading.Monitor Enter and Exit functions. It’s like a “using” construct that will dispose the object after the using clause. But the lock construct will exit the monitor for you. Dmitri then explains the concept of lock recursion where you reuse the same lock in a function that calls itself recursively. He demonstrates that the spinlock he used this far in the video, does not support lock recursion although you can pass a boolean in there to say it is allowed to be used multiple times in a recursive manner.
This is followed up with a video about mutexes, which were quite new to me. I can’t really remember ever using any. They have existed for a long time in the .New framework so that’s rather interesting. I think I always got my stuff done with simple locks. A mutex however is more “powerful” than a lock. A mutex can be used in several different ways. The first one that Dmitri explains is the WaitOne function where you wait for the lock, infinitely, or for a certain amount of time. Also, you need to manually release the lock once you are done with it. Another thing that makes the mutex more interesting is that you can lock on multiple mutexes in your code. The example Dmitri gives in the video is how you would transfer money between two bank accounts. Here he uses two mutexes where he waits for both of them, and when they lock is successful he’s able to transfer the money. Oh, and don’t forget to release both mutexes afterward. The last cool thing about mutexes he shows is how you would make a global mutex, meaning, a mutex bound to a specific identity that you can use only once. So imagine that you have an app with some global mutex id, when you start the same app again, you can detect that mutex and for example show a dialog saying the app is already running or the mutex already taken.
The last video in this section is about ReaderWriterLocks, which do exactly what you might think. They separate the locking mechanism between read and write operations. So you can create more fine grained locking logic. Dmitri quickly shows the two variations of ReaderWriterLocks. He says to always use the “Slim” lock since it is more modern, but does not really explain why. I researched this a little bit and apparently there are bugs in the ReaderWriterLock in regard to atomic updates. This blog on InfoQ explains. So with these kinds of locks you are able to have many readers on data, yet only one writer. I think these locks work really nicely. You simply open a read or write block and then do some actions and close them again. Plus, a nice bonus is that the ReaderWriterLocks support lockrecursion which will probably be very useful in certain situations. The last cool thing about ReaderWriterlocks is that you can upgrade read-locks to write-locks when you open them as up gradable locks. Those are just other methods besides open and release read or write lock.
Section 3: Concurrent Collections
The third section is one that is more familiar to me. It’s about all default .Net concurrent collections, which I’ve used many times before. The one I tend to use the most in Unity3D is the ConcurrentDictionary, but there are others too like the ConcurrentStack, ConcurrentQueue and the ConcurrentBag. This section also dives deeper into the producer-consumer pattern which resembles something like channel operations Clojure.Core.Async or something you might see in Go(lang). I think the video about the producer-consumer pattern is the most interesting of this section. But before I talk about that, let’s talk about the concurrent collections first. I wont talk about them individually since all collections behave in a very similar matter. Also, they behave similar as to their single threaded siblings. So a ConcurrentDictionary has the same properties as a normal Dictionary, a ConcurrentStack behaves like a normal stack, and a ConcurrentQueue behaves like a Queue.
They way they differ from their single threaded versions is that, they support concurrent access, of course. But also they don’t have “Add”, “Pop”, “Push” etc. functions, but all have “TryAdd” or “TryPush”etc. functions. All these functions return a boolean which you can handle when adding succeeds or fails. Additionally, you also have to supply an update lambda function that handles the actual adding of the element. I have to say this mechanism works pretty great and as I said, I’ve used the ConcurrentDictionary many times before. I’ve never really used the Concurrent Stack, Queue or Bag much but that’s mainly because I was never really in the position to use them I think.
This section ends with a video about the Producer-Consumer pattern. What you need here is some collection that implements the IProducerConsumerCollection interface. The example Dmitri uses here is a BlockingCollection based on a ConcurrentBag. The behaviour he simulates here is that of a channel construct you might see in more recent languages. You can simply call the GetConsumerEnumerable function on the blocking collection is a while loop which will block the thread if there are no messages. The producer can simply keep adding values until the buffer is full. In the video Dmitri added a buffer of 10 elements, so his collection could never contain more than 10 elements. only after reading a value from the list, would the producer be able to add a new item. I think this pattern is really nice and I’ve never used it with C#. I’ve used channels in Clojure.core.async before in my Discourje project so I’m really familiar with the behavior you get. If I find a nice opportunity to use it in Unity3D I most certainly will!
Section 4: Task Coordination
The next section in the course is all about task coordination. It starts off with a video about continuations, which are the most simple means of coordination of tasks. A continuation between tasks is like executing two actions in sequence, yet on another thread. Continuations can be done on a single task but also on a collection of tasks. You can continue when “All” tasks are finished, or when “Any” of the tasks was finished. When you use the “Any” variant, don’t forget about cancellation tokens if you need to cancel some things gracefully.
The second video in this section is about how tasks relate to one another. Tasks are inherently independent, yet you can parent certain tasks together. So when you start a task, within another task, nothing special happens. To bind tasks you need to add some additional data in the task constructor. When you add specific TaskCreationOptions you can set the enum to AttachedToParent. So if you wait for the parent to finish, it now also automatically waits for any child tasks to finish. Then there are also specific TaskContinuationOptions that specify rules when continuation is allowed. So for example, you can specify that a task may only continue when the previous task has ran to completion and for example, when the previous task has faulted. This is really cool right? this way you can make some really nice fine grained continuation mechanisms.
The next video is about the Barrier object. This is not the same as the MemoryBarrier we saw earlier in this blog. The memory barrier was about making sure the CPU does not reorganize memory locations of tasks. So it makes them run in order. The Barrier object is designed to make sure certain tasks only run when some threshold is reached. The example Dmitri gives in the video is when you have some algorithm with a number of working threads. Each of these threads need to run through specific phases. But each working may only continue if and only if all other workers have completed their phase as well. So you can setup TaskContinuationOptions to account for this. Then when all workers have finished their phase, they will all continue on the next phase. The example in the video contains the most overused concurrency example known to man; making a cup of tea. The algorithm contains two separate threads, one for getting the kettle, boiling the water and putting the kettle back. And a second thread for getting a cup, adding the tea and putting the water in. Each phase I just described are only allowed to run in parallel one at a time. So for example, boiling the water takes some time, yet picking a cup is done much faster. Yet, we need to wait for both actions to actually make the tea. So this is where the Barrier can do it’s thing. You can set the barrier up to account for a maximum of two actions, and when it’s reached the barrier value will reset to 0 and you can start another phase.
In the fourth video, Dmitri explains that there are a couple of more Barrier like coordination constructs. They often work with some kind of counter, like the Barrier has. When you reach the number you passed as constructor argument, the Barrier resets. In this short video the CountDownEvent is explained. This works very similar as the Barrier, yet counts down, and not increments like the Barrier does. The difference between Barrier and CountDownEvent lies in the fact that Barrier supports multiple phases, but the CountDownEvent does not. So if you need a single phase, you can use them interchangeably.
The fifth video is about two very similar objects; the ManualResetEventSlim and the AutoResetEvent. These can also be used to block threads. You can use treat reset events like toggles. You can Set them, and wait for them. When they got set, and somewhere else the wait keyword was triggered. With the manual event you can wait for the signal in the event to become true, and then it continues. The auto reset event can only wait once for the signal to change, if it as not set the check fail and thus will not complete. So the auto event will consume a signal only once, but the manual event can wait indefinitely, until the signal is changed, and then consumes it.
The last video in this section is about the classic Semaphore(Slim) class. I think many of us have used this before, especially if you have done some multi-threading in Java. A Semaphore also works with an internal counter but you can both increment and decrement them manually. This makes it pretty flexible and useful in certain situations. You can wait for the semaphore which will block the thread and you can release a certain amount of threads. This amount can be determined in the constructor of the semaphore.
Section 5: Parallel Loops
The fifth section of the course jumps into something really interesting, and something I never used in any Unity3D project. Parallel loops. I thought this section was really interesting because it’s such a simple thing to do, and yet it might give you an immense performance increase. The only problem is that In Unity, you cannot access the Unity3D api in those parallel loops. That’s why I haven’t used them much before. But, if I find the opportunity I will use them!
The first video in this section is about parallel Invoke, For and Foreach. The first subject of this video is Parallel.Invoke which simply takes an array of System.Actions which are called in parallel. This already is really interesting. This is really just a nice friendly shortcut to call functions in parallel. Also, these functions run within a Task so you can use the earlier discussed synchronization and continuation. Another important thing to note here is that Parallel.Invoke, .For, .Foreach are blocking operations! Dmitri then explains about the parallel For and Foreach which do exactly what you think. They simply run all iterations on separate threads. This is really nice, when you simply need to burn through many data elements, performing the same actions. Just remember, when these iterations are run in parallel, they will probably not run in order!
The next video shows how to break, cancel or handle exceptions in these parallel loops. You can’t simply break from them since the iterations are all ran in parallel so you must take care of this another way. When you start a parallel For or Foreach you can pass a special (overloaded) lambda that takes two arguments where the first is the value that is being iterated, and the second the an object called a ParallelLoopState. With this state you can determine what state the loop is in and you can force the loop to stop, or signal the loop to break at the next iteration. A second way to stop a parallel for is to simply throw exceptions. Remember that these parallel iterations all run inside tasks o when you do throw an exception, you need to catch an aggregate exception and handle the inner exceptions accordingly. Dmitri also shows you can take the return value of a parallel loop, called a ParallelLoopResult and see whether the loop is completed or break is called at a specific iteration. The last means of cancelling a parallel loop is by simply using a cancellationtoken as discussed in previous sections. You can use the token for cancelling the loops if that betters suits your needs.
The second to last video in this section is about thread local state which is very useful since you don’t want to lock certain data over and over again in those loops. The example in the video involves calculating the sum on a number of integers. You can do some manual locking here and keep locking and releasing the sum variable in each iteration. This seems a bit to much and luckily the .Net framework offers you another way to solve this problem. Dmitri warns that this is a bit complex so let’s discuss why he thinks this is so.
When you start a parallel loop you can pass in an extra lambda argument that initialized the initial state of the local storage. In this particular example we set a value of 0 since we want to start the sum at 0. Then you need to pass another lambda that takes 3 arguments, first the counter, second the parallelloopstate and third the current value of the thread local storage. And the last argument for the loop is an additional lambda that is only executed after all iterations are done. This lambda takes one argument and is the result of the thread local storage. In this lambda you do need to use some custom locking logic. So this is why Dmitri thinks it is a rather complex setup. It’s just calling the parallel loop with multiple arguments, of which most are lambda’s so this might be confusing.
The last video in this section is about partitioning the parallel loop iterations. So the parallel loops automatically partition these iterations themselves, but in some cases you will probably want to partition the iterations yourself since you have a deeper understanding of the given algorithm. To do this, you can use the Partitioner to create chunks of data which you can simply feed into a parallel For or Foreach through it’s method arguments. I think it is really nice that the .Net framework offers something like this out of the box. This way you don’t have to act al clever and partition data is mystical ways.
Section 6: Parallel LINQ
The sixt section of the course is all about parallel Linq. I think every .Net programmer is very happy with the existence of Linq. It makes code so much more readable and you can program in a more functional way to make things less state-full. The fact that you can also use parallel versions of linq makes it even more performant!
To make a linq query behave in parallel you simply call the AsParallel() function on the base IEnumerable you have. From there on you can use all Linq operators you are used to but they work on a ParallelQuery<T> object instead of the IEnumerable you started with. You also need to remember that any parallel function is often ran out or order. So if you truly want to run things in order, you need to call another linq function, specific to the ParallelQuery<T> object called AsOrdered. This will run the entire thing in parallel, in an ordered fashion which is really nice! This is something I am most certainly going to use more in my Unity3D projects. Because of the fact this is so darn easy and you can get a massive performance boost by doing so. But again, remember that Unity3D does not support multi-threading on most API’s.
The next stop is going to be about exception handling and cancellation of parallel link. Just as with the other parallel functions parallel linq works with tasks under the hood. So you can throw exceptions which you must handle accordingly in an aggregate exception. The second way to stop a loop is through simple cancellation tokens. I’ve discussed this in previous sections so I’m not going to dive deeper into this.
Then there is a more interesting video which takes a look at merge options. Parallel Linq ParallelMergeOptions is a hint you can give to the system for merging the result data. The example Dmitri shows in the video is about how certain values are produced and consumed. In the example he shows that while the algorithm is still producing, some values are already consumed by a simple writeline call. However, if you want to control this you can use the merge options to do so. When you set it to unbuffered produce and consume might happen interchangeably, when you set it to fullybuffered you can expect that the result is fully generated before you can start to consume data.
The last thing Dmitri shows in this section is how to use a Parallel Linq Aggregate function. This again is a rather complex function call which takes four arguments, and makes use of thread local storage. It works somewhat the same as the Parallel Loop function I described in the previous section. You start with a seed value, than can operate on the thread local storage, sum them all and last do some post-processing on the result value. I don’t think you would use this function much, but if you need it, it’s there so that’s nice :).
Section 7: Asynchronous Programming
(Async/Await)
The last section is this course is about something we (.Net devs) probably all know something about; Async / Await. Dmitri explains this really nicely so let’s see if we can learn something new.
The first video in the section explains the basics of the async and await keywords. Dmitri creates a simple Forms app where he adds a button that does a fictional (thread.sleep) complex computation. This of course freezes the UI thread and thus you cannot do anything else in the mean time. Then he changes the code to use tasks with a continuation and everything works again. But this can be made even easier by using Async and Await. He also shows that you can use the await keyword multiple times in the same method.
The next video is a short one about how using the async and await keywords generate state machines of your code when compiled to Intermediate Language. Dmitri tells one reason why this happens is that you want to be able to get nice aggregate exceptions with stack-traces and such. This also optimizes your code and is what allows you, as a C# programmer to write code as your used to. Just by adding async and await keywords here and there you can create a multi-threaded program. So the .Net framework handles all of this for you under the hood.
The third video in the course is about the Task.Run utility method and it’s sometimes complex return values. When you use this utility method you can get return values like Task<Task<Int>> since the Task.Run method starts a task but you might as well create another task in that function. Also, if you call it as async you make it return it’s correct return value. If you don’t you might get a return value like Task<Task>> which is not very useful if you need the return value. Another interesting thing in this video is the fact that you can force a task to Unwrap. So you can call the Unwrap function on a Task<Task<Int>> and get the Task<Int> where you can the return value from properly. What is also pretty neat is that the await keyword acts the same as unwrap. You can even call Await multiple times on a task to unwrap it multiple times. So for example: you can await await Task<Task<Int>> and get the integer directly. I did not know this and it’s pretty useful to know π
Next there is a very short video about task utility combinators which are simple utility functions like Task.AwaitAll, or Task.AwaitAny. These work in the same manner as other coordination functions I discussed earlier.
The fifth video in this section is one that I think is a bit more interesting; It’s about a pattern called “Async Factory Method” and concerns creating objects Async. In .Net you cannot have an async constructor so, you cannot doe async initialization. But with the Async Factory Method you can. This essentially works the same as the normal Factory Method design patter, just with the use of async keywords in the construction methods. I thought this was a nice little video since he shows how to quickly transform a sequential algorithm into an async one very easily. Which demonstrates the power of async / await again.
The next video shows a very similar pattern, which is called the Async Initialization Pattern which is the same as the Async Factory Method but this one also takes care of the asynchronicity of any to be initialized associated classes that might need to be created. So this comes down to adding some extra await keywords in the factory methods.
The second to last video is about another interesting subject, Async Lazy Initialization. Dmitri shows how to transform a Lazy<T> object into an AsyncLazy<T> object. He first transforms the Lazy<T> object manually to do it’s work async but then explains that .Net has a shortcut which is the implementation of the AsyncLazy<T> class.Β I thought this was a pretty useful video. I can’t remember ever using the Lazy<T> class, let alone the AsyncLazy<T> so I learned something new π
The last video in the course is about the concept of a ValueTask. This type of task was added in .Net Core 2.0 and is capable of wrapping a TResult (T Reult) or a TResult<T>. The ValueTask is based on a struct and thus generates less overhead for the garbage collector. You can use this new task in few specific cases where you know users will await on these tasks directly or where you can effectively pool the tasks.
Conclusion
Ok, this took quite a bit longer to write than expected. The main reason why is that this course, and blog are rather technical an thus I wanted to describe things clearly and correctly. I’ve learned a lot from this course that I can apply directly in my day-to-day programming adventures. Especially the Parallel Loops and Parallel Linq sections were very interesting. Yet, I think, that the other sections were very interesting too. I’ve learned new stuff from each section so this course was well worth my time. Also, writing this blog, kind of a summary of the course made me revisit the subjects to be able to write them down properly. So effectively, I’ve seen the course 2 times now, some video’s I’ve watched more than 2 times.
I think this was a great course, 5/5 stars! I think everyone that wants some deeper understanding of parallel and asynchronous programming in .Net can learn new things here. I think, that some level of understanding of tasks is very helpful since you can follow Dmitri a bit better. But I have to say, everything is explained very clearly so I think people with any skill level should be able to enroll and finish the course with new insights.
#end
01010010 01110101 01100010 01100101 01101110
Trackbacks/Pingbacks