17 – Udemy Part 2: C# Performance Tricks

#begin

In my last blog about Udemy I hinted that I started on another course. I finally had the opportunity to finish it and thus, here is the review of the course. I started this relatively short course on C# performance tricks since I thought it would probably yield some interesting tips and tricks I could apply in my daily work; and yes, I was right.

C# Performance Tricks

The course I started was this one: C# Performance Tricks: How To Radically Speed Up Your Code by Mark Farragher. It is a rather short course, about 4.5 hours of content. But I think it covers a lot of ground in just this short time. This course caught my attention since it mentions Intermediate Language (IL) for the Common Language Runtime (CLR). I have only studied IL, or Java ByteCode for that matter, for a very short while. This course mentions that you are going to get some more information on how C# code is compiled to IL and how you can read this IL and find out if you can enhance the performance of the code with some simple tricks.

I thought this was fascinating since I often have or hear people having discussions about for example: for loops vs. foreach loops, throwing exceptions vs. safeguarding your code and using Reflection vs. class factories. This course dives into all of this.

The course has 7 sections, starting with an introduction and ending with the final words where Mark is saying his goodbye’s. I will cover each section in its own paragraph and end this review with a conclusion explaining my opinion and providing you with some key takeaways.

Section 1 – Introduction

As with any course, anywhere on the internet, this one also starts of with a simple introduction of the lecturer. Mark explains his background from his startup and how he started teaching C# on different levels. He also helps you with setting up your IDE depending on your OS. Mark shows he is using the unholy union of MacOS and VSCode for C# programming, Yuck! Sorry, I’m just biased to JetBrains IDE’s.

Section 2 – Fundamentals of the .Net Framework

The first section with some actual real content in regard to the course is all about the fundamental aspects of the .Net Framework. It starts with two videos about the Stack and the Heap, how they relate to each other and what the between them is. It’s been a long time ago that I got some in depth information on the stack and heap. Mark clearly explains what type of data is stored on the stack and the heap. I think there was nothing really new for me here. He also explains the concept of garbage collection when talking about the heap. But I’ll write about that in a later section since there is an entire video dedicated to the .Net garbage collector.

Next there are two videos about value and reference types. Marked hinted at these in the videos about the stack and heap and now he explains what exactly value and reference types are and what types would classify as them. What I found rather interesting about this part is that he clearly explains how reference types exist on the stack, but point to data on the heap. He explains this really nicely, and I think this is really important for a developer to know. If you want to write high performant algorithms in C# you most definitely need to have these basic understandings of the stack, heap, reference and value types.

Last he talks about boxing and unboxing of value types, what it means and how it works. He also hints that is really slow compared to just using value types. So if you know you are using integers for example, don’t store them as objects. This is rather easy to remember of course, but there are certain collection types that do boxing and unboxing of values automatically.

This section ends with a short explanation why strings in C#, and most other languages are immutable. I think every programmer knows this already. Surely, when you ever programmed in plain old C, you must remember the character arrays that are used to represent strings. It is still the same in more high level languages like C#, but just with some nice utilities. So a string is still represented as a char array in C#, on the low level. When you access the string, you only have the pointer to the first element of the array.

Section 3 – A crash course in Intermediate Language

The next section is all about IL. In this section Mark talks about different kinds of IL instructions. He points out some of the ones you will most likely see when you decompile C# code. He again talks about boxing and unboxing of objects, store and load operations, math functions and branching operations. I think this is a really great addition to the course since I think there is not much attention given to this aspect in most education / classes. It has been a long time since I read some IL, or Java ByteCode for that matter. But I know, if you understand the operations well enough you can surely add some extra optimizations to your code. Rich Hikey, the creator of the Clojure programming language often talked about Java Bytecode optimizations in his early talks about Clojure.

The second part of the video shows a nice example of how a simple for loop that sums all numbers given an iteration count. Mark explains the generated IL code step by step which he did really nicely. He shows you exactly what which line of C# are compiled to what IL instructions.

Section 4 – Basic optimizations: The low-hanging fruit

This section is all about the most basic, and easy to implement optimizations you can and should do to your code. Mark starts again with a video emphasizing that boxing and unboxing is very slow. He demonstrates this by using an array of objects vs. an array of integers. He iterates through them in a for loop and shows the difference in performance in a nice graph. I think we all have some understanding about the fact you should always use the specific type you need instead of using objects and casting everything. Not just for readability, but surely in mission critical code you need that performance and thus you should always use the desired types for it.

Next there is a video about string concatenation. We already discussed that, under the hood, strings are still character arrays where you only have a pointer to the first char. The implementation of concatenating strings by the + operator simply creates a new string and marks the old string for the garbage collector. (More on the garbage collector will be discussed later in the course, spoiler alert!) Since this video is publicly available I will talk about this part in more detail. Mark has some simple code in a for loop where he concatenates 10000 characters using the + operator vs. appending a string using the stringbuilder. The results are again displayed in a nice graph, and there is something interesting going on here.

It appears that string concatenation is actually faster than the stringbuilder up to 4 operations, when it is higher the string builder starts to out perform the + operator. And not just by a little.. In the end, the stringbuilder is 240 times faster than the + operator. I think this is really what we all expected, yet it is interesting that the string concatenation is faster up to 4 operations. I have been relentlessly removing concatenation through the + operator from my code, and my colleagues code for years now. But actually, it seems to be acceptable in low numbers. However, since C# 6 we always use string interpolation instead of concatenation since is has a much nicer syntax. Too bad that Mark does not compare this with the + operator and the stringbuilder. I think, in practice people will use the interpolation on low number of elements in a string since it is just practical. You will not find code that manually interpolates 10000 entries in a string, that just does not seem feasible to me. If you ever encounter such code, please show me the magic.

The next video is all about collection classes you might use in the .Net framework. He shows the difference between many collection objects and it boils down to this; Never, ever user the ArrayList<T> since it introduces boxing and unboxing when any element in the array is referenced or dereferenced. Also, never use any collection classes from the System.Collections namespace, and do always use them from the System.Collections.Generic namespace since the former does boxing and unboxing and the latter does not. And last, always use the generic IEnumerable<T> instead of the IEnumerable without generic, since again, this will introduce boxing and unboxing when referencing elements. These are all very, very easy fixes and everyone should implement these if they want some easy to grab performance increases to their code. Luckily, I think the generic collection classes come second nature to C# programmers. I have not seen many use of the non-generic counterparts myself. I think, since C# is strongly typed, using a generic collections gives you a better programming experience since you don’t have to cast values as much.

What has become clear in this video about collections and native type arrays (like int[]) is that arrays always out perform collection classes. Mark says to always use type arrays over any collection classes. The reason is that arrays are natively supported by IL so when you declare an array IL can use them directly since arrays are implemented in IL itself. This makes the performance of arrays superior to any custom made collection class. Yet, the problem with arrays is that we need to know the length of it in advance so it’s not always practical to use them. But if you do know the size, always use a type array since it will grant you some extra performance.

The next video is about how to implement multi-dimensional arrays and still have great performance. I think this was one the the more interesting video’s in the course since I had never seen an implementation of this in code. Let me explain it. Mark explains that multi-dimensional arrays (int [,]) are slowest, and jagged arrays (int[][]) are slow. This all has to do with the number of IL instructions needed to access and edit elements in these arrays but Mark explains a third way of implementing code that needs a multi-dimensional array of some sort. You implement it yourself by a technique called array flattening. It works like this: When for example you have need for a 3-dimensional array, each of 1000 elements; then you instantiate a simple, 1 dimensional array of 3000 elements where you keep a”row length” in a separate variable. So when you want to access the 1th element in the 2de dimension you simply do array[rowlength*row+index] which in this case is array[1000*1+0], you can do the same for accessing the 400 element in the third dimension like this array[1000*2+400]. Now why go through all this trouble? Well, if you want, or desperately need the extra performance this will outperform the multi-dimensional and the jagged arrays. And why? Well, because, as discussed earlier, IL has native support for arrays, and thus, it is very faster than the other two. I think this is really nice and If I ever have the chance to implement this I will.

The next video is about the performance hit you take when you throw exceptions and I thought this was astonishing. I knew throwing exceptions was slow, but not this slow. Now, since I often work in Unity3D it is bad practice to throw exceptions anyway since Unity will often get into a deadlock state when an exception is thrown. This is why you always take a rather defensive stance against any code that is executed. You make the code as safe a possible and add logging nicely to indicate something went wrong. But anyhow, Mark shows how much slower throwing exceptions is vs. handeling erroneos states silently yourself. He demonstrates this with some simple dictionary techniques where he needs to get vales from a dictionary. He has one usecase where he throws an exception when a key does not exist in the dictionary, and another where he first checks for for the key, and only when it exists he uses it’s value. (I’m not sure why he did not use Dictionary.TryGetValue(<key>, out var <value>), but yeah, it does not matter that much.) The results of this experiment are rather dramatic so I have to align my opinions with Mark here, and tell that never, ever throw exceptions in (deeply nested) mission critical code! Period…

This section ends with a video about the 1 million dollar question and always subject in some debate; the dreaded for loop vs. foreach discussion! I think we all know deep down that the for-loop is faster than de foreach loop yet, it depends. The results are as follows; there slight performance differences between the two. When you need to iterate an array, you can take either the for or the foreach loop, there is not much difference. But when you are going to iterate a List<T> or an ArrayList<T> you should use a for-loop since you will gain some performance benefit by doing so. The problem lies again in boxing and unboxing where the foreach loop uses an IEnumerable “under the hood”. A last remark here is that, if you see an ArrayList<T> being iterated by a foreach, consider changing it into a List<T> at least as a minimum change to gain a slight performance increase.

Section 5 – Intermediate optimizations

Alright, now this section started with an unexpectedly interesting subject; garbage collection. There are two in depth video’s about this where in the first video the concept of garbage collection is repeated and information about the .Net garbage collector is explained. The second video shows some practical tips how to write your code to match the .Net garbage collectors algorithm. I thought this was fascinating! I for example had totally forgotten that the garbage collector in .Net was generational and you can write your code to better align with the algorithms it uses to mark objects and move them across generations. This is explained really nicely in the first video of this section and I think it is absolutely crucial a programmer understands the concept of garbage collection well enough so he can write code that is optimized for it in some sense. I’m not going to explain exactly what Mark says in the first video because I want you to view the course yourself, plus you can search online how a generational garbage collector works.

What I will do is give some small practical tips on how to “interact” with it. With this garbage collector objects can either be short or long lived, where long lived objects are objects >85k. This means, that if you have some object, you only use once, but us greater than 85k you must dispose of it or set it to null once you are done with it for optimal performance. If you leave it around the object will survive and go to the next generation. A key takeaway that Mark gives here is that the .Net garbage collector has some assumptions about objects; that 90% of small objects is short-lived and all large objects are long-lived. He emphasizes to not go against these assumptions. Also another great way to optimize for the garbage collector is to, simply, allocate less objects. But when you allocate them, discard them as soon as possible and try to reuse large objects.

Next up, is a video about delegates. I never really thought about the performance of delegates in my code. Yet I use code like “public event Action<T> MyEvent;” all over (an Action<T> generates delegate under the hood). However this is not in mission critical, high performant code but often related to UI matters. And still, I never really gave much though about how fast, or slow, this implementation would be. I just implemented it this way since I wanted some observer like design. But, Mark shows the difference between unicast, and multicast delegates and there is a performance hit. His advise is to never, ever use multicast delegates in your code, and if your mission critical code needs optimization, remove delegates altogether. He points out that you should use delegates in your code where it’s convenient, yet always consider removing them and come up with another solution to gain some performance.

The last video in this section is also something really interesting; it’s about fast class factories. This is something I struggle with sometimes, especially when I really want to constraint myself to using Clean Architecture. I often write my code as generic and independent as I can to align with the concepts of Clean Architecture, yet there always is this one factory kind of class that pulls in all concrete dependencies in order to create and configure the objects a caller might expect.

In this video Mark shows two commonly used solutions and one ‘new’ solution to instantiate instances of classes. He shows the most obvious ones like having a factory class with string input that uses a switch to instantiate the correct class. Yet this is not really flexible since you must define all switch cases beforehand. So you might end up re-compiling your code all them time to add new cases. He also shows some more advanced way of instantiating new objects and that’s through reflection and the Activator class. Yet, as we know, this is really, really slow. In this video he shows you how slow it actually is and it’s dramatic. Then Mark shows a third way of doing this, which I found really nice and I have never seen implemented anywhere and that is using dynamic method delegates to instantiate new objects. What this essentially is, is a simple static class that keeps a dictionary of <string, or type, classgenerator> where the class generator is a delegate that creates the instance of the class. When you as for an instance, it will check for the key in the dictionary, if it exists it will simply call the delagate and get a new instance of the object. When it does not exist Mark uses reflection to get information of the objects constructors and generate code that matches it. In this way, you only take the performance hit for using reflection only once per objects type you want to construct. This is way faster than using the Activator class!

Section 6 – Advanced Optimizations

So this last section of the course is what I found the least interesting. Not that this isn’t interesting per-see, but the performance tips Mark shows you here are so specific to your use case, I will probably never have one in the context of the things I do in Unity3D. I might find myself in the position to use some of the techniques to process images like he shows in this part of the course, but the same algorithm is most likely already implemented in the Unity3D engine. But nonetheless, I watched the videos with full attention and I will discuss them below.

So most of the tricks he shows in this section are all marked in “unsafe” blocks in the code. An “unsafe” block of code is called unsafe because it is not managed by the .Net framework. This means you can use pointers, stack allocation and some other low level, C’ish kinds of functionality. And that’s exactly what this section is about:

The first video of the section talks about using stack allocation to use arrays on the stack, and not on the heap. Mark says you might have to use stack allocated arrays when you need to interface with existing, unmanaged code; maybe even from another programming language. In this video mark shows clearly that using stack allocated arrays just as a performance boost is not worth the trouble. There is only a slight increase in performance, but this is so minimal it is often better to keep the readability higher.

Next there are two videos about pointers, what they are, how they work and why you would want to use them in higher level languages like C#. It’s often used for interfacing with low level languages yet in some very special cases you really want to use them in C#. Image processing is such an example. If you want to get any kind of acceptable performance for image processing you really need to use pointers so set and read pixels in an image. Mark shows the difference between using the high level GetPixel functions vs. low level custom made pointer based solution. He clearly shows the pointers vastly out perform the high level get pixel code.

The second video about pointers discusses the fact that, pointers should only be used when absolutely necessary. The syntax can be really obscure and unreadable. You should use regular functions for accessing data instead of pointers as much as you can. But if you really do need the performance, there is the option that you refactor your code to use pointers instead.

Section 7 – Final Words

This last section is a simple recap of all that has been discussed. Mark thanks you for choosing his course and wishes you good luck. The last video is one that I found really funny. This is a promotional video, for other lectures he makes, and hosts himself, however he does the promotion on Udemy. That’s pretty ballsy! Nonetheless, he makes promotion for videos about Data Science and machine learning (ML), all in C#. He explains the rationale behind this and I have to agree with him. He says that the marked is flooded with “C# programmers” not that there is anything wrong with that, yet, you can differentiate yourself by having experience in Data Science and ML. He says some new doors might open for you, and you can use his courses to unlock them.

Conclusion

So finally, I can give you my opinion of the course. I’m pretty enthusiastic about it all. It’s a nicely explained course about what I find an interesting topic. Many of the things he explains, I knew about already, yet not in the scale he objectively shows here. For example: I knew string concat with the + operator was slow, yet, not how slow. It’s nice to see some numbers so you can say; I need to refactor this, or don’t, like up to 4 concatenations using the + operator seems acceptable.

The video where Mark explains all about the .Net garbage collector is one I found really nice. People often don’t think about it since they take the garbage collector for granted. Is is really nice to be able to optimize your code to work and align with the garbage collector for faster running code. Also, the piece about fast delegates is one I found particularly interesting. I explained this already but I will do it again.

When I implement software using Clean Architecture I often end up having these AbstractFactories and such where I need to create and configure concrete instances of objects. In some cases I might use Reflection to do so. Mark shows a really nice alternative, which is faster than using the activator class but still keep the flexibility.

The last section, “Advanced Optimizations” is one I did not find that interesting since it does not relate to the work I do. I can imagine that it will to other people. This section is explained as well as the rest so the information given is to the point and nicely presented in a clear manner.

So, what would be the score for this course? Well I’m going to give it all 5 stars. I think any C#, or .Net developer will gain some information from this regardless of his skill level. Some information is just never talked about during work. It is nice to have some of these things refreshed. So yeah, I recommend you to watch the videos and see for yourself.

#end

01010010 01110101 01100010 01100101 01101110

17 – Udemy Part 2: C# Performance Tricks

C# Performance Tricks

Section 1 – Introduction

Section 2 – Fundamentals of the .Net Framework

Section 3 – A crash course in Intermediate Language

Section 4 – Basic optimizations: The low-hanging fruit

Section 5 – Intermediate optimizations

Section 6 – Advanced Optimizations

Section 7 – Final Words

Conclusion

Trackbacks/Pingbacks

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Hey, sorry to bother you but you can subscribe to my blog here.

You have Successfully Subscribed!