#begin
Algorithm Speed
Now, as we talked about in the introduction we will also discuss the topic of algorithm speed. This mostly boils down to being able to estimate with some level of precision your algorithm’s resources. Pragmatic programmers should be able to say something about algorithm time, processor memory and maybe data allocation on both the stack and heap.
David and Andrew say this kind of estimation is a crucial skill for software engineers because what program would you pick? The program that runs an algorithm in 1.000.000 cycles or the one running for 1000. These kinds of questions can be answered by approximations called the Big O, O(). notation.
What Do We Mean by Estimating Algorithms?
A common thing among non-trivial algorithms is that they can handle some kind of variable input like sorting algorithms, matrix manipulation and cryptographic algorithms. The size of such input, will have an impact on the resources used by that algorithm.
It’s very useful, and also necessary in most cases, to know if that relationship between input and resources used by the algorithm is Constant: O(1), Linear time: O(n), Logarithmic time: O(n log n), Quadratic time: O(n^2), Exponential time: O(2^n) or Factorial time: O(n!).
In most cases, important algorithms are certainly not constant, linear or logarithmic. They are often quadratic or exponential and sometimes factorial. But we also need to consider resource used; for example, if your algorithm is logarithmic time yet in each iteration of some loop it needs to load a file of 1 gig into memory it will drastically impact the performance of the algorithm. But that might be a topic for another day.
The O() Notation
In the Pragmatic Programmer we zoom in a bit on the O() notation. This notation is the mathematical way of dealing with approximations. An important thing to understand about the O() notation is that it always expresses the worst case scenario. This means that, if you see a factorial notation of some search algorithm it means that only if the element you search, is the very last in some collection. If the element is at the first location being searched, the algorithm exists early. If it’s implemented properly of course. O() is all about upper bounds!
Another important aspect of the O() notation is that we often take constants out of the notation. When a notation like O(n*2+3n) and a notation of (n*2) are compared, the latter will be much faster than the former. And this is of course one of the weaknesses of the O() notation. Some algorithms with the same notation might be far slower than another.
David and Andrew now present a graph of algorithm complexity which I strongly encourage you to check out. Its on page 220 of the book!
Source: The Pragmatic Programmer 1999.
Common Sense Estimation
Then they provide us with some common algorithms like looks and nested loops. Again I strongly encourage you to check the book since it explains it explains it far better than I do. But Let’s give it a go.
First of we have simple loops: If the algorithm runs for 1 to N elements in a collection the notation with be O(n) which is linear time. Since the algorithm simply accesses each item in the collection.
Second we have nested loops: if you nest a loop inside another the notation becomes O(N*M) where N and M represent the limits of the collections and thus the loops. Of course, when N and M are the same collection, the notation becomes O(N squared). This commonly seen in sorting algorithms for example.
Third we have Binary chop: If your algorithm halves the set of things it needs to consider each time around the loop then the algorithm is likely logarithmic. The perfect example of this is of course binary search. But for binary search to work properly, you need an already sorted collection because it wont work otherwise. But the searching part is really optimized and that’s the thing you should remember.
Fourth we have divide and conquer algorithms: such algorithms partition their work on two halves independently and when done they combine the results. These algorithms are often O(n log(n)). Here the canonical example is quicksort, which in the end is not as quick as you think. Quicksort works by partitioning data, and then recursively sorting each, by dividing it again. Quicksort looks like a cool algorithm but you will shoot yourself in the foot trying to implement it. C# for example, doesn’t deal well with recursion since its not tail call optimized (TCO)! This means that all stackframes must be kept and stack exhaustion is a real problem. I won’t go into the details about tail call optimization since its a topic on its own. But if you want a great explanation of it I would refer you to Uncle Bob’s latest book; Clean Functional Design. In this book he describes TCO in grave detail. I read it recently and I really like his approachable way of explaining it. So go check it out, here.
Fifth and last we have combinatoric algorithms; David and Andrew describe these kinds of algorithms as the ones that start to look at permutations of things. Haha, well that’s kinda vague. Combinatoric algorithms algorithms are often factorial. The example they give is the traveling salesman problem. This algorithm boils down to finding the shortest route between all given points on a map or graph without duplication of locations. So the salesman must reach every location, yet never the same one twice in the shortest way possible. This requires the algorithm to access each node at least once, and from there access the other ones as well.
Algorithm Speed in Practice
David and Andrew say that as a day to day engineer you will probably not spend a lot of time writing sort algorithms. These algorithms have been around for a long time and will be provided by libraries and such. Yet they say that there are problems that pop up time and time again. Like, if you are writing a single loop, then you know you are dealing with an O(n) algorithm. If its a nested loop then you’re looking at an O(N*M).
You should be asking yourself the question how large these numbers can get and if the numbers have an upper bound. A while a go I was listening to an episode of the Coding Blocks podcast and they were talking about the coding best practices and rules used at NASA. In these rules there was a subject of always providing an upper bound, or guard, to exit from a loop and not run it indefinitely. I think this is also very important. I mean, how many times have you written a coroutine where all the code is wrapped in a While(true) block? You should have some means of exiting early because that thing can run the for entire lifetime of your program, consuming resources, if you are not careful.
There are some approaches you can take to address potential problems in your algorithms. David and Andrew say that if you have an algorithm with O(n2) complexity, maybe you should try a divide and conquer approach to take that down to O(n log(n)). That’s far better than n-squared. And if you are not sure how much resources your algorithm is consuming, you could try and measure it. Although real performance measurement is notoriously hard to do. But Unity3D provides this PerformanceTest Package you can use and measure frames and memory consumption etc. So maybe give that a try.
And if that doesn’t work out David and Andrew provide us with the advise to run the algorithm with varying input and simply plot the results. You should be able to recognize some kind of curve or line which you can estimate further.
The last tip they give on this subject is to thoroughly test your algorithm… I mean duhh.. But what they mean is that you should provide all kinds of different data-sets to it and see the runtime. For example, how does your algorithm perform with input that’s unordered vs. ordered? What happens with negative numbers? Zero? Or very large numbers? Empty collections or collections that contain null? You should encode these scenarios in your tests and also think about the edge-cases and boundary conditions. And, also, profilers are your best friend here, so use them. Unity3D provides a pretty nice profiler and if you are developing for Android you can use the profiler build into Android studio while running a development build on the actual device.
#end
01010010 01110101 01100010 01100101 01101110
Recent Comments