#begin
Ok; so next up is yet another section on testing. David and Andrew say that most developers hate testing. And I agree, as Uncle Bob would say, testing after the fact is really, really boring. You already know the code works because you just wrote it and tested it manually, why still write unit tests after you’re done anyway… right? David and Andrew say that most programmers test gently and subconsciously where the code might break to avoid weak spots. But as Pragmatic Programmers we are different. We try to cover as much code as we can by tests and try to ensure that the QA department doesn’t find anything so they can actually enjoy the QA process and do exploratory and play testing instead of functional testing.
David and Andrew compare a test suite with fishing nets; we use small and fine nets for catching fish, those map to the unit tests. We also got big, coarse nets to find larger fish, which map to integration tests. Sometimes the fish escape because there are holes in the nets, so we repair the nets, cover the holes are continue fishing. We have to test early, often and automatically.
The pragmatic approach is to start testing as soon as there is any code. This makes sure we have high coverage but more importantly, we create a habit of testing. Developers will get used to the process of writing tests and start to like the process instead if hating it. We don’t, at any time, want to start testing manually. Well, this sounds like some utopia, because every now and then you will do some manual testing just to increase your comprehension of the control flow of the system for example. This happens most often when you are tracing down some bug that escaped your test suite.
David and Andrew claim that many teams make great test plans, but rarely use them. They say that team that have automated test are far more successful. Automated tests just have a much higher chance of finding bugs than manual testing. But, there’s some nuance here. You’ll hear this saying “Move fast and break things” in our industry a lot. This is mainly a saying in the web development world where a new release of some micro service is done in mere minutes. But in game dev this process is way longer and we can’t really benefit from distributed architecture from a software architecture perspective. Or course we can change backend services any time we want but we’re not really able to dynamically load DLL’s into our games and change the store module for another one. I mean, the Google play store and Apple App store simply prohibit such behavior with their ULA. You could achieve such logic by implementing an embedded scripting language like LUA or LISP but I’m not so sure that many teams will go this far. Interestingly, on windows platforms you might be able to pull off DLL injection but it comes with its own set of problems too, like security. But in a web development world you might be better off throwing away the code you wrote for some small cloud function and start over instead of trying to find the bug. I’m not saying you should but it could be the better choice, especially when there’s no documentation and the original author is not available.
But in a game development setting, I think that teams should have a trustworthy suite of tests that covers enough of the code to be able to say; if the test suite passes, we release the code. Then the QA department can come in and do exploratory and play testing. You should not manually have to test the login sequence… that’s boring as fuck and also inhumane to make someone else do it. Just remember that your work doesn’t end with the fact the code seems to work, you also need to test it properly and of course maintain it. Having a test suite will make maintenance easier.
What to test
David and Andrew then talk about a couple of area’s you should think about in your test suite: You should do unit testing, integration testing, validation and verification testing, resource exhaustion, errors and recovery testing, performance testing and last, usability testing and probably more.
We’ve talked about Unit and integration testing before so let’s take a look at what they have to say about these other forms of testing:
Validation and verification testing
Validation and verification testing is about making sure that the software actually solves the given requirements. So these tests might feel similar to integration testing. Let’s take the requirement of the gearing and weapons system example from previous blogs. You might implement a verification test that some weapon is able to pierce some armor. You specifically encode some requirement into test cases to make sure it passes and remains passing going forward. These kind of tests make sure that you are actually writing the correct system and not just of the system works as expected. I mean you could be coding the wrong requirements but it might be functionally correct. We want to avoid that and make sure we cover the requirements properly.
Resource Exhaustion, Errors and Recovery tests
In the next section they describe resource, errors and recovery tests. These kinds of tests are really important to do in the context of game development. Resource tests especially can find things that make or break your game. These kinds of tests will check for Memory and disk space usage and CPU and GPU bandwidth for example. People will not play your game if the performance sucks, that’s a fact. Another common problem you’ll encounter in gaming, mainly in the mobile market, is battery usage. In some cases you’ll find that the battery drains really quickly and gets really hot. In such cases the frametime needs to be reduced to introduce some buffer and not make the CPU and GPU run at 100%. This will give the device some air to breathe.
But I think that everyone reading this blog will open up the profiler and memory profiler in unity every once in a while. Currently we have some CI workflows hooked up to physical devices that will test for resource usage like memory, disk-space, CPU and GPU utilization and average frame-rate. The results are then uploaded in grafana. I really like this setup because it will show quickly when any of these metrics change. It’s a nice guard on making memory leaks visual. When some of our thresholds is passed, we will get notified in slack which will trigger us to investigate what changed between the latest and previous release that ran on the workflow. It gives you a narrow window of changes to check why a memory leak got introduced, which is really helpful because memory leaks are notoriously hard to find sometimes. Especially when dealing with native code and marshaling to C#.
Error testing on the other hand is also good to keep in mind. An example might be that you change the quality settings of your game depending on the hardware that is available. For example; there’s quite some difference between mobile GPU’s that are on the market. This leads to having to check for certain things in your shaders. If you don’t they will crash or show weird visual glitches.
And then last, Recovery tests. These are a bit more difficult mainly because games often run as a monolith. Once the program crashes there is often no way to recover from it. You can do recovery tests with components that are started by the main game but the root process is pretty impossible. Some examples you could test for recovery might be some cache server that runs in or even out of process. Once the cache server crashes, how do you recover from that and how might you restart it.
Performance Testing
The next subject is performance testing or stress testing. These kinds of tests are notoriously hard to do properly. You will have to consider other processes that might be running on the device, and kill as much of them as you can. Luckily with unity we are dealing with an AOT compiler, but in JIT compilers performance testing requires all kinds of tricks to ‘warm up’ the environment for optimal testing. Warm up in this context refers to allowing the compiler to make N number of compilations to reach the optimal execution path. If you don’t know about JIT compilation, go check it out. It’s a cool process with lots of interesting optimizations. But basically, when the compiler notices some execution path is reached often, it will try to optimize it.
Performance testing in unity will most definitely involve opening up the profiler. When you enable deep profiling you’ll be able to dive pretty deep into the code that’s being run. You can see the number of calls and allocated memory. This will give you a hint on how to optimize the code. You’ll be surprised about the dragons you’ll encounter deep down in the trenches in the profiler. It’s a great tool so learn to use it. Once you understand it, it’s a really great tool to have on your toolbelt. For example: You can also inject your own user defined profiler samples to really allow for more information about your system to surface. If you’ve not used it before, give it a try next time you need to quantify something related to performance or resource utility.
Usability testing.
Usability testing is another topic that has great importance in game development. I think it both includes testing for ease of use and testing whether the game is actually fun to play. In these kinds of tests the focus lies on the human factors of the software that’s written. It also verifies if the requirements are implemented correctly from a human point of view. So, does your loot system, which is implemented perfectly according to specs, actually sparks some fun. I mean, we all know about these stupid ass game or seasonal passes to earn extra rewards. This is some truly atrocious money grabbing nonsense to try and keep players hooked to a game. And often enough it even works, which baffles my mind sometimes. Same goes for loot boxes… yeah, stupid shit.
Sorry for my rant, let get back to to book. As with any kind of testing, usability and play testing should be done as early in the process as possible. There is no better way to start testing than right now. The earlier, the better. David and Andrew say that “failure to meet usability criteria is just as big a bug as dividing by zero.”. And I agree, especially in an industry that focused purely on entertainment the fun factor and ease of use must be of high degree, or you will not survive.
Regression testing
Another form of testing is regression testing. This kind of testing is more like an umbrella term for many kinds of tests. Regression testing simply means test results from one run are compared to results of the previous run. This way we can ensure that features implemented and bugs fixed yesterday, still work as expected today. Regression tests form an important safety net to avoid unpleasant surprises. Regression tests can be of many kind of types like, unit, integration, usability or performance tests.
Exercising GUI systems
The next topic in the book is one that is really interesting; even more so because I can’t really remember what it’s about. The name of the section is called Exercising GUI Systems, in regard to testing of course. This perfectly matches games! Games can be really sophisticated in terms of presentation and UI. Don’t forget that the 2D or 3D world is also just another UI. The same practices to decouple business code from presentation apply to your game-objects just as it does to the UI to represent menu’s etc.
David and Andrew mention specialized gui testing tools like silenium or telerik. I’m not a big fan of such tools because these kinds of tests need lots of maintenance, especially when they purely rely on some previous recording or even screenshots. UI and presentation just changes too often to make such tests valuable. We do however use some inhouse build visual test framework at work. This simply compares one screenshot to another. So in order for this to work we need to keep tests really narrow and isolated and make the screenshot as focused on the visual as we can. So we are using the visual tests for really isolated bits, and not testing entire use-cases like logging in to a system and following the entire flow. That’s insane btw, that’s just business logic you can test with unit or integration tests, you shouldn’t require a visual test framework to test that sort of scenario’s.
And then, there is just this part of the system that simply cannot be tested. Visual aspects that are non-deterministic in nature are just impossible to test. My simplest example would be, how do you visually test a particle system? One could argue that we could start the particle system with the same random seed and then play it for n number of frames, where we can control the frames. And as a matter of fact, this is exactly how you should test such systems. Both the randomness and time must be in full control. Remember our discussion about temporal coupling? Temporal coupling happens when the system is coupled to time itself, let’s say, the Unity update loop. That’s not good, at least from a testing perspective. Yet, for such scenario’s you could also fall back on manual testing if there’s capacity to do so.
Testing the tests
The next section is about a really interesting topic; testing the test code. Testception right here! David and Andrew mention that if one is really hardcore about testing once could appoint a saboteur on the team who creates random branches and where he or she changes arbitrary code to see if the test suite catches it. If not, a new test case must be added. This is exactly what mutation testing does, but automatically. So definitely check out mutation testing if this sounds interesting to you. Mutation testing frameworks act on static analysis and change for example, mathematical or binary operators like a greater than, to a lower than and check if the code still works. If not, the tests will fail indicating the coverage is not high is enough. It would be interesting to see if something like copilot could generated the missing test case.
Testing thoroughly
One topic that will always be brought up when talking about testing is coverage. This time, it’s nothing different. Interestingly, Andrew and David make the case that code coverage is not the same as test state coverage. Back in 1999 they already knew this, and even far before that. Still, to this day, some manager kind of figures require X% of code coverage in the tests. But developers will game the metric and they will make any number they’re given. They will write tests, without assertions and will add annotations to exclude code from code coverage tools. That sort of shit.
Tightening the Net
And the last topic of this section is where Andrew and David say they will reveal the single most important concept in testing. It’s rather obvious, but the most important thing is that test need to capture requirements. If a bug slips, we will cover it with a test so it never ever happens again. No one should ever find the same bug twice. And although I agree, this will still happen from a customer or QA person point of view. A simple example would be that the user is unable to login. There are a million reasons why a user wouldn’t be able to login. That doesn’t mean its always the same reason, but there could be a million bugs that result in the user not being able to login.
But I guess we understand what David and Andrew mean; the same bug should never find its way to the surface more than once. The user not being able to login is a symptom of a million other bugs but the root cause can never be more than 1 case. Our test suite should cover that case from the point the bug was patched.
#end
01010010 01110101 01100010 01100101 01101110
Recent Comments