80 – Review: The Pragmatic Programmer, Duplication

#begin

In today’s blog we will start with chapter two called, A Pragmatic Approach. We will dive into code duplication, a topic that is very familiar to us since we have discussed in to great length already in previous blogs while discussing Clean Code and A Philosophy of Software Design. But this book provides a bit more in depth information about the different causes of duplication.

So let’s get started with the first section of this chapter: The Evils of duplication. This section warns you not to copy paste knowledge around but keep it in an isolated spot. As programmers we document knowledge in specifications, which we call code. However, this knowledge is not stable and it changes really rapidly and often. Requirements might change after a meeting with your client and or product owner.

There’s also regulations. So you might find your code is deprecated or obsolete when the government decides to change some law(s). Sucks eh!? But this all means that we must spend a large part of out time just maintaining code to make sure everything fits today’s standards and rules.
David and Thomas now point out something pretty interesting and that is, and I quote: “Most people assume that maintenance begins when an application is released, that maintenance means fixing bugs and enhancing features. We think these people are wrong.” And I fully agree. I think this is a very old mindset. This comes straight from the Waterfall era way of working. Teams used to work in a very phased matter, where the very last stage was the maintenance stage. But as we found out over the past blogs, complexity creep is a thing. Prof. Ousterhout defined 3 causes of complexity creep for us which were change amplification, cognitive load and unknown unknowns. If you want to get to know more about these issues, listen to the series about A Philosophy of Software Design. The full book is covered in previous blog series.

Uncle Bob also talks about complexity creep. He says that at some point the code will become so complex that no one will dare to change any of it. At that point the code will start to rot. Uncle bob has a nice metaphor for this phenomenon as well. He compares complexity to a tractor-pull. You know, these competitions where people drive a very strong tractor which needs to pull a weighted cart or trailer across a field. But the further the tractor comes, the more it starts to slow down from the weight that the trailer puts on the tractor. The one that can pull the trailer the furthest wins the competition.

I think this is indeed very comparable to how software is written sometimes. Sometimes an immense amount of technical dept is gathered and as a consequence teams are unable to move and HAVE to start over or re-engineer things completely. Such a waste of time and money.

So I agree with the authors of The Pragmatic Programmer. Maintenance of you application starts as soon as a single or more requirements change. You might see that you need to do maintenance when a customer wants some requirement slightly different, or when the environment changes. The authors say the following, and I quote: “maintenance is not a discrete activity, but a routine part of the entire development process”. Such a nice statement, and I agree. I think, if you have worked in the industry for just a short while you have encountered a situation where the requirements kept changing right beneath your feet. It sucks, but sometimes, while in a prototyping or exploration phase of a project this can happen. This is perfectly fine, however, the “business” or some upper management must understand that this stressful and they should not expect a perfect product once the prototype is done. A shit ton of work will probably have to be done to transform that prototype into a production ready product. We’ve talked about prototyping before, a prototype is is often a one time, throw away product. You try things out, cut major corners just to explore some features. But more on this later, let’s continue with maintenance.

So when we do maintenance you need to change code. If you have the same code copy pasted around everywhere, you will find you will have to do a lot of maintenance on the same thing, which will burn you down. It’s easy to duplicate knowledge in specifications, processes and programs, but you will find yourself in a maintenance nightmare.

And now the authors introduce a very well known acronym in our industry; The DRY principle. DRY stands for Don’t Repeat Yourself. I bet you have heard that before and yes, it comes from The Pragmatic Programmer. The idea behind DRY is the following: “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. And that’s easier said than done. But the alternative is to have the same thing expressed in two or more places. And if you change one, you have to change all the others too. That’s maintenance hell. It’s not a question whether you remember to change all the duplicates, but when you forget. Because you will inevitably forget to change one duplicate.

But how does duplication arise? Well, in the book they cite four reasons so let’s take a look. The first one is called imposed duplication. The authors say that developers might feel they have no choice because the environment or framework requires duplication. I think this can indeed happen sometimes. I can’t really come up with an example of this in a Unity3D context but I’m sure there is. Do you know anything, let me know in the comments. I can think of some examples where you endlessly implement the input or dragging interfaces on different objects and then expose events in order to notify some listener. But this is not necessarily duplication but a consequence of using interfaces. Which I think is very clean. But I bet you might be able to think of something far sinister.

The second reason why duplication might occur is called Inadvertent duplication. Developers don’t even notice when they are duplicating information. Haha, yes, this happens pretty often I think. How often have you implement some loop or LINQ query to search for something. Or maybe needed to check for equality of floats in Unity3D. The solution is these cases is often to write an extension function for the specific type, so for the loop or search algorithm you write an extension function for Ienumerable<T> and for float, an extension function for float. This way you can battle duplication in a very clean manner.

The third reason for duplication to arise is impatient duplication, haha. The reason here is that developers get lazy and duplicate because it seems easier. Yess… we have all succumbed to this. I certainly have… You might want to quickly wrap up some feature or fix a bug. And the path of least resistance is by duplicating some code and pasting it where ever you need it to be. This way you can fix a bug, but you have committed the crime of code duplication. And often it it will take some effort to find a clean solution to use some other code in order to avoid duplication. You might need to take on some unwanted dependency for example. And we have talked about this before, sometimes code looks the same, it does the same but it belongs to different domains or bounded contexts. This might mean, they are not duplicates because they can evolve in different ways. So also be careful when taking on some dependency just to fight duplication.

The fourth and final reason why duplication might occur is inter-developer duplication. This happens when multiple people or multiple teams duplicate information. Yes, I’ve seen this too. Teams might need to implement similar features, or abstractions and thus run into similar problems. So they both solve the same thing and thus there is duplication among code. So always communicate properly with other teams in order to make sure you can have some consensus about the code. And this is not as easy as it sounds. But if you are both working in the same repo on the same game, you definitely want to share abstractions or just a simple set of extension functions. If not, you will run into duplication at some point.

So those were the four reasons why duplication can arise in a project. The next couple of sections dive a bit deeper into all four of these subjects so let’s take a look 🙂

We’re starting with Imposed duplication. Sometimes duplication is imposed by some tool, framework or standard for example. And now they give a very nice example and I quote: “At a coding level, we often need to have the same information represented in different forms”. And yes this is so true, and I can give you a very concrete example of this, which I have been promoting in a previous blogs, which are: DataTransferObjects (DTO). These DTO’s are often just copies of entities created just for communication purposes. We use these so they can evolve separately from our business objects. So you might have a PlayerDTO, a Player business object and a Player monobehaviour. They all contains similar data, but they serve different bounded contexts and therefore and not duplicates in the sense of architecture, yet they are duplicates in the sense of the data they contain.

And next is a very interesting point, which will probably spark some discussion or rant on my end. The authors say that duplication can arise because of documentation in code. They say and I quote: ”Programmers are taught to comment their code: good code has lot’s of comments. Unfortunately, they are never taught why code needs comments: bad code requires lots of comments”. Haha, I guess this flies in the face of Prof. Ousterhout, and I’m sure Uncle Bob agrees with Andrew and David. “The DRY principle tells us to keep the low-level knowledge in the code, where it belongs, and reserve the comments for other, high level explanations.” You have to consider that when you update the code, you also need to update the comments. And as Prof. Ousterhout also admits in his book, A Philosophy of Software Design, bad comments often repeat implementation details. Thus, we are duplication knowledge and not adhering to the DRY principle.

So I think this is rather interesting. Uncle Bob will tell you to delete most of the comments in your code because they often lie and are misleading. Code should be self documenting. And it is pretty clear that Andrew and David are on the same line. I think, that Uncle Bob has such a strong opinion about comments because he also read The Pragmatic programmer back in 1999. I’m even pretty sure, because if you ask Uncle Bob which books he recommends people to read, this book will always come up in his list.

But Prof. Ousterhout has a totally different opinion. He loves comments and says they are fundamental to software design and abstraction. He says, without comments, there cannot be abstraction. According to Prof. Ousterhout comments can appear in different forms. Interface comments, which describe high-level information. But also comments that describe low-level implementation details, like his example of the substring function where the index is inclusive or exclusive. A third form is implementation comments, which are inlined or block comments that describe certain lines of code. I’ve talked about this to great lengths and I’m still of opinion that implementation comments are clutter and should be deleted.

But I think it is an interesting take on duplication. If you write a comment that simply repeats code or describes what the code is doing, that is considered duplication as well. I like this. Not because I think most comments can be removed, but just because it forces you to think better about the comment. This is also what Prof. Ousterhout teaches us in his book. He says that comments must always be on a different level of abstraction than the code it is commenting. So it’s either higher or lower, but never equal level of abstraction. Also, comments should never focus on the how or what the code is doing, but why it is doing it. And even if you write comments this way, you will probably be breaking the DRY principle according to The Pragmatic Programmer.

What do you think about this form of duplication. Would you consider comments as duplicates of the code? Please let me know, I’m really curious. I will argue that they actually are duplicates.

The last cause of imposed duplication the authors describe is language issues. They say that some programming languages are designed to duplicate certain parts. The prime example they give here is the fact that in C and C++ you have to make those header files, and implementation files. So the header files define the interface the implementation needs to adhere to. This feels a bit like duplication indeed now I think about it. On the other hand, I always liked having these header files. They can provide you with a quick view of what a class is doing, and if you need more information, you take the implementation and start reading. But in some sense I guess Andrew and David are right.

So these were the causes for imposed duplication. Let’s continue with another form of duplication called inadvertent duplication.

Inadvertent duplication can be difficult to spot and to solve. Developers sometimes do not realize they are actually duplicating code. The example in the book goes as follows; Imagine you created software for a delivery system. Some driver calls in sick and now you need to change the state of some driver class. However, do you also need to change the driver on a Truck class or maybe on the DeliveryRoute Class. Or both? I guess it will be both. So, there’s duplication here. However this kind of duplication is a mere reference to some Driver ID or something, I hope so at least. Still there is duplication. So in many cases like this there is some aggregate object that combines a Driver, Truck and DeliveryRoute into some object that keeps references to all of them. This way you can circumvent the duplication issue. I think this is the cleanest solution.

Another cause of inadvertent duplication might also be design issues like having a class called Line with 3 properties, start, end and length. Length should be a Get only property or function that calculates the length on the spot. Because you can always calculate the length based on start and end. It’s little things like this that can make your code much more usable and simplistic, and simplicity should be high on your agenda anyway.

Next up we take a bit of a deeper dive into impatient duplication. This is the kind of duplication you might introduce when you are on a really tight schedule. So instead of doing things nicely and creating the correct abstractions etc. You throw all of your disciplines out of the window and start copy pasting things around to get them to work. This is not a good thing. You should stick with your disciplines as long and as much as you can. We’ve talked about this in previous blogs as well. Sometimes just to make a deadline or please some client or manager you might hack something in, but after releasing, you go back immediately and fix it properly to limit the damage. This is not efficient no, but business goals are important too and sometimes we need to compromise. But, you should always go back and fix it to not built up that technical debt.

The authors also give a nice quote here saying: “short cuts make long delays”. Haha, how true is that in software. I think if you have take some shortcuts in your development adventures you know how much a delay some shortcut can give you in the future. A simple example I’ll give you in a Unity3D sense, which is not even a programming or development example but still. Imagine you are designing levels for your game. You have make lot’s of props, tree’s, foliage, buildings, characters and much more. But, you have not properly created nested prefab structures for these prefabs. And thus, you need to manually change lots and lots of assets scattered through many scenes. I have made this mistake for sure. If you setup your prefabs with a nested structure from the beginning, you can really easily swap things out.

As the authors also very thoughtfully say in the book: “Impatient duplication is an easy form to detect and handle, but it takes discipline and willingness to spend time up front to save pain later.” This is so true and I hope it makes sense to you as well.

The last form of duplication we will discuss is inter-developer duplication. I think this is the kind of duplication we all have, or will run into eventually. This is the kind of duplication that can arise when multiple teams work on a shared project and logic or data is copied around. I can give you a very simple example that I encounter very often; when you develop some Unity3D project that needs to talk to some backend service. You will have the same DTO objects on both the Unity3D side and the backend side. This is inadvisable since communication must go by the same data structures. Now, you could create some shared DLL that contains these DTO’s, and as a matter of fact, I tried this approach, but it works less nicely as you might expect. These DLL’s might go out of date and you need to update them etc. You need to update the DTO’s individually as well but I just found that keeping the DLL up to date took more effort since it wasn’t automated. Maybe, I could use another approach to put DTO’s into a separate project and then use that in the backend, and use the package manager in Unity to pull them in. Updating gets in the default way of working then because I regularly check the package manager for updates.

But to avoid interdeveloper duplication you really need proper communication and quick communication channels. I think nowadays, communication is done quickly over slack or discord or something similar. In the book they talk about newsgroups, this dates the book a little I guess haha. But also, design decisions and the architecture of the game should be described somewhere too. You won’t do that in Slack of course you need some proper way to document all of that stuff. So use something like a wiki or, as we do at work, Atlassian confluence. Not that I like confluence in particular but we use that tool and you need something or somewhere to store this kind of information. I’m also a bit unaware of other or better tooling for storing this kind of documentation so if you know something better, please let me know.

But that’s enough about duplication. I think it’s pretty clear that duplication is not a great thing. The thing I like the most is that comments that are on the same level of abstraction of the code are considered duplicates as well. I never really though about them this way but it makes total sense. I really like that because that’s yet another argument that comments should be useful of you decide to write them.

#end

01010010 01110101 01100010 01100101 01101110

#begin

#end

01010010 01110101 01100010 01100101 01101110

80 – Review: The Pragmatic Programmer, Duplication

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Hey, sorry to bother you but you can subscribe to my blog here.

You have Successfully Subscribed!