If it's not broken, don't fix it

Part of my posts covering Golden Rules of Engineering

Posted by Sami Tikka on January 24, 2013

Note. This is part of my posts covering "Golden Rules of Engineering".

This is the most important rule of engineering and also in my experience the one that is hardest for engineers to comprehend.

Description of the system

Suppose we have a working system, already designed and engineered. The system is also up and running. "Fixing" here means changing the system some way, that is not required for the system to carry out it's function successfully.

Usually systems require some time after they've brought to existence to be able to tell their stability and fitness for the task. Fine-tuning the system is an on-going process, but major design issues usually surface after short time of operation. After the system has been tuned and modified, operating it is rather straightforward and requires only operations personnel with instructions. This kind of system can be described as mature.

Motivations for fixing the system

Motivations for "fixing" the system can come out of several sources:

1) Out of curiosity

Engineer wants to update parts of the system to try out new improved parts, or methods. Engineer might be also responsible for systems operations and feels he/she wants to get a feel for new tools that might be needed later, perhaps with some other system. Now, we cannot justify the update by benefits to end-user, improved (measurable) performance or cost-savings. Instead, we are under impression, that learning the tool pays off in the future. Engineer might also feel, that updating the part is needed for future improvements, so that parts are already up-to-date and they don't need to be replaced all at once.

Now, stop and think about the system for a moment. Our system is mature. It doesn't need fixing, customer/end-user/operations are not getting any benefit for it. Mature system has revealed it's design to be robust, and stands to gain very little if at all from the update. The update might however introduce unexpected problems, so that some functions or whole system might be in danger to stop working correctly. Problems also might not be readily observed, but introduce themselves after a while. More of that in the next item.

This is a clear case of negative asymmetry. We stand to gain little, but stand to lose a lot.

2) Updating the system by improved design

Workings of the system could be improved by new design of some of the parts or sub-systems. New design requires changing some parts for new ones fit for the design.

The critical thing to consider is this: How much the system is going to be improved, vs. time and effort put into the design and engineering? Measuring the system improvement by performance metrics is not enough, one must also take into account stability of the old design vs. new one, i.e. is the new design as reliable as the old one?

There's obvious reasons when design update is needed, like when system has obvious limitations in it's features or system is not working in full capacity because of some sub-system. Then the gains are obvious and new design can be justified. This rule applies generally to mature systems, where we don't have these kinds of problems, but some of the design is a bit outdated.

Even if all goes well, and new design seems to be working, we still have one problem with doing changes to mature systems: We don't really know if they've made the system better (stability and performance wise) until after some time has passed. All of the gains are not readily observable while making the changes. So we cannot really tell, if the time and energy put into updated design has paid off until it might be already too late (considering already spent resources) or impossible to change the system back.

Applications of the 1st rule in software engineering

Especially important applications of this rule are library updates. If the library to be updated don't materially improve the workings of the system, update is not necessary and gains from it are negligible compared to potential harm. Also, refactoring should be done with considerable care, keeping in mind, that we might introduce unwanted behavior to the system.