Risk management in software projects, part 2

This is a second post in my series on risk management in software projects (see first post detailing the current state of risk management problems). In this post, I'm proposing a new strategy for software project risk management. I believe we can significantly reduce project risks by using a simple process to identify and avoid unknown risks while building new software.

Tackling the complexity monster

Instead of making core software design decisions on a whim, we can do it methodologically. It's useful to think each design decision carrying a certain amount of risk with it.

Best way to manage risk is to avoid it. To avoid risk is to avoid making decisions, that carry unknown amount of risk. Unknown amount of risk usually comes from doing things without prior experience.

Our estimation errors in software projects are much above what we normally associate with our ability to estimate things. It's not uncommon for an IT-project requiring 2x or 3x more work than originally planned. This extra work might be a result of many small decisions bringing increased complexity into the system and with complexity, unknown and surprising interactions that make the system considerably more difficult to develop and manage. Or, the extra work might even be the result of single failed design decision.

More technically, our errors might be modeled according to a fat-tailed distribution, with estimation error frequencies and estimation error amounts having a relationship according to a power law ¹. When estimation errors might grow exponentially, we need to work around the fact that the worst case scenario might have so much impact it ruins our project. So we need to focus on making sure we reduce our exposure to it.

So, to avoid having an unnecessary complexity and unknown risks, we need to keep track of how we succeed in minimizing the exposure. Keeping track of our decisions and measuring their outcome is a good way for individuals and organizations to improve their performance. Therefore, a simple chart, where you record the original risk estimate and assess it's accuracy afterwards, would be a good start.

Sample project risk tracking form

A sample risk assessment could look like this. Notice, that it's useful to categorize each design decision into a category:

CATEGORY	DESCRIPTION	DECISION	RISK ASSESSMENT	RISKS REALIZED
Architecture design
	Select framework for our CMS system.	We decided on a meeting at 18.01.2015 to use Drupal.	We have extensive experience with using and customization of Drupal, so we expect little surprises for it. We estimate being able to build the system well within original estimate of 300h of work. The risk for overextending the work amount is deemed low. Moreover, in the worst case, we estimate the extra work being at most 100h.	(fill this after project is finished)
Testing, CI and deployment
	Select deployment system.	We decided to try out Docker for both development and deployment work. This enables us to make sure, that we can run the software without effort on our PaaS provider.	We have little experience with Docker so far, so setting up the development environment might take us more time than using our existing templates. We assess the risk for this being fairly low. We also have to set up our CI to be able to run tests, build the container and make the deployment. This might turn out to be quite a lot of work. Because we have no prior experience of containers and the work it brings to our processes, the risk for worst case scenario is moderate or significant.	(fill this after project is finished)

In the example the risks are estimated just based on an educated guess. Like I said, precise estimation is often impractical, but even qualitative estimation should work towards our goals.

So, the conclusion here is, that in order to bring the total risk to more manageable level, we might need to reconsider decisions with very high worst case total workloads. This is the best way to handle tail risks. Be bold, when the harm done is minimal even in the worst case scenario, but be very conservative, when there's a chance of a big blow up.

Conclusion

The goal of this exercise is to

a) identify the risks
b) characterize the risks
c) potentially adjust our decisions based on the risk assessment
d) measure the outcome

When these are accounted for in the project, we have a basic risk management process, which is infinitely better than no process at all.

¹ While I don't yet have the data to back this up, I hope to publish a post later with numbers.