Archive for the 'Risk Analysis' Category

Random Numbers Out of Triangular Distribution

As description on Brighton-Webs beautifully explains:

The Triangular Distribution is typically used as a subjective description of a population for which there is only limited sample data.  It is based on a knowledge of the minimum and maximum and an inspired guess as to what the modal value might be.  Despite being a simplistic description of a population, it is a very useful distribution for modeling processes where the relationship between variables is known, but data is scarce (possibly because of the high cost of collection).

Data about task durations is scarce and usually difficult to collect. Tasks are rarely the same as projects are by definition never the same. So why complicate with difficult distributions like Beta, when Triangular is just as good. Most of the error will come from distribution parameters and not its shape.

Probability density function of Triangular Distribution (source: Wikipedia)

Triangular distribution is completely described by minimum (a), maximum (b) and most likley (c) value (mode). Get all the formulas on Brighton-Webs or Wikipedia.

How do we get these three points? Is minimum equal to optimistic or best case? Is maximum really pessimistic enough? Is worst case really the worst? Are we really so good, that we can provide best case and worst case estimates for ALL tasks? I haven’t seen this yet.

Steve McConnell asks the reader of his book Software Estimation: Demystifying the Black Art: How Good an Estimator Are You? He presents the reader with ten quiz like questions:

  • Surface temperature of the Sun,
  • Latitude of Shanghai,
  • Area of the Asian continent,
  • The year of Alexander the Great’s birth,
  • Total value of U.S. currency in circulation in 2004,
  • Total volume of the Great Lakes,
  • Wordwide box office receipts for the movie Titanic,
  • Total length of the coastline of the Pacific Ocean,
  • Number of book titles published in the U.S. since 1776 and
  • Heaviest blue whale ever recorded.

Then he asks the reader to provide lower and upper bounds for each question, so that there is a 90% chance of including the correct value. Looking at this from other direction this means that out of these 10 questions exactly one answer will fall outside of estimated range.

According to Steve McConnell, most of quiz takers are able to guess 1 to 3 answers (average 2.8), which tells us that there’s no way we can provide 100% accurate bounds of the estimates. This is really worrying.

The good thing is that we can measure this. Just check your last project where you used 3-point estimates and count how many of them hit the bounds. If you didn’t use 3-point estimates, provide them afterwards prefferably for the project you didn’t manage, but without looking at original estimates.

So in the real world, we cannot provide usefull bounds with 100% accuracy, but if we try hard, we may be able to provide 10th and 90th percentiles, which gives us more down to earth 80% accuracy. McConnell claims that if you’re good and try hard enough, you can only go up to 70%, which then leaves you with 15th and 85th percentiles. As said in the prevous paragraph, go and measure your accuracy and use the result in estimating distribution parameteres as described below.

Glen Alleman says that best case – worst case estimates are biased by personal risk avoidance factor and is also proposing 10th and 90th percentile.

Getting triangular distribution out of mode, lower/upper percentiles requires some number crunching. Luckily, Samuel Kotz & Johan René van Dorp have done this for us in BEYOND BETA: Other Continuous Families of Distributions with Bounded Support and Applications (2004).

You can find implementation of algorithm as part of my Monte Carlo Simulation for MS Project package on SourceForge, or browse the source of TriDist.bas directly.



Monte Carlo Simulation with Microsoft Project

Project = uncertainty. Managing a project means dealing with uncertainties. Microsoft Project doesn’t really help you there. It applies Critical Path Method to determine, which tasks will delay the project if they are delayed and calculates how much slack time other tasks have.

If you can elaborate a bit on tasks durations and provide optimistic, pessimistic and expected durations, you can use Microsoft Project’s PERT Analysis tool to enter this data and calculate estimates for tasks.

But still, life is more complex than that. If you know optimistic, pessimistic and expected (most likely) durations, and assume that actual outcomes are triangularly distributed, you can run a Monte Carlo simulation and see how the overall project duration is distributed and what are the chances that you meet a certain deadline.

I found a neat VBA script to run the simulation right out of Microsoft Project. It works by using the same fields as Microsoft Project’s PERT Analysis Tool, that is Duration1, Duration2 and Duration3, assigns random, triangularly distributed duration to each task and at the end of each run, it exports durations to Microsoft Excel worksheet. As a result, you get a list of possible outcomes for each task, summary task as well as the whole project. Put it on chart and you’ll know how thing might turn out.

I rewrote the simulation from scratch to make it faster by storing all data in VB instead exporting it to Microsoft Excel and programed analytical charts like duration, work and cost sensitivity, criticality, project finish, duration, work and cost distribution right into the simulation.

You can find more on the project page on SourceForge.




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.