Random Numbers Out of Triangular Distribution

As description on Brighton-Webs beautifully explains:

The Triangular Distribution is typically used as a subjective description of a population for which there is only limited sample data.  It is based on a knowledge of the minimum and maximum and an inspired guess as to what the modal value might be.  Despite being a simplistic description of a population, it is a very useful distribution for modeling processes where the relationship between variables is known, but data is scarce (possibly because of the high cost of collection).

Data about task durations is scarce and usually difficult to collect. Tasks are rarely the same as projects are by definition never the same. So why complicate with difficult distributions like Beta, when Triangular is just as good. Most of the error will come from distribution parameters and not its shape.

Probability density function of Triangular Distribution (source: Wikipedia)

Triangular distribution is completely described by minimum (a), maximum (b) and most likley (c) value (mode). Get all the formulas on Brighton-Webs or Wikipedia.

How do we get these three points? Is minimum equal to optimistic or best case? Is maximum really pessimistic enough? Is worst case really the worst? Are we really so good, that we can provide best case and worst case estimates for ALL tasks? I haven’t seen this yet.

Steve McConnell asks the reader of his book Software Estimation: Demystifying the Black Art: How Good an Estimator Are You? He presents the reader with ten quiz like questions:

  • Surface temperature of the Sun,
  • Latitude of Shanghai,
  • Area of the Asian continent,
  • The year of Alexander the Great’s birth,
  • Total value of U.S. currency in circulation in 2004,
  • Total volume of the Great Lakes,
  • Wordwide box office receipts for the movie Titanic,
  • Total length of the coastline of the Pacific Ocean,
  • Number of book titles published in the U.S. since 1776 and
  • Heaviest blue whale ever recorded.

Then he asks the reader to provide lower and upper bounds for each question, so that there is a 90% chance of including the correct value. Looking at this from other direction this means that out of these 10 questions exactly one answer will fall outside of estimated range.

According to Steve McConnell, most of quiz takers are able to guess 1 to 3 answers (average 2.8), which tells us that there’s no way we can provide 100% accurate bounds of the estimates. This is really worrying.

The good thing is that we can measure this. Just check your last project where you used 3-point estimates and count how many of them hit the bounds. If you didn’t use 3-point estimates, provide them afterwards prefferably for the project you didn’t manage, but without looking at original estimates.

So in the real world, we cannot provide usefull bounds with 100% accuracy, but if we try hard, we may be able to provide 10th and 90th percentiles, which gives us more down to earth 80% accuracy. McConnell claims that if you’re good and try hard enough, you can only go up to 70%, which then leaves you with 15th and 85th percentiles. As said in the prevous paragraph, go and measure your accuracy and use the result in estimating distribution parameteres as described below.

Glen Alleman says that best case - worst case estimates are biased by personal risk avoidance factor and is also proposing 10th and 90th percentile.

Getting triangular distribution out of mode, lower/upper percentiles requires some number crunching. Luckily, Samuel Kotz & Johan René van Dorp have done this for us in BEYOND BETA: Other Continuous Families of Distributions with Bounded Support and Applications (2004).

You can find implementation of algorithm as part of my Monte Carlo Simulation for MS Project package on SourceForge, or browse the source of TriDist.bas directly.




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Random Numbers Out of Triangular Distribution”

  1. R. Timm Says:

    Great Blog. I ran across this looking for VBA Macros to Export Project Data to Excel. I disagree with the example claiming that folks are bad cost estimators / PMs if they cannot create an upper and lower bound around information they know nothing about. You have to do the research first by communicating with the engineers / experts.

    If you ask a golfer to put a 3 point estimate around how far Tiger Woods will drive the ball you will likely get a decent range. If you ask someone who has never heard of the game they aren’t going to provide a very good estimate.

  2. Sašo Says:

    Timm, of course I agree with that domain knowledge improves estimates, but at different stages of the project you can expect different levels of expertise. At the initial concept of a project you don’t know yet, whether a task of hitting a golf ball will be included in a plan or not.

    Example. You’re planning Olympic Games and you need to fire the olympic flame in the opening ceremony. It might be by fire golf ball hit by Tiger Woods or it might be in any other way. Before you come to this level of detail, there will be hundred checkpoints where different level of estimates will be required. When you apply for the host of the Olympics, you need to provide a cost estimate. You don’t ask Tiger Woods how far can he drive the ball at this stage.

    Steve McConnell talks about The Cone of Uncertainty. It charts the level of uncertainty depending on the time / phase into the project.

    The bottom line is that every estimate should be made by three points. Uncertainty involved in the estimate should be defined by the width of the range.

    Much more important than the width of the estimate is the probability, that the outcome will be inside the range. Using percentile estimates expands the ranges so that they at least to some extent cover for “out of range” errors in estimates.

Leave a Reply