I am concerned about the CI, median and normal distribution

Originally posted at http://www.howtomeasureanything.com, on Wednesday, February 11, 2009 2:16:38 PM, by andrey.

“Hello Douglas,

First of all let me say I have thoroughly enjoyed reading your book. I have a technical background (software engineering) and have always been surprised at how “irresponsible” some business-level decision making can be – based on gut instincts and who-knows-what. This ‘intuitive’ approach is plagued with biases and heuristics, the effects of such approach has been widely publicized (for example here). This is one of many reasons I found your book very simulating and the AIE approach as a whole very promising.

However I have reservations about a few points you make. Please forgive me my ignorance if my questions are silly, my math has become rusty with the years passing by.

One of my concerns is the validity of the assumption that you make when explaining ‘The Rule of Five’, 90% CI and especially when using Monte Carlo simulation. I can believe (although it would’ve been great to see the sources) that ‘there is a 93% chance that the median of a population is between the smallest and largest values in any random sample of five from that population’. But when you are applying this to the Monte Carlo simulations, you assume that the mean (which is also the median for symmetric probability distributions) is exactly in the middle of the confidence interval. Which, I think, makes a big difference to the outcome because of the shape of the normal distribution function. If you assume the the median is for example very close to the lower or upper bound of the confidence interval by putting a different value into the =norminv(rand(),A, B) formula the results would be different.

I am still working through your book (second reading), trying to ‘digest’ and internalise it properly. I would be very grateful if you could explain this to me.

Thank you very much,

Andrey”

Thanks for your comment.

I don’t show a source (I’m the one who coined the term “Rule of Five” in this context) but I show the simple calculation, which is easily verifiable. The chance of randomly picking one sample with a parameter value above the true population median for that parameter is, by definition, 50%. We ask “what is the probability that I could happen to randomly choose five samples in a row that are all above the true population median?” It is the same chance as flipping five coins and getting all heads. The answer is 1/2^5 or 3.125% Likewise, the probability that we could have just picked 5 in a row that were all below the true population median is 3.125%. That means that there is a 93.75% chance that some were above and some were below – in other words, that the median is really between the min and max values in the sample. It’s not an “assumption” at all – it is a logically necessary conclusion from the meaning of the word “median”.

You can verify this experimentally as well. Try generating any large set of continuous values you like using any distribution you like (or just define a distribution function for such a set). Determine the median for the set. Then randomly select 5 from the set and see if the known population median is between the min and max values of those 5 samples. Repeat this a large number of times. You will find that 93.75% of the time the known median will be between the min and max values of the sample of 5.

I believe I also made it clear that this only applies to the median and not the mean. I further stated that if, on the other hand, you were able to make the determination that the distribution is symmetrical then, of course, it applies to the mean as well. Often, you may have reason to do this and this is no different than the assumption in any application of a t-stat or z-stat based calculation of a CI (which are always assumed to be symmetrical).

Furthermore, you certainly should not use a function that generates a normally distributed random number if you know it not to be normally distributed and I don’t believe I recommended otherwise. If you know the median and the mean of the distribution you intend to generate are not the same, then you can’t count on Excel’s normdist function to be a good approximation. For the sake of simplicity, I gave a very limited set of distributions functions for random values in Excel. But we can certainly add a lot more (triangular, beta, lognormal, etc.)

My approach is not to assume anything you don’t have to. If you don’t know that the distribution isn’t lopsided, you can simulate that uncertainty, too. Is it possible that the real distribution could actually be lognormal? Then put a probability on that and generate accordingly. Why “assume” something is true if we can explicitly model that we are uncertain about something?

Thanks,

Doug

In Russian

Hello, [I have] short question – where i can find the book in Russian? On the site it is written that the translation exists.

I`m writing postgraduate research on the efficiency of IT – it is quite interesting to take a deep look into your methodology, but the translation can significantly increase the speed of this process))

[Thanks] in advance,

Anton

Standard Deviation and Destributions

Originally posted at http://www.howtomeasureanything.com, on Tuesday, February 03, 2009 9:20:28 AM, by lascar.

“Hello.

I’ve enjoyed the book and am trying to apply AIE to some of our IT decision making. Unfortunately I don’t have a statistician’s background and my college days are quite far away. So, [I’m] trying to catch up on some basics. Wikipedia is amazingly helpful in this sense.

I have two questions. If they are two basic for this forum, I’d appreciate anyone at least directing me to some resources which might help me answer them. Direct answers of cause are even more welcomed.

1. In Monte Carlo example in the book, the assumption is that most of the variables have Normal distribution. And if not, there are 2 more distributions mentioned – Uniform and Binary. I guess these are most common? My question is: how does one quickly evaluate what type of distribution is fitting for a variable? I’d guess it is quite straight forward with binary distribution. However from this article (http://en.wikipedia.org/wiki/List_of_probability_distributions), it seams there is quite a choice of distributions.

The MC scenario I’m running is to evaluate performance of a software package. I also realize that a quick proof of concept (running software and collecting metrics) might shed more light on distribution of some metric/variable. However that requires acquiring an expensive license first. So the decision I’m trying to facilitate is to prove that we need a POC and I need to calculate the value of improving on these measurements with POC’s help – alas – that cost of POC is worth lowering the uncertainty of measurements.

2. In the same Monte Carlo example standard deviation of 3.29 is used and the statement is that it is for 90% CI. However I’ve stumbled on this article (http://en.wikipedia.org/wiki/Standard_deviation#Rules_for_normally_distributed_data) and it seams the standard deviation for 90% is 1.645. 3.49 is closer to 99% CI. Can someone clarify, please?

Thank you.”

 Thanks for your interest. First, yes there are quite a few distributions to choose from. I included the three simplest. The normal distribution is a very specific type of “bell curve”. I won’t go into how this bell curve is different from other bell curves, but the difference between this distribution and a uniform or binary is simple. The normal distribution is a range of values that are more likely in the middle but go out in both directions forever, albeit the odds are diminishingly small at the tails. The formula I gave for converting the bounds to normal distribution allows for values outside of your bounds – there is a 5% chance it could be higher than the upper bound and a 5% chance it is lower than the lower bound.

In a uniform distribution the values can only possible between the upper and lower bounds. Unlike the normal distribution, there is no chance the value could be outside of the bounds. Also, unlike the normal distribution, values are not more likely in the middle. Any value between the bounds of a uniform distribution is equally likely to any other value between the bounds. Use uniform distributions when you know that a variable can’t possibly be outside of the bounds. For example, if I know an uncertain variable about a productivity improvement from a new technology can’t be less than 0% and can’t be more than 10% (perhaps that’s the maximum amount of time spent on the activity being automated) then I would use a uniform distribution. However, if I’m not certain of those bounds but I think values around 5% are more likely than other values, then I might make it a normal distribution. Note that the normal, then, would allow the productivity improvement to be a negative value or greater than 10% even though values in the middle are more likely.

Binary is a simple one. It applies to events that either happen or do not. For example, if you are building a Monte Carlo simulation of a construction project and you want to model the chance of a labor strike, then you need to show that it will happen (or not) with a given probability. If there is a 10% chance of a strike then there is a 90% chance of no strike. The only values generated are either 1 or 0 and nothing in between. Of course, if there is a strike, you might want to use a normal or uniform to simulate the duration of the strike.

Regarding your second question, there is no inconsistency between the two values you mention. There are 3.29 standard deviations in a 90% CI if you subtract the upper bound from the lower bound, as I describe in the book. But there are 1.645 standard deviations from the middle of the range to either bound – which is half the distance between the bounds. (1.645 x 2 = 3.29) When you use 1.645, it is because you are starting with the middle value and computing the 90% CI. In the situation in the book, we start with a 90% CI and need to compute the standard deviation so we can simulate it.

Measuring Prevention

Originally posted on http://www.howtomeasureanything.com/forums/ on Monday, December 22, 2008 4:23:05 PM, by Dynera.

“Hi Doug,

I have your book on order and I was wondering if you cover the measurement of preventation effectivenss. For instance, I was involved with a software architecture team who’s purpose was to vet architecture designs. We recieved several feedback forms saying that our input was useful but besides that we really didn’t see any other way to demonstrate our effectiveness.

Any feedback is appreciated.

Thanks.

Paul”

Paul,

First, my apologies for the long delay in my response. Between the holidays and site problems, it’s been quite a while since I’ve looked at the Forums.

Vetting architectural designs should have measures similar to any quality control process. Since you are a software architect, I would suggest the following:

1) Ask for more specifics in your feedback forms. Ask if any of the suggestions actually changed a design and how. Ask which suggestions, specifically, changed the design.

2) Count up these”potential cost savings identified” and count these up for each design you review.

3) Periodically go into more detail with a random sample of clients and your suggestions to them. For this smaller sample, take a suggestion that was identfied as one that resulted in a specific change (as in point #1 above) and get more details about what would have changed. Was an error avoided that otherwise would have been costly and, if so, how costly? Some existing research or the calibrated estimates of your team might suffice to put a range on the costs of different problems if they had not been avoided.

4) You can estimate the percentage of total errors that your process finds. Periodically use independent reviewers who develop separately assess the same design and compare their findings. I explain a method in Chapter 14 for using the findings from two different vetters to determine the number of errors that neither found. In short, if both vetters routinely find all of the same errors, then you probably find most of them. If each vetter finds lots of errors that the other does not find, then then are probably a lot that neither find. You can also periodically check the process by the same method used to measure proofreaders – simply add a small set of errors yourself and see how many are found by the vetter.

This way, you can determine whether you are catching, say, 40% to 60% of the errors instead of 92% to 98% of the errors. And of the errors you find, you might determine that 12% to 30% would have caused rework or other problems in excess of 1 person-month of effort had it not been found. Subjective customer responses like “very satisfied” can be useful, but the measures I just described should be much more informative.

Thanks for your interest.

Doug Hubbard

First Print Errata

Welcome to the Errata thread in the book discussion of the How to Measure Anything Forum. An author goes through a lot of check with the publisher but some errors manage to get through. Some my fault, some caused by the typesetter or publisher not making previous changes. I just got my author’s copies 2 days ago (July 20th) about 2 weeks before it gets to the stores. But I already found a couple of errors. None should be confusing to the reader, but they were exasperating to me.  Here is the list so far.

1) Dedication: My oldest son’s name is Evan, not Even. My children are mentioned in the dedication and this one caused by wife to gasp when she saw it. I don’t know how this one slipped through any of the proofing by me but this is a big change priority for the next print run.

2) Preface, page XII: The sentence “Statistics and quantitative methods courses were still fresh in my mind and I in some cases when someone called something “immeasurable”; I would remember a specific example where it was actually measured.” The first “I” is unnecessary.

3) Acknowledgements, page XV: Freeman Dyson’s name is spelled wrong. Yes, this is the famous physicist. Fortunately, his name is at least spelled correctly in chapter 13, where I briefly refer to my interview with him. Unfortunately, the incorrect spelling also seems to have made it to the index.

4) Chapter 2, page 13: Emily Rosa’s experiment had a total of 28 therapists in her sample, not 21.

5) Chapter 3, Page 28. In the Rule of Five example the samples are 30, 60, 45, 80, and 60 minutes so the range should be 30 to 80- not 35 to 80.

6) Chapter 7: Page 91, Exhibit 7.3: In my writing, I had a habit of typing “*“ for multiplication since that is how it is used in Excel and most other spreadsheets. My instructions to my editor were to replace the asterisks with proper multiplication signs. They changed most of them throughout the book but the bottom of page 91 has several asterisks that were never changed to multiplication signs. Also, in step 4 there are asterisks next to the multiplication signs. This hasn’t seemed to confuse anyone I asked. People still correctly think the values are just being multiplied but might think the asterisk refers to a footnote (which it does not).

7) Chapter 10: Page 177-178: There is a error in the lower bound of a 90% confidence interval at the bottom of page 177 . I say that the range is “79% to 85%”. Actually, the 79% is the median of the range and the proper lower bound is 73%. On the next page I show an error in the column headings of Exhibit 10.4. I say that the second column is computed by subtracting one normdist() function in Excel from another. Actually, the order should be reversed so that the first term is subtracted from the first. As it is now, the formula would give a negative answer. Taking the negative of that number gives the correct value. I don’t think this should confuse most readers unless they try to recreate the detailed table (which I don’t expect most to do). Fortunately, the downloadable example spreadsheet referred to in this part of the book corrects that error. The correction is in the spreadsheet named Bayesian Inversion, Chapter 10 available in the downloads.

8) Chapter 12: page 205; There should be a period at the end of the first paragraph.

I’ve found errors in other books but you are the only author who posted them on a website as soon as the book came out. This ought to be the standard for how books deal with it. I was going to make a couple of errata entries but I see you already have them mentioned here. I’m sure you could probably find some statistics on average errors per book somewhere out there.

Bill:

9) Chapter 5, page 69. Third line from top of page – “He has has seen…” should be “He has seen…”. One has too many.

John Chandler-Pepelnjak :

Another one pretty close to your #4 above. Chapter 2, page 13. In my edition (don’t know if it is first print run or second), the half-width of your confidence interval is given as 16% and the resulting confidence interval is given as 44% to 66%. If Emily had 10 measurements on 28 therapists, the CI half-width should be 6%, giving a CI of 44% to 56%. So it looks like the 16% is a typo and the confidence interval has a typo. (If you use the corrected number of therapists then that half-width goes up to 7% and the CI is 43% to 57%.)

Thanks for writing this book. It’s an enjoyable read and an important message.

Thanks for your input and for your interest in my book. You’ve made an astute observation. I’ve had a couple of email conversations about this so your comment gives me an opportunity to summarize the point. Other than one minor caveat, you are correct.

First, it is important to point out that this partcular claim is simply about the chance of getting 44% of coin flips out of 280 on heads (i.e. about 123 out of 280) and not about the confidence interval for this particular study (which, of course, would have a mean of 44%). I only point that out because it confused a couple of other readers (apparently not you).

Of course, the relevant tool for this is the cumulative binomial distribution – specifically, 123 or fewer successes out of 280 trials with a 50% chance of success per trial. This comes out to 2.42% chance of anything up to and including that number of successes, which is close to the lower bound of a 95%CI. Likewise, every possibility up to and including 156 heads has a 97.58% chance. So we get pretty close to a 95% CI of 44% to 56%. Of course, this (combinatoric) distribution is meant to work with integers and there are some rounding issues. You can get small differences based on on whether you think the result of 123 or 156 heads should be included in the range or are jsut outside of the range (which I think might explain how you got a 7% half-width). But you are correct that there was definitely a typo.

Thanks,

Doug Hubbard

Computing the Value of Information

 Originally posted at http://www.howtomeasureanything.com, on Monday, December 08, 2008 10:08:08 PM, by Unknown.

“I’m trying to quickly identify the items I need to spend time measuring. In your book, you determine what to measure by computing the value of information. You refer to a macro that you run on your spreadsheet that automatically computes the value of information and thus permits you to identify those items most worth spending extra time to refine their measurements. Once I list potential things that I might want to measure, do I estimate, using my Calibrated Estimators, a range for the chance of being wrong and the cost of being wrong and using something like @RISK, multiply these two lists of probability distributions together to arrive at a list of distributions for all the things I might want to measure? Then, do I look over this list of values of information and select the few that have significantly higher values of information?

I don’t want you to reveal your proprietary macro, but am I on the right track to determining what the value of information is?”

You were on track right up to the sentence that starts Once I list potential things that I might want to measure. You already have calibrated estimates by that point. Remember, you have to have calibrated estimates first before you even can compute the value of information. Once the value of information indicates that you need to measure something, then its time to get new observations. As I mention in the book, you could also use calibrated estimates for this second round, but only if you are giving them new information that will allow them to reduce their uncertainty.

So, first you have your original calibrated estimates, THEN you compute the value of information, THEN you measure the things that matter. In addition to using calibrated estimators again (assuming you are finding new data to give them to reduce their ranges) I mention several methods in the book including decomposition, sampling methods, controlled experiments, and several other items. It just depends on what you need to measure.

Also, the chance of being wrong and the cost of being wrong can already be computed from the original calibrated estimates you provided and the business case you put them in. You do not have to estimate them separately in addition to the original calibrated estimates themselves. Look at the examples I gave in the chapter on the value of information.

My macros make it more convenient to more complicated information values, but they are not necessary for the simplest examples. Did you see how I computed the information values in that chapter? I already had calibrated estimates on the measurements themselves. Try a particular example and ask me about that example specifically if you are still having problems.

Thanks for your use of my book and please stay in touch.

Doug Hubbard