Originally posted at http://www.howtomeasureanything.com, on Tuesday, September 08, 2009 8:20:46 AM, by Dreichel.
This is my first post on your board. I am excited to actually be able to pose this to the author (or to others who want to chime in).
First some background;
I work as an Analyst predicting Budget “Burn Rate”, that is to say “Here is what we forecasted we’d spend, here is what we actually spent”. A book like “How to Measure Anything” is invaluable because I am often asked to come up with ways to “measure the intangible” and more importantly predict what that value is going to be once you can measure it.
I do not always use formal statistical models to do my work. In fact, I tend to blend what I call “Common Sense” modeling into my approach. This involves using my past experience as a guideline to tell me when to use a six weighted moving average and when to consider a particularly unusual situation “This Holiday falls on a Saturday, even though we are not open, it WILL impact us, because people will take that Friday Off or Monday after”. Often, this common sense approach follows a quantitative logical pattern, but there is no set-in-stone approach to these methods.
Part of my job involves removing the statistical jargon and telling them in plain English “I took a 10% reduction in expected working hours for Friday and subsequently 15% for Monday because I expect a certain number of people to call in sick or take vacation around a Saturday Holiday (July 4th) greater than normal” or simplifying it further still.
The old “I just asked you what time it is, why are you telling me how to make a watch?” phrase applies here. I must keep my message simple.
Second the question (s);
ONE: I have incorporated the concept of “90% CI” into my approach from Chapter Two of the book for Forecasting. Naturally, if you read the book you have an idea what 90% CI means, so I won’t go into it here.
However, I often deal with Managers who do not understand the concept of 90% CI. I prepare sort of a Elevator (30 seconds or less) speech for what it means. However, I’d like to ask you for yours?
TWO: they ask for a Target Number for a Forecast, let’s say it is 10,124,556. This is created by adding up several other values that are provided to me from other sources. Unfortunately, I cannot round this to a less precise value like 10.1M when I express it. I would prefer to to do this, but they are used to seeing the dollar value.
They do not want that value expressed as a range. They already have metrics in place as a Target of +/- 5% to their Target Number. How do I assign a confidence factor to a single target value? I don’t feel that 90% CI is correct when applied to a single value.
THREE: This is the big question. How do you answer “I don’t care about 90% right, I want 100% CI I want to know what you think it WILL be?”
Many managers get hung up on the concept that you must go with your BEST guess, not your 90% guess. What are some things you have done when that came up in the past as I am sure I am not the only one to hear “100% CI”. They want to feel they are working with the best information possible and I try to explain that I am trying not to be over-confident that my number is perfect. They don’t seem to buy that. They want to know what do I REALLY think it will be.
As an example, the Author gave in the calibration tests. Get 9 out 10 questions correct. The typical person to bring up the “Why not 100%?” would say “What if I happen to know all 10 answers? you are saying purposely get one wrong, just to throw it off?”
Let’s go back to that forecasted number, but lets make it 10 million for sake of simple math. If I said the forecast was going to be 10 million dollars with a CI of 90%, they would then ask “Does that mean I can take 10% of 10 million +/- (in this case 1 million dollars) and you really think its going to fall within 9 million or 11 million?
This range would be unacceptable to someone who must hit their target within +/-5%, so if that is the case, should I be trying to get 95% CI? and if so, what additional rigor needs to take place to get there? They would ask if I could get 95% CI why not 100%?
[That’s] more than a few questions for an introductory post, so I’ll just sit back and hope to hear from you.”
Thanks for your question and for giving me the chance to cover some of these important concepts again.
For starters, don’t presume how much or how little other managers might understand if the material is explained correctly. I constantly run into some managers who warn me about how little “other” managers will understand about these basic concepts. Yet, when I explain it I don’t find any of the resistance they anticipated. What I do find more often is that the first manager didn’t quite understand the issues themselves and were explaining it poorly.
Some of your questions, in fact, indicate to me that we might have some confusion about the meaning and use of some of these concepts and missed some key points in the book. Otherwise, responses to the kinds of questions you encounter should be fairly obvious. For example, at one point, you reference an estimate of a “…10 million dollars with CI of 90%…”. 10 million dollars can’t be a CI because it isn’t an interval, but an exact point. You have to state an upper and lower bound to have an interval of any confidence. If you presented it that way, perhaps you would have avoided the speculation about what it might mean (i.e. “Does that mean I can take +/- 10% of 10 million”). I don’t encounter those types of questions because I give them the whole interval and don’t make them guess – as in “The 90% CI is 6 million to 14 million”. So first, I would make sure you feel you understand the concepts very well yourself before we infer how much others – who are understandably confused by that kind of comment – would understand if it were correctly presented.
The confidence interval, of course, must be an interval (i.e. a range with an upper and lower bound) and it must have a stated confidence (e.g. 90%). My elevator pitch for a 90% CI is a range of values for which there is a 90% chance that the range will actually contain the true answer. In other words, if I go back and look at all my 90% CI (e.g. over the last few months or years) I should find that about 90% of the intervals contained the true answer. (90% of the intervals for sales contained the actual sales)
The reason why we often use a 90% CI instead of a 100% CI is because often the 100% CI can be so wide it might be useless to us. The 100% CI for the change in the next day of the Dow Jones Industrial Average, for example, could be greater than +/- 25% (since larger price changes have occurred, we know it is possible). We are effectively saying that anything outside of the 100% CI is absolutely impossible and should never occur – ever. But the 90% CI for the one-day change in the DJIA is a little less than +/- 2%. We are saying that very large changes are possible, but it is much more likely to be in this narrower range. This is a useful description of our actual uncertainty.
Regarding the calibration exams, I explain that 10 is a very small sample and you could easily get 10 right by chance alone. However, since most people are initially very overconfident (that is, they are right much less often than their stated confidence would predict) the sample of 10 is usually sufficient to demonstrate they are not well calibrated. It is common for most people in their first attempt to get less than half of the answers within their stated 90% CI’s. If only 4 out of 10 of your 90% CI contained the true answer, then you are probably very overconfident. (A little math shows that if there really were a 90% chance that each interval contained the answer, then there should be only a 1 in 6807 chance you would get less than 5 out of 10 within the ranges.)
We also have to make sure you understand that under-confidence and overconfidence are equally undesirable. You could put absurdly wide ranges in all of your calibration tests but then those wouldn’t be your 90% CI and would not represent your real knowledge about the question. A range of, say, 1 to a million miles for the air distance between NYC and LA (a range I’ve seen someone use in the calibration tests) implies the estimator believes it is possible for NYC and LA to be as little as 5 miles apart or further apart than many times the circumference of the planet. This range does not represent their real knowledge. A well calibrated person is right just as often as they expect to be – no more, no less. They are right 90% of the time they say they are 90% confident and 75% of the time they say they are 75% confident.
Furthermore, I highly recommend calibration training for any manager who has to deal with uncertainties. I find that most of them (from a variety of industries and education backgrounds) understand it quite well. And when they get calibrated, they just don’t generate the kinds of questions you mention. Calibration puts your “common sense” to the test. Einstein said common sense is just all of the prejudices you accumulated by the age of 18. Your intuition about forecasts has a performance that can be measured and calibration is one way to measure it.
I would also recommend just collecting historical data about estimates in your organization. Apparently, they have been doing this for a while and you should have lots of historical data (and if you don’t have the data it is not too late to start tracking it). Once managers see how forecasts historically compared to actual outcomes they seem to “get” the point of ranges. At the very least, they will probably see that a perception of “+/-5%” certainty is an utter delusion.
By the way, I’m scheduling some webinars for calibration training. I’ll be covering all of these issues and more and people we overcome these problems by applying them in a series of tests.
Thanks for your questions.
First, thank you very much for the insight!
Let me clarify a couple things I may not have mentioned above;
don’t presume how much or how little other managers might understand if the material is explained correctly
In this case, I mean Managers who self-describe as “Big Picture People” and often require diagrams and conceptual. They avoid complexity like a plague. So when I say I must simplify my message, the higher up the chain, the less slides in the power-point approach is what I am talking about. (If you could give your Manager 20 page detailed slide, by the time it gets to his boss, it must be gelled down to 10 slides and to their boss 1 slide).
For example, at one point, you reference an estimate of a “…10 million dollars with CI of 90%…”.
What I mean, is they are asking (asking is a polite word for ‘expecting’) a target number. I would prefer to express a range.
They may be operating with a pre-defined SLA of +/- 5% to within the Forecast as something their boss handed down to them. I am not in a position to offer them the historical example that it they’ve never made their goal in the past so therefore the range should be increased. So the +/-5% is a fixed range off the target number.
So how can I offer a target number for them, with some sort of footnote expressing my own take on how accurate this is?
Imagine for example this is the data;
Jan Forecast 10 M Actuals 10.3 M
Feb Forecast 10 M Actuals 10.5 M
Mar Forecast 11 M Actuals 10.3 M
Apr Forecast 11 M Actuals (not yet in)
In each case, a Target Number was established in their system of record (They cannot enter ranges, they must simply input how many dollars they expect by line item). In Jan and Feb they hit their target of +/- 5%, and in March they missed it.
Q: Can you calculate how confident you are that they are going to hit their goal in April and express that as a value from this? and if so, how.
My elevator pitch for a 90% CI is a range of values for which there is a 90% chance that the range will actually contain the true answer.
I was unclear here. I really meant, “why 90%, and not 100%”
What I think you are saying (to paraphrase) is that to get the 100% I’d have to increase the uncertainty by increasing the range.
I suppose the question is more of how do you increase the accuracy from 90% confident to 95 or 100% confident without increasing the range.
The reason why we often use a 90% CI instead of a 100% CI is because often the 100% CI can be so wide it might be useless to us.
I am beggining to understand where the confusion can come from. If Increasing your CI causes you to increase the range, then lowering your CI (let’s say 75%) means you decrease the range? but now you are saying “I am 75% confident this number is correct”. So tightening up the range to give more precision is a double edged sword because it reduces your confidence that it is correct.
You could put absurdly wide ranges in all of your calibration tests but then those wouldn’t be your 90% CI
Yes, I get that. That is where it seems a little fuzzy to me in determining that if I get 9 out of 10 right, I could have put wide ranges in 9 of them and tight range in the 10th and still feel I got a 90%. Is there a second value that can be used to help understand how much precision is required?
Let’s say you are measuring airplane wingspan distances.
Your book says that you had someone who had difficulty with this question. She said she had no idea, had to guess. Then you clarified by establishing using the absurdity test the upper and lower range. “Could it be 1,000 feet?” and she was “Of course not”.
Yet, if you wanted to be accurate you COULD say from 1-1,000 feet and there is no quantifiable way other than judgement that this is absurd. No passenger plane has a 1 foot wingspan. However, depending on the scale of what you are doing the degree you could be off by, may not matter.
As an example, I am building an airplane hanger. I want to know the maximum wingspan I need to store, so I don’t care about the lower limit, 1 foot is fine.
How, do you establish a quantifiable metric for determining how much span in the range for the range there should be? “I need it to be within 20 feet of the actual wingspan of a 747”. Instead, “I need it to be within 50 feet”? In your system, its pass or fail. I answer 180-220 and the span is 200 feet or I answer is 150-250 both are “True”. How do I determine what is a reasonable amount of precision that the range must fall in to be considered a “True” answer? Both are correct and neither really feel ‘absurd’.
I understand that you could impose a lower CI on the more specific 180-220 answer, and I may not be asking this correctly but I just feel like there must be some way to express that the range must fall within a certain % to be considered true.
I think we still need to clear up some concepts. The same probability distribution can have a 75% CI, a 90% CI and a 99.5% CI. Which interval you choose to describe it doesn’t change the underlying distribution. Consider a normal distribution with a mean of 100 and a standard deviation of 20. Which of the following are true:
1) This has a 75% CI of 77 to 123
2) This has a 90% CI of 67.1 to 132.9
3) This has a 99.5% CI of 43.9 to 156.1
The answer is that all three statements are true. They are all descriptions of the same distribution. Choosing to describe this distribution with its 99.5% CI instead of its 90% CI doesn’t change the distribution at all. It is still a normal distribution with a mean of 100 and a standard deviation of 20. If we are estimating daily demand of a rental car service and I don’t want to accept a probability of higher than 10% that I run out of cars in any given day, I set my lot size to 126 cars. If that’s my acceptable risk, it’s the same lot size regardless of whether I started with the 75% CI or the 99.5% CI because both are descriptions of a distribution of a mean of 100 and a standard deviation of 20. I think this can address some of your issues.
The reason you can’t use a 100% CI interval here is that, in theory, a normal distribution is infinite in both directions. There is no 100% CI on a normal distribution. Other distributions do have absolute limits – like a uniform or triangular. I just wanted to be clear that choosing to express a probability distribution with a different interval doesn’t mean the distribution changed. It’s like measuring your weight in pounds or kilograms doesn’t change you weight, just the method of description.
When you ask how you increase the confidence without increasing the range (or conversely, narrowing the range without reducing the confidence) then that is where measurement comes in. As I explain in the book, that’s actually what empirical measurements do. You don’t just arbitrarily pick a different range or change the confidence level to make it seem like your uncertainty is less without actually making new measurements. You make new observations that reduce your uncertainty. That’s what measurements mean.
In your example, the +/-5% appears to be a performance objective for an SLA, not a measurement or forecast. You have to separate the two. One is what you would like to happen. The other is what you think will happen given the information you have. You can set goals that are exact numbers (they are actually the lower bound) and still predict that there is only a 80% that they will be met or exceeded.
And you keep coming back to statements like “I am 75% confident this number is correct”. A single number does not a CI make. You need a range.
Regarding the calibration questions, yes, you could have put wide ranges on 9 of them and a small range on 1 to get 9 out of 10 right. But then 9 of them would be under-confident ranges and 1 would be overconfident. The objective is not just to trick the test but to actually improve your ability to assess odds and ranges subjectively. The only way to do that is to make a set of answers where you would be indifferent on betting on any of your ranges. If you answered the test questions the way you suggested, then you would clearly think it is more likely for some ranges to contain the true answer than others. That should not occur if they are all your honest 90% CI.
The wingspan example I mention in the book is actually an example for how absurdly wide ranges can be used to come down to properly calibrated ranges. It is not an argument for using absurdly wide ranges as your actual estimate. Instead, it shows how you really do know something in cases where you thought you knew nothing.
You say your managers are “Big Picture People” and avoid complexity. But, honestly, I was confused by some of your explanations so I’m not surprised they would be. And all “Big Picture” people understand the concept of making a bet (perhaps that’s why my How to Measure Anything book sells so well).
Thanks again for your posts,
“A well calibrated person is right just as often as they expect to be – no more, no less.” <- the semi-circular nature of this sentence really gave me a good belly-laugh 🙂
I see what you mean but, in fact, there is nothing remotely circular or even semi-circular about it. Most people are not right just as often as they expect to be so this is no tautology. It is not self-evident that a person is right just as often as they expect to be because, when that is measured, it turns out not to be the case. When we track how often someone says there is a 90% probability that some claim they make is true, we will find that they turn out to be correct perhaps 60% to 80% of the time. If we ask them for a range that they believe has a 90% chance of containing the true value (e.g. a project cost estimate) and we track all such ranges against actual outcomes, we find that much less than 50% of the ranges contain the true values.
A calibrated person is trained so that when we track all of the times they said there was a 90% probability of some event, we will find that the event actually occured about 90% of the time. And when they say that an event has an 80% probability, that event will occur about 80% of the time. So, that is what we mean when we say they are right just as often as they expect to be – no more, no less.
This is an old thread, but worth reviving for the common misconceptions and for the value of your responses.
It’s a common problem…dealing with managers who may want “best” point estimates rather than interval estimates…even when they truly do understand the concepts. In these cases where you want to preserve some sense of uncertainty while still providing a single value, then the analyst can resort to the one-sided LCB (or UCB depending on the good/bad context). You may have implied this in your response to the original Q#2. Then at least the analyst can say “I am 90% confident the value is above X million.”
Usually there is a good/bad direction on the scale, so a UCB or LCB may be sufficient. That said, ignoring uncertainty in either direction can lead to bad (or non-) decisions.