The Measurement Challenge

I’m reintroducing the Measurement Challenge for the blog.  I ran it for a couple of years on the old site and had some very interesting posts. 

Use this thread to post comments about the most difficult – or even apparently “impossible” – measurements you can imagine.  I am looking for truly difficult problems that might take more than a couple of rounds of query/response to resolve.  Give it your best shot!

Doug Hubbard

Can “expert” training increase confidence while making judgments worse?

I came across more interesting research about possible “placebo effects” in decision making.  According to the two studies cited below, receiving formal training in lie detection (e.g. so that law enforcement officers are more likely to detect a untruthful statement by a suspect) has a curious effect.  The training greatly increases confidence of the experts in their own judgments even though it may decrease their performance at detecting lies.  Such placebo effects were a central topic of The Failure of Risk Management.  I’m including this in the second edition of How to Measure Anything as another example of how some methods (like formal training) may seem  to work and increase confidence of the users but, in reality, don’t work at all.

  • DePaulo, B. M., Charlton, K., Cooper, H., Lindsay, J.J., Muhlenbruck, L. “The accuracy-confidence correlation in the detection of deception” Personality and Social Psychology Review, 1(4) pp 346-357, 1997
  • Kassin, S.M., Fong, C.T. “I’m innocent!: Effect of training on judgments of truth and deception in the interrogation room” Law and Human Behavior, 23 pp 499-516, 1999

Thanks to my colleague Michael Gordon-Smith in Australia for sending me these citations.

Doug Hubbard

What to say when they ask, “Why not 100% CI?”

Originally posted at http://www.howtomeasureanything.com, on Tuesday, September 08, 2009 8:20:46 AM, by Dreichel.

“Hello,

This is my first post on your board. I am excited to actually be able to pose this to the author (or to others who want to chime in).

First some background;

I work as an Analyst predicting Budget “Burn Rate”, that is to say “Here is what we forecasted we’d spend, here is what we actually spent”. A book like “How to Measure Anything” is invaluable because I am often asked to come up with ways to “measure the intangible” and more importantly predict what that value is going to be once you can measure it.

I do not always use formal statistical models to do my work. In fact, I tend to blend what I call “Common Sense” modeling into my approach. This involves using my past experience as a guideline to tell me when to use a six weighted moving average and when to consider a particularly unusual situation “This Holiday falls on a Saturday, even though we are not open, it WILL impact us, because people will take that Friday Off or Monday after”. Often, this common sense approach follows a quantitative logical pattern, but there is no set-in-stone approach to these methods.

Part of my job involves removing the statistical jargon and telling them in plain English “I took a 10% reduction in expected working hours for Friday and subsequently 15% for Monday because I expect a certain number of people to call in sick or take vacation around a Saturday Holiday (July 4th) greater than normal” or simplifying it further still.

The old “I just asked you what time it is, why are you telling me how to make a watch?” phrase applies here. I must keep my message simple.

Second the question (s);

ONE: I have incorporated the concept of “90% CI” into my approach from Chapter Two of the book for Forecasting. Naturally, if you read the book you have an idea what 90% CI means, so I won’t go into it here.

However, I often deal with Managers who do not understand the concept of 90% CI. I prepare sort of a Elevator (30 seconds or less) speech for what it means. However, I’d like to ask you for yours?

TWO: they ask for a Target Number for a Forecast, let’s say it is 10,124,556. This is created by adding up several other values that are provided to me from other sources. Unfortunately, I cannot round this to a less precise value like 10.1M when I express it. I would prefer to to do this, but they are used to seeing the dollar value.

They do not want that value expressed as a range. They already have metrics in place as a Target of +/- 5% to their Target Number. How do I assign a confidence factor to a single target value? I don’t feel that 90% CI is correct when applied to a single value.

THREE: This is the big question. How do you answer “I don’t care about 90% right, I want 100% CI I want to know what you think it WILL be?”

Many managers get hung up on the concept that you must go with your BEST guess, not your 90% guess. What are some things you have done when that came up in the past as I am sure I am not the only one to hear “100% CI”. They want to feel they are working with the best information possible and I try to explain that I am trying not to be over-confident that my number is perfect. They don’t seem to buy that. They want to know what do I REALLY think it will be.

As an example, the Author gave in the calibration tests. Get 9 out 10 questions correct. The typical person to bring up the “Why not 100%?” would say “What if I happen to know all 10 answers? you are saying purposely get one wrong, just to throw it off?”

FOUR:

Let’s go back to that forecasted number, but lets make it 10 million for sake of simple math. If I said the forecast was going to be 10 million dollars with a CI of 90%, they would then ask “Does that mean I can take 10% of 10 million +/- (in this case 1 million dollars) and you really think its going to fall within 9 million or 11 million?

This range would be unacceptable to someone who must hit their target within +/-5%, so if that is the case, should I be trying to get 95% CI? and if so, what additional rigor needs to take place to get there? They would ask if I could get 95% CI why not 100%?

[That’s] more than a few questions for an introductory post, so I’ll just sit back and hope to hear from you.”

Thanks for your question and for giving me the chance to cover some of these important concepts again.

For starters, don’t presume how much or how little other managers might understand if the material is explained correctly. I constantly run into some managers who warn me about how little “other” managers will understand about these basic concepts. Yet, when I explain it I don’t find any of the resistance they anticipated. What I do find more often is that the first manager didn’t quite understand the issues themselves and were explaining it poorly.

Some of your questions, in fact, indicate to me that we might have some confusion about the meaning and use of some of these concepts and missed some key points in the book. Otherwise, responses to the kinds of questions you encounter should be fairly obvious. For example, at one point, you reference an estimate of a “…10 million dollars with CI of 90%…”. 10 million dollars can’t be a CI because it isn’t an interval, but an exact point. You have to state an upper and lower bound to have an interval of any confidence. If you presented it that way, perhaps you would have avoided the speculation about what it might mean (i.e. “Does that mean I can take +/- 10% of 10 million”). I don’t encounter those types of questions because I give them the whole interval and don’t make them guess – as in “The 90% CI is 6 million to 14 million”. So first, I would make sure you feel you understand the concepts very well yourself before we infer how much others – who are understandably confused by that kind of comment – would understand if it were correctly presented.

The confidence interval, of course, must be an interval (i.e. a range with an upper and lower bound) and it must have a stated confidence (e.g. 90%). My elevator pitch for a 90% CI is a range of values for which there is a 90% chance that the range will actually contain the true answer. In other words, if I go back and look at all my 90% CI (e.g. over the last few months or years) I should find that about 90% of the intervals contained the true answer. (90% of the intervals for sales contained the actual sales)

The reason why we often use a 90% CI instead of a 100% CI is because often the 100% CI can be so wide it might be useless to us. The 100% CI for the change in the next day of the Dow Jones Industrial Average, for example, could be greater than +/- 25% (since larger price changes have occurred, we know it is possible). We are effectively saying that anything outside of the 100% CI is absolutely impossible and should never occur – ever. But the 90% CI for the one-day change in the DJIA is a little less than +/- 2%. We are saying that very large changes are possible, but it is much more likely to be in this narrower range. This is a useful description of our actual uncertainty.

Regarding the calibration exams, I explain that 10 is a very small sample and you could easily get 10 right by chance alone. However, since most people are initially very overconfident (that is, they are right much less often than their stated confidence would predict) the sample of 10 is usually sufficient to demonstrate they are not well calibrated. It is common for most people in their first attempt to get less than half of the answers within their stated 90% CI’s. If only 4 out of 10 of your 90% CI contained the true answer, then you are probably very overconfident. (A little math shows that if there really were a 90% chance that each interval contained the answer, then there should be only a 1 in 6807 chance you would get less than 5 out of 10 within the ranges.)

We also have to make sure you understand that under-confidence and overconfidence are equally undesirable. You could put absurdly wide ranges in all of your calibration tests but then those wouldn’t be your 90% CI and would not represent your real knowledge about the question. A range of, say, 1 to a million miles for the air distance between NYC and LA (a range I’ve seen someone use in the calibration tests) implies the estimator believes it is possible for NYC and LA to be as little as 5 miles apart or further apart than many times the circumference of the planet. This range does not represent their real knowledge. A well calibrated person is right just as often as they expect to be – no more, no less. They are right 90% of the time they say they are 90% confident and 75% of the time they say they are 75% confident.

Furthermore, I highly recommend calibration training for any manager who has to deal with uncertainties. I find that most of them (from a variety of industries and education backgrounds) understand it quite well. And when they get calibrated, they just don’t generate the kinds of questions you mention. Calibration puts your “common sense” to the test. Einstein said common sense is just all of the prejudices you accumulated by the age of 18. Your intuition about forecasts has a performance that can be measured and calibration is one way to measure it.

I would also recommend just collecting historical data about estimates in your organization. Apparently, they have been doing this for a while and you should have lots of historical data (and if you don’t have the data it is not too late to start tracking it). Once managers see how forecasts historically compared to actual outcomes they seem to “get” the point of ranges. At the very least, they will probably see that a perception of “+/-5%” certainty is an utter delusion.

By the way, I’m scheduling some webinars for calibration training. I’ll be covering all of these issues and more and people we overcome these problems by applying them in a series of tests.

Thanks for your questions.

Doug Hubbard

Lens Model: Negative Value

Originally posted at http://www.howtomeasureanything.com, on Sunday, September 06, 2009 9:20:07 PM, by sujoymitra17.

“While using Lens Model (multiple regression), I am getting negative scores for a few parameters and positive scores for few others. I am computing the score using the formula:

– <Coeff of parameter-1>*Val of parameter-1+<Coeff of parameter-2>*Val of parameter-2….+Intercept.

Since few parameters are showing -ve scores and others +ve (considering few have -ve correlation-coeff and others have +ve correlation-coeff), how do I formulate weights?”

I’m a little confused by your message. The coefficients in a regression model ARE the “weights”. The output of a regression analysis includes the coefficients. A regression analysis is how the weights are computed for a Lens model (the former is a tool for the later, they are not the same thing).

Are you actually performing a least-squares best-fit linear regression analysis? Are you using the regression tool in Excel? Just making a formula with parameters and coefficients is not a linear regression.

Getting negative coefficients is not necessarily a problem, since that actually makes sense for many situations (examples of negative coefficients include criminal convictions and income, body fat and life expectancy, driving speed and mileage, etc.) If you are doing an actual regression, then getting negative values is not a problem. It can even be an expected outcome.

Perhaps you can describe what you are attempting to do in more detail.

Thanks,

Doug Hubbard

First Print Errata

Welcome to the Errata thread in the book discussion of the How to Measure Anything Forum. An author goes through a lot of check with the publisher but some errors manage to get through. Some my fault, some caused by the typesetter or publisher not making previous changes. I just got my author’s copies 2 days ago (July 20th) about 2 weeks before it gets to the stores. But I already found a couple of errors. None should be confusing to the reader, but they were exasperating to me. Here is the list so far.

1) Dedication: My oldest son’s name is Evan, not Even. My children are mentioned in the dedication and this one caused by wife to gasp when she saw it. I don’t know how this one slipped through any of the proofing by me but this is a big change priority for the next print run.

2) Preface, page XII: The sentence “Statistics and quantitative methods courses were still fresh in my mind and I in some cases when someone called something “immeasurable”; I would remember a specific example where it was actually measured.” The first “I” is unnecessary.

3) Acknowledgements, page XV: Freeman Dyson’s name is spelled wrong. Yes, this is the famous physicist. Fortunately, his name is at least spelled correctly in chapter 13, where I briefly refer to my interview with him. Unfortunately, the incorrect spelling also seems to have made it to the index.

4) Chapter 2, page 13: Emily Rosa’s experiment had a total of 28 therapists in her sample, not 21.

5) Chapter 3, Page 28. In the Rule of Five example the samples are 30, 60, 45, 80, and 60 minutes so the range should be 30 to 80- not 35 to 80.

6) Chapter 7: Page 91, Exhibit 7.3: In my writing, I had a habit of typing “*“ for multiplication since that is how it is used in Excel and most other spreadsheets. My instructions to my editor were to replace the asterisks with proper multiplication signs. They changed most of them throughout the book but the bottom of page 91 has several asterisks that were never changed to multiplication signs. Also, in step 4 there are asterisks next to the multiplication signs. This hasn’t seemed to confuse anyone I asked. People still correctly think the values are just being multiplied but might think the asterisk refers to a footnote (which it does not).

7) Chapter 10: Page 177-178: There is a error in the lower bound of a 90% confidence interval at the bottom of page 177 . I say that the range is “79% to 85%”. Actually, the 79% is the median of the range and the proper lower bound is 73%. On the next page I show an error in the column headings of Exhibit 10.4. I say that the second column is computed by subtracting one normdist() function in Excel from another. Actually, the order should be reversed so that the first term is subtracted from the first. As it is now, the formula would give a negative answer. Taking the negative of that number gives the correct value. I don’t think this should confuse most readers unless they try to recreate the detailed table (which I don’t expect most to do). Fortunately, the downloadable example spreadsheet referred to in this part of the book corrects that error. The correction is in the spreadsheet named Bayesian Inversion, Chapter 10 available in the downloads.

8) Chapter 12: page 205; There should be a period at the end of the first paragraph.

Variation of Recatch Example

Originally posted to http://www.howtomeasureanything.com/forums/ on Monday, July 13, 2009 2:13:07 PM.

“I would love to see an example following upon the idea of estimating the population of all prospective clients which uses similar sampling method as recatching example. Could you do it for me?

Best regards

Adam”

We might need more details to work out the specific mechanics of this one, but we can discuss the concept. First, it is worth pointing out that the recatch example is just a way of using two independent sampling methods and comparing the overlap. In the case of the fish in the lake, the sampling methods were sequential (one was done after the other was done) and the overlap of the samples was determined by the tags that were left with the first sample of fish. Then when the second sample of fish was gathered, the proportion of that sample with tags would show how many fish were caught in both samples. From this and knowledge of each sample size, the entire population could be estimated.

But we don’t have to think of this as being sequential sampling where the first sampling leaves a mark of some kind (e.g. the tags on the fish) so that we see the overlap in the second sample. We can also run samples at the same time as long as we can identify individuals. People are simple enough to identify (since they have names, unique email addresses, etc.) so we don’t have to “tag” them between samples. (This is convenient, since I find that people rarely sit still while I try to apply the tag gun to their ear lobe.)

So if we had two independent sources attempt to identify prospects out of a population pool we could estimate the size of the prospect population. If two independent teams were using two different methods (perhaps two different phone surveyors or two different teams surveying people in malls), and if identification is captured, then the two teams could compare notes after the survey and determine how many individuals came up in both surveys.

The trick would be to find sampling methods that were truly independent of each other and the target population. If the population was “prospects in the city of Houston” and the sampling methods were mall surveys, then we should consider the possibility that not all prospects are equally likely to visit malls. If both survey methods were biased in the same way (tending to sample the same small subset of the target population), then the “recatch” method would underestimate the population size. If we used two completely different sampling methods (one mall survey and one phone survey) and the two methods were biased in a way that made prospects in one method less likely to be found by the other method, then the method will overestimate the total population.

As you can see, there are many variations on this method and each has challenges. The error could be high but, as I point out in the book, if it told you more than you knew before, then it can be a useful measurement.

Thanks,

Doug Hubbard