by Douglas Hubbard | Aug 25, 2009 | Errata, How To Measure Anything Blogs, News
Welcome to the Errata thread in the book discussion of the How to Measure Anything Forum. An author goes through a lot of check with the publisher but some errors manage to get through. Some my fault, some caused by the typesetter or publisher not making previous changes. I just got my author’s copies 2 days ago (July 20th) about 2 weeks before it gets to the stores. But I already found a couple of errors. None should be confusing to the reader, but they were exasperating to me. Here is the list so far.
1) Dedication: My oldest son’s name is Evan, not Even. My children are mentioned in the dedication and this one caused by wife to gasp when she saw it. I don’t know how this one slipped through any of the proofing by me but this is a big change priority for the next print run.
2) Preface, page XII: The sentence “Statistics and quantitative methods courses were still fresh in my mind and I in some cases when someone called something “immeasurable”; I would remember a specific example where it was actually measured.” The first “I” is unnecessary.
3) Acknowledgements, page XV: Freeman Dyson’s name is spelled wrong. Yes, this is the famous physicist. Fortunately, his name is at least spelled correctly in chapter 13, where I briefly refer to my interview with him. Unfortunately, the incorrect spelling also seems to have made it to the index.
4) Chapter 2, page 13: Emily Rosa’s experiment had a total of 28 therapists in her sample, not 21.
5) Chapter 3, Page 28. In the Rule of Five example the samples are 30, 60, 45, 80, and 60 minutes so the range should be 30 to 80- not 35 to 80.
6) Chapter 7: Page 91, Exhibit 7.3: In my writing, I had a habit of typing “*“ for multiplication since that is how it is used in Excel and most other spreadsheets. My instructions to my editor were to replace the asterisks with proper multiplication signs. They changed most of them throughout the book but the bottom of page 91 has several asterisks that were never changed to multiplication signs. Also, in step 4 there are asterisks next to the multiplication signs. This hasn’t seemed to confuse anyone I asked. People still correctly think the values are just being multiplied but might think the asterisk refers to a footnote (which it does not).
7) Chapter 10: Page 177-178: There is a error in the lower bound of a 90% confidence interval at the bottom of page 177 . I say that the range is “79% to 85%”. Actually, the 79% is the median of the range and the proper lower bound is 73%. On the next page I show an error in the column headings of Exhibit 10.4. I say that the second column is computed by subtracting one normdist() function in Excel from another. Actually, the order should be reversed so that the first term is subtracted from the first. As it is now, the formula would give a negative answer. Taking the negative of that number gives the correct value. I don’t think this should confuse most readers unless they try to recreate the detailed table (which I don’t expect most to do). Fortunately, the downloadable example spreadsheet referred to in this part of the book corrects that error. The correction is in the spreadsheet named Bayesian Inversion, Chapter 10 available in the downloads.
8) Chapter 12: page 205; There should be a period at the end of the first paragraph.
by Douglas Hubbard | Jul 13, 2009 | How To Measure Anything Blogs, News
Originally posted to http://www.howtomeasureanything.com/forums/ on Monday, July 13, 2009 2:13:07 PM.
“I would love to see an example following upon the idea of estimating the population of all prospective clients which uses similar sampling method as recatching example. Could you do it for me?
Best regards
Adam”
We might need more details to work out the specific mechanics of this one, but we can discuss the concept. First, it is worth pointing out that the recatch example is just a way of using two independent sampling methods and comparing the overlap. In the case of the fish in the lake, the sampling methods were sequential (one was done after the other was done) and the overlap of the samples was determined by the tags that were left with the first sample of fish. Then when the second sample of fish was gathered, the proportion of that sample with tags would show how many fish were caught in both samples. From this and knowledge of each sample size, the entire population could be estimated.
But we don’t have to think of this as being sequential sampling where the first sampling leaves a mark of some kind (e.g. the tags on the fish) so that we see the overlap in the second sample. We can also run samples at the same time as long as we can identify individuals. People are simple enough to identify (since they have names, unique email addresses, etc.) so we don’t have to “tag” them between samples. (This is convenient, since I find that people rarely sit still while I try to apply the tag gun to their ear lobe.)
So if we had two independent sources attempt to identify prospects out of a population pool we could estimate the size of the prospect population. If two independent teams were using two different methods (perhaps two different phone surveyors or two different teams surveying people in malls), and if identification is captured, then the two teams could compare notes after the survey and determine how many individuals came up in both surveys.
The trick would be to find sampling methods that were truly independent of each other and the target population. If the population was “prospects in the city of Houston” and the sampling methods were mall surveys, then we should consider the possibility that not all prospects are equally likely to visit malls. If both survey methods were biased in the same way (tending to sample the same small subset of the target population), then the “recatch” method would underestimate the population size. If we used two completely different sampling methods (one mall survey and one phone survey) and the two methods were biased in a way that made prospects in one method less likely to be found by the other method, then the method will overestimate the total population.
As you can see, there are many variations on this method and each has challenges. The error could be high but, as I point out in the book, if it told you more than you knew before, then it can be a useful measurement.
Thanks,
Doug Hubbard
by Douglas Hubbard | Jun 22, 2009 | How To Measure Anything Blogs, News
Originally posted on http://www.howtomeasureanything.com/forums/ on Monday, June 22, 2009 4:33:03 AM.
“Greetings!
ON the page 187 there is a claim that the author has generated an example of random power law generator, but i cant find it from the examples. Can someone help me with this problem who has find the example from downloads?
THanks,
Markus Kantor”
Thanks for your question. I had it up briefly and discovered a flaw in the automatic histogram generation. I’m out of the country I’ll have the power law generator up by the end of June (much sooner if I find the time before I head back to the US).
Thanks for your patience
by Douglas Hubbard | Jun 12, 2009 | How To Measure Anything Blogs, News
This question was originally posted on Friday, June 12, 2009 10:47:57 AM by mzaret20000 on http://www.howtomeasureanything.com/forums/.
“Hi Douglas,
I have been asked to provide the gross margin for client engagements. My company is a recruiting firm that operates like an internal recruiting team (meaning that we charge fixed rates as we are doing the work regardless of outcome rather than billing a percentage of compensation hired). Most of our engagements are small so they aren’t not profitable in the absence of other engagements. I’m wondering what your strategy would be to make this type of measurement.”
Thanks for your post. At first glance, your problem seems like it could just be a matter of accounting procedures (e.g. revenue minus expenses and divided by revenue with consideration for issues like how you allocate marketing costs across projects, etc.) but let me presume you might mean something that is more complex. It might not be what I strictly call a “measurement” since it sounds like you probably have the accounting data you need already and you probably do not actually need to make additional observations to calculate it. This is more of a calculation based on given data and the issue is more about what it is you really want to compute.
Since you mentioned that your projects are small and not profitable n the absence of other engagements, perhaps what you really want to compute is some kind of break-even point based on fixed costs and marginal costs of doing business. But even that’s a guess on my part.
Perhaps you could describe why you need to know this. This is a typical question I ask. What decision could be different given this information? What actions of what parties would this information guide? Once you define that I find that the measurement problem is generally much clearer.
by Douglas Hubbard | Jun 8, 2009 | Facilitating Calibrated Estimates, How To Measure Anything Blogs, News
Originally posted on http://www.howtomeasureanything.com/forums/ on Wednesday, July 08, 2009 2:46:05 PM by
“Hi Doug,
I want to share an observation a V.P. made after doing the 10 pass fail questions. If one was to input 50% confidence to all the questions and randomly selected T/F they would be correct 1/2 the time the difference would be 2.5.
The scoring would indicate that that person was probably overconfident. Can you help here ?.
I am considering making the difference between the overall series of answers (as a decimal) and the Correct answers(as a decimal) as needing to be greater than 2.5 for someone to be probably overconfident.
please advise
Thaks in advance – Hugh”
Yes, that is a way to “game the system” and the simple scoring method I show would indicate the person was well calibrated (but not very informed about the topic of the questions). It is also possible to game the 90% CI questions by simply creating absurdly large ranges for 90% of the questions and ranges we know to be wrong for 10% of them. That way, they would always get 90% of the answers within their ranges.
If the test-takers were, say, students, who simply wanted to appear calibrated for the purpose of a grade, then I would not be surprised if they tried to game the system this way. But we assume that most people who want to get calibrated realize they are developing a skill they will need to apply in the real world. In such cases they know they really aren’t helping themselves by doing anything other than putting their best calibrated estimates on each individual question.
However, there are also ways to counter system-gaming even in situations where the test taker has no motivation whatsoever to actually learn how to apply probabilities realistically. In the next edition of How to Measure Anything I will discuss methods like the “Brier Score” which would penalize anyone who simply flipped a coin on each true/false question and answered them all as 50% confident. In a Brier Score, the test taker would have gotten a higher score if they put higher probabilities on questions they thought they had a good chance of getting right. Simply flipping a coin to answer all the questions on a T/F test and calling them each 50% confident produces a Brier score of zero.
Thanks for your interest,
Doug Hubbard
by Douglas Hubbard | Apr 30, 2009 | How To Measure Anything Blogs, News
Originally posted on http://www.howtomeasureanything.com/forums/ on Thursday, April 30, 2009 10:42:04 AM, by djr.
“Let me start off by saying, I really appreciate this book and have found it very useful. I enjoyed the calibration exercises and decided to include them in a semester class on decision analysis I’ve just finished with graduate students. Unfortunately it didn’t work out as I hoped.
While I saw progress for the group as a whole initially, only a few of the 27 students even neared the 90% target and when we took a break for 5 weeks, the skills slipped. Furthermore, the students who got close primarily did so by using such extreme ranges, that they (and I) felt the conclusion was they didn’t really know much about the questions. I sought to see if students who felt they were comfortable working with numbers did better, but they did not. Students who self-described themselves as analytical did somewhat better but it was not a strong relationship. Nevertheless, the students indicated for the most part they liked the exercises. It helped them realize they were overconfident and it made them think about estimation and uncertainty. However, making progress on getting more calibrated for the most part eluded them. I recognize that unlike the scenarios you described in the book, these students are not in the business of estimation and indeed many of them are quite possibly adverse to such estimating. But I argued they all nevertheless would estimate in their professional careers (public and nonprofit management).
I’m planning on doing this again but I wanted to pose two questions.
1. One strategy for “getting calibrated” at the 90% level is to choose increasingly wider ranges, even to the point where they seem ridiculous. For example on a question about the height of Mt. Everest in miles above sea level, one student put 0.1 miles to 100,000 miles. While strictly speaking this was a range that captured the true value, its usefulness as an uncertainty range is probably approaching zero. However from the students’ perspectives, answering in this way was getting them closer to the 90% confidence range that I was pushing on them. (Even with such ranges, many students were still at 50-70%.) What would your response be to this strategy if you saw it being used and what might I as an instructor suggest to improve this? Is the conclusion to be left with, you don’t know anything if you have to choose wide ranges? Are there other measures we should combine with this such as the width of the confidence intervals? Are there other mental exercises besides those in the book that might help?
2. While students did not do well on the 90% confidence interval questions, they did do fairly well on true/false questions where they then estimated their degree of confidence. More than three-fourths of the class did get within ten percent of their estimated level of confidence by the second true/false trial (though these came after several 90% confidence interval exercises as well). At the same time, students average confidence level for individual questions, did not correlate at all with the percent of the students who correctly guessed true/false. In the book there was no discussion of improvements or accuracy with the true/false type estimation questions and I wondered if you had any observations to offer on why this seemed easier and students were better on this type of estimation. In your experience, are these type of calibrations more/less effective or representative? Should they be very different from the 90% confidence intervals in terms of calibration?
Again, great book that I think could almost be a course for my students as it is.”
Thank you for this report on your experiences. Please feel free to continue to post your progress on calibration so we can all share in the feedback. I am building a “library” of findings from different people and I would very much like to add your results to the list. I am especially interested in how you asked students to describe themselves as analytical vs. those who did not. Please feel free to send details on those results or to call or email me directly. Also, since I now have two books out discussing calibration, please let me know which book you are referring to.
I item-tested these questions for the general business, government, analyst and management crowd. Perhaps that is one reason for the difference in perceived difficulty, but I doubt that alone would make up for the results you see. My experience is that about 70% of people achieve calibration after 5 tests. We might be emphasizing different calibration strategies. Here are my response to your two questions:
1) We need to be sure to explain that “under-confidence” is just as undesirable for assessing uncertainty as overconfidence. I doubt that student really believed Mt. Everest was several times larger than the diameter of the Earth, but if he/she literally had no sense of scale, I suppose that is possible. It is more likely that they didn’t really believe Mt. Everest could be 100,000 miles high or even 10,000 miles high. Remember to apply the equivalent bet. I suspect that person believed they had nearly a 100% chance of getting the answer within the range, not 90%. They should answer the questions such that they allow themselves a 5% chance that the true value is above the upper bound and a 5% chance it is below the lower bound. But if this truly is their range that best represents their honest uncertainty, then you are correct – they are telling you they have a lot of uncertainty and the ends of that range are not really that absurd to them.
2. Yes, they always appear to get calibrated on the binary questions first. But I do discuss how to improve the true/false questions. Remember that the “equivalent bet” can apply to true false questions as well. Furthermore, repetition and feedback is a strategy for improving on either ranges or true/false questions. Finally, the corrective strategy against “anchoring” involves treating each range question as two binary questions (the anchoring phenomenon may be a key reason why ranges are harder to calibrate than true/false questions). When answering range questions, many people first think of one number, then add or subtract another “error” value to get a range. This tends to result in narrower – and overconfident – ranges. As an alternative strategy, ask the students to make the lower bound such that they could say they would answer “True with 95% confidence” to the question “Is the true value above the lower bound?” This seems to significantly reduce overconfidence in ranges.
Thanks for this information and feel free to send detailed records of your observations. I may even be able to incorporate your observations in the second edition of the How to Measure Anything book (which I’m currently writing).
Thanks,
Doug Hubbard