First Print Errata

Welcome to the Errata thread in the book discussion of the How to Measure Anything Forum. An author goes through a lot of check with the publisher but some errors manage to get through. Some my fault, some caused by the typesetter or publisher not making previous changes. I just got my author’s copies 2 days ago (July 20th) about 2 weeks before it gets to the stores. But I already found a couple of errors. None should be confusing to the reader, but they were exasperating to me. Here is the list so far.

1) Dedication: My oldest son’s name is Evan, not Even. My children are mentioned in the dedication and this one caused by wife to gasp when she saw it. I don’t know how this one slipped through any of the proofing by me but this is a big change priority for the next print run.

2) Preface, page XII: The sentence “Statistics and quantitative methods courses were still fresh in my mind and I in some cases when someone called something “immeasurable”; I would remember a specific example where it was actually measured.” The first “I” is unnecessary.

3) Acknowledgements, page XV: Freeman Dyson’s name is spelled wrong. Yes, this is the famous physicist. Fortunately, his name is at least spelled correctly in chapter 13, where I briefly refer to my interview with him. Unfortunately, the incorrect spelling also seems to have made it to the index.

4) Chapter 2, page 13: Emily Rosa’s experiment had a total of 28 therapists in her sample, not 21.

5) Chapter 3, Page 28. In the Rule of Five example the samples are 30, 60, 45, 80, and 60 minutes so the range should be 30 to 80- not 35 to 80.

6) Chapter 7: Page 91, Exhibit 7.3: In my writing, I had a habit of typing “*“ for multiplication since that is how it is used in Excel and most other spreadsheets. My instructions to my editor were to replace the asterisks with proper multiplication signs. They changed most of them throughout the book but the bottom of page 91 has several asterisks that were never changed to multiplication signs. Also, in step 4 there are asterisks next to the multiplication signs. This hasn’t seemed to confuse anyone I asked. People still correctly think the values are just being multiplied but might think the asterisk refers to a footnote (which it does not).

7) Chapter 10: Page 177-178: There is a error in the lower bound of a 90% confidence interval at the bottom of page 177 . I say that the range is “79% to 85%”. Actually, the 79% is the median of the range and the proper lower bound is 73%. On the next page I show an error in the column headings of Exhibit 10.4. I say that the second column is computed by subtracting one normdist() function in Excel from another. Actually, the order should be reversed so that the first term is subtracted from the first. As it is now, the formula would give a negative answer. Taking the negative of that number gives the correct value. I don’t think this should confuse most readers unless they try to recreate the detailed table (which I don’t expect most to do). Fortunately, the downloadable example spreadsheet referred to in this part of the book corrects that error. The correction is in the spreadsheet named Bayesian Inversion, Chapter 10 available in the downloads.

8) Chapter 12: page 205; There should be a period at the end of the first paragraph.

First Print Errata

Welcome to the Errata thread in the book discussion of the How to Measure Anything Forum. An author goes through a lot of check with the publisher but some errors manage to get through. Some my fault, some caused by the typesetter or publisher not making previous changes. I just got my author’s copies 2 days ago (July 20th) about 2 weeks before it gets to the stores. But I already found a couple of errors. None should be confusing to the reader, but they were exasperating to me.  Here is the list so far.

1) Dedication: My oldest son’s name is Evan, not Even. My children are mentioned in the dedication and this one caused by wife to gasp when she saw it. I don’t know how this one slipped through any of the proofing by me but this is a big change priority for the next print run.

2) Preface, page XII: The sentence “Statistics and quantitative methods courses were still fresh in my mind and I in some cases when someone called something “immeasurable”; I would remember a specific example where it was actually measured.” The first “I” is unnecessary.

3) Acknowledgements, page XV: Freeman Dyson’s name is spelled wrong. Yes, this is the famous physicist. Fortunately, his name is at least spelled correctly in chapter 13, where I briefly refer to my interview with him. Unfortunately, the incorrect spelling also seems to have made it to the index.

4) Chapter 2, page 13: Emily Rosa’s experiment had a total of 28 therapists in her sample, not 21.

5) Chapter 3, Page 28. In the Rule of Five example the samples are 30, 60, 45, 80, and 60 minutes so the range should be 30 to 80- not 35 to 80.

6) Chapter 7: Page 91, Exhibit 7.3: In my writing, I had a habit of typing “*“ for multiplication since that is how it is used in Excel and most other spreadsheets. My instructions to my editor were to replace the asterisks with proper multiplication signs. They changed most of them throughout the book but the bottom of page 91 has several asterisks that were never changed to multiplication signs. Also, in step 4 there are asterisks next to the multiplication signs. This hasn’t seemed to confuse anyone I asked. People still correctly think the values are just being multiplied but might think the asterisk refers to a footnote (which it does not).

7) Chapter 10: Page 177-178: There is a error in the lower bound of a 90% confidence interval at the bottom of page 177 . I say that the range is “79% to 85%”. Actually, the 79% is the median of the range and the proper lower bound is 73%. On the next page I show an error in the column headings of Exhibit 10.4. I say that the second column is computed by subtracting one normdist() function in Excel from another. Actually, the order should be reversed so that the first term is subtracted from the first. As it is now, the formula would give a negative answer. Taking the negative of that number gives the correct value. I don’t think this should confuse most readers unless they try to recreate the detailed table (which I don’t expect most to do). Fortunately, the downloadable example spreadsheet referred to in this part of the book corrects that error. The correction is in the spreadsheet named Bayesian Inversion, Chapter 10 available in the downloads.

8) Chapter 12: page 205; There should be a period at the end of the first paragraph.

I’ve found errors in other books but you are the only author who posted them on a website as soon as the book came out. This ought to be the standard for how books deal with it. I was going to make a couple of errata entries but I see you already have them mentioned here. I’m sure you could probably find some statistics on average errors per book somewhere out there.

Bill:

9) Chapter 5, page 69. Third line from top of page – “He has has seen…” should be “He has seen…”. One has too many.

John Chandler-Pepelnjak :

Another one pretty close to your #4 above. Chapter 2, page 13. In my edition (don’t know if it is first print run or second), the half-width of your confidence interval is given as 16% and the resulting confidence interval is given as 44% to 66%. If Emily had 10 measurements on 28 therapists, the CI half-width should be 6%, giving a CI of 44% to 56%. So it looks like the 16% is a typo and the confidence interval has a typo. (If you use the corrected number of therapists then that half-width goes up to 7% and the CI is 43% to 57%.)

Thanks for writing this book. It’s an enjoyable read and an important message.

Thanks for your input and for your interest in my book. You’ve made an astute observation. I’ve had a couple of email conversations about this so your comment gives me an opportunity to summarize the point. Other than one minor caveat, you are correct.

First, it is important to point out that this partcular claim is simply about the chance of getting 44% of coin flips out of 280 on heads (i.e. about 123 out of 280) and not about the confidence interval for this particular study (which, of course, would have a mean of 44%). I only point that out because it confused a couple of other readers (apparently not you).

Of course, the relevant tool for this is the cumulative binomial distribution – specifically, 123 or fewer successes out of 280 trials with a 50% chance of success per trial. This comes out to 2.42% chance of anything up to and including that number of successes, which is close to the lower bound of a 95%CI. Likewise, every possibility up to and including 156 heads has a 97.58% chance. So we get pretty close to a 95% CI of 44% to 56%. Of course, this (combinatoric) distribution is meant to work with integers and there are some rounding issues. You can get small differences based on on whether you think the result of 123 or 156 heads should be included in the range or are jsut outside of the range (which I think might explain how you got a 7% half-width). But you are correct that there was definitely a typo.

Thanks,

Doug Hubbard

Bayesian vs. Frequentist?

Under the Errata forum in a thread I called Second Print Run Corrections , one poster replied that he believed I incorrectly applied the term confidence interval in the book. I discuss several errors in that post in a reply in that thread. But it introduces another point of confusion apparently held by some about the difference between Bayesian vs. non-Bayesian methods in statistics and the epistemicologicaly philosophy debate of the frequentist vs. the subjectivist. I addressed it in another thread called Bayesian vs. Frequentist in this In the Clouds forum topic.

Second Print Run Corrections

The errata and typos in the first print run that were mentioned in the first thread on this topic have all be addressed in the second print run. Fortunately, the book was selling well enough that the publisher had to go to a second print run much sooner than any of us planned. That allowed me to get those corrections in.

Thanks to everyone who posted suggestions for changes!

Doug Hubbard

Errata Statistics

Although my publisher assures me that some errors always make it through the proofing process, each one is still frustrating to the author – mostly because the author had the chance at some point to catch almost every one of the errors.

My wife teaches math at a local community college and had taught math in a high-school for many years. She tells me that every text book she ever used had an errata sheet and that some had up to three pages of errata. In the proofing, we find scores – even hundreds – of errors, so it must be likely that some will get through. Can this likelihood be computed? If you read my book, you would say “Of course!” So I started thinking that if several people each find a number of errors over a period of time, I should be able to estimate the undiscovered errors.

One method I discuss in the book talks about methods for problems like this, including the catch, release & recatch approach for estimating fish populations. If two independent error-finding methods find some of the same errors but they each find errors the other did not find, then we can estimate the number that they both missed. I mention in the book that this same method can apply to estimating the number of people the Census missed counting or the number of unauthorized intrusions in your network that go undetected.

I had another method that I considered including in the book but, in the end, decided to leave out. This method is based on the idea that if you randomly search for errors in a book (or species of insects in the rain forest, or crimes in a neighborhood) the rate at which you find new instances will follow a pattern. Generally, finding unique instances will be easy at first but as the number