How do you backtest a probability estimate?

A question asked by a subscriber:

I am sympathetic to the concept of using a Monte Carlo approach, perhaps using a cascaded series of Monte Carlo models to model inputs, to produce an estimated probability distribution. This seems like a more reasonable approach to modeling reality than to assume the distribution follows a normal or lognormal distribution.

To me, this seems like a logical extension of the budgeting approach familiar to every accountant, with incremental improvements to such a model building through time. So long as someone does not become ‘wedded to the model’, this process is powerful.

To avoid becoming ‘wedded to the model’, it seems to me that it is necessary to identify the parameters for your inputs (or the environment which affects those inputs) within which you believe your model will be robust. Movements of your inputs outside of this range should trigger a re-evaluation of your model.

For those who are wedded to VaR, you can even ‘measure’ the risk associated with your model as a 5% VaR etc. if you want to lose much of the detail of what you have done .. there is sometimes a place for a summary measure, so long as it does not become the input to a subsequent calculation.

I am convinced of the importance of accurate calibration and backtesting of models.

What I am less clear about is how you can ‘backtest’ a probabilistic model on anything other than a gross basis. How do you know whether an observed event is a 1 in a 100 event or a 1 in 10 event? Clearly if there is sufficient data, then the usual probabilistic maths can be used .. but what about where we are dealing with unusual, but perhaps critical, events?

Is the only answer to use traditional statistics to measure the confidence we have in our model? And if so, how can these measures be incorporated into the re-iteration of our Monte Carlo model?

First Print Errata

Welcome to the Errata thread in the book discussion of the How to Measure Anything Forum. An author goes through a lot of check with the publisher but some errors manage to get through. Some my fault, some caused by the typesetter or publisher not making previous changes. I just got my author’s copies 2 days ago (July 20th) about 2 weeks before it gets to the stores. But I already found a couple of errors. None should be confusing to the reader, but they were exasperating to me. Here is the list so far.

1) Dedication: My oldest son’s name is Evan, not Even. My children are mentioned in the dedication and this one caused by wife to gasp when she saw it. I don’t know how this one slipped through any of the proofing by me but this is a big change priority for the next print run.

2) Preface, page XII: The sentence “Statistics and quantitative methods courses were still fresh in my mind and I in some cases when someone called something “immeasurable”; I would remember a specific example where it was actually measured.” The first “I” is unnecessary.

3) Acknowledgements, page XV: Freeman Dyson’s name is spelled wrong. Yes, this is the famous physicist. Fortunately, his name is at least spelled correctly in chapter 13, where I briefly refer to my interview with him. Unfortunately, the incorrect spelling also seems to have made it to the index.

4) Chapter 2, page 13: Emily Rosa’s experiment had a total of 28 therapists in her sample, not 21.

5) Chapter 3, Page 28. In the Rule of Five example the samples are 30, 60, 45, 80, and 60 minutes so the range should be 30 to 80- not 35 to 80.

6) Chapter 7: Page 91, Exhibit 7.3: In my writing, I had a habit of typing “*“ for multiplication since that is how it is used in Excel and most other spreadsheets. My instructions to my editor were to replace the asterisks with proper multiplication signs. They changed most of them throughout the book but the bottom of page 91 has several asterisks that were never changed to multiplication signs. Also, in step 4 there are asterisks next to the multiplication signs. This hasn’t seemed to confuse anyone I asked. People still correctly think the values are just being multiplied but might think the asterisk refers to a footnote (which it does not).

7) Chapter 10: Page 177-178: There is a error in the lower bound of a 90% confidence interval at the bottom of page 177 . I say that the range is “79% to 85%”. Actually, the 79% is the median of the range and the proper lower bound is 73%. On the next page I show an error in the column headings of Exhibit 10.4. I say that the second column is computed by subtracting one normdist() function in Excel from another. Actually, the order should be reversed so that the first term is subtracted from the first. As it is now, the formula would give a negative answer. Taking the negative of that number gives the correct value. I don’t think this should confuse most readers unless they try to recreate the detailed table (which I don’t expect most to do). Fortunately, the downloadable example spreadsheet referred to in this part of the book corrects that error. The correction is in the spreadsheet named Bayesian Inversion, Chapter 10 available in the downloads.

8) Chapter 12: page 205; There should be a period at the end of the first paragraph.

Variation of Recatch Example

Originally posted to http://www.howtomeasureanything.com/forums/ on Monday, July 13, 2009 2:13:07 PM.

“I would love to see an example following upon the idea of estimating the population of all prospective clients which uses similar sampling method as recatching example. Could you do it for me?

Best regards

Adam”

We might need more details to work out the specific mechanics of this one, but we can discuss the concept. First, it is worth pointing out that the recatch example is just a way of using two independent sampling methods and comparing the overlap. In the case of the fish in the lake, the sampling methods were sequential (one was done after the other was done) and the overlap of the samples was determined by the tags that were left with the first sample of fish. Then when the second sample of fish was gathered, the proportion of that sample with tags would show how many fish were caught in both samples. From this and knowledge of each sample size, the entire population could be estimated.

But we don’t have to think of this as being sequential sampling where the first sampling leaves a mark of some kind (e.g. the tags on the fish) so that we see the overlap in the second sample. We can also run samples at the same time as long as we can identify individuals. People are simple enough to identify (since they have names, unique email addresses, etc.) so we don’t have to “tag” them between samples. (This is convenient, since I find that people rarely sit still while I try to apply the tag gun to their ear lobe.)

So if we had two independent sources attempt to identify prospects out of a population pool we could estimate the size of the prospect population. If two independent teams were using two different methods (perhaps two different phone surveyors or two different teams surveying people in malls), and if identification is captured, then the two teams could compare notes after the survey and determine how many individuals came up in both surveys.

The trick would be to find sampling methods that were truly independent of each other and the target population. If the population was “prospects in the city of Houston” and the sampling methods were mall surveys, then we should consider the possibility that not all prospects are equally likely to visit malls. If both survey methods were biased in the same way (tending to sample the same small subset of the target population), then the “recatch” method would underestimate the population size. If we used two completely different sampling methods (one mall survey and one phone survey) and the two methods were biased in a way that made prospects in one method less likely to be found by the other method, then the method will overestimate the total population.

As you can see, there are many variations on this method and each has challenges. The error could be high but, as I point out in the book, if it told you more than you knew before, then it can be a useful measurement.

Thanks,

Doug Hubbard

Random Power Law Generator Example?

Originally posted on http://www.howtomeasureanything.com/forums/ on Monday, June 22, 2009 4:33:03 AM.

“Greetings!

ON the page 187 there is a claim that the author has generated an example of random power law generator, but i cant find it from the examples. Can someone help me with this problem who has find the example from downloads?

THanks,

Markus Kantor”

Thanks for your question. I had it up briefly and discovered a flaw in the automatic histogram generation. I’m out of the country I’ll have the power law generator up by the end of June (much sooner if I find the time before I head back to the US).

Thanks for your patience