Originally posted at http://www.howtomeasureanything.com, on Monday, December 08, 2008 10:08:08 PM, by Unknown.
“I’m trying to quickly identify the items I need to spend time measuring. In your book, you determine what to measure by computing the value of information. You refer to a macro that you run on your spreadsheet that automatically computes the value of information and thus permits you to identify those items most worth spending extra time to refine their measurements. Once I list potential things that I might want to measure, do I estimate, using my Calibrated Estimators, a range for the chance of being wrong and the cost of being wrong and using something like @RISK, multiply these two lists of probability distributions together to arrive at a list of distributions for all the things I might want to measure? Then, do I look over this list of values of information and select the few that have significantly higher values of information?
I don’t want you to reveal your proprietary macro, but am I on the right track to determining what the value of information is?”
You were on track right up to the sentence that starts Once I list potential things that I might want to measure. You already have calibrated estimates by that point. Remember, you have to have calibrated estimates first before you even can compute the value of information. Once the value of information indicates that you need to measure something, then its time to get new observations. As I mention in the book, you could also use calibrated estimates for this second round, but only if you are giving them new information that will allow them to reduce their uncertainty.
So, first you have your original calibrated estimates, THEN you compute the value of information, THEN you measure the things that matter. In addition to using calibrated estimators again (assuming you are finding new data to give them to reduce their ranges) I mention several methods in the book including decomposition, sampling methods, controlled experiments, and several other items. It just depends on what you need to measure.
Also, the chance of being wrong and the cost of being wrong can already be computed from the original calibrated estimates you provided and the business case you put them in. You do not have to estimate them separately in addition to the original calibrated estimates themselves. Look at the examples I gave in the chapter on the value of information.
My macros make it more convenient to more complicated information values, but they are not necessary for the simplest examples. Did you see how I computed the information values in that chapter? I already had calibrated estimates on the measurements themselves. Try a particular example and ask me about that example specifically if you are still having problems.
Thanks for your use of my book and please stay in touch.
Disclaimer: This was editied, as the original post caused errors. We left the content as originally posted and removed the links.
I don’t know which version I have, but the comment I have to make wasn’t discussed in the First Run errata page.
On page 53, in the beginning of chapter 5, you state in statistics, a range that has a particular chance of containing the correct answer is called a confidence interval (CI). A 90% CI is a range that has a 90% chance of containing the correct answer.
I believe this is incorrect. The definition that you’ve provided is for a credible interval, not a confidence interval, and is a common misconception among many people using the statistical measure. A confidence interval does not grant you the probability that the interval you have contains the correct interval. Wikipedia states it better than I can:
More precisely, a CI for (http://en.wikipedia.org/wiki/Population_parameter) population parameter is an Interval (mathematics) (http://en.wikipedia.org/wiki/Interval_%28mathematics%29) with an associated (http://en.wikipedia.org/wiki/Probability) probability, that is generated from a random sample of an underlying population such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion of the confidence intervals would contain the (http://en.wikipedia.org/wiki/Population_parameter) population parameter in question. Confidence intervals are the most prevalent form of (http://en.wikipedia.org/wiki/Interval_estimation) interval estimation.
It must be noted that a confidence interval is not in general equivalent to a (http://en.wikipedia.org/wiki/Bayesian) a credible interval (http://en.wikipedia.org/wiki/Credible_interval) credible interval. The common error of equating the two is known as the (http://en.wikipedia.org/wiki/Prosecutor%27s_fallacy) prosecutor’s fallacy.
The credible interval is as follows:
In (http://en.wikipedia.org/wiki/Bayesian_statistics) Bayesian statistics, a credible interval is a (http://en.wikipedia.org/wiki/Posterior_probability) posterior probability interval, used for purposes similar to those of (http://en.wikipedia.org/wiki/Confidence_interval) confidence intervals in (http://en.wikipedia.org/wiki/Frequentist_statistics) frequentist statistics.
For example, a statement such as following the experiment, a 90% credible interval for the parameter t is 35-45 means that the (http://en.wikipedia.org/wiki/Posterior_probability) posterior probability that lies in the interval from 35 to 45 is 0.9.
Distinction between a Bayesian credible interval and a frequentist confidence interval. By contrast, a frequentist (http://en.wikipedia.org/wiki/Confidence_interval) confidence interval (e.g. a 90% confidence interval of 35-45) means that with a large number of repeated samples, 90% of the calculated confidence intervals would include the true value of the parameter. The probability that the parameter is inside the given interval (say, 35-45) is either 0 or 1 (the non-random unknown parameter is either there or not). In frequentist terms, the parameter is fixed (cannot be considered to have a distribution of possible values) and the confidence interval is random (as it depends on the random sample).
Now it does state “Since many non-statisticians intuitively interpret confidence intervals in the Bayesian credible interval sense, credible intervals are sometimes called confidence intervals, but this is an error and can often lead to incorrect usage of statistics in experimental methods.”
I think you’ve pointed out the danger of using Wikipedia entries instead of authoritative sources, especially where the Wikipedia entry offers no sources for a claim.
Wikipedia’s entry notwithstanding, I think you will find that confidence interval is widely used among researchers in the decision sciences and statistics just as I defined it. My career has been mostly empirical methods and statistical modeling and I’ve trained many statisticians in my methods including all of the entire Statistical Services Support staff at the Environmental Protection Agency and quite a few at the Census Bureau and Argonne National Labs. No less than a dozen PhD statisticians (many of whom I quote in the book) have reviewed my entire manuscript. They made many comments but not one thought to make the point you are making. For example, Sam Savage (a statistician) and J. Russo and R. Dawes (decision sciences researchers) are quoted in my book, are well-known in their respective fields, and they all use CI in their own published articles and books exactly as I do.
I think a broader search of more authoritative sources than Wikipedia would also support my use of the term. Here is one definition from A. A. Sveshnikov “Problems in Probability Theory, Mathematical Statistics and Theory of Random Functions” p 286. Definitions vary a bit from source to source but this one is a close match to the definitions one would find in the Oxford Dictionary of Statistics and several other sources. Sveshnikov states:
“A confidence interval is an interval that with a given confidence level alpha; covers a parameter Theta to be estimated.”
As in most authoritative sources on this issue, he makes no specific distinction about how the confidence level is computed. Russo and Schoemaker in their book “Decision Traps” also use “confidence range” in exactly this fashion and in their index under “Confidence Interval” they say “see Confidence Range”. Throughout the book these two respected statisticians and decision scientists ask for subjective 90% Confidence “ranges” just as I do in my book. In fact, the specific definition you gave is really just the definition of the interval estimate for a population mean based on a random sample. Since many sources give much broader definitions than the one in Wikipedia, I think we can say that the Wikipedia definition is really only a subset of types of confidence intervals and is unnecessarily narrow. Even a Google search on the phrase “Bayesian confidence interval” brings up many hits and that many of them are from reputable statistical sources. I even found the following interesting hit from the US Geological Survey site (http://www.npwrc.usgs.gov/resource/methods/statsig/whatalt.htm ):
“Bayesian confidence intervals are much more natural than their frequentist counterparts. A frequentist 95% confidence interval for a parameter theta, denoted (theta; sub L /sub , theta; sub U /sub ), is interpreted as follows: if the study were repeated an infinite number of times, 95% of the confidence intervals that resulted would contain the true value theta;. It says nothing about the particular study that was actually conducted, which led Howson and Urbach (1991:373) to comment that statisticians regularly say that one can be ’95 per cent confident’ that the parameter lies in the confidence interval. They never say why. In contrast, a Bayesian confidence interval, sometimes called a credible interval, is interpreted to mean that the probability that the true value of the parameter lies in the interval is 95%. That statement is much more natural, and is what people think a confidence interval is, until they get the notion drummed out of their heads in statistics courses.”
I find this quote interesting because when it says a confidence interval is “sometimes called” a credibility interval it not only implies that confidence and credibility are effectively synonymous, but also that credibility is less often used. It also brings up a point about how even professional statisticians could be more clear in their statements. Now, Savage, Russo and Dawes might all agree that there may be a problem with how consistently these definitions are used. But I often see their criticisms on such issues as being directed at statistics teachers and first-year stats texts and I’m generally in agreement with those criticisms. I think there are some stats teachers who may have confused the issue with some of the distinctions they make. Savage, in particular, loves to correct things he hears from stats course instructors. So, in a way, I agree with you that there are “common misconceptions” but we disagree about who holds them.
On a related note, I decided not to discuss in my book my solution to the frequentist vs. Bayesian problem, but I think calibrated estimates make this a largely moot point. A frequentist insists that the only meaningful use of a probability is how it reflects historical distributions of some phenomenon. A Bayesian states that a probability also has meaning as an expression of subjective uncertainty. But a calibrated person who states, subjectively, that they are 90% certain that some claim is true has shown that, historically, they are right 90% of the time they said they were 90% confident. They satisfy both the Bayesian and Frequentist point of view.
Finally, I believe you have completely misinterpreted the prosecutor’s fallacy. Since you accept Wikipedia as an authoritative source, I will refer you to the Wikipedia entry on the prosecutor’s fallacy. It mentions two versions of the fallacy – one having to do with the use (or lack of use) of conditional probabilities and the other having to do with multiple tests in large databases. Neither mentions anything about the credible interval vs. confidence interval distinction. Perhaps you were making a different point with this but, if so, it’s not clear to me what it was.
Thanks for your comments,
The example on Threshold Values in the problem of deciding whether to invest in a project that is likely to increase the units sold focused on closer examination of the range on the units sold, because fewer than 200,000 additional units would result in a negative return. In this example the cost per unit was considered fixed, so that’s what was use to calculate the threshold of 200,000.
What if the problem being examined does not have any simple “go/no-go” threshold value like that? Or where the threshold of one parameter depends on what values are used for two other parameters? Is there always a threshold value? Or is it often the case that management would simply declare a threshold for a particular variable below or above which the project would be rejected?
I have a value of information (VOI) question related to robinhfoster’s. Here is a specific example from software. A project is supposed to improve the performance of our code. To estimate the positive impact this benefit will have we decomposed it into the following pieces (using hours as the base unit): Performance Benefit time savings per year = (Number of users) x (Number of simulations/year/user) x (Run-time of typical simulation) x (% reduction in runtime due to this project).
Say we use a calibrated estimator and produce a 90% CI for each of these pieces. Then we use your spreadsheet to compute the VOI for each of them. The spreadsheet requires two additional pieces of information: a threshold for loss and a loss rate. In the example of “% reduction in runtime due to this project”, say the calibrated 90% CI is [5%,25%]. I could assign a threshold of 10% to say that this project should be canceled if the improvement is that small. Then I have to figure out the loss-rate. How much do we lose (in hours) for each percentage point under 10%? I’m not sure how to estimate this. It clearly involves the other pieces that are multiplied together to get the total benefit from the performance improvements. I could use the mean values from the 90% CI estimates to compute a loss rate of (mean number of users) x (mean number of simulations/year/user) x (mean runtime of typical simulation). It seems to me that the loss rate should have its own 90% CI.
To further complicate things, how do I compute the threshold and loss-rate for the “Number of users”? I have an estimate for the total number of users, [8,35] with 90% confidence. Where do I get a number for a threshold related to this project? If we have less than 10 users then this project is not worth funding? Where do I get a loss-rate if the actual number of users is below this threshold because the loss rate is related to how much I’m investing in this project and how many simulations they run and the runtime of a typical simulation, not to mention the anticipated performance improvement, all of which are currently estimated with 90% CI.
I would appreciate any help you can offer.
I am thoroughly impressed with your book “How To Measure Anything” and I’m already using it to compute ROI with Monte Carlo simulations. It has already changed our conversations from “force of opinions” to “is that the right list and breakdown of benefits” which is a huge improvement already.
Thanks for your comment and sorry for the delayed response. Answering the threshold and the loss rate are easier if all of this is put together in a spreadsheet that computes an NPV for the investment in question. Remember, the first step is defining and modeling the decision. Create a spreadsheet model to compute the NPV just as you normally would for a business case for this code-improvement project. The simplest answer is that the threshold is the value for a variable that would make the NPV = 0 while holding all other variables at their mean value (the mean of the estimated range). There are some reasons why this might not always be a good estimate but its usually very close.
Then estimate your loss function by setting the variable in question equal to one unit below its threshold (or above, if the loss occurs when you are above the threshold). If you are 1% point below the threshold, and the NPV = -$10,000 (negative ten thousand dollars) then the loss rate is $10,000 per unit. (In this case, express a percentage point range and threshold as a whole number in the VOI spreadsheet since it assumes a “unit” for the loss function is a “1” not “.01”).
Let me know if that answers your question and thanks for your interest!