The Statistics Behind the Calibration Scores

Originally posted on http://www.howtomeasureanything.com/forums/ on Thursday, April 30, 2009 6:20:57 AM.

“Hi Douglas,

I want to thank you for your work in this area .Using the information in your book I used Minitab 15 and created an attribute agreement analysis plot. The master has 10 correct and I then plotted 9,8,7,6,5,4,3,2,1,0. From that I can see the overconfidence limits you refer to in the book. Based on the graph there does not appear to be an ability to state if someone is under-confident. Do you agree?

Can you assist me in the origin of the second portion of the test where you use the figure of -2.5 as part of the calculation in under-confidence?
I want to use the questionnaire as part of Black Belt training for development. I anticipate that someone will ask how the limits are generated and would like to be prepared.

Thanks in advance – Hugh”

The figure of 2.5 is based on an average of how confidently people answer the questions. We use a binomial distribution to work out the probability of just being unlucky when you answer. For example, if you are well-calibrated, and you answer an average of 85% confidence (expecting to get 8.5 out of 10 correct), then there is about a 5% chance of getting 6 or less correct (cumulative). In other words, at that level is is more likely that you were not just unlikely, but actually overconfident.

I took a full distribution of how people answer these questions. Some say they are an average of 70% confident, some say 90%, and so on. Each one has a different level for which there is a 5% chance that the person was just unlucky as opposed to overconfident. But given the average of how most people answer these questions, having a difference of larger than 2.5 out of 10 between the expected and actual means that there is generally less than a 5% chance a calibrated person would just be unlucky.

It’s a rule of thumb. A larger number of questions and a specific set of answered probabilities would allow us to compute this more accurately for an individual.

Thanks,

Doug

How to Measure Innovation

Originally posted at http://www.howtomeasureanything.com, on Thursday, March 05, 2009 7:30:54 PM, by JBehling.

“I am Six Sigma Black Belt for an IS Organization and my team has been struggling to measure the impact of “Innovation” in our company. We bring new and innovative systems to our business partners to help them streamline their practices and processes.

Any thoughts on how to develop a measurement system for innovation? Are there any standard practices for measuring IS Innovation? HELP!”

 

How to Measure Performance

Originally posted at http://www.howtomeasureanything.com, on Friday, March 20, 2009 9:14:48 PM, by jerry.

“Greetings,

I loved your book. Thanks for sharing such valuable information. Now I’m trying to apply it.

I am leading a project of training developers and instructional designers and am attempting to put together a meaningful way to measure their performance. I have come up with some parameters that seem evident to me, such as time to complete a lesson, number of edits recommended (to the designer), type of edits recommended (order, strategies, completeness of content), edit recommendation trends (is the number of recommended edits going up or going down).

Is there a particular part of your book I should re-read that would help me frame a thorough performance evaluation measuring framework? Or can you suggest anything that would help expand the framework or make it a more reliable measure of performance?

Thank you in advance for any direction you can point me in or for any suggestions you can provide.

Jerry”

Thanks for reading my book. I think you might find part of what you are looking for in Chapter 11 on measuring preferences and attitudes. On page 197 I show how different performance measures of a software developer could be combined into a single metric by quantifying the acceptable tradeoffs.

You might also consider more of an “end result” metric of some kind. Isn’t the ultimate success of the instructional material measured by the performance of students? Obviously, many things affect the performance of students but among those should be the design of the material. Individual students will vary but if one set of material consistently results in better student performance than another set, then I think it’s fair to attribute some of that to the material designer.

Thanks,

Doug Hubbard

Length of Calibration

Originally posted on http://www.howtomeasureanything.com/forums/ on Monday, March 09, 2009 9:14:11 AM.

“I just read your book and found it fascinating. Thanks.

On calibrated estimates, once experts are calibrated, do they stay calibrated?
Or do you repeat every time that you are beginning a project or making an estimate.

I’m just thinking in a corporate setting – do you just do it once for a group of people that you may want estimates for or would you do it before each project. Do it annually?

What has been your experience on how long people stay calibrated?

Thanks,

Praveen”

Measuring Leadership

Originally posted at http://www.howtomeasureanything.com on Thursday, March 05, 2009 7:30:54 PM, by Paddy.

“Hi Douglas

Firstly let me give you a huge wrap for the book – I would say its invaluable but that would be wrong, because as I am learning “Everything can be measured”!

I am interested in what is traditionally referred to as a ‘soft’ area, where good measurements are hard to come by – Leadership and [Organizational] Development. Seeing as you asked to be stumped/challenged, Ill throw my biggest fish at you first…

How can you measure leadership?

To help direct the discussion that I hope will flow from this, lets talk about to specific examples: in a sporting context (ie impact of player/coach leadership on scores) and in a corporate context (impact of management/employee leadership on profits)

Lets see if this stumps you…

Paddy”

Thanks for your question. That is something I’ve been asked more than once. As with all measurement questions, I start out with “What do you mean…leadership?”. Then ask, how do you observe examples of leadership? If someone says “Leadership is better here than there” what observations are they basing that on?

Sometimes people define the observations for leadership as being some measure of performance of an organization. In that case, what they really want to measure is the performance. But sometimes they want to ask if particular “leadership styles” result in the improved performance. In that case, they should think about correlating surveys of staff about leadership (to determine the type of leadership style) and correlate that result with observed performance.

Perhaps they are asking about some ill-defined sense of charisma separate from the performance f the organization. In that sense, a survey of subordinates should suffice. But in that case, we want to be careful of some other effects that might get confused with charisma or leadership but most definitions of leadership would not include. For example, physical attractiveness and even being tall are often associated with subjective perceptions of leadership. US Presidents, for example, are almost always significantly taller than average. Tom Malloy in the 1980’s studied how attire affects perceptions of charisma, competence and authority. That’s the problem with the subjective sense of leadership they way it is often used. People can’t help but to let things affect our assessment of leaders even thought we know they shouldn’t.

Perhaps leadership is defined by examples of particularly inspirational ideas such as President Kennedy’s decision to go to the moon. Perhaps Joan of Arc leading the charge is leadership because it so inspired her troops. If the these cases are what you mean, then perhaps you should think about survey people about how inspired they are.

Personally, I think all of this is sort of meaningless if it doesn’t lead to performance. So, as I mention in the book, you need to ask why you want to know this. Are you evaluating prospective executive staff? Are you evaluating who will run a new division better? If you can zero in on why you care, you will probably find that measuring actual leadership (or whatever that means) is not your real concern. If you are trying to predict performance, I suggest that past performance is important. Would Kennedy or Joan of Arc have been that inspirational if they failed? Does a subjective perception of leadership by subordinates matter if the leader doesn’t meat objectives? Probably not.

Thanks,

Doug Hubbard

Lens Model Example – Chapter 12

Originally posted at http://www.howtomeasureanything.com, on Sunday, March 01, 2009 1:30:45 PM, by Paddy.

“Could you please clarify what scenarios the can Lens Model can remove human inconsistency in decision making (i.e., problems that are well defined/repeatable or unstructured)? Would like to apply Lens Model to evaluate computer interfaces.

Also, could you please clarify the variables in step 6 of the Lens Model Procedure – Perform regression analysis. For example, could you please clarify independent and dependent variables in step 6 and the end output in step 7. Diagram was great, example would be better.

Thanks,

Amran”

Originally posted at http://www.howtomeasureanything.com, on Friday, April 17, 2009 9:21:44 AM, by Paddy.

“Any help with an example would be much appreciated.

Thank you”