How do you backtest a probability estimate?

A question asked by a subscriber:

I am sympathetic to the concept of using a Monte Carlo approach, perhaps using a cascaded series of Monte Carlo models to model inputs, to produce an estimated probability distribution. This seems like a more reasonable approach to modeling reality than to assume the distribution follows a normal or lognormal distribution.

To me, this seems like a logical extension of the budgeting approach familiar to every accountant, with incremental improvements to such a model building through time. So long as someone does not become ‘wedded to the model’, this process is powerful.

To avoid becoming ‘wedded to the model’, it seems to me that it is necessary to identify the parameters for your inputs (or the environment which affects those inputs) within which you believe your model will be robust. Movements of your inputs outside of this range should trigger a re-evaluation of your model.

For those who are wedded to VaR, you can even ‘measure’ the risk associated with your model as a 5% VaR etc. if you want to lose much of the detail of what you have done .. there is sometimes a place for a summary measure, so long as it does not become the input to a subsequent calculation.

I am convinced of the importance of accurate calibration and backtesting of models.

What I am less clear about is how you can ‘backtest’ a probabilistic model on anything other than a gross basis. How do you know whether an observed event is a 1 in a 100 event or a 1 in 10 event? Clearly if there is sufficient data, then the usual probabilistic maths can be used .. but what about where we are dealing with unusual, but perhaps critical, events?

Is the only answer to use traditional statistics to measure the confidence we have in our model? And if so, how can these measures be incorporated into the re-iteration of our Monte Carlo model?