Friday, May 6, 2011

How Accurate Is Your Pundit? (Hamilton.edu)

Allen Thomson, the unusually keen-eyed observer of all things odd and analytic, brought a study by a group of students at Hamilton College to my attention recently.

Titled, Are Talking Heads Blowing Hot Air:  An Analysis Of The Accuracy Of Forecasts In The Political Media, the study, complete with detailed annexes and  statistical analysis, assessed the accuracy of 26 pundits in the media with regard to political forecasts made in 2008.

The students used something they called a Prognosticator Value Score (PVS) to rank each of the 26.  The PVS factors in how many predictions were made, how many were right, how many were wrong and on how many the prognosticators hedged.

The best?  Paul Krugman with a PVS of 8.2 (You can see a screenshot of his score sheet to the right.  Note:  Score sheets for each of the pundits are in the full text document).

The worst?  Cal Thomas, with a PVS of -8.7 (You read that right.  Negative eight point seven...).

The students were able to confirm much of what Tetlock has already told us:   Many things do not matter -- age, race, gender, employment simply had no effect on forecasting accuracy.

The students did find that liberal, non-lawyer pundits tended to be better forecasters but the overall message of their study is that the pundits they examined, in aggregate, were no better than a coin flip. 

This is more interesting than it sounds as one of Tetlock's few negative correlations was between a forecaster and his or her exposure to the press.  The more exposure, Tetlock found, the more likely the forecaster was to be incorrect.  Here, there may be evidence of some sort of "correction" that is made internally by public pundits, i.e. people who make a living, at least in part, making forecasts in the press.

I have a few methodological quibbles with the study.  Number of predictions, for example, did not factor into the PVS.  Kathleen Parker, for example, made only 6 testable predictions, got 4 right and had a PVS of 6.7.  Nancy Pelosi, on the other hand, made 27 testable predictions, got 20 right, but had a PVS of only 6.2.

Despite these minor details, this study is a bold attempt to hold these commentators accountable for their forecasts and the students deserve praise for their obvious hard work and intriguing results.

Tuesday, May 3, 2011

Evaluating Analytic Methods: What Counts? What Should Count? (Global Intelligence Forum)

About a week ago, I highlighted the upcoming Global Intelligence Forum and stated that one of the things I liked most about this conference was the opportunity, indeed, the inevitability of meeting interesting people working outside one's own area of expertise.

A really good example of this was Dr. Justine Schober, a pediatric urologist, who lectured the crowd last year on the the problems the medical profession had in analyzing  intersexuality (I'll let you look it up...).

I will be honest with you:  Justine's presentation was not what the crowd was expecting (...to say the least).

As I listened, however, to her description of the mistakes that doctors had made in this field, how bias and tradition had allowed these mistakes to continue for decades, and how much effort it had taken to begin to understand, analyze and rectify these errors, I realized just how much her profession and my profession have in common. 

Evaluating Medical Practice -- Pyramid of Evidence
One of her most useful slides was a simple pyramid (See picture to the right) that highlighted the kinds of evidence doctors use to validate their methods and approaches to various diseases and disorders.  Evidence at the bottom of the pyramid is obviously less valuable to doctors than evidence at the top, but all of this evidence counts in one way or another. 

This led me, in turn, to think about how we in intelligence evaluate analytic methods.  There appears to me to be two strong schools of thought.  In the first are such notables as Sherman Kent and other long time members of the intelligence community who write about how difficult it is to establish "batting averages" for intelligence estimates in general, much less for particular methods.

The other school of thought (of which I am a member) emphasizes rigorous testing of analytic methods under realistic conditions to see which are more likely to improve forecasting accuracy and under which conditions.  The recent National Research Council report, Intelligence Analysis For Tomorrow, seems to strongly support this point of view as well.

My colleague, Steve Marrin, has often pointed out in our discussions (and probably in print somewhere as well -- he is nothing if not prolific), that this is a false dichotomy, an approach that presents intelligence professionals with only extreme choices and so is not a very useful guide to action.

Justine's chart made me think the same thing.  In short, it seems foolish to focus exclusively at either the top or the bottom of the evidence hierarchy.  What makes more sense is to climb the damn pyramid! 

What do I mean?  Well, first, I think it is important to imagine what such a pyramid might look like for intelligence professionals.  You can take a look at my own first cut at it below.

Evaluating Intelligence Methods -- Pyramid of Evidence
Ideally, we should be able to select an analytic method and then match the relevant evidence, such as it is, with that method. This, in turn, allows us to know how much faith we should put in the method in question and what kind of studies might be most useful in either confirming or denying the value of the method and under what circumstances.

Examined from this perspective, there are many, many useful and simple kinds of studies intelligence professionals at all levels and in all areas of the intelligence discipline can do to make a difference in the field and, more importantly, many of these kinds of studies are tailor-made for the growing number of intel studies students in the US and elsewhere.