Thursday, January 29, 2009

Part 4 -- The Problems With Evaluating Intelligence Products (Evaluating Intelligence)

Part 1 -- Introduction
Part 2 -- A Tale Of Two Weathermen
Part 3 -- A Model For Evaluating Intelligence

The fundamental problem with evaluating intelligence products is that intelligence, for the most part, is probabilistic. Even when an intelligence analyst thinks he or she knows a fact, it is still subject to interpretation or may have been the result of a deliberate campaign of deception.

  • The problem is exacerbated when making an intelligence estimate, where good analysts never express conclusions in terms of certainty. Instead, analysts typically use words of estimative probability (or, what linguists call verbal probability expressions) such as "likely" or "virtually certain" to express a probabilistic judgement. While there are significant problems with using words (instead of numbers or number ranges) to express probabilities, using a limited number of such words in a preset order of ascending likelihood currently seems to be considered the best practice by the National Intelligence Council (see page 5).

Intelligence products, then, suffer from two broad categories of error: Problems of calibration and problems of discrimination. Anyone who has ever stepped on a scale only to find that they weigh significantly more or significantly less than expected understands the idea of calibration. Calibration is the act of adjusting a value to meet a standard.

In simple probabilistic examples, the concept works well. Consider a fair, ten-sided die. Each number, one through ten, has the same probability of coming up when the die is rolled (10%). If I asked you to tell me the probability of rolling a seven, and you said 10%, we could say that your estimate was perfectly calibrated. If you said the probability was only 5%, then we would say your estimate was poorly calibrated and we could "adjust" it to 10% in order to bring it into line with the standard.

Translating this concept into the world of intelligence analysis is incredibly complex. To have perfectly calibrated intelligence products, we would have to be able to say that, if a thing is 60% likely to happen, then it happens 60% of the time. Most intelligence questions (beyond the trivial ones), however, are unique, one of a kind. The exact set of circumstances that led to the question being asked in the first place and much of the information relevant to its likely outcome are impossible to replicate making it difficult to keep score in a meaningful way.

The second problem facing intelligence products is one of discrimination. Discrimination is associated with the idea that the intel is either right or wrong. An analyst with a perfect ability to discriminate always gets the answer right, whatever the circumstance. While the ability to perfectly discriminate between right and wrong analytic conclusions might be a theoretical ideal, the ability to actually achieve such an feat exists only in the movies. Most complex systems are subject to a certain sensitive dependence on initial conditions which precludes any such ability to discriminate beyond anything but trivially short time frames.

If it appears that calibration and discrimination are in conflict, they are. The better calibrated an analyst is, the less likely they are to be willing to definitively discriminate between possible estimative conclusions. Likewise, the more willing an analyst is to discriminate between possible estimative conclusions, the less likely he or she is to be properly calibrating the possibilities inherent in the intelligence problem.

For example, an analyst who says X is 60% likely to happen is still 40% "wrong" when X does happen should an evaluator choose to focus on the analyst's ability to discriminate. Likewise, the analyst who said X will happen is also 40% wrong if the objective probability of X happening was 60% (even though X does happen), if the evaluator chooses to focus on the analyst's ability to calibrate.

Failure to understand the tension bewteen these two evaluative principles leaves the unwitting analyst open to a "damned if you do, damned if you don't" attack by critics of the analyst's estimative work. The problem only grows worse if you consider words of estimative probability instead of numbers.

All this, in turn, typically leads analysts to ask for what Phlip Tetlock, in his excellent book Expert Political Judgment, called "adjustments" when being evaluated regarding the accuracy of their estimative products. Specifically, Tetlock outlines four key adjustments:

  • Value adjustments -- mistakes made were the "right mistakes" given the cost of the alternatives

  • Controversy adjustments -- mistakes were made by the evaluator and not the evaluated
  • Difficulty adjustments -- mistakes were made because the problem was so difficult or, at least, more difficult than problems a comparable body of analysts typically faced

  • Fuzzy set adjustments -- mistakes were made but the estimate was a "near miss" so it should get partial credit

This parade of horribles should not be construed as a defense of the school of thought that says that intelligence cannot be evaluated, that it is too hard to do. It is merely to show that evaluating intelligence products is truly difficult and fraught with traps to catch the unwary. Any system established to evaluate intelligence products needs to acknowledge these issues and, to the greatest degree possible, deal with them.

Many of the "adjustments", however, can also be interpreted as excuses. Just because something is difficult to do doesn't mean you shouldn't do it. An effective and appropriate system for evaluating intelligence is an essential step in figuring out what works and what doesn't, in improving the intelligence process. As Tetlock notes (p. 9), "The list (of adjustments) certainly stretches our tolerance for uncertainty: It requires conceding that the line between rationality and rationalization will often be blurry. But, again, we should not concede too much. Failing to learn everything is not tantamount to learning nothing."

Tomorrow: The Problems With Evaluating Process


dimccjf said...

I think you and Tetlock have made the problem of evaluating intelligence judgments needlessly complex, at least from the audience perspective. Much of Phil's work seems devoted to saving face for the analyst. The process is simple and we used it in the J2 JCS for a decade to improve training and performance by 120 analysts.

Kristan J. Wheaton said...

Thanks for the comment!

Can you point to a resource for learning more about this solution?


Mark said...

For dimccjf -- I'd also like to learn more about the process used in the J2 JCS. How can people learn more?

Anonymous said...

Folks, see Propaganda Analysis, by Alexander George for a simpler more usable technique for evaluating inferences.

Kristan J. Wheaton said...


Thanks for the comment regarding the George paper. I found it on Rand's website:

There are two papers there, to be precise. I am not sure to which one you are referring or if you have a different paper in mind?

Having read both the papers, I am also not sure on which point you are making a comparison to this post. The papers deal with the fairly narrow issue of content analysis, a method which has its problems (as do all methods) but which has been made much easier through the use of computers (something that George did not have when he wrote these papers int he 50's).

My post, on the other hand, is about intelligence products generally, so I am not sure what the "simpler" technique you describe actually is...