Saturday, January 31, 2009

See The First Open Source Intelligence Brief In A Virtual World! (JustLeapIn.com)

About a week ago, I wrote about the Visual Short Form Analytic Report exercise I run in my intelligence communications class each year. I got a ton of interesting and creative projects this time but two of my students attempted to use the web-based, virtual world construction kit JustLeapIn.com to complete their projects and the results are pretty cool.

One of the worlds got trashed (the whole service is still in beta...) but I will try to show it to you if we get it back up. One, however, by Brian Gabriel, survived and is a quite clever take on the typical intel briefing.

To see what he has done, you will need a broadband connection and will have to load the JustLeapIn plug-in (it takes no time and is no more onerous on system resources than any other plug-in you are currently using). Once you have the plug-in installed you should be able to access the "briefing room" in the window below. You can also see his "research lab", another room that is adjacent, in cyberspace terms, to the briefing room. If there are other people there you can also chat with them (though you can only see them if you have registered with JustLeapIn and created an avatar).



Moving around is easy. The arrow keys control the movement while you can zoom in on an object by double-clicking on it. Hold both mouse keys down and move the mouse to look around without changing position. To get the movie to work (Yep, Brian made a movie too. Can't have a briefing room without a briefing...), zoom in on it and then single click to get it started.

Brian built this on his own, with no training in either the JustLeapIn program or in making movies using Windows Movie Maker. He did it all (including the research for the brief) in about a week. For sophisticated inhabitants of virtual worlds, this is going to seem pretty straightforward -- for the rest of us, it should be eye-opening.

For those of you asking, "What is the briefing about?" Who cares! This is history!

Update: I am continuing to slowly post bits and pieces of my paper on evaluating intel. The most recent post is here (with links to the previous posts). I will be wrapping it up sometime this week.
Reblog this post [with Zemanta]

Thursday, January 29, 2009

Part 5 -- The Problems With Evaluating The Intelligence Process (Evaluating Intelligence)

Part 1 -- Introduction
Part 2 -- A Tale Of Two Weathermen
Part 3 -- A Model For Evaluating Intelligence
Part 4 -- The Problems With Evaluating Intelligence Products

There are a number of ways that the intelligence process can fail. Requirements can be vague, collection can be flimsy or undermined by deliberate deception, production values can be poor or intelligence made inaccessible through over-classification. Finally, the intelligence architecture, the system in which all the pieces are embedded, can be cumbersome, inflexible and incapable of responding to the intelligence needs of the decisionmaker. All of these are part of the intelligence process and any of these -- or any combination of these -- reasons can be the cause of an intelligence failure.

In this series of posts (and in this post in particular), I intend to look only at the kinds of problems that arise when attempting to evaluate the analytic part of the process. From this perspective, the most instructive current document available is Intelligence Community Directive (ICD) 203: Analytic Standards. Paragraph D4, the operative paragraph, lays out what makes for a good analytic process in the eyes of the Director Of National intelligence:
  • Objectivity
  • Independent of Political Considerations
  • Timeliness
  • Based on all available sources of intelligence
  • Properly describes the quality and reliability of underlying sources
  • Properly caveats and expresses uncertainties or confidence in analytic judgments
  • Properly distinguishes between underlying intelligence and analyst's assumptions and judgements
  • Incorporates alternative analysis where appropriate
  • Demonstrates relevance to US national security
  • Uses logical argumentation
  • Exhibits consistency of analysis over time or highlights changes and explains rationale
  • Makes accurate judgements and assessments

This is an excellent starting point for evaluating the analytic process. There are a few problems, though. Some are trivial. Statements such as "Demonstrates relevance to US national security" would have to be modified slightly to be entirely relevant to other disciplines of intelligence such as law enforcement and business. Likewise, the distinction between "objectivity" and "independent of political considerations" would likely bother a stricter editor as the latter appears to be redundant (though I suspect the authors of the ICD considered this and still decided to separate the two in order to highlight the notion of political independence).

Some of the problems are not trivial. I have already discussed (in Part 3) the difficulties associated with mixing process accountability and product accountability, something the last item on the list, "Makes accurate judgements and assessments" seems to encourage us to do.

Even more problematic, however, is the requirement to "properly caveat and express uncertainties or confidence in analytic judgements." Surely the authors meant to say "express uncertainties and confidence in analytic judgments". While this may seem like hair-splitting, the act of expressing uncertainty and the act of expressing a degree of analytic confidence are quite different things. This distinction is made (though not as clearly as I would like) in the prefatory matter to all of the recently released National Intelligence Estimates. The idea that the analyst can either express uncertainties (typically through the use of words of estimative probability) or express confidence flies in the face of this current practice.

Analytic confidence is (or should be) considered a crucial subsection of an evaluation of the overall analytic process. If the question answered by the estimate is, "How likely is X to happen?" then the question answered by an evaluation of analytic confidence is "How likely is it that you, the analyst, are wrong?" These concepts are analogous to statistical notions of probability and margin of error (as in polling data that indicates that Candidate X is looked upon favorably by 55% of the electorate with a plus or minus 3% margin of error). Given the lack of a controlled environment, the inability to replicate situations important to intelligence analysts and the largely intuitive nature of most intelligence analysis, an analogy, however, is what it must remain.

What contributes legitimately to an increase in analytic confidence? To answer this question, it is essential to go beyond the necessary but by no means sufficient criteria set by the standards of ICD 203. In other words, analysis which is biased or late shouldn't make it through the door but analysis that is only unbaised and on time meets only the minimum standard.

Beyond these entry-level standards for a good analytic process, what are those elements that actually contribute a better estimative product? The current best answer to this question comes from Josh Peterson's thesis on the topic. In it he argued that seven elements had adequate experimental data to suggest that they legitimately contribute to analytic confidence:
  • Use of structured methods in analysis
  • Overall Source Reliability
  • Level of Source Corroboration/Agreement
  • Subject Matter Expertise
  • Amount of Collaboration Among Analysts
  • Task Complexity
  • Time Pressure
There are still numerous questions that remain to be answered. Which element is most important? Is there a positive or negative synergy between two or more of the elements? Are these the only elements that legitimately contribute to analytic confidence?

Perhaps the most important question, however, is how the decisionmaker -- the person or organization the intelligence analyst supports -- likely sees this interplay of elements that continuously impacts both the analytic product and process.

Monday: The Decisionmaker's Perspective

Part 4 -- The Problems With Evaluating Intelligence Products (Evaluating Intelligence)

Part 1 -- Introduction
Part 2 -- A Tale Of Two Weathermen
Part 3 -- A Model For Evaluating Intelligence

The fundamental problem with evaluating intelligence products is that intelligence, for the most part, is probabilistic. Even when an intelligence analyst thinks he or she knows a fact, it is still subject to interpretation or may have been the result of a deliberate campaign of deception.


  • The problem is exacerbated when making an intelligence estimate, where good analysts never express conclusions in terms of certainty. Instead, analysts typically use words of estimative probability (or, what linguists call verbal probability expressions) such as "likely" or "virtually certain" to express a probabilistic judgement. While there are significant problems with using words (instead of numbers or number ranges) to express probabilities, using a limited number of such words in a preset order of ascending likelihood currently seems to be considered the best practice by the National Intelligence Council (see page 5).

Intelligence products, then, suffer from two broad categories of error: Problems of calibration and problems of discrimination. Anyone who has ever stepped on a scale only to find that they weigh significantly more or significantly less than expected understands the idea of calibration. Calibration is the act of adjusting a value to meet a standard.

In simple probabilistic examples, the concept works well. Consider a fair, ten-sided die. Each number, one through ten, has the same probability of coming up when the die is rolled (10%). If I asked you to tell me the probability of rolling a seven, and you said 10%, we could say that your estimate was perfectly calibrated. If you said the probability was only 5%, then we would say your estimate was poorly calibrated and we could "adjust" it to 10% in order to bring it into line with the standard.

Translating this concept into the world of intelligence analysis is incredibly complex. To have perfectly calibrated intelligence products, we would have to be able to say that, if a thing is 60% likely to happen, then it happens 60% of the time. Most intelligence questions (beyond the trivial ones), however, are unique, one of a kind. The exact set of circumstances that led to the question being asked in the first place and much of the information relevant to its likely outcome are impossible to replicate making it difficult to keep score in a meaningful way.

The second problem facing intelligence products is one of discrimination. Discrimination is associated with the idea that the intel is either right or wrong. An analyst with a perfect ability to discriminate always gets the answer right, whatever the circumstance. While the ability to perfectly discriminate between right and wrong analytic conclusions might be a theoretical ideal, the ability to actually achieve such an feat exists only in the movies. Most complex systems are subject to a certain sensitive dependence on initial conditions which precludes any such ability to discriminate beyond anything but trivially short time frames.

If it appears that calibration and discrimination are in conflict, they are. The better calibrated an analyst is, the less likely they are to be willing to definitively discriminate between possible estimative conclusions. Likewise, the more willing an analyst is to discriminate between possible estimative conclusions, the less likely he or she is to be properly calibrating the possibilities inherent in the intelligence problem.

For example, an analyst who says X is 60% likely to happen is still 40% "wrong" when X does happen should an evaluator choose to focus on the analyst's ability to discriminate. Likewise, the analyst who said X will happen is also 40% wrong if the objective probability of X happening was 60% (even though X does happen), if the evaluator chooses to focus on the analyst's ability to calibrate.

Failure to understand the tension bewteen these two evaluative principles leaves the unwitting analyst open to a "damned if you do, damned if you don't" attack by critics of the analyst's estimative work. The problem only grows worse if you consider words of estimative probability instead of numbers.

All this, in turn, typically leads analysts to ask for what Phlip Tetlock, in his excellent book Expert Political Judgment, called "adjustments" when being evaluated regarding the accuracy of their estimative products. Specifically, Tetlock outlines four key adjustments:

  • Value adjustments -- mistakes made were the "right mistakes" given the cost of the alternatives

  • Controversy adjustments -- mistakes were made by the evaluator and not the evaluated
  • Difficulty adjustments -- mistakes were made because the problem was so difficult or, at least, more difficult than problems a comparable body of analysts typically faced

  • Fuzzy set adjustments -- mistakes were made but the estimate was a "near miss" so it should get partial credit

This parade of horribles should not be construed as a defense of the school of thought that says that intelligence cannot be evaluated, that it is too hard to do. It is merely to show that evaluating intelligence products is truly difficult and fraught with traps to catch the unwary. Any system established to evaluate intelligence products needs to acknowledge these issues and, to the greatest degree possible, deal with them.

Many of the "adjustments", however, can also be interpreted as excuses. Just because something is difficult to do doesn't mean you shouldn't do it. An effective and appropriate system for evaluating intelligence is an essential step in figuring out what works and what doesn't, in improving the intelligence process. As Tetlock notes (p. 9), "The list (of adjustments) certainly stretches our tolerance for uncertainty: It requires conceding that the line between rationality and rationalization will often be blurry. But, again, we should not concede too much. Failing to learn everything is not tantamount to learning nothing."

Tomorrow: The Problems With Evaluating Process

Tuesday, January 27, 2009

Part 3 -- A Model For Evaluating Intelligence (Evaluating Intelligence)

Part 1 -- Introduction
Part 2 -- A Tale Of Two Weathermen

Clearly there is a need for a more sophisticated model for evaluating intelligence – one that takes not only the results into consideration but also the means by which the analyst arrived at those results. It is not enough to get the answer right; analysts must also “show their work” in order to demonstrate that they were not merely lucky.

For the purpose of this series of posts, I will refer to the results of the analysis -- the analytic estimate under consideration -- as the product of the analysis. I will call the means by which the analyst arrived at that estimate the process. Analysts, therefore, can be largely (more on this later) correct in their analytic estimate. In this case, I will define the product as true. Likewise, analysts can be largely incorrect in their analytic estimate in which case I will label the product false.

Just as important, however, is the process. If an analyst uses a flawed, invalid process (much like the bad weatherman used a rule proven to be wrong most of the time), then I would say the process is false. Likewise, if the analyst used a generally valid process, one which produced reasonably reliable results over time, then I would say the process was true or largely accurate and correct.

Note that these two spectra are independent of one another. It is entirely possible to have a true process and a false product (consider the story of the good weatherman). It is also possible to have a false process and a true product (such as with the story of the bad weatherman).

In fact, it is perhaps convenient to think of this model for evaluating intelligence in a small matrix, such as the one below:

There are a number of examples of each of these four basic combinations. For instance, consider the use of intelligence preparation of the battlefield in the execution of combat operations in the Mideast and elsewhere. Both the product and the process by which it was derived have proven to be accurate. On the other hand, statistical sampling of voters (polling) is unquestionably a true process but has, upon occasion, generated spectacularly incorrect results (see Truman v. Dewey…)

False processes abound. Reading horoscopes, tea leaves and goat entrails are all false processes which, every once in a while, turn out to be amazingly accurate. These same methods, however, are even more likely to be false in both process and product.

What are the consequences of this evaluative model? In the first place, it makes no sense to talk about intelligence being “right” or “wrong”. Such an appraisal is overly simplistic and omits critical evaluative information. Evaluators should be able to specify if they are talking about the intelligence product or process or both. Only at this level of detail does any evaluation of intelligence begin to make sense.

Second, with respect to which is more important, product or process, it is clear that process should receive the most attention. Errors in a single product might well result in poor decisions, but are generally easy to identify in retrospect if the process is valid. On the other hand, errors in the analytic process, which are much more difficult to detect, virtually guarantee a string of failures over time with only luck to save the unwitting analyst. This truism is particularly difficult for an angry public or a congressman on the warpath to remember in the wake of a costly “intelligence failure”. This makes it all the more important to embed this principle deeply in any system for evaluating intelligence from the start when, presumably, heads are cooler.

Finally, and most importantly, it makes no sense to evaluate intelligence in isolation – to examine only one case to determine how well an intelligence organization is functioning. Only by examining both product and process systematically over a series of cases does a pattern emerge that allows for appropriate corrective action, if necessary at all, to be taken.

Tomorrow: The Problems With Evaluating Product And Process

Monday, January 26, 2009

Part 2 -- A Tale Of Two Weathermen (Evaluating Intelligence)

Part 1 -- Introduction

I want to tell you a story about two weathermen; one good, competent and diligent and one bad, stupid and lazy. Why weathermen? Well, in the first place, they are not intelligence analysts, so I will not have to concern myself with all the meaningless distinctions that might arise if I use a real example. In the second place, they are enough like intelligence analysts that the lessons derived from this thought experiment – sorry, I mean “story” – will remain meaningful in the intelligence domain.

Imagine first the good weatherman and imagine that he only knows one rule: If it is sunny outside today, then it is likely to be sunny tomorrow (I have no idea why he only knows one rule. Maybe he just got hired. Maybe he hasn’t finished weatherman school yet. Whatever the reason, this is the only rule he knows). While the weatherman only knows this one rule, it is a good rule and has consistently been shown to be correct.

His boss comes along and asks him what the weather is going to be like tomorrow. The good weatherman remembers his rule, looks outside and sees sun. He tells the boss, “It is likely to be sunny tomorrow.”

The next day the weather is sunny and the boss is pleased.

Clearly the weatherman was right. The boss then asks the good weatherman what the weather will be like the next day. “I want to take my family on a picnic,” says the boss, “so the weather tomorrow is particularly important to me.” Once again the good weatherman looks outside and sees sun and says, “It is likely to be sunny tomorrow.”

The next day, however, the rain is coming down in sheets. A wet and bedraggled weatherman is sent straight to the boss’ office as soon as he arrives at work. After the boss has told the good weatherman that he was wrong and given him an earful to boot, the good weatherman apologizes but then asks, “What should I have done differently?”

“Learn more rules!” says the boss.

“I will,“ says the weatherman, “but what should I have done differently yesterday? I only knew one rule and I applied it correctly. How can you say I was wrong?”

“Because you said it would be sunny and it rained! You were wrong!” says the boss.

“But I had a good rule and I applied it correctly! I was right!” says the weatherman.

Let’s leave them arguing for a minute and think about the bad weatherman.

This guy is awful. The kind of guy who sets low standards for himself and consistently fails to achieve them, who has hit rock bottom and started to dig, who is not so much of a has-been as a won’t-ever-be (For more of these see British Performance Evaluations). He only knows one rule but has learned it incorrectly! He thinks that if it is cloudy outside today, it is likely to be sunny tomorrow. Moreover, tests have consistently shown that weathermen who use this rule are far more likely to be wrong than right.

The bad weatherman’s boss asks the same question: “What will the weather be like tomorrow?” The bad weatherman looks outside and sees that it is cloudy and he states (with the certainty that only the truly ignorant can muster), “It is likely to be sunny tomorrow.”

The next day, against the odds, the day is sunny. Was the bad weatherman right? Even if you thought he was right, over time, of course, this weatherman is likely to be wrong far more often than he is to be right. Would you evaluate him based solely on his last judgment or would you look at the history of his estimative judgments?

There are several aspects of the weathermen stories that seem to be applicable to intelligence. First, as the story of the good weatherman demonstrates, the traditional notion that intelligence is either “right” or “wrong” is meaningless without a broader understanding of the context in which that intelligence was produced.

Second, as the story of the bad weatherman revealed, considering estimative judgments in isolation, without also evaluating the history of estimative judgments, is a mistake. Any model for evaluating intelligence needs to (at least) take these two factors into consideration.

Tomorrow: A Model For Evaluating Intelligence

Sunday, January 25, 2009

Evaluating Intelligence (Original Research)

(This is another in a series of posts that I refer to as “experimental scholarship” -- or using the medium of the internet and the vehicle of this blog as a way to put my research online for more or less real-time peer review. Earlier examples of this genre include: A Wiki Is Like A Room..., The Revolution Begins On Page 5, What Is Intelligence? and What Do Words Of Estimative Probability Mean?.

In addition, astute readers will note that some of what I write here I have previously discussed in other places, most notably in an article written with my long-time collaborator, Diane Chido, for Competitive Intelligence Magazine and in a chapter of our book on Structured Analysis Of Competing Hypotheses (written with Diane, Katrina Altman, Rick Seward and Jim Kelly). Diane and the others clearly deserve full credit for their contribution to this current iteration of my thinking on this topic.)


Evaluating intelligence is tricky.

Really tricky.

Sherman Kent, one of the foremost early thinkers regarding the analytic process in the US national security intelligence community wrote in 1976, “Few things are asked the estimator more often than "How good is your batting average?" No question could be more legitimate--and none could be harder to answer.” So difficult was the question that Kent reports not only the failure of a three year effort in the 50’s to establish the validity of various National Intelligence Estimates but also the immense relief among the analysts in the Office of National Estimates (forerunner of the National Intelligence Council) when the CIA “let the enterprise peter out.”

Unfortunately for intelligence professionals, the decisionmakers that intelligence supports have no such difficulty evaluating the intelligence they receive. They routinely and publicly find intelligence to be “wrong” or lacking in some significant respect. Abbot Smith, writing for Studies In Intelligence in 1969, cataloged many of these errors in On The Accuracy Of National Intelligence Estimates. The list of failures at the time included the development of the Soviet H-bomb, the Soviet invasions of Hungary and Czechoslovakia, the Cuban Missile Crisis and the Missile Gap. The Tet Offensive, the collapse of the Soviet Union and the Weapons of Mass Destruction fiasco in Iraq would soon be added to the list of widely recognized (at least by decisionmakers) “intelligence failures”.

Nor was the US the only intelligence community to suffer such indignities. The Soviets had their Operation RYAN, the Israelis their Yom Kippur War and the British their Falklands Island. In each case, after the fact, senior government officials, the press and ordinary citizens alike pinned the black rose of failure on their respective intelligence communities.

To be honest, in some cases, the intelligence organization in question deserved the criticism but, in many cases, it did not -- or at least not the full measure of fault it received. However, whether the blame was earned or not, in the aftermath of each of these cases, commissions were duly summoned, investigations into the causes of the failure examined, recommendations made and changes, to one degree or another, ratified regarding the way intelligence was to be done in the future.

While much of the record is still out of the public eye, I suspect it is safe to say that intelligence successes rarely received such lavish attention.

Why do intelligence professionals find intelligence so difficult, indeed impossible, to evaluate while decisionmakers do so routinely? Is there a practical model for thinking about the problem of evaluating intelligence? What are the logical consequences for both intelligence professionals and decisionmakers that derive from this model? Finally, is there a way to test the model using real world data?

I intend to attempt to answer all of these questions but first I need to tell you a story…

Tomorrow: A Tale Of Two Weathermen