Saturday, February 7, 2009

Part 8 -- Batting Averages (Evaluating Intelligence)

Part 1 -- Introduction
Part 2 -- A Tale Of Two Weathermen
Part 3 -- A Model For Evaluating Intelligence
Part 4 -- The Problem With Evaluating Intelligence Products
Part 5 -- The Problem With Evaluating The Intelligence Process
Part 6 -- The Decisionmaker's Perspective
Part 7 -- The Iraq WMD Estimate And Other Iraq Pre-War Assessments

Despite good reasons to believe that the findings of the Iraq WMD National Intelligence Estimate NIE) and the two pre-war Intelligence Community Assessments (ICAs) regarding Iraq can be evaluated as a group for insights into the quality of the analytic processes used to produce these products, several problems remain before we can determine the "batting average".
  • Assumptions vs. Descriptive Intelligence: The NIE drew its estimative conclusions from what the authors believed were the facts based on an analysis of the information collected about Saddam Hussein's WMD programs. Much of this descriptive intelligence (i.e. that information which was not proven but clearly taken as factual for purposes of the estimative parts of the NIE) turned out to be false. The ICAs, however, are largely based on a series of assumptions either explicitly or implicitly articulated in the scope notes to those two documents. This analysis, therefore, will only focus on the estimative conclusions of the three documents and not on the underlying facts.
  • Descriptive Intelligence vs. Estimative Intelligence: Good analytic tradecraft has always required analysts to clearly distinguish estimative conclusions from the direct and indirect information that supports those estimative conclusions. The inconsistencies in the estimative language along with the grammatical structure of some of the findings makes this particularly difficult. For example, the Iraq NIE found: "An array of clandestine reporting reveals that Baghdad has procured covertly the types and quantities of chemicals and equipment sufficient to allow limited CW agent production hidden in Iraq's legitimate chemical industry." Clearly the information gathered suggested that the Iraqi's had gathered the chemicals. What is not as clear is if they were they likely using them for limited CW production or if they merely could use these chemicals for such purposes. A strict constructionist would argue for the latter interpretation whereas the overall context of the Key Judgments would suggest the former. I have elected to focus on the context to determine which statements are estimative in nature. This inserts an element of subjectivity into my analysis and may skew the results.
  • Discriminative vs. Calibrative Estimates: The language of the documents uses both discriminative ("Baghdad is reconstituting its nuclear weapons program") and calibrative language ("Saddam probably has stocked at least 100 metric tons ... of CW agents"). Given the seriousness of the situation in the US at at that time, the purposes for which these documents were to be used, and the discussion of the decisonmaker's perspective in part 6 of this series, I have elected to treat calibrative estimates as discriminative for purposes of evaluation.
  • Overly Broad Estimative Conclusions: Overly broad estimates are easy to spot. Typically these statements use highly speculative verbs such as "might" or "could". A good example of such a statement is the claim: "Baghdad's UAVs could threaten Iraq's neighbors, US forces in the Persian Gulf, and if brought close to, or into, the United States, the US homeland." Such alarmism seems silly today but it should have been seen as silly at the time as well. From a theoretical perspective, these type of statements tell the decisionmaker nothing useful (anything "could" happen; everything is "possible"). One option, then, is to mark these statements as meaningless and eliminate them from consideration. This, in my mind, encourages this bad practice and I intend to count these kinds of statements as false if they turned out to have no basis in fact (I would under this same logic have to count them as true if they turned out to be true, of course).
  • Weight of the Estimative Conclusion: Some estimates are clearly more fundamental to a report than others. Conclusions regarding direct threats to US soldiers, for example, should trump any minor and indirect consequences regarding regional instability identified in the reports. Engaging in such an exercise might be something appropriate for individuals directly involved in this process and in a better position to evaluate these weights. I, on the other hand, am looking for only the broadest possible patterns (if any) from the data. I have, therefore decided to weigh all estimative conclusions equally.
  • Dealing with Dissent: There were several dissents in the Iraq NIE. While the majority opinion is, in some sense, the final word on the matter, an analytic process that tolerates formal dissent deserves some credit as well. Going simply with the majority opinion does not accomplish this. Likewise, eliminating the dissented opinion from consideration gives too much credit to the process. I have chosen to count those estimative conclusions with dissents as both true and false (for scoring purposes only).
Clearly, given the caveats and conditions under which I am attempting this analysis, I am looking only for broad patterns of analytic activity. My intent is not to spend hours quibbling about all of the various ways a particular judgment could be interpreted as true or false after the fact. My intent is to merely make the case that evaluating intelligence is difficult but, even with those difficulties firmly in mind, it is possible to go back, after the fact, and, if we look at a broad enough swath of analysis, come to some interesting conclusions about the process.

Within these limits, then, by my count, the Iraq NIE contained 28 (85%) false estimative conclusions and 5 (15%) true ones. This conclusion tracks quite well with the WMD Commission's own evaluation that the NIE was incorrect in "almost all of its pre-war judgments about Iraq's weapons of mass destruction." By my count, the Regional Consequences of Regime Change in Iraq ICA fares much better with a count of 23 (96%) correct estimative conclusions and only one (4%) incorrect one. Finally, the report on the Principal Challenges in Post-Saddam Iraq nets 15 (74%) correct analytic estimates to 4 (26%) incorrect ones. My conclusions are certainly consistent with the tone of the Senate Subcommittee Report.
  • It is noteworthy that the Senate Subcommittee did not go to the same pains to compliment analysts on their fairly accurate reporting in the ICAs as the WMD Commission did to pillory the NIE. Likewise, there was no call from Congress to ensure that the process involved in creating the NIE was reconciled with the process used to create the ICAs, no laws proposed to take advantage of this largely accurate work, no restructuring of the US national intelligence community to ensure that the good analytic processes demonstrated in these ICAs would dominate the future of intelligence analysis.
The most interesting number, however, is the combined score for the three documents. Out of the 76 estimative conclusions made in the three reports, 43 (57%) were correct and 33 (43%) incorrect. Is this a good score or a bad score? Such a result is likely much better than mere chance, for example. For each judgment made there were likely many reasonable hypotheses considered. If there were only three reasonable hypotheses to consider in each case, the base rate would be 33%. On average, the analysts involved were able to nearly double that "batting average".

Likewise it is consistent with both hard and anecdotal data of historical trends in analytic forecasting. Mike Lyden, in his thesis on Accelerated Analysis, calculated that, historically, US national security intelligence community estimates were correct approximately 2/3 of the time.

Former Director of the CIA, GEN Michael Hayden, made his own estimate of analytic accuracy in May of last year, ""Some months ago, I met with a small group of investment bankers and one of them asked me, 'On a scale of 1 to 10, how good is our intelligence today?' I said the first thing to understand is that anything above 7 isn't on our scale. If we're at 8, 9, or 10, we're not in the realm of intelligence—no one is asking us the questions that can yield such confidence. We only get the hard sliders on the corner of the plate."

Given these standards, 57%, while a bit low by historical measures, certainly seems to be within normal limits and, even more importantly, consistent with what the US has routinely expected from its intelligence community.

Tomorrow: Final Thoughts
Reblog this post [with Zemanta]

Friday, February 6, 2009

Part 7 -- The Iraq WMD Estimate And Other Iraq Pre-War Assessments (Evaluating Intelligence)

Part 1 -- Introduction
Part 2 -- A Tale Of Two Weathermen
Part 3 -- A Model For Evaluating Intelligence
Part 4 -- The Problems With Evaluating Intelligence Products
Part 5 -- The Problems With Evaluating Intelligence Processes
Part 6 -- The Decisionmaker's Perspective

Perhaps the most famous document leading up to the war in Iraq is the much-maligned National Intelligence Estimate (NIE) titled Iraq's Continuing Programs for Weapons Of Mass Destruction completed in October, 2002 and made public (in part) in April, 2004. Subjected to extensive scrutiny by the Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction, this NIE was judged "dead wrong" in almost all of its major estimates.

Far less well known are the two Intelligence Community Assessments (ICA) both completed in January, 2003. The first, Regional Consequences of Regime Change in Iraq, was made public in April, 2007 as was the second ICA, Principal Changes in Post-Saddam Iraq. Both documents were part of the US Senate's Select Subcommittee on Intelligence report on Pre-War Intelligence Assessments About Post War Iraq and both (heavily redacted) documents are available as appendices to the subcommittee's final report.

The difference between an NIE and an ICA seems modest to an outsider. Both types of documents are produced by the National Intelligence Council and both are coordinated within the US national security intelligence community and, if appropriate, with cleared experts outside the community. The principal differences appear to be the degree of high level approval (NIEs are approved at a higher level than ICAs) and the intended audiences (NIEs are aimed at high level policy makers while ICAs are geared more to the desk-analyst policy level (Thanks, Elizabeth!).

In this case, there appears to be at least some overlap in the actual drafters of the three documents. Paul Pillar, National Intelligence Officer (NIO) for the Near East and South Asia at the time was primarily responsible for coordinating (and, presumably drafting) both of the ICAs. Pillar also assisted Robert D. Walpole, NIO for Strategic and Nuclear Programs in the preparation of the NIE (along with Lawrence K. Gershwin, NIO for Science and Technology and Major General John R. Landry, NIO for Conventional Military Issues).

Despite the differences in the purposes of these documents, it is likely safe to say that the fundamental analytic processes -- the tradecraft and evaluative norms -- were largely the same. It is highly unlikely, for example, that standards such as "timeliness" and "objectivity" were maintained in NIEs but abandoned in ICAs.

Why is this important? As discussed in detail in Part 3 of this series, it is important, in evaluating intelligence, to cast as broad a net as possible, to not only look at examples where the intelligence product was false but also cases where the intelligence product was true and, in turn, examine the process in both cases to determine if the analysts were good or just lucky or bad or just unlucky. These three documents, prepared at roughly the same time, under roughly the same conditions, with roughly the same resources on roughly the same target allows the accuracy of the estimative conclusions in the documents to be compared with some assurance that doing so may help get at any underlying flaws or successes in the analytic process.

Monday: The Score

Wednesday, February 4, 2009

Part 6 -- The Decisionmaker's Perspective (Evaluating Intelligence)

Part 1 -- Introduction

Part 2 -- A Tale Of Two Weathermen
Part 3 -- A Model For Evaluating Intelligence
Part 4 -- The Problems With Evaluating Intelligence Products
Part 5 -- The Problems With Evaluating Intelligence Process

Decisionmakers are charged with making decisions. While this statement is blindingly obvious, its logical extension actually has some far reaching consequences.

First, even if the decision is to "do nothing" in a particular instance, it is still a decision. Likewise, with the authority to make a decision comes (or should come) responsibility and accountability for that decision's consequences (The recent kerfluffle surrounding the withdrawal of Tom Daschle for an appointment in the Obama cabinet is instructive in this matter -- see the video below).

Driving these decisions are typically two kinds of forces. The first is largely internal to the individual or organization making the decision. The focus here is on the capabilities and limitations of the organization itself: How well-trained are my soldiers? How competent are my salespeople? How competitive are my prices, how efficient are my logistics or how well equipped are my police units? Good decisionmakers are often comfortable here. They know themselves quite well. Oftentimes they have risen through the ranks of an organization or even started the organization on their own. The internal workings of a decisionmaker's own organization are easiest (if not easy) to see, predict and control.

The same cannot be said of external forces. The current upheaval in the global market is likely, for example, to affect even good, well-run businesses in ways that are extremely difficult to predict, much less control. The opaque strategies of state and non-state actors threaten national security plans and the changing tactics of organized criminal activity routinely frustrate law enforcement professionals. Understanding these external forces is a defining characteristic of intelligence and the complexity of these forces is often used to justify substantial intelligence budgets.

Experienced decisionmakers do not expect intelligence professionals to be able to understand external forces to the same degree that it is possible to understand internal forces. They do expect intelligence to reduce their uncertainty, in tangible ways, regarding these external forces. Sometimes intelligence provides up-to-date descriptive information, unavailable previously to the decisionmaker (such as the U2 photographs in the run-up to the Cuban Missile Crisis). Decisionmakers, however, find it much more useful when analysts provide estimative intelligence -- assessments about how the relevant external forces are likely to change.

  • Note: I do not intend to address concerns regarding bad or stupid decisionmakers in this series of posts, though both clearly exist. These concerns are outside the scope of a discussion about evaluating intelligence and fall more naturally into the realms of management studies or psychology. I have a slightly different take on inexperienced (with intel) decisionmakers, however. I teach my students that intel professionals have an obligation to teach the decisionmakers they support about what intel can and cannot do in the same way the grizzled old platoon sergeant has an obligation to teach the newly minted second lieutenant about the ins and outs of the army.
Obviously then, knowing, with absolute certainty, where the relevant external forces will be and what they will be doing is of primary importance to decisionmakers. Experienced decisionmakers also know that to expect such precision from intelligence is unrealistic. Rather, they expect that the estimates they receive will only reduce their uncertainty about those external forces, allowing them to plan and decide with greater clarity.

Imagine, for example, a battalion commander on a mission to defend a particular piece of terrain. The intelligence officer tells the commander that the enemy has two primary avenues of approach, A and B, and that it is "likely" that the enemy will choose avenue A. How does this intelligence inform the commander's decision about how to defend the objective?

For the sake of argument, let's assume that the battalion commander interprets the word "likely" as meaning "about 60%". Does this mean that the battalion commander should allocate about 60% of his forces to defending Avenue A and 40% to defending Avenue B? That is a solution but there are many, many ways in which such a decision would make no sense at all. The worst case scenario for the battalion commander, however, is if he only has enough forces to adequately cover one of the two avenues of approach. In this case, diverting any forces at all will guarantee failure.

Assuming an accurate analytic process and all other things being equal (and I can do that because this is a thought experiment), the commander should align his forces along Avenue A in this situation. This gives him the best chance of stopping the enemy forces. This decisionmaker, with his limited forces, is essentially forced by the situation to treat a 60% probability as 100% accurate for planning purposes. Since many decisions are (or appear to be to the decisionmaker) to be of this type, it is no wonder that decisionmakers, when they evaluate intelligence, tend to focus on the ability to discriminate between possible outcomes over the ability to calibrate the estimative conclusion.

Tomorrow: The Iraq NIEs

Reblog this post [with Zemanta]">

Tuesday, February 3, 2009

CIA National Clandestine Service Ad On YouTube (YouTube via Got Geoint?)

The US Geospatial Intelligence Foundation's surprisingly hip Got Geoint? blog pointed to an interesting recruiting ad from the CIA (see below). The content is about what you would expect but the venue -- YouTube -- seems new. Worth a look (Another thing worth a look is Got Geoint? It is destined to be the Danger Room of the Geoint beat...)