Sources And Methods: WEPs

Showing posts with label WEPs. Show all posts

Monday, May 19, 2008

Saying One Thing And Doing Another: A Look Back At Nearly 60 Years Of Estimative Language (Original Research)

US News and World Reports has an interesting story about the current state of intelligence reform. According to the article, CIA Director, Mike Hayden, said,

"Some months ago, I met with a small group of investment bankers and one of them asked me, 'On a scale of 1 to 10, how good is our intelligence today?'" recalled Hayden. "I said the first thing to understand is that anything above 7 isn't on our scale. If we're at 8, 9, or 10, we're not in the realm of intelligence—no one is asking us the questions that can yield such confidence. We only get the hard sliders on the corner of the plate. Our profession deals with subjects that are inherently ambiguous, and often deliberately hidden. Even when we're at the top of our game, we can offer policymakers insight, we can provide context, and we can give them a clearer picture of the issue at hand, but we cannot claim certainty for our judgments."

(For those of you keeping score at home, Hayden said much the same thing last year during an interview with CSPAN...)

Frankly, I don't know anyone knowledgeable about the strengths and weaknesses of intelligence that doesn't agree with this statement. Certitude is impossible. That is what makes the chart below so darn interesting:

The chart is from Rachel Kesselman's recently completed thesis, Verbal Probability Expressions In National Intelligence Estimates: A Comprehensive Analysis Of Trends From The Fifties Through Post 9/11. The chart shows the number of times the word "will" has been used, in an estimative sense (e.g "X will happen"), in the Key Judgments of 120 National Intelligence Estimates (NIE) over the last 58 years (20 per decade) that she examined.

In fact, at 717 times, the word "will" was the single most commonly used estimative word, by a very large margin, in NIEs. Not only was it the single most commonly used word, it was also one of the most consistently used words across the decades (tests Rachel ran showed that the variances across the decades were not statistically significant).

So...if certitude is impossible, why does the Intelligence Community use "will" -- a word that reeks of certitude -- so often in its estimates? Such a result is absolutely inconsistent with statements, such as Hayden's above, made by virtually everyone who has ever jumped up to defend intelligence's predictive track record.

This was only one of the many fascinating results that came out of Rachel's exhaustive study of the words that analysts have used over the years to verbally express probabilities

Rachel's lit review, for example, makes for very interesting reading. She has done a thorough search of not only the intelligence but also the business, linguistics and other literatures in order to find out how other disciplines have dealt with the problem of "What do we mean when we say something is 'likely'..." She uncovered, for example, that, in medicine, words of estimative probability such as "likely", "remote" and "probably" have taken on more or less fixed meanings due primarily to outside intervention or, as she put it, "legal ramifications". Her comparative analysis of the results and approaches taken by these other disciplines is required reading for anyone in the Intelligence Community trying to understand how verbal expressions of probability are actually interpreted.

Another of my favorite charts is the one below:

This chart examines the use of the NIC's nine currently "approved" words of estimative probability (See page 5 of this document for additional discussion) across the decades. The NICs list only became final in the last several years so it is arguable whether this list of nine words really captures the breadth of estimative word usage across the decades. Rather, it would be arguable if this chart didn't make it crystal clear that the Intelligence Community has really relied on just two words, "probably" and "likely" to express its estimates of probabilities for the last 60 years. All other words are used rarely or not at all.

Based on her research of what works and what doesn't and which words seem to have the most consistent meanings to users, Rachel even offers her own list of estimative words along with their associated probabilities:

Rachel's work tracks well with my own examination of word usage in recent NIEs and with some of the findings in Mike Lyden's thesis on Accelerated Analysis, but her thesis really stands on its own and my brief description and summary of some of the highlights does not do it justice. It is a first-of-its-kind, longitudinal study of estimative word usage by the intelligence community and has contributed significantly to my own understanding of where the Intelligence Community has been over the last 58 years. I think readers of this blog will be more than a little interested in her results and recommendations as well.

Related Posts:
The Revolution Begins On Page Five...
Accelerated Analysis: A New And Promising Intelligence Process
What Do Words Of Estimative Probability Mean?

Monday, March 24, 2008

What Do Words Of Estimative Probability Mean? (Final Version With Abstract)

Abstract:

The value of Word of Estimative Probability (WEPs) is, of course, an ongoing question both within the intelligence community and among its critics. At one end of the spectrum are those, who call for numeric estimates. At the other end of the spectrum are those who believe that it doesn’t matter what an analyst says, policymakers and others will interpret the analysis however they wish. The intelligence community (IC) has recently moved further in the direction of a position that that is clearly on more formal side of the spectrum as the “best practice” for effectively communicating the results of intelligence analysis to decisionmakers. Much of the reason for using WEPs instead of numbers centers around the imprecise nature of intelligence analysis in general, coupled with the misunderstandings that could arise in the minds of decisionmakers if analysts used numbers to communicate their estimative judgments. A large part of the argument against WEPs, on the other hand, has to do with the imprecise meaning of the words themselves. In other words, what exactly does ‘likely” mean? Exploring these ideas and how best to teach them to Intelligence Studies students is the purpose of this article.

PDF Version (Pre-pub/Complete)

HTML Version:
Part 1 -- Introduction
Part 2 -- To Kent And Beyond
Part 3 -- The Exercise And Its Learning Objectives
Part 4 -- Teaching Points
Part 5 -- A Surprise Ending

Sunday, March 23, 2008

The Revolution Begins On Page Five: The Changing Nature Of The NIE And Its Implications For Intelligence (Final Version With Abstract)

Abstract:

There has been a good bit of discussion in the press and elsewhere concerning the recently released National Intelligence Estimate (NIE) on Iran’s nuclear program. Virtually all of this commentary has focused on the facts, sources and logic – the content – of the estimate. It is my position that, while the content is fascinating, the most interesting story behind the NIE has to do with the changes in form that this latest NIE has adopted; that what the National Intelligence Council (NIC) has said is, in many ways, less interesting than the way it has decided to say it. This shift in form implies a new, emerging theory of intelligence – what intelligence is and how to do it – that is likely to influence intelligence communities worldwide. “Emerging”, however, is the key term here. As this article will highlight, the revolution may have begun but it is far from complete

PDF Version (Pre-pub/Complete)

HTML Version:
Part 1 -- Welcome To The Revolution
Part 2 -- Some History
Part 3 -- The Revolution Begins
Part 4 -- Page Five In Detail
Part 5 -- Enough Exposition, Let's Get Down To It...
Part 6 -- Digging Deeper
Part 7 -- Looking At The Fine Print
Part 8 -- Confidence Is Not the Only Issue
Part 9 -- Waffle Words And Intel-Speak
Part 10 -- The Problem With “If”
Part 11 -- One More Thing
Part 12 -- Final Thoughts
Epilogue

Tuesday, March 4, 2008

Part 5 -- A Surprise Ending (What Do Words Of Estimative Probability Mean?)

Part 1 -- Introduction
Part 2 -- To Kent And Beyond
Part 3 -- The Exercise And Its Learning Objectives
Part 4 -- Teaching Points

So far in this series, I have discussed the issues surrounding the use of Words Of Estimative Probability as a way of communicating the results of intelligence analysis to real-world decisionmakers. I have tried to devise an exercise that can demonstrate to intelligence studies students that, while a consistent and limited series of so called "good" WEPs (like the ones the National Intelligence Council (NIC) has adopted for use in its recent National Intelligence Estimates (NIEs)) constitute the current "best practice" in communicating the results of analysis, it is far from a perfect system. Studies both within the intelligence community and from fields such as medicine, finance and meteorology have all demonstrated that people assign only roughly consistent meanings to WEPs -- that one person's "likely" is another person's "virtually certain".

As I began to look at the data from my recent round of this classroom exercise, I began to notice something interesting, though. There seemed to be a level of consistency in the data that I had not noticed before. Was it there previously and I just missed it? I don't know. I don't typically keep the data from these exercises and the only reason I had this batch of data was because it was buried in one of the many piles of paper I have in my office (I believe in that ancient organizational system -- mounding).

I decided to take a closer look at the data. I was surprised by what I saw. While some individuals were throwing the full range out of whack (and keeping the teaching points in the exercise relevant), these were clearly statistical outliers. The bulk of the students were congregating quite nicely around an approximately ideal trendline. To be sure, the results were still off in places, but the results were much closer to optimal than I expected.

I have reproduced the aggregate results in a chart below. I have used what financial analysts call a high-low-close chart that marks the average high score, the average low score and the average point value for each WEP. I have also included the idealized trendline and have connected the high and low averages so you can see how the range fluctuates as the probabilities associated with each WEP increases.

If you want to see the raw data, I have included it in the chart below:

(Notes on the chart: The "High" column represents the average high score while the "Low" column represents the average low score for each WEP. The "Odds" column represents the average point value given for each WEP. The "High-Low" column represents the range (difference between high and low score) for each WEP. The "Odds-odds" column represents the difference between the average point value from one WEP to another. N=18)

While I know there are statistical nuances that I have not accounted for in the way I have calculated and displayed the data, the overall pattern seems to suggest to me that there may be something interesting going on here. We can be pretty adamant about the use of good WEPs here at Mercyhurst. The students in this exercise have been exposed to that thinking and it seems to have calibrated their use of WEPs to a certain degree.

There is, in fact, precedent for this kind of calibration. According to Rachel Kesselman's early results, the medical profession, with outside pressure from the insurance industry, has adopted a more or less "accepted" meaning for a number of WEPs (used primarily in prognostic statements to patients and their families). The same thing might well be happening here (Note: My colleague, Steve Marrin, has done a number of papers on the more general aspects of the medical analogy to the intelligence profession. All are worth checking out).

The key seems to be, in all these cases, outside pressure. In the case of our students the pressure comes from the professors. In the case of the medical profession, the pressure comes from the insurance companies. I have already argued that the potential for public exposure of the results of NIEs is one of the primary drivers behind a more consistent and rigorous approach to the communication of estimates in general. It may well be that this potential for public exposure will force the meanings of WEPs to collapse around certain estimative ranges as well.

Sunday, March 2, 2008

Part 4 -- Teaching Points (What Do Words Of Estimative Probability Mean?)

Part 1 -- Introduction
Part 2 -- To Kent And Beyond
Part 3 -- The Exercise And Its Learning Objectives

Given the withering criticism offered by Kent and Schrage and the wide range of other studies regarding the appropriate interpretation of Words of Estimative Probability (WEPs), it is fairly easy to get intelligence studies students to see the problems with using "bad" WEPs in their estimative statements. Bad WEPS, which include such words as "could", "may", "might" and "possible", convey such a broad range of probabilities that, in the best case, they do little to reduce a decisionmaker's uncertainty concerning an issue and, at worst, create the sense, in the decisionmaker's mind, that the analyst is simply trying to cover his or her backside in the event of a failed estimative conclusion.

Student analysts, then, are generally happy to see that the National Intelligence Council (NIC) has "solved" this problem with their notional scale of appropriate WEPs (the scale is available on page five of the latest Iran NIE and was discussed earlier in this series). This scale not only provides adequate gradations of probability (translated into words, of course) but also avoids the use of either numbers or bad WEPs; both of which, for different reasons, appear to be goals of the NIC in these public documents.

While there are many possible ways to explore with students the data generated by the exercise described in Part 3, my primary teaching point is to disabuse entry level analysts of the idea that the problems regarding communicating estimative conclusions to decisionmakers have been, in any way, "solved". Rather, I want my students to come away with the idea that using WEPs in a more-or-less formal way, while currently the best practice, is a system that can still be improved upon; that it is an important question of intelligence theory that deserves additional research and study.

I generally start the review of the results of the exercise by exploring how "rational" (in a classical economic sense) the students were in assigning point values and ranges to the various WEPs. I point out the words are clearly ordered in increasing order of likelihood and it makes sense, absent other information, to assign levels of probability at equal intervals to each of the words. There are eight words and 100 possible percentage points and a wholly "rational" person would place each word, therefore, about 12% points apart. When you ask students, however, to look at the differences between the point values of each word they will typically see nothing that comes even close to this rational approach. The vast majority of students will have assigned probabilities intuitively with little regard for the mathematic difference between one word and another.

The results are even worse when you ask students to look at the range of values for each word. Again, the rational person would have assigned equal ranges for each of the words but students typically do not. A good exercise to do at this point is to pick a word and find out who in the class had the lowest score and who in the class gave the highest score and to then ask the students to justify their decisions for doing so. This range is typically quite broad and the justifications for selecting one number over another are typically quite vague.

Inevitably, there will be a handful of students in each class who have, in fact, done the math and calculated both the point values and the ranges accordingly. This exercise offers two places to highlight the problems with this approach. First, the exercise separates out the words "probably" and "likely". That is not the case with the NIC's chart which treats the two words as synonymous. While it is quite surprising for the NIC to treat these words this way since much of the literature does not indicate that people actually see them as synonymous, the net effect in this exercise is to create a learning opportunity. It is rare for a student to have taken into account the idea that two words may be partly or largely synonymous in their mathematical calculations.

Likewise, there is an even better chance for learning in examining the results for the "even chance" WEP. "Even chance" would appear to mean exactly what it says -- an even chance, 50-50. Some students will inevitably interpret it in this literal way and assign a point probability of 50% to the WEP and also mark both its high and low scores at 50%. Other students will see the phrase more generally and, while typically giving it a point value of 50%, will also include a range of values around it such that "even chance" could mean anything from 40-60%! Of course, there is no right answer here, both sides can make valid arguments, and fomenting this discussion is the ultimate point of this part of the exercise.

The relative firmness of "even chance" coupled with the synonymity problem described earlier also lends itself to a further examination of the mathematical approach. Few of the mathematicians in the class will have noticed that there are three WEPs below even chance and four above it, creating an uneven distribution centering on the 50% (more or less) probability ascribed to the phrase "even chance". A wholly logical approach would lead to an uneven distribution of both the point values and the ranges for those WEPs below "even chance" when compared with those WEPs above it.

Students are typically confused by the end of this exercise. While they do (or should) fully understand the problems with waffle words such as "could", "may", "might" and "possible", and were willing to applaud the NIC's efforts at standardization, they now see these "approved" words as far more squishy than they had previously thought. Good. This is exactly the time to reinforce the message laid out at the beginning of this post; to bring students back full circle. As analysts, they have an obligation to communicate as effectively as possible the results of their intelligence analysis to decisionmakers. What this exercise and the learning that went on before it demonstrate is that there is not yet a perfect way to do this; there is only a best practice that tries to balance the competing concerns. In my mind, it is the degree to which students come to understand not only the best practice but also these concerns that marks the difference between a well-trained analyst and a well-educated one.

Tomorrow -- A Surprise Ending

Friday, February 29, 2008

Part 3 -- The Exercise And Its Learning Objectives (What Do Words Of Estimative Probability Mean)

Part 1 -- Introduction
Part 2 -- To Kent And Beyond

The issue of the use of Words Of Estimative Probability (WEPs) is one of the most significant theoretical issues in the intelligence profession. What is the best way to communicate the results of intelligence analysis to decisionmakers? If it is to be through WEPs, shouldn’t we know what they mean? This is why I think the work of Kent, Heuer, Rieber and, soon, Kesselman, (all referenced in the last post) is so enormously important.

At Mercyhurst, we have been teaching WEPs as the “best practice” for communicating with decisionmakers for at least as long as I have been here (2003) and probably well before that. While we teach it as a best practice, we do not avoid the controversy surrounding this practice. The classroom exercise that I am about to describe is specifically designed to highlight both the strengths and weaknesses of WEPs. My goal is to get my students to understand the limits as well as the utility of WEPs, to get them to think about the boundaries implicit in any theory and not just to “know stuff”.

Therefore, this classroom exercise does not present the meanings of WEPs as a fait accompli to the students. The exercise is designed to capture both the point value (Heuer) and the range of values (Rieber) behind a select series of WEPS. The WEPs I choose to use are those that come directly from the recent series of National Intelligence Estimates (NIEs). These NIEs, which I have discussed in detail earlier, all include a sort of scale that leaves the impression of the probabilities associated with particular words without actually mentioning any numbers. I have included a graphic (taken from the most recent NIE on Iran and its nuclear ambitions) of the scale below.

To set the stage for the exercise, I converted the scale above into the graphic below. Note that I left the two right hand column headings empty and that I separated the words "probably" and "likely" into their own rows. I did this in order to help me make some key teaching points later on.

I hand out this sheet to each student and ask them to state, in terms of a single number (as with the study reported by Heuer), what each word means in terms of probability. I usually give them an example such as: "If you think "remote" means a 1% chance of whatever it is you are studying happening, then write "1" in the block for remote." I always choose "remote" or "virtually certain" for these examples as I know I run the risk of anchoring the students when I give such an example and I figure it is safest to anchor at the extremes where it is less likely to influence the overall outcome.

Once all of the students have filled in the first column, I ask them to label the next two columns, "Low" and "High". I ask them to write the lowest and the highest percentage they would assign to each word in those two columns. Once they have completed this task, I ask them to calculate the difference between each word in the "odds" column (For example, if a student wrote 1 for "remote" and 20 for "very unlikely" then the difference would be 19). I also ask them to calculate the range of their answers for each word (For example, if the low score for "very unlikely" was 10 and the high score was 30, then the range would be 20). In this part of the exercise, I am clearly mirroring the study reported by Rieber.

Handing out the forms, explaining the instructions and actually having the students fill in the sheets can take as little as 5 or as many as 15 minutes depending on the types of students you have and the level of sophistication with WEPs in general. In my last class where I used this specific exercise I think it took me all of 5 minutes but that class was quite bright and very used to the concept of WEPs. Once all the numbers have been entered and the calculations complete, it is time to start making teaching points which I will discuss in the post on Monday.

Monday -- Teaching Points

Thursday, February 28, 2008

Part 2 -- To Kent And Beyond (What Do Words Of Estimative Probability Mean?)

Part 1 -- Introduction

The discussion of Words of Estimative Probability (WEPs) starts with Sherman Kent’s seminal essay on the topic but hardly ends there. Linguistics experts have done a large number of studies on what they refer to (among other things) as “verbal expressions of probability”, “verbally expressed uncertainties” or “verbal probability expressions”. Others, in the fields of finance, health (Thanks, Rob!) and meteorology have also wrestled with this question.

I am advising one of our graduate students, Rachel Kesselman, on her thesis which will address all these literatures at some length. She is scheduled to present her preliminary findings at the ISA conference at the end of March and will likely complete her thesis (which focuses on the historical use of WEPs in National Intelligence Estimates) sometime in May or June. I won’t steal her thunder, then, but suffice it to say that this is a well studied topic outside the IC.

Within the IC, though, there appears to be a limited number of studies on the topic. Steve Rieber presented his own paper on the meaning of WEPs a couple of years ago at the ISA conference. At the time, he cited only two studies as major research findings within the realm of intelligence analysis: One in Dick Heuer’s classic, The Psychology Of Intelligence Analysis, and one (at least part of the basis for Rieber's paper) from a study of Kent School analysts. In the study cited from Heuer, analysts gave a single numerical probability for each word. For example, one analyst might claim that the word “likely” suggests a 75% probability while another might claim that it suggests only a 60% probability. Kent School analysts, on the other hand, were asked to give a range of values for each word. The charts showing both results are below (Heuer's is on top and Rieber's is on bottom).

The conclusion from both studies was that the level of agreement was rough, to say the least. There was a distinct difference between words at either end of the spectrum (such as “highly unlikely” and “highly likely”) but differences between words that were closer together in meaning (such as “probably” and “likely”) hardly seemed to be differences at all.

Other writers have tried to more or less establish statistical meanings to the words by simply declaring that certain words have certain probabilistic meanings. Kent's own attempt fell much along these lines as does the recent attempt (Thanks, Ted!) by the authors of Joint Publication 2-0, "Joint Intelligence", Appendix A (published 22 JUN 07). The fundamental problem with dictating these intervals is that it ignores the considerable evidence (including the two studies cited above) suggesting that people don't think about these words in these rigid ways (The problems with the Joint Pub run even deeper as it unnecessarily confuses the ideas of probability and confidence and is, as a consequence, 180 degrees out from what the National Intelligence Council was promulgating at approximately the same time! All this argues, I might add, for a need for more research into intelligence theory and, in the interim, some standardized estimative language that reflects the current best practice.)

What is clear, however, is that decisionmakers want clarity and consistency in the language of intelligence estimates. One of our former grad students, Jen Wozny, did a very strong thesis on this subject a number of years ago (Available, unfortunately only through inter-library loan at Mercyhurst's Hammermill Library). She looked at what over 40 decisionmakers, from the national security, business and law enforcement fields, wanted from intelligence. Two of the items that consistently popped up were clarity and consistency in the language that intelligence analysts used to communicate the results of their analysis. Peter Butterfield, in a comment to yesterday's introductory post, indicated similar concerns on the part of his decisionmakers.

Tomorrow -- The Exercise And Its Learning Objectives

Wednesday, February 27, 2008

What Do Words Of Estimative Probability Mean? (Part 1 -- Introduction)

(Note: This is another attempt at what I call "experimental scholarship" (See this series for my first attempt). The discussion regarding the use of blogs as a way to publish scholarly works (or, in my case, more-or-less scholarly works...) is pretty hot and heavy right now. However, I found writing an article in the form of a series of blog posts extraordinarily useful the first time, if only for the comments that I received that I am sure will make any traditional journal article just that much better. It was the positive feedback I received from that experience that makes me want to give it another go.)

I was cleaning my office this week in anticipation of a new term (we are on a quarter system at Mercyhurst) and I ran across the results of a classroom exercise I conduct regarding the meaning of words of estimative probability (such as “likely” or “virtually certain”) or as they are commonly referred to around here, WEPs. I thought some discussion of the exercise I use and the results of that exercise would be of interest to intelligence studies students and educators.

The value of WEPs is, of course, an ongoing question both within the intelligence community and among its critics. At one end of the spectrum are those, like Michael Schrage, who call for numeric estimates -- x has a 75% chance of happening plus or minus 10%, that sort of thing. At the other end of the spectrum are those who Sherman Kent called “poets” who believe that it doesn’t matter what an analyst says, policymakers and others will interpret the analysis however they wish. The intelligence community (IC) has recently moved further in the direction of a position that, while not quite as extreme as Schrage’s, is clearly on that side of the spectrum as the “best practice” for effectively communicating the results of intelligence analysis to decisionmakers.

Much of the reason for using WEPs instead of numbers centers around the imprecise nature of intelligence analysis in general, coupled with the misunderstandings that could arise in the minds of decisionmakers if analysts used numbers to communicate their estimative judgments. A large part of the argument against WEPs, on the other hand, has to do with the imprecise meaning of the words themselves. In other words, what exactly does ‘likely” mean? That is where I intend to go next.

Tomorrow -- To Kent And Beyond!

Friday, January 11, 2008

Part 8 -- Confidence Is Not the Only Issue (The Revolution Begins On Page Five: The Changing Nature Of The NIE And Its Implications For Intelligence)

Part 1 -- Welcome To The Revolution
Part 2 -- Some History
Part 3 -- The Revolution Begins
Part 4 -- Page Five In Detail
Part 5 -- Enough Exposition, Let's Get Down To It...
Part 6 -- Digging Deeper
Part 7 -- Looking At The Fine Print

Part 8 -- Confidence Is Not the Only Issue

Some 29% of the sentences in the Iran National Intelligence Estimate (NIE) do contain Words of Estimative Probability (WEPs), however. As the chart below shows, this is pretty much in line with other NIEs. The chart outlines the number of uses of a particular word in an estimative sense in each of the eight NIEs I examined. Again, I only looked at the words in the Key Judgments (not in any of the prefatory matter or in any of the full text or appendices). The column on the far right shows the percent of the time a particular WEP showed up in NIEs generally. In other words, "probably" was used in 33 sentences and there were 263 sentences total in the 7 NIEs examined, so it showed up about 13% of the time. I am also well aware that such a simple review is fraught with difficulty given the complexity of the English language but, since I am only looking for broad trends, I believe that such a review is an appropriate method for analyzing the way in which these estimates were written and the way in which they are changing.

In fact, the Iran NIE is well within the range of other NIEs with respect to percent of sentences containing WEPs. Furthermore, the Iran NIE does not use any “unauthorized” WEPs. That is to say, only WEPs specifically listed on the Explanation of Estimative Language (EEL) page are used in the Iran NIE. This was not the case in previous NIEs which used (though not often) statements that were undefined at a minimum and misleading at their worst. Consider the use of “most likely” in the August 2007 update to “Prospects for Iraq’s Stability”:

We judge such initiatives are most likely to succeed in predominantly Sunni Arab areas, where the presence of AQI elements has been significant, tribal networks and identities are strong, the local government is weak, sectarian conflict is low, and the ISF tolerate Sunni initiatives, as illustrated by Al Anbar Province.

“Most likely” could mean many things in this context since there is no baseline probability with which to compare it. The initiatives referenced in the report could be likely to succeed or unlikely to succeed; the reader cannot know from the text. All we can know is that they are "most likely" to succeed in the predominantly Sunni areas. Other formulations, such as “much less likely” and “increasingly likely”, suffer from the same problem. “Not likely” is the only place where I am clearly quibbling as it is obviously synonymous with unlikely. I just think it is silly to state that the authors intend to use “unlikely” on page 5 (the EEL page) and then ignore that and use “not likely” in the text. If the two are truly synonymous then use the one you said you were going to use. If they aren’t synonymous, then explain the difference. You can’t have it both ways.

Beyond the mere use of WEPs, there also appears to be an issue with which WEPs predominate. Again, there is a strong pattern – the clear preference over the last 6 public NIEs for the use of the word “probably”. In fact 73% of authorized and 62% of all WEPs used in the last six NIEs are “probably”. It is also interesting to note that the only non-millennial NIE examined, the 1990 Yugo NIE did not use “probably” at all (whether this pattern holds and whether this was a good thing, I will leave to other researchers).

If the analysts involved in these estimates genuinely believe that all these events are “probable” and not somewhat more or less likely then there is little to discuss. The extreme overuse of the term suggests other explanations, however. "Probably" is arguably one of the broadest WEPs in terms of meaning (see Figure 1 in the paper linked here). Fairly clearly it means that the odds are above even chance but it seems open to interpretation from there.

Thus, analysts could be using "probably" as an analytic safe haven. Relatively certain that the odds are above 50% but unwilling to be more aggressive and use a phrase such as “highly likely” or “virtually certain” and unaware or unable to use expressions of confidence to appropriately nuance these more aggressive terms, these analysts default to “probably”. Since the NIE is a consensus estimate combining input from all 16 intelligence agencies, it is also possible that "probably" was the one word upon which everyone could agree; that it represents, essentially, a compromise position. Either way, such a move is “safe” in terms of getting the answer broadly correct but hurts the decisionmaker who, in the end, must take action and allocate resources. If analysts are more certain than they are willing to put in writing, the decisionmaker is deprived of the analysts’ best judgment and will arguably make less informed decisions.

(Note: The statistical analogy to the issue described above is the classic problem of calibration versus discrimination. For additional insights into this issue I refer you to Phillip Tetlock’s book Expert Political Judgment or to this site)

Monday: Part 9 -- Waffle Words And Intel-Speak

Wednesday, January 9, 2008

Part 6: Digging Deeper (The Revolution Begins On Page Five: The Changing Nature Of The NIE And Its Implications for Intelligence)

Part 1 -- Welcome To The Revolution
Part 2 -- Some History
Part 3 -- The Revolution Begins
Part 4 -- Page Five In Detail
Part 5 -- Enough Exposition, Let's Get Down To It...

Part 6 -- Digging Deeper

There are some disturbing trends in other numbers collected from the Iran National Intelligence Estimate. For example, 71% of the sentences in the Iran NIE contain one of the three statements “we assess”, “we judge” or “we estimate”. As you will recall, this is the way the Intelligence Community (IC) indicated it would preface its estimative conclusions. Compare this with the number of sentences with statements of confidence, i.e. 61%. I could see there being fewer sentences beginning with these three phrases (it would get tedious to constantly see “we assess", "we estimate" or "we judge” all the time) but how do you get more? That means that there are at least some estimates marked by the words that the community has stated it would use to mark such estimates that do not also contain statements of confidence.

Not that big of a deal, you say. OK, I agree, but consider this: Only 29% of the sentences in the Iran NIE contain Words of Estimative Probability (WEPs)! That means that there are some, perhaps many, sentences that indicate that they are estimative in nature but are missing one and perhaps both of the other two elements (WEPs or an assessment of confidence) that the Intelligence Community itself said it would use.

It makes my head hurt.

Let’s review the bidding: Up until the Iran NIE was released only several weeks ago, the IC was saying one thing and then doing another with regard to statements of confidence in their estimates. The Iran NIE dramatically reversed this trend and included statements of confidence in almost 2/3s of its sentences… but, while this is an undeniable improvement, there are still numbers that don’t add up.

Tomorrow: Part 7 -- Looking At The Fine Print

Monday, January 7, 2008

Part 4 -- Page Five In Detail (The Revolution Begins On Page Five: The Changing Nature Of The NIE And Its Implications For Intelligence)

Part 1 -- Welcome To The Revolution
Part 2 -- Some History
Part 3 -- The Revolution Begins

Part 4 -- Page Five In Detail

What, then, is so darn unique about page five? While the format and language of the “Explanation of Estimative Language” page (hereinafter the "EEL") has undergone some changes (for the better) over the last four publicly released National Intelligence Estimates (NIEs), all of the estimates that contain such a page make the same three key points:

First, the NIE is…well…an estimate. The authors intend this to be a probabilistic judgment, not a statement of "facts". This may seem obvious but, to many casual readers, there may still be this lingering impression that the CIA, NSA and the other 14 agencies that make up the National Security Community are omniscient. Sorry, not the case and the authors of the NIEs at the National Intelligence Council (NIC) want us to know it.

Second, there is a discussion of Estimates of Likelihood. Specifically, this section talks about what the intelligence community commonly calls Words of Estimative Probability (WEPs -- after the Sherman Kent article of the same name) and what linguistics professionals usually refer to as Verbal Uncertainty Expressions (Thanks, Rachel!). These are words, such as "likely", "probably", or "almost certainly", that convey a sense of probability without coming right out and saying “60%” or whatever.

Noted MIT scholar Michael Schrage came out quite forcefully against this type of estimative language in a Washington Post editorial in 2005. In the same article he spoke very favorably of using percentages and Bayesian statistical methods to get them. Despite this kind of criticism, the NIC , in the early versions of the EEL page, noted that, “Assigning precise numerical ratings to such judgments would imply more rigor than we intended”. While this language was dropped in the Iran NIE (probably due to space constraints), it likely continues to represent the NICs position.

Regardless of its desire to avoid numbers, the NIC still effectively benchmarks its WEPs in two ways. First, it makes it clear that words such as "probably", "likely", "very likely" and "almost certainly" indicate a greater than even chance (above 50%) while words like "unlikely" and "remote" indicate a less than even chance (below 50%). In addition, the NIC also provides a handy scale that, while it is devoid of numbers, clearly rank orders the WEPS in regular increments. While the rank ordering is more important than the actual increments, early versions have five increments implying roughly 20% intervals for each word. The most recent version in the Iran NIE has seven intervals (see the chart below) implying intervals of approximately 14%.

The EEL page also identifies the language the authors will use for improbable but potentially important events. These words and phrases include such old standards as "possible/possibly", "may", and "might" and phrases such as "we cannot dismiss" and "we cannot rule out".

I intend to write quite a bit about WEPs later on but one point is absolutely clear: This move towards consistency in the use of language is an incredibly positive step forward but the “poets” in the IC have only been defeated, not routed. Kent defined poets as the type of analysts who “… appear to believe the most a writer can achieve when working in a speculative area of human affairs is communication in only the broadest general sense. If he gets the wrong message across or no message at all-well, that is life.” There has been, as we will see in later posts in this series, either a real hesitancy or a real lack of understanding of the value of consistent terminology on the part of many analysts in the intelligence community.

Consistent terminology, however, is something that decisionmakers have been requesting from intelligence professionals for decades. Mercyhurst alumna, Jen Wozny, wrote a wonderful thesis on the topic (currently you can only obtain it through inter-library loan with the Hammermill Library at Mercyhurst), exploring what over 40 decisionmakers said they wanted from intelligence. One of the key requests, of course, was consistent terminology. I consider it likely that the potential for broader distribution brought on by the recent Congressional requests and the public scrutiny of these latest NIEs essentially forced the Intelligence Community to adopt the more or less consistent series of terms described above.

While it may seem ludicrous to many (especially in the business or scientific communities) that this was a real debate in the intelligence community, it was and, based on the differences between what the EEL page says and what was actually done (which will make up the bulk of the remaining posts in this series), it still is.

Third and finally, the EEL page explains what the NIC means when it talks about “confidence in assessments”. This concept is difficult to explain to most people and the NIC has not been very helpful with their brief discussion of the concept.

Confidence in an assessment is a very different thing than the assessment itself. Imagine two analysts working on the same problem. One is young, inexperienced, working on what is generally considered a tough problem on a tight time schedule. He is unfamiliar with a number of key sources and cannot adequately judge the reliability of the ones he does have. When pressed to make an estimate regarding this problem, he states that he thinks that “X is likely to happen”.

The second analyst is a seasoned analyst with adequate time to think about the problem and considerable experience in the subject in question. He knows where all the sources are and knows which ones are good and which ones are to be taken with a large grain of salt. He, too, states that he thinks, “X is likely to happen.” Both analysts have given the same assessment of the same problem. The level of confidence of the first analyst is likely much lower than the level of confidence of the second analyst, however.

The important thing to note is that the analyst is expressing confidence in his probabilistic assessment. In the first case the young analyst is essentially saying “I think X is likely but for a number of reasons, not the least of which is my own inexperience, I think that this assessment could be way off. If I knew just a little bit more, I could come back to you saying that X is anything from remote to virtually certain.” In the second case, the senior analyst would say, “I think X is likely, but because I know a lot about this problem and how to do analysis, I am fairly comfortable that X is likely and even if I went out and did more research, my estimate would still probably be, “X is likely”.

How does one determine a level of analytic confidence, though? What are the appropriate elements and how are they measured? How do you know when you have crossed the line from low to moderate and the line from moderate to high (the three levels of confidence used on the EEL page)? The discussion above suggests that there are a number of legitimate factors that analysts should consider before making a statement of analytic confidence. The EEL page, strangely, does not see it that way, preferring to tie it only to the quality of the information and the nature of the problem (presumably some sort of scale running from easy to hard).

Recent research by a Mercyhurst grad student (Thanks, Josh!) suggests that a number of things legitimately influence analytic confidence including, among others, subject matter expertise (though it is likely not as important as some people think), time on target, the use of structured methods in the analysis, the degree and way in which analysts collaborate on the product, etc. I suspect that the IC is well aware of at least some of these other elements of analytic confidence (I am hard pressed to imagine, for example, senior officials in the IC stating that the subject matter expertise of their analysts doesn’t matter in their calculation of confidence yet it is not mentioned as an element in the EEL page). I find it disingenuous that they do not list these broader elements that could impact analytic confidence.

Despite these caveats and the minor weaknesses, the EEL implies a fairly comprehensive vision of what I have begun calling a theoretically complete estimate. How might such an estimate appear? Something like, “We estimate that X is likely to happen and our confidence in this assessment is high.” Translated, this might look like, “We are willing to make a rough probabilistic statement (Point 1 in the EEL) indicating that we think alternative X has about a 60-75% chance of occurring (Point 2 in the EEL). Because we have pretty good sources and this problem is not that difficult we are very comfortable that the actual range might be a bit broader but we don't think it is by much (Point 3 in the EEL).”

Ideally, decisionmakers want to know the future with certainty. Despite what the cynics in the IC might say, realistic decisionmakers understand that intelligence professionals deal with unstructured and incomplete data, some of which is deliberately deceptive, concerning difficult and even intractable problems and that certainty, as an intelligence judgment, is impossible. Under these circumstances, the structure outlined in the EEL pages of these recent NIE's seems both reasonable and useful.

Tomorrow: Part 5 -- Enough Exposition! Let’s Get Down To It…

Sunday, December 2, 2007

Nixon Could've Invented The Internet And Other Gems (Secrecy News)

Secrecy News reports on the recent declassification and release of a previously Top Secret/Sensistive/Codeword document regarding the strengths, weaknesses and, surprisingly, the methods of preparation of the CIA's President's Daily Brief (PDB) under Nixon and Kissinger. Secrecy News highlights the inconsistencies of the CIA's position with regard to the declassification of PDBs (According to Meredith Fuchs at the National Security Archive (and quoted in the Secrecy News piece), "What is most amazing is that one day they say the method of producing [the PDB] is so secret that nothing about the document can be disclosed, and then not long after they release this detailed, hour by hour explanation of how it is produced...") but there are other golden nuggets of information in this document:

Policymaking vs. Intelligence. There is an extensive discussion about the relationship between the PDB and the NSC's own policy and analysis "Situation Room document" and the degree to which they overlapped and competed. Andrew Marshall, the author of the memo to Kissinger, summed up with the comment "the success of the Situation Room Product probably has driven the CIA's PDB out of the focus of the President's attention". Ouch!

Office Politics And Intelligence. Check out this quote: "This situation presents a number of awkward problems. The CIA is not likely to suggest stopping production of the PDB. CIA has a major institutional stake in the PDB. It will not give it up easily. Moreover, in a recent discussion with Jack Smith, he strongly expressed his view that the CIA people almost consider themselves almost as part of the President's staff. They have no other natural superior. I told him I thought that view somewhat unrealistic in organizational and bureaucratic terms. But nonetheless, it may be the view of some of them and suggestive of their likely reluctance to given up production of the PDB. Over time they are likely to find out about the current situation if it persists." The condescension is almost palpable here. It is interesting to note that this reaction was only relevant to Nixon. Apparently (according to the document) Kennedy and Johnson thought highly of the CIA product.

It is also worth noting the number of clear statements of likelihood in this paragraph. The intelligence community has wrestled with the question of Words of Estimative Probability for many years and I wonder if there is a correlation between how the CIA products were being written at the time and the desires of the decisionmakers -- particularly Kissinger. If Kissinger liked, for example, documents with clear statements of likelihood (whether that preference were implicit or explicit), you would expect to find that mirrored in his staff's reports and, more importantly, in his staff's selection of reports for the President to read. Perhaps it was the way they were written that kept them off the President's desk...

Lack of Feedback and Information Overload. Both of these topics are covered extensively in this document. Like WEPs, these two problems have a long history with the intelligence community and it is interesting to see a senior level staffer address them so directly.

Nixon and the Internet. One of the most interesting discussions comes at the end of the document where the author cautiously recommends a new sort of intelligence portal for Kissinger and the President:

Sounds a lot like the internet to me, complete with hyperlinks, etc. Apparently it did not happen at least partly because, as the report itself notes, "the balance of experience has been that top-level executives don't like gadgets."

Monday, November 12, 2007

U.S. Intelligence: Iran Possesses Trillions Of Potentially Dangerous Atoms (The Onion)

According to a recent report on Iran by the Onion, "the Middle Eastern nation has obtained literally trillions of atoms—the same particles sometimes used to make atomic bombs—for unknown purposes".

Excellent example of WEPs ad absurdum as well: "More alarming, officials said, is the "very likely" possibility that there are more atoms inside the laboratory."

Link

Sources And Methods