Tuesday, June 22, 2010

1975 Experiments Showed Flaws Of A-F, 1-6 Rating System For Evaluating Accuracy, Reliability Of Intel Info(NTIS.gov)

A couple of weeks ago there was a discussion on the always interesting US Army INTELST regarding schemes for grading sources.  I pushed my own thoughts on this out to the list and published a link list on SAM that contained much the same information.

One of the topics that came up as a result of that discussion was "Whatever happened to the old A-F, 1-6 method for evaluating the accuracy and reliability of a source?"  Under this system, the reliability of a source of a piece of info was graded A-E, with "A" being completely reliable and "E" being unreliable.  "F" was reserved for sources where reliability could not be determined. 

Likewise, the info was graded for accuracy on a scale of 1-5 where "1" indicated that the info was confirmed and "5" indicated that it was improbable.  "6" was reserved for info the truth of which could not be judged.

Under this system, every piece of collected info had a unique identifier (B-3, C-2, A-1 -- now you know where that expression came from!) that supposedly captured both the reliability of the source and the accuracy of the info.

Except that it didn't work.

In 1975, Michael G. Samet conducted a series of experiments using the system for the US Army's Research Institute for the Behavioral and Social Sciences titled, Subjective Interpretation of Reliability And Accuracy Scales For Evaluating Military Intelligence.  I ran across it while doing some background research for the link list.  Unfortunately, the good people at NTIS had not had the time to scan this report and upload it yet.  Even more maddening was the fact that the abstract (the only thing available) included details about the study but not the !@#$ results.

So, I had to send away to NTIS for a hard copy.  I have uploaded it to Scribd.com to make this important piece of research more generally available.

The study asked about 60 US army captains familiar with the scoring system to evaluate 100 comparative statements.  The results were pretty damning:
"Findings of the present study indicate that the two-dimensional evaluation should be replaced because:
1.  The accuracy rating dominates the interpretation of a joint accuracy and reliability rating and
2.  There is frequently an undeniable correlation between the two scales."
You can read the full study below or download it from here

All of this raises another issue, though.  It seems that every 20 years or so the US national security intel community takes a crack at validating its methods and processes.  Sherman Kent talks about one such effort in the 50's and then, again, in the 70's and early 80s there seems to have been another attempt (the report referenced here is an example).  We seem to be entering into another such era given some of the language coming out of IARPA.

For some reason, however, just when things get good, the effort peters out.  When these efforts peter out in the intel community, however, the results become almost impossible to find.  Not having this research on hand and, frankly, online, means that the government will inevitably pay for the same research twice (the questions don't go away just because we forget what the answers are...) and researchers will be forced to start from scratch even though they don't have to.

I won't repeat my rant from a few days ago, but finding and keeping track of this kind of stuff seems to be a perfect task for academe and the kind of thing the DNI ought to fund (Hint, hint...).

Subjective Interpretation of Reliability and Accuracy Scales For Evaluating Military Intelligence
Enhanced by Zemanta

No comments: