Sources And Methods: Biometrics

Wednesday, August 7, 2013

Is Forensic Speaker Recognition The Next "Fingerprint?"

Take a fingerprint... for that matter, go ahead and take a palm print. Now, take a voiceprint. In this day and age, forensic biometric analysis is extraordinarily complex. In a world where we analyze everything from irises to earlobes, what can science tell us about voice?

One increasingly popular form of analysis is forensic speaker recognition (aka voice biometrics or biometric acoustics). Forensic speaker recognition (FSR) has unequivocal potential as a supplementary analytic methodology, with applications in both the fields of law enforcement and counterterrorism (for details, see the last section of the 2012 book on FSR Applications to Law Enforcement and Counter-terrorism).

The utility of the FSR process is either one of identification (1:N or N:1) or verification (1:1).

1:N Identification -- Imagine you have a recording of a voice making threats over the phone. The speaker identification process allows you to query a database of acoustic recordings of known suspects for comparison against your target voice to identify more threats he/she might have made.
N:1 Identification -- Imagine you have a bunch of voice recordings and you want to know in which of them, if any, a certain speaker participates.
1:1 Verification -- Imagine you wish to grant someone access to a building or secure location by assessing whether or not they are who they say they are (this aspect of speaker recognition is less applicable to analysis and more applicable to security).

That said, the CIA, the NSA and the Swiss IDIAP all turned to automatic speaker verification systems in 2003 to analyze the so-called Osama tapes (for details of the approach, see Graphing the Voice of Terror). This case provides an excellent opportunity to note the distinction between automatic speaker recognition performed by an algorithmic machine and aural speaker recognition performed by acoustic experts.

The cornerstone methodology supporting forensic speaker recognition is voiceprint analysis,or spectrographic analysis, a process that visually displays the acoustic signal of a voice as a function of time (seconds or milliseconds) and frequency (hertz) such that all components are visible (formants, harmonics, fundamental frequency, etc.).

(Note: For those who are more acoustically inclined and would enjoy a well-written read on all things acoustic from military strategy to frog communication, Seth Horowitz's new book The Universal Sense: How Hearing Shapes the Mind comes with my highest recommendation.)

Spectrographic analysis differs from human speaker recognition in that it provides a more quantifiable comparison between two speech signals. Under favorable conditions, both approaches yield favorable results: 85 percent identification accuracy (McGehee 1937), 96 percent accuracy (Epsy-Wilson 2006), 98 percent accuracy (Clifford 1980), 100 percent accuracy (Bricker and Pruzansky 1966). These approaches, however, do not come without caveats.

Forensic speaker recognition has many limitations and is currently inadmissible in federal court as expert testimony. Bonastre et al (2003) summarize these limitations quite well:

"The term voiceprint gives the false impression that voice has characteristics that are as unique and reliable as fingerprints... this is absolutely not the case."

The thing about voices is that they are susceptible to a myriad of external factors such as psychological/emotional state, age, health, weather... the list goes on. From an application standpoint, the most prominent of these factors is intentional vocal disguise. There are a number of things people can intentionally do to their voices to drastically reduce the ability of machine or human expert to identify their voice correctly (you would be amazed at how difficult it is - nearly impossible - to identify a whispered voice). Under these conditions, identification accuracy falls to 40 - 52 percent (Thompson 1987), 36 percent (Andruski 2007), 26 percent (Clifford 1980).

Top: Osama bin Laden's "dirty" 2003 telephonic spectrogram
Bottom: Osama bin Laden's "clean" spectrogram
Source: Owl Investigations

More problematic still is communication by telephone. Much of the input law enforcement and national security analysts have to work with comes from telephone wiretaps or calls made from jail cells. Telephones, cellphones in particular, create a filtering phenomenon of an acoustic signal, whereby all acoustic information under a certain frequency simply does not get transmitted (within this frequency range lie some of the key characteristics for voice identification).

While the forensic speaker recognition capability has come a long way since 2003, the consensus among the analytic community remains that it is not a stand-alone methodology, rather a promising supplementary tool. Biometric analysis was also a topic brought to the Intelligence Technology panel of the 2013 Global Intelligence Forum conference this year. Of note was the expanding applicability and increasing capabilities of all biometric technologies.

Thus far, the Spanish Guardia Civil is the only law enforcement agency worldwide to have a fully-operational acoustic biometric system (called SAIVOX, the Automatic System for the Identification of Voices). In the Spanish booking process, just like we take fingerprints, they take voice samples that they then contribute to a corpus of over 3,500 samples linked with well-known criminals and certain types of crime.

In 2011, the FBI commissioned NIST to launch a program on "investigatory voice biometrics." The goal of the committee is to develop best practices and collection standards to launch an operational voice biometric system with robust enough corpora so as to serve as a useful tool in ongoing investigations, modeled off the Spanish system. (This is an ongoing project and you can read the full report here).

FSR is not a perfect methodology, but one that can add substantial value on a case-by-case basis. It is of high interest to the US national security and law enforcement analytic communities.

Additional reading:

Andruski, J., Brugnone, N., & Meyers, A. (2007). Identifying disguised voices through speakers' vocal pitches and formants. 153rd ASA meeting.

Bonastre, J. F., Bimbot, F., Boe, L. J., Campbell, J. P., Reynolds, D. A., & Magrin-Chagnolleau, I. (2003). Person authentication by voice: A need for caution. Eurospeech 2003.

Bricker, P.D., & Pruzansky, S. (1966). Effects of stimulus content and duration ontalk identification. The acoustical society of the Americas, 40, 1441-1449.

Clifford, B. R. (1980). Voice identification by human listeners: On earwitnessreliability. Law and human behavior, 4(4), 373-394.

Epsy-Wilson, C. Y., Manocha, S., & Vishnubhotla, S. (2006). A new set of features fortext-independent speaker identification.

McGehee, F. (1937). The reliability of the identification of the human voice. Journal of general psychology, 31, 53-65.

Parmar, P. (2012). Voice fingerprinting: Avery important tool against crime. J Indian academy forensic med.,34(1), 70-73. doi: 0971-0973

Thursday, October 21, 2010

The Gartner Hype Cycle: An Interesting Way To Think About The "Next Big Thing" In Tech (Gartner.com)

Every year I look forward to seeing the latest editions of a number of regularly published analytic reports. The DNI's Annual Threat Assessment and Transparency International's Corruption Perception Index fall into this category. Even the Aon Terrorism Threat Map, while not an annual publication, satisfies my itch for a regular update on the state of affairs within that functional area.

When it comes to technology trends, however, the best such product I know of is Gartner's annual "Hype Cycle" chart. Gartner is a large and well respected research company that tracks all sorts of technologies.

Their experience has been that new technologies follow a more or less predictable pattern over time that is best measured by the amount of "hype" (i.e. inflated expectations) associated with a particular technology. You can see the current version of the hype cycle below (and can get more detailed information about the cycle, the methodology and additional findings at Gartner's website):

For example, if you look at the image above you can see that biometric identification has exited the "trough of disillusionment" and has entered the "slope of enlightenment". For many inside the intel community, biometric devices are old hat but what the hype cycle seems to be saying is that these technologies are about to become old hat for all of us...

One of the surprises for me was to see predictive analytics so far out on the hype cycle. Of course, then I think about Hunch's Predict-o-matic (available only to Facebook users, unfortunately, and which scared the be-jeesus out of me...) or articles like this one and I understand exactly what they mean.

Even more interesting are those items at the top of the hype cycle; stuff like cloud computing, 3D flat panel displays and augmented reality. If Gartner is right, then, in the very near future, we should start to see mainstream news articles trashing these technologies not as the "next big thing" but as the most recent tech flop.

My favorite part of the hype cycle is the stuff entering in from the left hand side, the technologies that are just beginning to climb the first steep curve of unreasonable expectations. Here we find the way-out technologies -- autonomous vehicles and computer-brain interfaces.

I like to point out to students that these are the technologies that they will have to deal with over the course of their careers; that they will fight with their children not over whether they get earrings in their ears but whether they will get chips in their brains.

Sources And Methods

Wednesday, August 7, 2013

Is Forensic Speaker Recognition The Next "Fingerprint?"

Thursday, October 21, 2010

The Gartner Hype Cycle: An Interesting Way To Think About The "Next Big Thing" In Tech (Gartner.com)

I want to use some material on this blog...

SAM's Twitter Feed

Popular Posts

Blog Archive

About Me

Career Advice!

CVTV

Strawman