Take a fingerprint... for that matter, go ahead and take a palm print. Now, take a voiceprint. In this day and age, forensic biometric analysis is extraordinarily complex. In a world where we analyze everything from irises to earlobes, what can science tell us about voice?
One increasingly popular form of analysis is forensic speaker recognition (aka voice biometrics or biometric acoustics). Forensic speaker recognition (FSR) has unequivocal potential as a supplementary analytic methodology, with applications in both the fields of law enforcement and counterterrorism (for details, see the last section of the 2012 book on FSR Applications to Law Enforcement and Counter-terrorism).
The utility of the FSR process is either one of identification (1:N or N:1) or verification (1:1).
- 1:N Identification -- Imagine you have a recording of a voice making threats over the phone. The speaker identification process allows you to query a database of acoustic recordings of known suspects for comparison against your target voice to identify more threats he/she might have made.
- N:1 Identification -- Imagine you have a bunch of voice recordings and you want to know in which of them, if any, a certain speaker participates.
- 1:1 Verification -- Imagine you wish to grant someone access to a building or secure location by assessing whether or not they are who they say they are (this aspect of speaker recognition is less applicable to analysis and more applicable to security).
(Note: For those who are more acoustically inclined and would enjoy a well-written read on all things acoustic from military strategy to frog communication, Seth Horowitz's new book The Universal Sense: How Hearing Shapes the Mind comes with my highest recommendation.)
"The term voiceprint gives the false impression that voice has characteristics that are as unique and reliable as fingerprints... this is absolutely not the case."The thing about voices is that they are susceptible to a myriad of external factors such as psychological/emotional state, age, health, weather... the list goes on. From an application standpoint, the most prominent of these factors is intentional vocal disguise. There are a number of things people can intentionally do to their voices to drastically reduce the ability of machine or human expert to identify their voice correctly (you would be amazed at how difficult it is - nearly impossible - to identify a whispered voice). Under these conditions, identification accuracy falls to 40 - 52 percent (Thompson 1987), 36 percent (Andruski 2007), 26 percent (Clifford 1980).
Top: Osama bin Laden's "dirty" 2003 telephonic spectrogram Bottom: Osama bin Laden's "clean" spectrogram Source: Owl Investigations |
More problematic still is communication by telephone. Much of the input law enforcement and national security analysts have to work with comes from telephone wiretaps or calls made from jail cells. Telephones, cellphones in particular, create a filtering phenomenon of an acoustic signal, whereby all acoustic information under a certain frequency simply does not get transmitted (within this frequency range lie some of the key characteristics for voice identification).
4 comments:
As a Bayesian, I'm quite happy to suggest we fuse the results of several techniques. :-)
But with FSR alone, how age-stable is current voiceprint recognition compared to current fingerprint recognition?
Oh no Kris, please don't fall for this. It's well proven that voiceprints are not accurate. Spectrographic analysis only gives you a picture of what sound is being uttered. No unique acoustic identifier exists for people's voice...there's been a ton of research done on it. You can google Dr. Harry Hollien and read some of his information/research into voiceprints because he's the authority in the field. That so-called science is literally a quack. Check out this article for a start: http://news.google.com/newspapers?nid=888&dat=19750218&id=NEFSAAAAIBAJ&sjid=eHkDAAAAIBAJ&pg=7011,2840620
CT - Fusing the results of several techniques is the goal here. As stated, FSR is no where near accurate enough to be the only methodology behind a claim or the only piece of evidence behind a conviction. It is viable and extremely promising as a supplement to other techniques. One of its biggest drawbacks is that it isn't age-stable at all. Your voice changes drastically until after puberty, and then continues to change in minimal yet significant ways in accordance with age, health, psychological/emotional state, weather, etc. There are too many external factors that influence voice. Bottom line, it's not a fingerprint.
RK - I agree that FSR is not perfect (hence why I try very hard in this post to present it as a supplementary methodology), but calling it a "quack" is definitely a stretch. The thing about FSR is that it encompasses so many different approaches, it is hard to boil down to one single thing. Spectrographic analysis alone is, as you say, highly inaccurate, because there are no clear cut acoustic identifiers for an individual human voice. Human expert acoustic analysis, however, yields more promising results, and automated speaker recognition technology has come a LONG way since the early 2000s (just look at work on MatLab, etc.). Ultimately, it is a methodology that, depending on how and when it is used, can contribute greatly to ongoing analysis. Equally as important as knowing when to use it is recognizing when NOT to use it (when its shortcomings outweigh its potential benefit).
http://hltcoe.jhu.edu/
Post a Comment