Storing Millions Of People’s Voices In A Voice-Recognition Database

Slate on software, already being sold to governments and corporations, making it possible to store and identify the unique sound of everyone’s speech. The obvious question is, can it be thwarted by pitch shifting or other modification?

Intercepting thousands of phone calls is easy for government agencies. But quickly analyzing the calls and identifying the callers can prove a difficult task. Now one company believes it has solved the problem—with a countrywide biometric database designed to store millions of people’s “voice-prints.”

Russia’s Speech Technology Center, which operates under the name SpeechPro in the United States, has invented what it calls “VoiceGrid Nation,” a system that uses advanced algorithms to match identities to voices. The idea is that it enables authorities to build up a huge database containing up to several million voices—of known criminals, persons of interest, or people on a watch list.

Alexey Khitrov, SpeechPro’s president, told me the company is working with a number of agencies in the United States at a state and federal level. Khitrov [also] divulge[d] that various versions of the company’s biometric technology are used in more than 70 countries and that the Americas, Europe, and Asia are its key markets.

The advance of a mass, countrywide voice recognition system raises some obvious concerns. Russian secret services watchdog reported earlier this year that Speech Technology Center’s products have been sold to countries including Kazakhstan, Belarus, Thailand, and Uzbekistan—hardly bastions of human rights and democracy. What if the VoiceGrid Nation system were in the hands of an authoritarian government? It has the technical capacity, for example, to store a voice-print of every single citizen in a country the size of Bahrain—with a population of 1.3 million.

17 Comments on "Storing Millions Of People’s Voices In A Voice-Recognition Database"

  1. I could defeat this simply by going into Elmer Fudd mode.
    Or in extreme cases, I could break out my Porky Pig impersonation.

    “Uh, why don’t you go, uh buh dee, uh buh dee, uh, go, uh buh dee, uh…go FUCK yourselves!”


  2. Ever heard of Google Voice? Though google says their use of your voicemail information is intended to be used for research to improve voice recognition technology.

    • Calypso_1 | Sep 21, 2012 at 5:06 pm |

      Same with Dragon dictation.

    • Calypso_1 | Sep 21, 2012 at 5:06 pm |

      Same with Dragon dictation.

      • I know it will be hard to credit this as anything more than an oft repeated urban legend, but I actually worked for a guy (a multi-millionaire, in fact) who was too busy to train his Dragonsoft himself and had his personal assistant do it.

        When he was trying to figure out why it still didn’t recognize his voice, even after the extensive training his secretary did with it, I helpfully suggested he try speaking with a Spanish accent like his secretary had.

    •  Yea, that “Google is your friend” meme creeps me out more by the day…

    •  fortunately for freedom
      Google voice sucks

    •  fortunately for freedom
      Google voice sucks

  3. the reason for all these Big Homelander is watching stories is
    they can’t watch everybody all the time
    so they create a boogeyman to create an impression they are
    but they can’t parse the voluminous data effectively
    the intelligence they gather is obviously worthless
    they have to invent conspiracies & crimes to justify their crimes & conspiracies

  4. There is, after all, a reason that Anonymous generally uses something like Cepstral David TTS.

  5. I don’t buy it. The frequency bandwidth of PSTN is from about 300 hz to 3.4 khz. That’s about three and a half octaves out of the full 10 or so that human hearing and CD quality audio ranges over. If the human brain has trouble parsing that with a single source, which it does which is why in the era before caller ID we always had to introduce ourselves over the phone and people were always assuming people’s teenage sons were their fathers, I don’t foresee the mass data processing picking out these heavily bandlimited examples and fingerprinting them in any reliable way.

    • Calypso_1 | Sep 24, 2012 at 11:48 am |

      First take a little tour of Mr. Fourier & his fantastic transforms.  Consider also the numerous examples of text-dependent samples we all provide while navigating various voice interaction systems that ‘may be recorded for quality purposes’.  These alone provide potentially known matches with your identity. 
      Everything else becomes statistical verification matches once you have established biometrics.

Comments are closed.