Russia’s Speech Technology Сenter is boasting about its strong score in the NIST’s most recent Speaker Recognition Evaluation (SRE). The Evaluation measures the accuracy of different voice recognition algorithms, and gauges their ability to identify a speaker over the phone, or based on a snippet of audio pulled from a recorded video.
The SRE also gives interested developers the chance to combine a voice recognition algorithm with a facial recognition algorithm to further improve identification in video feeds. The competition included Fixed and Open components, with the NIST providing the audio data for the Fixed category while allowing competitors to use audio data from any source in the Open.
“High-quality speaker recognition is essential for nationwide biometric systems,” said Speech Technology Center CEO Dmitriy Dyrmovskiy. “NIST SRE21 is the fifth competition in 2021 where Speech Technology Сenter solutions have been given a high score by a jury of international experts. We’re excited to take it to the next level by properly showcasing our core competencies on the global market.”
The Speech Technology Center called particular attention to its use of transformer and wav2vec machine learning models, claiming that it was one of the first teams to combine the two approaches. Transformer tech is used for computer vision and natural language processing, while wav2vec helps with speech recognition. Together, they allowed the Center to minimize identification errors when applying its comprehensive speaker identification solution.
SRE competitors were asked to identify English, Mandarin, and Cantonese speakers recorded over the phone, and to do the same in audio recorded on modern devices with built-in microphones. The Speech Technology Center noted that such technology can be deployed in voice assistants, and in call centers to provide businesses with better data and to enable a better customer experience for end users.
Moving forward, the Speech Technology Center will enter its solutions in the NIST’s Conversational Telephone Speech SRE, which looks at even more languages. The company itself is part of the broader Sberbank ecosystem, and provided the speech recognition technology for the SmartBadge recording solution from SberDevices.
January 31, 2022 – by Eric Weiss