https://ai.facebook.com/blog/wav2vec-unsupervised-speech-recognition-without-supervision/
What the research is:
Whether it’s giving directions, answering questions, or carrying out requests, speech recognition makes life easier in countless ways. But today the technology is available for only a small fraction of the thousands of languages spoken around the globe. This is because high-quality systems need to be trained with large amounts of transcribed speech audio. This data simply isn’t available for every language, dialect, and speaking style. Transcribed recordings of English-language novels, for example, will do little to help machines learn to understand a Basque speaker ordering food off a menu or a Tagalog speaker giving a business presentation.
This is why we developed wav2vec Unsupervised (wav2vec-U), a way to build speech recognition systems that require no transcribed data at all. It rivals the performance of the best supervised models from only a few years ago, which were trained on nearly 1,000 hours of transcribed speech. We’ve tested wav2vec-U with languages such as Swahili and Tatar, which do not currently have high-quality speech recognition models available because they lack extensive collections of labeled training data.
Wav2vec-U is the result of years of Facebook AI’s work in speech recognition, self-supervised learning, and unsupervised machine translation. It is an important step toward building machines that can solve a wide range of tasks just by learning from their observations. We think this work will bring us closer to a world where speech technology is available for many more people.
Comments
Post a Comment