by kseniaskriptchenko | Dec 30, 2022 | AI, BIAS, Diversity
Many industries, especially the media, are enthusiastic about using machine learning (ML) to enhance the analysis of large datasets. Annotated data is essential for machine learning and artificial intelligence (AI) training. While incorrectly annotated, poorly chosen,...
by Tugtekin Turan | Dec 13, 2022 | Speaker Diarization
Use of voice fingerprints in multi-speaker recordings — Speaker Diarization A journalist’s life is made easier if questions like these can be resolved right away: What was said when? How often does the person speak, and where exactly is the audio or video? SELMA...
by guntisbarzdins | Jun 30, 2022 | NLP
Training large neural models for speech and language processing (NLP), requires not only a lot of data input (read here on why and here on how SELMA is handling this ) but also a lot of computing resources. Nowadays, a fair share of computing resources are...
by kseniaskriptchenko | May 26, 2022 | AI, News, HLT
Machine learning requires large quantities of labeled training data (for more insights, read more in this post). That means, in order to reach acceptable performance, current speech recognition systems training demands thousands of hours of transcribed speech. For...
by Tugtekin Turan | Dec 24, 2021 | News, HLT
What does machine learning in the language field have to do with a cake, you might ask yourself? And how does it come that in the end we can produce better subtitles? Don’t look any further- read on! Let them be cake! Facebook’s AI Director Yann LeCun...