Making “sense” of speech with Curriculum Learning Methods

Humans need about two decades to be trained as fully functional adults of our society. Quite some time and still pretty fast in contrast to where we are, if we want to copy the learning curve from a human to a computer. How can the machine learn like a child?

In SELMA, we are working on creating a knowledge base without the limitation of language barriers, for example when almost no training data is available. We are using a direct method (a so-called “end-to-end-approach”) to get semantic concepts from the speech audio signal. 

Since we aim to build a multilingual platform with more than 30 low-resourced and high-resource languages, we need to think of techniques that include all languages, no matter how small and rare they are. To overcome the lack of training data for many spoken languages, we use the transfer learning strategy (learning from a high-resourced language to a lower-resourced language) based on the principles of curriculum learning.

Learn like a child – The Origin of Curriculum Learning

That training is highly organized and based on an education system and a curriculum that introduces different concepts. It also exploits previously learned concepts to ease the learning of new abstractions. By choosing the examples and in which order to present them to the learner, the training can be guided accordingly to remarkably increase the speed at which learning effects occur. 

As early as 1993, research at the intersection of cognitive science and machine learning has raised the question of whether the machine learning algorithms may benefit from a similar training strategy as a human child. 

Nowadays, this approach is widely accepted and established. The idea of curriculum machine learning states:

Start small, learn easier aspects of the task or easier sub-tasks, and then gradually increase the difficulty level.

Basically, this approach consists of ordering training samples without modifying the neural architecture during the training process.

In SELMA, we adopt this approach to design a sequence of transfer learning processes, from a general task to the target specialized task. Commonly, a large amount of data is needed to train a spoken language understanding end-to-end neural model from speech. Applying transfer learning allows us to deal with the lack of data related to the target task.

Do you want to know more about this method? Read: 

  • Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in International Conference on Machine Learning, Montreal, Canada, 2009, pp. 41–48)

Some more terms – SLU, NLP, ASR 

SLU stands for spoken language understanding and is a term that refers to different natural language processing (NLP) tasks applied to spoken language. For instance, these tasks can be named entity extraction such as names, places, and persons. Better quality of automatic transcriptions leads to better automatic speech recognition (ASR) and altogether to better performance of the whole NLP system. That’s why thanks to the significant advance in automatic speech recognition, mainly due to advances on deep neural networks for both acoustic and language modeling, the performance of SLU systems has made massive progress in the last few years.

Article extracted from Caubrière, A., Tomashenko, N., Laurent, A., Morin, E., Camelin, N., & Estève, Y. (2019). Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability. In Proceedings of Interspeech 2019, Graz, Austria