Large scale acoustic models: A new perspective
Are you already subscribed?
Login to check
whether this content is already included on your personal or institutional subscription.
Abstract
Large Language Models (LLM), such as ChatGPT, generate texts answering to a prompt after being trained through exposition to a huge amount of texts. Similar approaches are applied in Automatic Speech Recognition (ASR) systems which are trained with unprocessed and unlabeled audio data without supervision. The deriving process recalls what a newborn could do to learn speech structure when immersed in the acoustic environment. In parallel with LLM, we refer to this architecture as Large Acoustic Models (LAM). Taking from psycholinguistics literature, we will draw a further parallel between modern ASR and human behaviors introducing the paradigm of artificial language learning. Lastly, a new approach to ASR will be presented, focusing on linguistic theories underlying natural speech.
Keywords
- large language models
- large acoustic models
- artificial language