Luigi Maria Giordano Orsini Vincenzo Norman Vitale Francesco Cutugno

Large scale acoustic models: A new perspective

Are you already subscribed?
Login to check whether this content is already included on your personal or institutional subscription.

Abstract

Large Language Models (LLM), such as ChatGPT, generate texts answering to a prompt after being trained through exposition to a huge amount of texts. Similar approaches are applied in Automatic Speech Recognition (ASR) systems which are trained with unprocessed and unlabeled audio data without supervision. The deriving process recalls what a newborn could do to learn speech structure when immersed in the acoustic environment. In parallel with LLM, we refer to this architecture as Large Acoustic Models (LAM). Taking from psycholinguistics literature, we will draw a further parallel between modern ASR and human behaviors introducing the paradigm of artificial language learning. Lastly, a new approach to ASR will be presented, focusing on linguistic theories underlying natural speech.

Keywords

  • large language models
  • large acoustic models
  • artificial language

Preview

Article first page

What do you think about the recent suggestion?

Trova nel catalogo di Worldcat