Lexical ambiguity in contextualized word embeddings: A case study of nominalizations
Are you already subscribed?
Login to check
whether this content is already included on your personal or institutional subscription.
Abstract
In this paper we investigate the extent to which contextualized word embeddings can encode lexical ambiguity. Specifically, we focus on nominalizations in French, which constitute an interesting case for the study of ambiguity because of their frequent polysemy and their relationship with polyfunctional morphological processes. Given a random sample of occurrences of 90 nouns, we compute for each word the pairwise cosine similarity (SelfSim) among their token embeddings extracted from the pre-trained model FlauBERT and we test it as a predictor of the degree of ambiguity of nominalizations. For the evaluation we make use of a manual annotation of lexical ambiguity, testing different annotation strategies: defining word senses with different semantic classifications and granularities; annotating lexemes in isolation or based on a sample of tokens. Our findings contribute to the understanding of (i) the lexical semantic component of contextual embeddings, enhancing their interpretability, (ii) aspects of lexical ambiguity related to derivational semantics and to the contextual variation of meaning.
Keywords
- lexical ambiguity
- nominalization
- distributional semantics
- token embeddings
- semantic annotation