How do human children acquire their mother tongue, and what are the mechanisms behind it? It has been an intriguing and yet hard subject to study among the other language study areas since babies before and during the language acquisition still do not have the language skill to verbalize experimental requirements. Therefore, language scientists need to rely on other factors to study language acquisition. One such measure is the age when children produce words; the word's Age of Acquisition (AoA). Knowing which words are learned when can potentially help to categorize them and investigate possible correlations between different linguistic characteristics of different words and ultimately helps us better understand language learning in general. Hence, different computational language modeling studies have tried to create effective models of AoA prediction. Some suggest lexical difficulty measures for isolated words to predict AoA with and few recent pieces of research also investigate sequential predictability of words in child-directed speech as a proxy for syntactic contexts. However, the features of the word's meaning representation embeddings have not been directly addressed, and we do not have a clear understanding of what makes a better vector representation of a word for predicting AoA. In this thesis, I train a language model using an LSTM neural network trained on child-directed speech and having other word difficulty values collected by other studies, make an AoA prediction for a set of test words. Eventually, I run an analysis of the density of the semantic space shaped by the word embedding of the LSTM model. My results suggest that first, there is a negative correlation between the Density of the word embedding space—resulted from an LSTM model of sequential predictability—And the AoA prediction accuracy. Second, the word embedding space of nouns is consistently denser than predicates and function words over distinct learning episodes, and therefore nouns obtain a weaker AoA prediction suggesting there is a clear difference between word embedding spaces made by different lexical categories. These results could help future studies in this area to make more effective computational models estimating AoA, which in turn will contribute to the understanding of the language acquisition.Published on: University of Trento
ResourcesOn publisher website Read the Paper Code
Alireza Mahmoudi Kamelabad (2020). Computational Models of Lexical Acquisition: Predicting the Age of Acquisition of Different Grammatical Classes. In Library of university of Trento. University of Trento.
Download Bib File