Faria, P. (2019). Aprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português Brasileiro. Diacrítica, 33(2), 229-251. https://doi.org/10.21814/diacritica.415
Abstract. A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.
Keywords: Language acquisition. Part-of-speech learning. Distributional analysis. Cognitive modelling.