Measuring the Influence of L1 on Learner English Errors in Content Words within Word Embedding Models

Abstract

Recent works in Second Language Acquisition Literature and Corpus Linguistics have shown the interference of a person’s first language (L1) when they process words in a new language. In this work, we build on the findings in two recent studies that explore the various differences in the lexico-semantic models of a person’s L1 and L2 (English in their case), and test their hypotheses within the framework of two popular word vector models. This test is carried out by extracting erroneous content word errors from an annotated corpus of essays written by learners of English who belong to 16 different first languages. Specifically, we compare the vectors representations of the incorrect and correct-replacement word pairs in English as well as in the person’s first language and find a moderate correlation between L1 and English. Additionally, we find certain inconsistencies between the two word embedding models when observed under the radar of language typology, suggesting new avenues for future work.

Publication
In 17th International Conference on Cognitive Modelling
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Kanishka Misra
Kanishka Misra
Research Assistant Professor at Toyota Technological Institute at Chicago

My research interests include Natural Language Processing, Cognitive Science, and Deep Learning.

Related