A new vocabulary list reflects how learners acquire additional words
The order of words acquired by non-native learners is usually assumed to reflect word frequency in the English language generally, but when Belgian researchers tested word recognition directly and compiled lists of word families reflecting what students actually know, their lists suggested other strong influences on which words are learned and in what order.
There have been many attempts to compile lists of vocabulary appropriate to various levels of language proficiency. These lists are often organised into levels of word families. For example, the Common European Framework of Reference (CEFR) has six proficiency levels (A1-C2).
What these lists have in common is that they’re compiled from the top down by researchers using word frequency to decide at which language learning level those words/word families should be known.
The Belgian study took a very different, bottom-up approach, testing language learners directly to find out which words they actually know.
An internet test was made freely available (see below), with each testing round comprising 70 random, real English words and 30 random, made-up but plausible words. The participants answered yes/no, indicating recognition of the word, while the inclusion of the nonsense words weeded out those who claimed to recognise words falsely.
Participants were also asked for personal information, including choosing from five proficiency levels, from ‘I know a few words’ to ‘It is my mother tongue’. Feedback scores were given and the test proved very popular, generating 17 million responses for the study.
Many more responses were actually collected, but only the first three tested by each user were used (some users tested hundreds of times) and anyone answering ‘yes’ more than twice to the nonsense words was excluded.
Participant profiles were very diverse, representing 150 mother tongues and all educational levels, with a mean age of 30. The most common mother tongues were Polish, Hungarian, German, Polish, Dutch and Chinese.
Analysing the responses led to a ranked list of 62,000 words. Of these, 114 were known to all participants, but were still ranked by responding time (eg, nouns in ranked order: ‘coffee, water, music, radio’ etc). Some less frequent, language-class related words also appear on this list: ‘subject, verb, vocabulary’. A further 331 words had only one ‘no’ response.
Some words would not have been predicted from their frequency in the English language generally and some were even better known in English than in the first language. These latter tended to be academic words, such as ‘informatics’, and cognates from the mother tongue or other second language, such as ‘paracetamol’.
Words that were comparatively less well-known than expected included more informal and child-friendly vocabulary, such as ‘tadpole’ and ‘dunce’.
The list of 62,000 words were organised into 20,000-word families, which is a more useful compilation for teachers and learners (see link below). The families were ranked by the best- known member of the family, and include inflections and derivations based on suffixes. For example, the family ‘correction’ includes ‘correctly’ and ‘corrective’, but not ‘incorrect’.
Overall, English language word frequency was only able to account for 46% of the variance in the likelihood that learners knew a word.
Apart from academic-related biases in vocabulary, other influences include the motivation of the learner and how interesting the vocabulary is to them. This may explain why so many learners knew words such as ‘snowboarding’ and ‘sexy’. It is also likely that sources of English outside the classroom, such as TV, film and social media, are highly influential.
These lists, being derived from what learners actually know, can help teachers and examiners to assess text difficulty more realistically and better anticipate challenges, such as otherwise unexpected gaps in vocabulary knowledge.
The vocabulary test can be found at http://vocabulary.ugent.be Wordlists are available at OSF | WordlevelsforEnglishL2speakers based on accuracy and response times in a yes/no vocabulary test with62thousandwords.
The lists are copyright protected under the Creative Commons license CC BY-NC (Attribution- NonCommercial). They can be used freely for research and education, but not for commercial puposes.
REFERENCE
■ Brysbaert, M., Keuleers, E. and Mandera, P. (2021), ‘Which words do English non-native speakers know? New supernational levels based on yes/no decision’. Second Language Research https://doi. org/10.1177/0267658320934526