Is AI poised to dominate testing?

Page 22 Image 1 — Can educators extract value from AI testing?

Can computers replace human language testers? They are getting close, as John Roscoe finds out

The ascendance of artificial intelligence (AI) as the ultimate solution to assessment of English language proficiency has long been anticipated. While its arrival may be imminent, it is, as of yet, only at the threshold, according to the research findings of tech start-up, English without Borders.

The company has been researching AI products and services with a view to incorporating them into its platform, which is designed to assist business and industry in meeting their needs for an English-proficient workforce.

It sees AI as on the verge of conquering the final challenges to fully automated, objective evaluation of the English language proficiency of job candidates. An impressive number of Fortune 500 companies, including the likes of Microsoft, Adecco and Bombardier, are on the client lists of companies developing AI English evaluation. Even some national governments are offering the tests to employees.

The formats used in English testing via AI mirror traditional proctored, centre-based evaluations. They cover the four competencies: listening and reading comprehension, and spoken and written expression. They outshine the most widely-used traditional assessments in the ease and flexibility of the on-demand tests, a much lower cost and the instantaneous delivery of results.

I am a project contributor to English Without Borders research and development, having worked for two decades with English proficiency exams, including, Teofl, Toeic and Ielts, as well as preparing students for the English essay portion of the Brazilian Foreign Service Exam.

I was recruited to evaluate the accuracy of the AI tests, and the user interfaces. A sample of Brazilian English speakers, ranging from basic to highly advanced, were recruited to take the tests and native English speakers with graduate degrees also participated. Finally, project contributors took the tests themselves.

We evaluated several assessment suppliers. Privacy agreements mean I cannot name the companies, test content or mechanisms.

I can, however, report that interfaces varied. Some were slightly confusing and ill-timed, with little orientation offered to presumably nervous test-takers. Others were considerate and intuitive, giving test-takers adequate instruction and practice examples, along with sufficient time to digest instructions offered in the test-takers’ L1.

The time required to take the online tests averaged one hour; substantially less than most traditional assessments. While this did much to eliminate the possibility of errors caused by fatigue, there was a feeling that, in some areas, the depth of testing might be barely adequate.

“The test scores are valuable tools in a recruiter`s toolbox but shouldn`t 
be used as absolute determiners in choosing candidates.”

Some of the multiple-choice questions seemed arbitrary, with options that were synonymous and without discernible connotative differences. The identification of the passive-voice was tested, but not its use – the value of this in screening a potential employee seems unclear.

Overall, it emerged that AI appeared to corroborate previous subjective assessments of the Brazilian test-takers. However, it seemed to be inordinately harsh to all of the native-speakers in scoring. Even native-speaker consultants scored just 40 percent on the essay! Project team members agreed that the AI algorithm may have interpreted a challenge to the premise of the essay question as ‘irrelevant content’, whereas a human evaluator may not.

Grammatical assessments failed to note one mistake and did not accept an idiomatic inclusion. A second attempt, written in a more straightforward manner, scored 80 percent higher.

One company’s product offered compensatory features, including reproduction and markings of the test submissions for review by the client (not the test-taker). Its proficiency-level matrices could also be manually adjusted by the client, meaning it was not a one-size-fits-all evaluation.

The test scores are valuable tools in a recruiter’s toolbox but shouldn’t be used as absolute determiners in choosing candidates.

The fact that AI demonstrates deficits in assessing writing is well-known phenomena. Some time ago a provider of traditional English language tests had its AI essay evaluation algorithm ‘gamed’ by a prep-instructor, resulting in outrage, backlash and a renewed emphasis on human markers.

Newer technology appears on track to eliminate this problem. As any of the 500 million daily users of Google Translator know, AI language capacity has advanced by leaps and bounds, since Google went from Phrase-Based Machine Translation to Google Neural Machine Learning in 2008.

If test providers can similarly eliminate the problems in assessing writing, the advantages of AI will far outweigh the disadvantages, for both test-takers and test-givers.

One remaining caveat is the prevention of fraud and cheating offered by proctored tests at registered sites. While a valid concern, it is worth noting that in light of the Covid-19 quarantine, ETS is now offering off-site Toefl testing.

Image courtesy of NEEDPIX

Is AI poised to dominate testing?

Can computers replace human language testers? They are getting close, as John Roscoe finds out

Avallain Provides Research-Driven Insights into GenAI for Teaching and Learning

Special Supplement: working smarter

Read the evidence: books are best!

Latest Posts

Inside IATEFL: What you missed

US tariffs: Where do we go now?

Password reveals smarter way to test

TOEFL: Assessing the test

POPULAR POSTS

Inside IATEFL: What you missed

US tariffs: Where do we go now?

Password reveals smarter way to test

POPULAR CATEGORY