Blogs

A blonde woman sat at a computer with headphones on in a room with more computers and desks in background

English certification and assessment
Technology and the future

Computer-based language assessment: the future is here

By David Booth

Many people are surprised at the idea of a computer program marking an exam paper. However, computer-based testing already exists in many different formats and many different areas. Many tests or exams that form part of our daily life are taken on computers. If you’ve ever learned to drive, sat a citizenship test, done a training course at work, or completed a placement test for a language course, the odds are that you’ve already taken an automated test.

Yet despite it being so common, there is still a lack of understanding when it comes to computer-based language assessment and how a computer can evaluate productive skills like speaking and writing.

Computer-based testing: a closer look

A common issue is that people have different ideas of what these tests entail. Computers can fulfill several essential roles in the testing process, but these often go unacknowledged. For example, a variety of test questions are needed to administer an exam, along with relevant data, and computers are used to store both the questions and the data. When it comes to creating randomized exams, computer software is used to select the exam questions, based on this data.

Computers can make complex calculations far more quickly and accurately than humans. This means that processes that previously took a long time are completed in days, rather than weeks.

Artificial intelligence (AI) technology is now capable of grading exam papers, for example. This means a shorter wait for exam results. In PTE Academic, candidates receive their results in an average of two days rather than waiting weeks for an examiner to mark their paper by hand.

The benefits for students and teachers

People take exams to prove their skills and abilities. Depending on their goals, the right result can open the door to many new opportunities, whether that is simply moving on to the next stage of a course, or something as life-changing as allowing you to take up your place on a university course in another country.

A qualification can act as a passport to a better career or an enhanced education, and for that reason, it’s important that both students and teachers can have faith in their results.

Computer programs have no inherent bias, which means that candidates can be confident that they will all be treated the same, regardless of their background, appearance or accent. PTE Academic, just one of Pearson’s computer-based exams, offers students the chance to score additional points on the exam with innovative integrated test items.

This integration means that the results are a far more accurate depiction of the candidate’s abilities and provide a truer reflection of their linguistic prowess.

How technology enhances language testing

The development of automated testing technologies doesn’t merely make the examination process quicker and more accurate – it also gives us the chance to innovate. Speaking assessments are an excellent example of this.

Previously, this part of a language exam involved an interview, led by an examiner, who asked questions and elicited answers. But now that we have the technological capability, using a computer offers students the chance to be tested on a much wider range of speaking skills, without worrying about the inherent bias of the examiner.

Indeed, the use of a computer-based system facilitates integrated skills testing. Traditionally, language exams had separate papers focusing on the four skills of reading, listening, speaking and writing. But the more modern concept of language testing aims to assess these linguistic skills used together, just as they are in real-life situations.

Afterwards, the various scores are categorized to allow learners an insight into their strengths and weaknesses, which helps both students and teachers identify areas which need improvement. This useful feedback is only possible because of the accuracy and detail of automated exam grading.

The space race on paper

Back in the 1960s, during the space race, computers were still a relatively new concept. Kathleen Johnson, one of the first African-American women to work for NASA as a scientist, was a mathematician with a reputation for doing incredibly complex manual calculations. Although computers had made the orbital calculations, the astronauts on the first space flight refused to fly until Kathleen had checked those calculations three times.

This anecdote reminds us that - although computer technology is an inherent part of everyday life - now and then, we still need to check that their systems are working as they should. Human error still comes into play – after all, humans program these systems.

PTE Academic – a fully digitized exam

Every stage of PTE Academic, from registration to practice tests to results (both receiving and sharing them with institutions) happens online. It may come as a surprise to learn that the test itself is not taken online. Instead, students attend one of over 295 test centers to take the exam, which comes with the highest levels of data security.

This means that each student can sit the exam in an environment designed for that purpose. It also allows the receiving institutions, such as universities and colleges, to be assured of the validity of the PTE Academic result.

The future is here

We created computers, but they have surpassed us in many areas – exam grading being a case in point. Computers can score more accurately and consistently than humans, and they don’t get tired late in the day, or become distracted by a candidate’s accent.

The use of AI technology to grade student responses represents a giant leap forward in language testing, leading to fairer and more accurate student results. It also means more consistency in grading which benefits the institutions, such as universities, which rely on these scores to accurately reflect ability.

And here at Pearson, we are invested in staying at the cutting edge of assessment. Our test developers are incorporating AI solutions now, using its learning capacity to create algorithms and build programs that can assess speaking and writing skills accurately and quickly. We’re expanding the horizons of English language assessment for students, teachers and all the other professionals involved in each stage of the language learning journey.

February 28, 2022

People sat in chairs doing various things like working on a laptop; sat in one of those seats is a cartoon robot

Business and employability
Technology and the future

English for employability: What will jobs be like in the future

By Pearson Languages

What do driverless car engineers, telemedicine physicians and podcast producers have in common? About 10 years ago none of these positions existed. They are representative of a new technology-driven marketplace, which is evolving faster than employers, governments and education institutions can keep up.

As new jobs appear, others fall by the wayside. Today, it’s estimated that up to 50% of occupations could be automated with currently available technology. Routine jobs like data entry specialists, proofreaders, and even market research analysts are especially at risk of becoming redundant within the next 5 to 10 years. Globally, that means between 400 and 800 million workers could be displaced by automation technology by 2030, according to McKinsey.

Moreover, 65% of today’s young people will need to work in areas that do not exist in the current market. The question is, what can we do to prepare learners for a future when we have no idea what jobs they’ll be doing? Mike Mayor and Tim Goodier discuss this uncertain future and explain why English for employability is such a hot topic right now.

A rising level of English and employer expectations

Mike Mayor, Director of the Global Scale of English at Pearson, explains that while he believes employability has always been a factor in English language education, it has become more important and more of a focus for students looking to enter the workforce.

“Expectations of employers have risen as proficiency in English language, in general, has risen around the world,” he says. “They’re now looking for more precise skills.”

Tim Goodier, Head of Academic Development at Eurocentres, agrees. He explains that English language education is primarily about improving communication and soft skills – which is key for the jobs of 2030 and beyond.

“There’s a convergence of skills training for the workplace and language skills training,” Tim says. “The Common European Framework of Reference (CEFR) has recognized and, in many ways, given a roadmap for looking into how to develop soft skills and skills for employability by fleshing out its existing scheme – especially to look at things like mediation skills.”

How the Global Scale of English and CEFR have surfaced employability skills

The Global Scale of English (GSE) is recognizing this increasing prominence of English for employability. Mike explains that it’s doing this “by taking the common European framework and extending it out into language descriptors which are specific for the workplace.”

In developing a set of learning objectives for professional learners, Mike and his team have given teachers more can-do statements. “They are able to create curricula and lessons around specific business skills,” he says.

Tim comments that one of the most interesting things about the GSE is that it links can-do statements to key professions, which he explains “is another extension of what these can-do statements can be used for – and viewing competencies as unlocking opportunity.”

Showing how these skills and competencies relate to the real world of work can be a strong motivating factor for learners.

He says that teachers need to visualize what success will look like in communication “and then from there develop activities in the classroom that are authentic.” At the same time, he says that activities should be personalized by “using the learners’ own interests and adapting the course as much as possible to their future goals.”

Preparing students for the future workplace

Speaking on the role of publishing in English for employability, Mike says:

“I would say as course book creators we actually incorporate a lot of these skills into our materials, but… I think we could do to push it a little further.”

In Mike’s view, educators need to do more than teach the skills, they need to raise awareness of their context. In other words why these skills are important and how they will help them in authentic situations both in and out of the work environment.

Beyond teaching the language itself, he says publishers should be helping teachers ask:

Are the students participating fairly in group discussions?
Are the students actively listening?
Are they interrupting politely?

These skills “don’t come naturally, and so just to begin raising awareness would be an added value,” he says.

Future skills: careers in 2030

In the same way we didn’t know that driverless cars would become a reality 10 years ago, we cannot say with absolute certainty which professions will arise and which will disappear. However, using tools like the GSE teacher toolkit, we can help our students develop the language and soft skills they need to navigate an ever-shifting job market. The future is an exciting place, let’s help our learners prepare themselves!

Watch the full interview with Mike and Tim below:

February 9, 2021

English language testing
Technology and the future

Explaining computerized English testing in plain English

By Pearson Languages

Research has shown that automated scoring can give more reliable and objective results than human examiners when evaluating a person’s mastery of English. This is because an automated scoring system is impartial, unlike humans, who can be influenced by irrelevant factors such as a test taker’s appearance or body language. Additionally, automated scoring treats regional accents equally, unlike human examiners who may favor accents they are more familiar with. Automated scoring also allows individual features of a spoken or written test question response to be analyzed independent of one another, so that a weakness in one area of language does not affect the scoring of other areas.

PTE Academic was created in response to the demand for a more accurate, objective, secure and relevant test of English. Our automated scoring system is a central feature of the test, and vital to ensuring the delivery of accurate, objective and relevant results – no matter who the test-taker is or where the test is taken.

Development and validation of the scoring system to ensure accuracy

PTE Academic’s automated scoring system was developed after extensive research and field testing. A prototype test was developed and administered to a sample of more than 10,000 test takers from 158 different countries, speaking 126 different native languages. This data was collected and used to train the automated scoring engines for both the written and spoken PTE Academic items.

To do this, multiple trained human markers assess each answer. Those results are used as the training material for machine learning algorithms, similar to those used by systems like Google Search or Apple’s Siri. The model makes initial guesses as to the scores each response should get, then consults the actual scores to see well how it did, adjusts itself in a few directions, then goes through the training set over and over again, adjusting and improving until it arrives at a maximally correct solution – a solution that ideally gets very close to predicting the set of human ratings.

Once trained up and performing at a high level, this model is used as a marking algorithm, able to score new responses just like human markers would. Correlations between scores given by this system and trained human markers are quite high. The standard error of measurement between Pearson’s system and a human rater is less than that between one human rater and another – in other words, the machine scores are more accurate than those given by a pair of human raters, because much of the bias and unreliability has been squeezed out of them. In general, you can think of a machine scoring system as one that takes the best stuff out of human ratings, then acts like an idealized human marker.

Pearson conducts scoring validation studies to ensure that the machine scores are consistently comparable to ratings given by skilled human raters. Here, a new set of test-taker responses (never seen by the machine) are scored by both human raters and by the automated scoring system. Research has demonstrated that the automated scoring technology underlying PTE Academic produces scores comparable to those obtained from careful human experts. This means that the automated system “acts” like a human rater when assessing test takers’ language skills, but does so with a machine's precision, consistency and objectivity.

Scoring speaking responses with Pearson’s Ordinate technology

The spoken portion of PTE Academic is automatically scored using Pearson’s Ordinate technology. Ordinate technology results from years of research in speech recognition, statistical modeling, linguistics and testing theory. The technology uses a proprietary speech processing system that is specifically designed to analyze and automatically score speech from fluent and second-language English speakers. The Ordinate scoring system collects hundreds of pieces of information from the test takers’ spoken responses in addition to just the words, such as pace, timing and rhythm, as well as the power of their voice, emphasis, intonation and accuracy of pronunciation. It is trained to recognize even somewhat mispronounced words, and quickly evaluates the content, relevance and coherence of the response. In particular, the meaning of the spoken response is evaluated, making it possible for these models to assess whether or not what was said deserves a high score.

Scoring writing responses with Intelligent Essay Assessor™ (IEA)

The written portion of PTE Academic is scored using the Intelligent Essay Assessor™ (IEA), an automated scoring tool powered by Pearson’s state-of-the-art Knowledge Analysis Technologies™ (KAT) engine. Based on more than 20 years of research and development, the KAT engine automatically evaluates the meaning of text, such as an essay written by a student in response to a particular prompt. The KAT engine evaluates writing as accurately as skilled human raters using a proprietary application of the mathematical approach known as Latent Semantic Analysis (LSA). LSA evaluates the meaning of language by analyzing large bodies of relevant text and their meanings. Therefore, using LSA, the KAT engine can understand the meaning of text much like a human.

What aspects of English does PTE Academic assess?

May 22, 2020

Technology and the future

Can computers really mark exams? Benefits of ELT automated assessments

By Pearson Languages

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests. PTE Academic and Versant tests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called latent semantic analysis. LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

address the increasing importance of formative assessment to drive personalized learning and diagnostic assessment feedback 
allow students to practice and get instant feedback inside and outside of allocated teaching time
address the issue of teacher workload
create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 
provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At Pearson, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The Pearson Test of English Academic is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

Pearson English International Certificate (PEIC)

Pearson English International Certificate (PEIC) also uses automated assessment technology. With a two-hour test available on-demand to take at home or at school (or at a secure test center). Using a combination of advanced speech recognition and exam grading technology and the expertise of professional ELT exam markers worldwide, our patented software can measure English language ability.

August 30, 2019

For learners

Mondly by Pearson

Join our language learning community

Want to access digital learning?

For test takers

Pearson Test of English (PTE)

Pearson English International Certificate (PEIC)

Versant by Pearson

For educators

Connected Learning Programs

Our courses and tests

Complementary products

Professional development

Game-changing digital learning

Existing customer?

For HR professionals

Versant by Pearson

Mondly by Pearson

Business language courses

Our language learning community

Social channels

Webinars

Why Pearson?

Key collaborations

Fast-track language learning

Supporting all learners

Ground-breaking technology

The Impact of English

Filter by tag

Computer-based language assessment: the future is here

Computer-based testing: a closer look

The benefits for students and teachers

More than questions on a screen

How technology enhances language testing

The space race on paper

PTE Academic – a fully digitized exam

The future is here

English for employability: What will jobs be like in the future

A rising level of English and employer expectations

How the Global Scale of English and CEFR have surfaced employability skills

Preparing students for the future workplace

Future skills: careers in 2030

Explaining computerized English testing in plain English

Development and validation of the scoring system to ensure accuracy

Scoring speaking responses with Pearson’s Ordinate technology

Scoring writing responses with Intelligent Essay Assessor™ (IEA)

What aspects of English does PTE Academic assess?

Can computers really mark exams? Benefits of ELT automated assessments

AI versus traditional automated scoring

How AI is used to score speaking exams

How AI is used to score writing exams

AI’s ability to mark multiple traits

AI’s lack of bias

The benefits of AI automated assessment

Examples of AI assessments in ELT

Versant

PTE Academic

Pearson English International Certificate (PEIC)

Our customers

Help and support

Community

Why learn with us

More from Pearson

Social links

AI’s ability to mark multiple traits