Computer-based language assessment: The future is here

David Booth
David Booth
A blonde woman sat at a computer with headphones on in a room with more computers and desks in background

Many people are surprised at the idea of a computer program marking an exam paper. However, computer-based testing already exists in many different formats and many different areas. Many tests or exams that form part of our daily life are taken on computers. If you’ve ever learned to drive, sat a citizenship test, done a training course at work, or completed a placement test for a language course, the odds are that you’ve already taken an automated test.

Yet despite it being so common, there is still a lack of understanding when it comes to computer-based language assessment and how a computer can evaluate productive skills like speaking and writing.

Computer-based testing: a closer look

A common issue is that people have different ideas of what these tests entail. Computers can fulfill several essential roles in the testing process, but these often go unacknowledged. For example, a variety of test questions are needed to administer an exam, along with relevant data, and computers are used to store both the questions and the data. When it comes to creating randomized exams, computer software is used to select the exam questions, based on this data.

Computers can make complex calculations far more quickly and accurately than humans. This means that processes that previously took a long time are completed in days, rather than weeks.

Artificial intelligence (AI) technology is now capable of grading exam papers, for example. This means a shorter wait for exam results. In PTE Academic, candidates receive their results in an average of two days rather than waiting weeks for an examiner to mark their paper by hand.

The benefits for students and teachers

People take exams to prove their skills and abilities. Depending on their goals, the right result can open the door to many new opportunities, whether that is simply moving on to the next stage of a course, or something as life-changing as allowing you to take up your place on a university course in another country.

A qualification can act as a passport to a better career or an enhanced education, and for that reason, it’s important that both students and teachers can have faith in their results.

Computer programs have no inherent bias, which means that candidates can be confident that they will all be treated the same, regardless of their background, appearance or accent. PTE Academic, just one of Pearson’s computer-based exams, offers students the chance to score additional points on the exam with innovative integrated test items.

This integration means that the results are a far more accurate depiction of the candidate’s abilities and provide a truer reflection of their linguistic prowess.

More than questions on a screen

It’s not as easy as simply transferring the questions onto a computer screen. All that does is remove the need for pen and paper; this is a missed opportunity to harness the precision and speed of a computer, as well as its learning potential.

Tests that have been fully digitized, such as PTE Academic, benefit from that automation; eliminating examiner bias, making the test fairer and calculating the results more quickly. Automated testing builds on the technological tradition of opening doors for the future – not closing them.

How technology enhances language testing

The development of automated testing technologies doesn’t merely make the examination process quicker and more accurate – it also gives us the chance to innovate. Speaking assessments are an excellent example of this.

Previously, this part of a language exam involved an interview, led by an examiner, who asked questions and elicited answers. But now that we have the technological capability, using a computer offers students the chance to be tested on a much wider range of speaking skills, without worrying about the inherent bias of the examiner.

Indeed, the use of a computer-based system facilitates integrated skills testing. Traditionally, language exams had separate papers focusing on the four skills of reading, listening, speaking and writing. But the more modern concept of language testing aims to assess these linguistic skills used together, just as they are in real-life situations.

Afterwards, the various scores are categorized to allow learners an insight into their strengths and weaknesses, which helps both students and teachers identify areas which need improvement. This useful feedback is only possible because of the accuracy and detail of automated exam grading.

The space race on paper

Back in the 1960s, during the space race, computers were still a relatively new concept. Kathleen Johnson, one of the first African-American women to work for NASA as a scientist, was a mathematician with a reputation for doing incredibly complex manual calculations. Although computers had made the orbital calculations, the astronauts on the first space flight refused to fly until Kathleen had checked those calculations three times.

This anecdote reminds us that - although computer technology is an inherent part of everyday life - now and then, we still need to check that their systems are working as they should. Human error still comes into play – after all, humans program these systems.

PTE Academic – a fully digitized exam

Every stage of PTE Academic, from registration to practice tests to results (both receiving and sharing them with institutions) happens online. It may come as a surprise to learn that the test itself is not taken online. Instead, students attend one of over 295 test centers to take the exam, which comes with the highest levels of data security.

This means that each student can sit the exam in an environment designed for that purpose. It also allows the receiving institutions, such as universities and colleges, to be assured of the validity of the PTE Academic result.

The future is here

We created computers, but they have surpassed us in many areas – exam grading being a case in point. Computers can score more accurately and consistently than humans, and they don’t get tired late in the day, or become distracted by a candidate’s accent.

The use of AI technology to grade student responses represents a giant leap forward in language testing, leading to fairer and more accurate student results. It also means more consistency in grading which benefits the institutions, such as universities, which rely on these scores to accurately reflect ability.

And here at Pearson, we are invested in staying at the cutting edge of assessment. Our test developers are incorporating AI solutions now, using its learning capacity to create algorithms and build programs that can assess speaking and writing skills accurately and quickly. We’re expanding the horizons of English language assessment for students, teachers and all the other professionals involved in each stage of the language learning journey.

More blogs from Pearson

  • A woman with headphones dancing in her living room

    Dance your way to fluent language learning and enhanced wellbeing

    By Charlotte Guest
    Reading time: 5 minutes

    Language learning can often feel daunting, with its endless vocabulary lists, grammatical structures and pronunciation rules. However, incorporating dance and movement into your study routine can transform this challenge into an engaging, enjoyable experience while significantly benefiting your overall wellbeing. This unusual approach is not only effective for language learners of all ages but also enriches the learning process with fun and physical activity.

    Engaging in movement and dance can substantially impact mental health, as evidenced by various studies and academic research. For instance, a notable study published in the American Journal of Dance Therapy highlighted that dance, particularly in structured environments, can reduce anxiety and improve mood among participants. This connection between dance and mental health improvement can be attributed to the release of endorphins, often referred to as happiness hormones, which occur during physical activity.

  • A girl sat at a desk with a laptop and notepad studying and taking notes

    AI scoring vs human scoring for language tests: What's the difference?

    By Charlotte Guest
    Reading time: 6 minutes

    When entering the world of language proficiency tests, test takers are often faced with a dilemma: Should they opt for tests scored by humans or those assessed by artificial intelligence (AI)? The choice might seem trivial at first, but understanding the differences between AI scoring and human language test scoring can significantly impact preparation strategy and, ultimately, determine test outcomes.

    The human touch in language proficiency testing and scoring

    Historically, language tests have been scored by human assessors. This method leverages the nuanced understanding that humans have of language, including idiomatic expressions, cultural references, and the subtleties of tone and even writing style, akin to the capabilities of the human brain. Human scorers can appreciate the creative and original use of language, potentially rewarding test takers for flair and originality in their answers. Scorers are particularly effective at evaluating progress or achievement tests, which are designed to assess a student's language knowledge and progress after completing a particular chapter, unit, or at the end of a course, reflecting how well the language tester is performing in their language learning studies.

    One significant difference between human and AI scoring is how they handle context. Human scorers can understand the significance and implications of a particular word or phrase in a given context, while AI algorithms rely on predetermined rules and datasets.

    The adaptability and learning capabilities of human brains contribute significantly to the effectiveness of scoring in language tests, mirroring how these brains adjust and learn from new information.

    Advantages:

    • Nuanced understanding: Human scorers are adept at interpreting the complexities and nuances of language that AI might miss.
    • Contextual flexibility: Humans can consider context beyond the written or spoken word, understanding cultural and situational implications.

    Disadvantages:

    • Subjectivity and inconsistency: Despite rigorous training, human-based scoring can introduce a level of subjectivity and variability, potentially affecting the fairness and reliability of scores.
    • Time and resource intensive: Human-based scoring is labor-intensive and time-consuming, often resulting in longer waiting times for results.
    • Human bias: Assessors, despite being highly trained and experienced, bring their own perspectives, preferences and preconceptions into the grading process. This can lead to variability in scoring, where two equally competent test takers might receive different scores based on the scorer's subjective judgment.

    The rise of AI in language test scoring

    With advancements in technology, AI-based scoring systems have started to play a significant role in language assessment. These systems utilize algorithms and natural language processing (NLP) techniques to evaluate test responses. AI scoring promises objectivity and efficiency, offering a standardized way to assess language and proficiency level.

    Advantages:

    • Consistency: AI scoring systems provide a consistent scoring method, applying the same criteria across all test takers, thereby reducing the potential for bias.
    • Speed: AI can process and score tests much faster than human scorers can, leading to quicker results turnaround.
    • Great for more nervous testers: Not everyone likes having to take a test in front of a person, so AI removes that extra stress.

    Disadvantages:

    • Lack of nuance recognition: AI may not fully understand subtle nuances, creativity, or complex structures in language the way a human scorer can.
    • Dependence on data: The effectiveness of AI scoring is heavily reliant on the data it has been trained on, which can limit its ability to interpret less common responses accurately.

    Making the choice

    When deciding between tests scored by humans or AI, consider the following factors:

    • Your strengths: If you have a creative flair and excel at expressing original thoughts, human-scored tests might appreciate your unique approach more. Conversely, if you excel in structured language use and clear, concise expression, AI-scored tests could work to your advantage.
    • Your goals: Consider why you're taking the test. Some organizations might prefer one scoring method over the other, so it's worth investigating their preferences.
    • Preparation time: If you're on a tight schedule, the quicker turnaround time of AI-scored tests might be beneficial.

    Ultimately, both scoring methods aim to measure and assess language proficiency accurately. The key is understanding how each approach aligns with your personal strengths and goals.

    The bias factor in language testing

    An often-discussed concern in both AI and human language test scoring is the issue of bias. With AI scoring, biases can be ingrained in the algorithms due to the data they are trained on, but if the system is well designed, bias can be removed and provide fairer scoring.

    Conversely speaking, human scorers, despite their best efforts to remain objective, bring their own subconscious biases to the evaluation process. These biases might be related to a test taker's accent, dialect, or even the content of their responses, which could subtly influence the scorer's perceptions and judgments. Efforts are continually made to mitigate these biases in both approaches to ensure a fair and equitable assessment for all test takers.

    Preparing for success in foreign language proficiency tests

    Regardless of the scoring method, thorough preparation remains, of course, crucial. Familiarize yourself with the test format, practice under timed conditions, and seek feedback on your performance, whether from teachers, peers, or through self-assessment tools.

    The distinctions between AI scoring and human in language tests continue to blur, with many exams now incorporating a mix of both to have students leverage their respective strengths. Understanding and interpreting written language is essential in preparing for language proficiency tests, especially for reading tests. By understanding these differences, test takers can better prepare for their exams, setting themselves up for the best possible outcome.

    Will AI replace human-marked tests?

    The question of whether AI will replace markers in language tests is complex and multifaceted. On one hand, the efficiency, consistency and scalability of AI scoring systems present a compelling case for their increased utilization. These systems can process vast numbers of tests in a fraction of the time it takes markers, providing quick feedback that is invaluable in educational settings. On the other hand, the nuanced understanding, contextual knowledge, flexibility, and ability to appreciate the subtleties of language that human markers bring to the table are qualities that AI has yet to fully replicate.

    Both AI and human-based scoring aim to accurately assess language proficiency levels, such as those defined by the Common European Framework of Reference for Languages or the Global Scale of English, where a level like C2 or 85-90 indicates that a student can understand virtually everything, master the foreign language perfectly, and potentially have superior knowledge compared to a native speaker.

    The integration of AI in language testing is less about replacement and more about complementing and enhancing the existing processes. AI can handle the objective, clear-cut aspects of language testing, freeing markers to focus on the more subjective, nuanced responses that require a human touch. This hybrid approach could lead to a more robust, efficient and fair assessment system, leveraging the strengths of both humans and AI.

    Future developments in AI technology and machine learning may narrow the gap between AI and human grading capabilities. However, the ethical considerations, such as ensuring fairness and addressing bias, along with the desire to maintain a human element in education, suggest that a balanced approach will persist. In conclusion, while AI will increasingly play a significant role in language testing, it is unlikely to completely replace markers. Instead, the future lies in finding the optimal synergy between technological advancements and human judgment to enhance the fairness, accuracy and efficiency of language proficiency assessments.

    Tests to let your language skills shine through

    Explore Pearson's innovative language testing solutions today and discover how we are blending the best of AI technology and our own expertise to offer you reliable, fair and efficient language proficiency assessments. We are committed to offering reliable and credible proficiency tests, ensuring that our certifications are recognized for job applications, university admissions, citizenship applications, and by employers worldwide. Whether you're gearing up for academic, professional, or personal success, our tests are designed to meet your diverse needs and help unlock your full potential.

    Take the next step in your language learning journey with Pearson and experience the difference that a meticulously crafted test can make.

  • Woman standing outside with a coffee and headphones

    Using language learning as a form of self-care for wellbeing

    By Charlotte Guest
    Reading time: 6.5 minutes

    In today’s fast-paced world, finding time for self-care is more important than ever. Among a range of traditional self-care practices, learning a language emerges as an unexpected but incredibly rewarding approach. Learning a foreign language is a key aspect of personal development and can help your mental health, offering benefits like improved career opportunities, enhanced creativity, and the ability to connect with people from diverse cultures.