Computer-Adaptive Assessment – is it the way forward?
Angela Hopkins, Head of Assessment Services at NFER, looks at the rise of Computer-Adaptive Assessment.
Over time, we have seen a number of proposals for reviews and alternative visions for assessment and accountability arrangements. In 2021 the Education and Skills (EDSK) published its report, Making Progress: The future of assessment. A key recommendation was that England should move to online adaptive tests to track the performance and progress of primary-aged pupils.
The defining feature of a computer-adaptive test (CAT) is that it selects questions for the test taker as they go along, based on how well they have performed on the questions they’ve answered so far. By adapting in this way, it means the test taker can be presented with a set of questions targeted to their ability.
The concept of CAT has been around for some time. Indeed tiering, as used in many GCSE exams when they were introduced in the late 1980s, is a fairly basic form of adaptive testing. However it is believed the use of a CAT, based on algorithms, enables a more sophisticated use of adaptive testing and the possibility of use at a national level. So, what are the benefits, and challenges?
CATs have the potential to offer a more user-friendly, proportionate and reliable test, not only for primary assessment but secondary too.
One of the most important benefits is the pupil’s experience. Rather than take a test in which they potentially cannot answer many of the questions or, conversely, find many of them too easy, the pupil is directed through a series of mainly targeted questions. This helps ensure the pupil feels no sense of inadequacy or boredom.
The test length can also be reduced, as the use of targeted questions will enable stakeholders to make inferences about the pupil’s knowledge and understanding of the curriculum, based on a smaller number of questions than is needed for a linear test.
Similarly, a test comprising questions which are increasingly targeted at the pupil’s ability level will provide a greater degree of precision than a test designed for pupils of all abilities.
Other benefits include automated scoring and immediacy of results, so avoiding the need for a post-test marking arrangement and checks on marker accuracy. With their higher levels of precision, CATs can also support a system where the focus is more on tracking learners’ progress over an educational phase with potentially more frequent but shorter assessments. A move to more targeted assessments also enables a move to testing when ready, rather than testing everyone at the same time. This does have implications, however, for reporting outcomes as explained later.
For a country which has grown very used to the notion of linear assessments with raw scores, CATs would require a shift to understanding outcomes on a scale, determined by an algorithm.
For a start, pupils taking the assessment would not get a raw or total score as a result. Their performance would be described as a position on an ability scale, although it could be translated to a grade or accompanied by a proficiency description.
It would be possible to introduce a revised national accountability system based on CATs but the ‘results’ and outputs will be complicated. For many stakeholders, including parents and pupils, it could feel less transparent than the current arrangement and lead to frustration and lack of engagement. Any move to an adaptive system would therefore benefit from an effective communications campaign to support the transition. In addition, no accountability system is immune to having an unintended influence on schools’ behaviours. For example, if testing windows were to become more flexible, it may create a tension between timing the test to best meet the needs of the school (in relation to published outputs), or timing it to meet individual pupil needs.
CATs could potentially make it harder for teachers to use test and question-level data to inform learning at a whole class level, because pupils will have tackled different questions. Careful consideration of how outcomes from CATs are reported is, therefore, important to ensure the information can support teaching.
A CAT could also potentially limit the learner’s opportunity to show their true ability. For example, if they were to enter uncharacteristic responses to earlier questions, they may not be presented with more demanding or more accessible questions later on. Similarly, some learners may not perform in a linear fashion. For example, they may be able to access some fairly challenging concepts in maths, yet struggle with more basic questions.
A CAT needs to comprise questions which can be scored in real time using automated scoring. This could restrict the types of questions that can be presented to learners and would require a significant change to many assessments currently in use in England, including national curriculum tests. While it is possible to test some aspects of higher order skills and understanding through selected response questions, it could limit the scope of what is assessed and, as a result, tests could provide less valid evidence of attainment and progress.
Behind the scenes, any CAT system needs a very large bank of questions to function effectively. It may seem like a straightforward task but the reality of developing large numbers of high-quality questions with fine gradations of challenge is a demanding one. It could mean the set-up costs of the assessment are very high.
It goes without saying that, for a CAT to work effectively, it must be rigorously piloted to ensure the assessment will function as intended and produce reliable data about each question and each test taker. In addition, regardless of the device used, and when or where the test is taken, the learner experience should be the same.
Such a significant change in the assessment system and the particular challenges in a move to CAT should not be underestimated, however the many benefits, especially in relation to improving pupils’ experience could make this process worth it!