Pearson’s Assessment business is a critical part of the company portfolio. This services business supports our customer requests by designing, building, administering, scoring, and reporting on test-taker performance. Assessments take place in many different contexts (classrooms, workplaces, etc.) and are needed for different purposes, including to support classroom instruction through progress monitoring and to certify a person’s fitness for employment in a given occupation.
Just like the students who use our courseware products in the classroom, the people who take our assessments are learners on a journey. Taking a test is not a learning experience in and of itself, but rather, the scores and diagnostic information from assessments may be used by instructors and others to make decisions about a learner’s progress along their journey.
Therefore, a measure of efficacy for assessments is not whether taking the test leads directly to higher achievement or passing the course, but rather, whether or not the scores and other diagnostic information provide an accurate snapshot of what the learner knows and can do. In other words, the efficacy of an assessment is its fitness for a given purpose.
The fitness of an assessment for a given purpose is, in turn, defined by three primary qualities or attributes of test scores and their use: validity, reliability, and fairness. The Standards for Educational and Psychological Testing (AERA, APA, NCME, 2014) have defined these attributes as follows:
- Validity is “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11). Simply put, the results of an assessment should be an accurate measure of the knowledge or ability the product is meant to assess.
- Reliability is “the consistency of scores across replications of a testing procedure” (p. 33). A test-taker’s scores should stay consistent even if they take the test on various different occasions, or the test is administered and rated by various different people.
- Fairness suggests that “scores have the same meaning for all individuals in the intended population” (p. 50). Fairness implies that assessment products should not disadvantage any particular group of test-takers. Test results should only vary according to the different levels of knowledge or ability the product is assessing, not according to the demographics of the test-takers.
The reports on our assessment products are not externally audited, because the auditing framework is organized around learner outcomes, and as discussed above, we typically do not expect our assessment products to have a direct effect on learner outcomes.