Standardized testing. What is it and how does it work?

Standardized assessment is a lens into the classroom. It sheds light on why a child might be struggling, succeeding, or accelerating on specific elements of their grade-level standards. Results from standardized tests help inform the next step in learning for our students. But, sometimes it isn’t always crystal clear to students, parents and the public how and why the tests are developed. Let’s delve into that.

As it stands, most states are still administering end-of-year tests as required by federal law under No Child Left Behind. For the most part, this means students take annual tests in English Language Arts and Mathematics in grades 3-8; they are tested at least once in high school. Science is tested at least once in elementary, middle and high school. Additional testing in high school often is seen after completing specific courses, like Algebra or Biology, or as a gateway to graduation.
Each state plans the specifics of its testing program, deciding elements like how many questions to put on a test, the dates for testing, whether tests are given on paper or on computer, to name a few. But, some similarities in the creation of the tests cut across the board.

Standardized tests undergo a very rigorous development process so here’s a bit about the five major steps that go into making a test.

States Adopt Content Standards

This is where it all begins. Everything starts with the content standards developed by states and/or a group of states, as seen with the Common Core State Standards. Content standards outline what a student should be able to know at the end of each school year. These standards are the foundation for instruction in the classroom as well as the assessment.

Given the huge range of knowledge and skills each student is supposed to master by year’s end, the assessment development process includes a determination of what will be assessed on each test for each grade. Because we can’t test everything covered in a year (no one wants the test to be longer than necessary), decisions must be made.

Item Development

Here’s where we get into the nitty gritty. Experts, most of whom are former or current teachers with experience and knowledge of the subject matter and grade level, create “items” that test the content selected in step two. These items can be multiple-choice questions, essay prompts, tasks, situations, activities, and the like.

Of note, significant time is even spent deciding which WRONG answers to make available for multiple-choice questions. Why’s that? Every item is a chance to identify what our students really know. Incorrect answers can actually tell us a lot about what students misunderstood. For instance, did they add instead of subtract? Multiply instead of divide? Every bit of data helps disentangle what kids really, truly know, which makes the assessment process complex and the final product a very powerful education tool.

Once the items are developed, then teachers, content experts, higher education faculty, and the testing entity at the state level review them. This diverse group of stakeholders works together to create items that are fair, reliable and accurate. Lots of revisions happen at this stage. And, during this process many items are thrown out — for any number of reasons — and never see the light of day.

Field Testing or Field Trials

Now, we test the items by giving them to students. Items developed in step three are “field tested” to gauge how each works when students respond to them. Here, and I can’t stress this enough, we’re testing the item itself – not the kids. We want to know that the question itself is worthy of being used to assess skills and knowledge appropriately. Students’ scores on these field-test items are only used to evaluate the items; they are not used to calculate a student’s score for the year.

By doing these trials, we can see if gender, ethnicity or even English proficiency impact a child’s ability to successfully perform the task at hand. All of this is done to verify that each and every question is fair. Yet again, a range of stakeholders and experts are involved in the process, reviewing the results and making decisions along the way. The reality is this: if an item doesn’t meet expectations, it’s cut.

Build the Test

Using field-tested and approved items, systematically and thoughtfully the test takes its final form. Easy and hard items, tasks, and activities are incorporated. Items that assess varying skills and content areas are added. This part of the process helps us understand what a child really knows at the end of the assessment. As they say, variety is the spice of life. Same goes for an assessment. A mixture of challenging and easy items enable a range of knowledge and skills to be assessed.
Setting Performance Standards – Finally, states with teachers and their testing partners to make decisions about how well students must perform to pass, or be proficient. For example, performance can be defined as basic, passing, proficient, or advanced. These “performance standards” provide a frame of reference for interpreting the test scores. They help students, parents, educators, administrators, and policymakers understand how well a student did by using a category rating.
After – and only after – this rigorous, multi-step, multi-year process involving a range of stakeholders is complete, do the tests enter the classroom.