**Chapter 1: Introduction to Educational Assessment **

I. Introduction

II. The Language of Assessment

A. Tests, Measurement, & Assessment

B. Types of Tests

C. Types of Scores Interpretations

III. Assumptions of Educational Assessment

A. Psychological and educational constructs exist.

B. Psychological and educational constructs can be measured.

C. While we can measure constructs, our measurement is not perfect.

D. There are different ways to measure any given construct.

E. All assessment procedures have strengths and limitations.

F. Multiple sources of information should be part of the assessment process.

G. Performance on tests can be generalized to non-test behaviors.

H. Assessment can provide information that helps educators make better educational decisions.

I. Assessments can be conducted in a fair manner.

J. Testing and assessment can benefit our educational institutions and society as a whole.

IV. Participants in the Assessment Process

A. People who develop tests.

B. People who use tests.

C. People who take tests.

D. Other people involved in the assessment process.

V. Common Application of Educational Assessments

A. Student Evaluation

B. Instructional Decisions

C. Selection, Placement, and Classification Decisions

D. Policy Decisions

E. Counseling and Guidance Decisions

VI. What Teachers Need to Know About Assessment

A. Teachers should be proficient in selecting professionally developed assessment

procedures that are appropriate for making instructional decisions.

B. Teachers should be proficient in developing assessment procedures that are

appropriate for making instructional decisions.

C. Teachers should be proficient in administering, scoring, and interpreting

professionally developed and teacher-made assessment procedures.

D. Teachers should be proficient in using assessment results when making

educational decisions.

E. Teachers should be proficient in developing valid grading procedures that

incorporate assessment information.

F. Teachers should be proficient in communicating assessment results.

G. Teachers should be proficient in recognizing unethical, illegal, and other

inappropriate uses of assessment procedures or information.

VII. Educational Assessment in the 21st Century

A. Computerized Adaptive Testing (CAT) and Other Technological Advances.

B. Authentic Assessments

C. Educational Accountability and High-Stakes Assessment

D. Trends in the Assessment of Students with Disabilities

VIII. Summary

Tables

A. Table 1.1: Major Categories of Tests

B. Table 1.2: Norm- and Criterion-Referenced Scores

C. Table 1.3: Assumptions of Educational Assessment

D. Table 1.4: Common Applications of Educational Assessments

E. Table 1.5: Teacher Competencies in Educational Assessment

Special Interest Topics

A. Special Interest Topic 1.1: Cognitive Diagnostic Assessment — Another Step Toward Unifying Assessment and Instruction

B. Special Interest Topic 1.2: Technology and Assessment in the Schools

C. Special Interest Topic 1.3: Princeton Review's Rankings of High-Stakes Testing Programs

D. Special Interest Topic 1.4: The “Nation's Report Card”

E. Special Interest Topic 1.5: What Does the 21st Century Hold for the Assessment Profession?

**Chapter 2: The Basic Mathematics of Measurement**

I. The Role of Mathematics in Assessment

II. Scales of Measurement

A. What is Measurement?

B. Nominal Scales

C. Ordinal Scales

D. Interval Scales

E. Ratio Scales

III. The Description of Test Scores

A. Distributions

B. Measures of Central Tendency

C. Measures of Variability

IV. Correlation Coefficients

A. Scatterplots

B. Correlation and Prediction

C. Types of Correlation Coefficients

D. Correlation and Causality

V. Summary

Tables

Table 2.1: Common Nominal, Ordinal, Interval, & Ratio Scales

**Table 2.2: Distribution of Scores for 20 Students**

Table 2.3: Ungrouped Frequency Distribution

Table 2.4: Group Frequency Distribution

Table 2.5: Calculating the Standard Deviation and Variance

Table 2.6: Calculating a Pearson Correlation Coefficient

Figures

Figure 2.1: Graph of the Homework Scores

Figure 2.2: Hypothetical Distribution of Large Standardization Sample

Figure 2.3: Negatively Skewed Distribution

Figure 2.4: Positively Skewed Distribution

Figure 2.5: Bimodal Distribution

Figure 2.6: Relationship between Mean, Median, and Mode in Normal and Skewed Distributions

Figure 2.7: Three Distributions with Different Degrees of Variability

Figure 2.8: Scatterplots of Different Correlation Coefficients

Special Interest Topics

Special Interest Topic 2.1: Population Parameters and Sample Statistics

Special Interest Topic 2.2: A Public Outrage: Physicians Overcharge Their Patients

Special Interest Topic 2.3: Is the Variance Always Larger Than the Standard Deviation?

Special Interest Topic 2.4: Caution: Drawing Conclusions of Causality

**Chapter 3: The Meaning of Test Scores**

I. Introduction

II. Norm-Referenced & Criterion-Referenced Score Interpretations

A. Norm-Referenced Interpretations

B. Criterion-Referenced Interpretations

III. Norm-Referenced, Criterion-Referenced, or Both?

IV. Qualitative Description of Scores

V. Summary

Tables

Table 3.1: Transforming Raw Scores to Standard Scores

Table 3.2: Relationship of Different Standard Score Formats

Table 3.3: Converting Standard Scores From One Format to Another

Table 3.4: Characteristics of Norm-Referenced and Criterion-Referenced Scores

**Figures**

Figure 3.1: Illustration of the Normal Distribution

Figure 3.2: Normal Distribution with Mean, Standard Deviation, & Percentages.

Figure 3.3: Normal Distribution Illustrating the Relationship among Standard Scores.

**Special Interest Topics Special Interest Topic 3.1: The “Flynn Effect.” Special Interest Topic 3.2: Whence the Normal Curve?**

Special Interest Topic 3.3: Why do IQ Tests use a Mean of 100 and Standard Deviation of 15?

Special Interest Topic 3.4: The History of Stanine Scores

Special Interest Topic 3.5: Every Child on Grade Level?

**Chapter 4: Reliability for Teachers**

I. Introduction

II. Errors of Measurement

A. Sources of Measurement Error

III. Methods of Estimating Reliability

A. Test-Retest Reliability

B. Alternate Form Reliability

C. Internal Consistency Reliability

D. Inter-Rater Reliability

E. Reliability of Composite Scores

F. Selecting a Reliability Coefficient

G. Evaluating Reliability Coefficients

H. How to Improve Reliability

I. Special Problems in Estimating Reliability

IV. The Standard Error of Measurement

A. Evaluating the Standard Error of Measurement

V. Reliability: Practical Strategies for Teachers

VI. Summary

Tables

Table 4.1: Major Types of Reliability

Table 4.2: Half-Test Coefficients and Corresponding Full-Test Coefficients Corrected

with the Spearman-Brown Formula

Table 4.3: Calculation of KR 20

Table 4.4: Calculation of Coefficient Alpha

Table 4.5: Calculating Inter-Rater Agreement

Table 4.6: Source of Error Variance Associated with Major Types of Reliability

Table 4.7: Reliability Expected When Increasing the Numbers of Items

Table 4.8: Standard Errors of Measurement for Values of Reliability and Standard Deviations

Table 4.9: Reliability Estimates for Tests with a Mean of 80%

Figures

Figure 4.1: Partitioning the Variance

Special Interest Topics

Special Interest Topic 4.1: Generalizability Theory

Special Interest Topic 4.2: Consistency of Classification with Mastery Tests

Special Interest Topic 4.3: A Quick Way To Estimate Reliability for Classroom Exams

**Chapter 5: Validity for Teachers**

I. Introduction

A. Threats to Validity

B. Reliability & Validity

II. "Types of Validity" versus "Types of Validity Evidence"

III. Types of Validity Evidence

A. Evidence Based on Test Content

B. Evidence Based on Relations to Other Variables

C. Evidence Based on Internal Structure

D. Evidence Based on Response Processes

E. Evidence Based on Consequences of Testing

F. Integrating Evidence of Validity

IV. Validity: Practical Strategies for Teachers

V. Chapter Summary

Tables

Table 5.1: Tracing Historical Trends in the Concept of Validity

Table 5.2: Sources of Validity Evidence

Figures

Figure 5.1: Illustration of Item Relevance

Figure 5.2: Illustration of Content Coverage

Figure 5.3: Predictive and Concurrent Studies

Figure 5.4: Graph of a Regression Line

Special Interest Topic

Special Interest Topic 5.1: Regression, Prediction, and Your First Algebra Class

**Chapter 6: Item Analysis for Teachers**

I. Introduction

II. Item Difficulty Index (or Item Difficulty Level)

A. Special Assessment Situations and Item Difficulty

III. Item Discrimination

A. Item Discrimination on Mastery Tests

B. Difficulty and Discrimination on Speed Tests

IV. Distracter Analysis

A. How Distracters Influence Item Difficulty and Discrimination

V. Item Analysis: Practical Strategies for Teachers

VI. Using Item Analysis to Improve Items

VII. Item Analysis and Performance Assessments

VIII. Qualitative Item Analysis

IX. Using Item Analysis to Improve Classroom Instruction

X. Summary

Tables

Table 6.1: Optimal *p *Values for Items with Varying Numbers of Choices

Table 6.2: Guidelines for Evaluating D Values

Table 6.3: Maximum D Values at Different Difficulty Levels

Table 6.4: Two Examples of Test Scoring and Item Analysis Programs

Special Interest Topics

Special Interest Topic 6.1: Item Difficulty Indexes and Power Tests

Special Interest Topic 6.2: Item Analysis for Constructed Response Items

Special Interest Topic 6.3: Developing a Test Bank

**Chapter 7: The Initial Steps in Developing a Classroom Test: **

**Deciding What to Test and How to Test It**

I. Introduction

II. Characteristics of Educational Objectives

III. Taxonomy of Educational Objectives

A. Cognitive Domain

B. Affective Domain

C. Psychomotor Domain

IV. Behavioral versus Nonbehavioral Educational Objectives

V. Writing Educational Objectives

VI. Developing a Table of Specifications

VII. Implementing the Table of Specifications and Developing an Assessment

A. Norm-Referenced versus Criterion-Referenced Assessment

B. Selecting which types of items to use

C. Putting the Assessment Together

VIII. Preparing your Students and Administering the Assessment.

IX. Summary

Tables

Table 7.1: Bloom's Taxonomy of Educational Objectives

Table 7.2: Krathwohl's Taxonomy of Affective Objectives

Table 7.3: Harrow's Taxonomy of Psychomotor Objectives

Table 7.4: Learning Objectives for Chapter 2: The Basic Math of Measurement

Table 7.5: Table of Specifications for Test on Chapter 2: Based on Content Areas

Table 7.6: Table of Specifications for Test on Chapter 2: Content Areas with Percentages

Table 7.7: Strengths and Weaknesses of Selected-Response Items

Table 7.8: Strengths and Weaknesses of Constructed-Response Items

Table 7.9: Practical Suggestions for Assembling an Assessment

Special Interest Topics

A. Special Interest Topic 7.1: Suggestions for Reducing Test Anxiety.

B. Special Interest Topic 7.2: Strategies for Preventing Cheating

**Chapter 8: The Development and Use of Selected-Response Items **

I. Introduction

II. Multiple-choice Items

A. Guidelines for Developing Multiple-choice Items

B. Strengths and Weaknesses of Multiple-choice Items

III. True-False Items

A. Guidelines for Developing Multiple-choice Items

B. Strengths and Weaknesses of Multiple-choice Items

IV. Matching Items

A. Guidelines for Developing Matching Items

B. Strengths and Weaknesses of Matching Items

V. Summary

Tables

Table 8.1: Checklist for the Development of Multiple-choice Items

Table 8.2: Strengths and Weaknesses of Multiple-choice Items

Table 8.3: Checklist for the Development of True-False Items

Table 8.4: Strengths and Weaknesses of True-False Items

Table 8.5: Checklist for the Development of Matching Items

Table 8.6: Strengths and Weaknesses of Matching Items

Special Interest Topics

Special Interest Topic 8.1: Do Multiple-choice Items Penalize Creative Students?

Special Interest Topic 8.2: Correction for Guessing

Special Interest Topic 8.3: What research says about "Changing your answer?"

**Chapter 9: The Development and Use of Constructed-Response Items **

I. Introduction

II. Oral Testing: The Oral Essay as a Precursor of Constructed-Response Items

III. Essay Items

A. Purposes of Essay Items

B. Essay Items at Different Levels of Complexity

C. Restricted-Response versus Extended-Response Essays

D. Guidelines for Developing Essay Items

E. Strengths and Weaknesses of Essay Items

F. Guidelines for Scoring Essay Items

III. Short-Answer Items

A. Guidelines for Developing Short-Answer Items

B. Strengths and Weaknesses of Short-Answer Items

IV. A Final Note: Constructed-Response versus Selected-Response Items

V. Summary

Tables

Table 9.1: Purposes of Essay Testing

Table 9.2: Guidelines for the Development of Essay Items

Table 9.3: Strengths and Weaknesses of Essay Items

Table 9.4: Holistic Scoring Rubric

Table 9.5: Analytic Scoring Rubric

Table 9.6: Guidelines for Scoring Essay Items

Table 9.7: Guidelines for the Development of Short-Answer Items

Table 9.8: Strengths and Weaknesses of Short-Answer Items

Special Interest Topics

Special Interest Topic 9.1: Computer Scoring of Essay Items

**Chapter 10: Performance Assessments & Portfolios **

I. Introduction - What Are Performance Assessments?

II. Guidelines for Developing Effective Performance Assessments

A. Selecting Appropriate Performance Tasks

B. Developing Instructions

C. Developing Procedures for Scoring Responses

D. Implementing Procedures to Minimize Errors in Rating

III. Strengths & Weaknesses of Performance Assessments

IV. Portfolios

V. Guidelines for Developing Portfolio Assessments

VI. Strengths & Weaknesses of Portfolios

VII. Summary

List of Tables

Table 10.1: Guidelines for Selecting Performance Tasks.

Table 10.2: Guidelines for Developing Instructions for Performance Assessments.

Table 10.3: Example of a Rating Scale using Verbal Descriptions

Table 10.4: Example of a Numerical Rating Scale

Table 10.5: Example of a Graphic Rating Scale

Table 10.6: Example of a Descriptive Graphic Rating Scale

Table 10.7: Example of a Checklist Used with Preschool Children

Table 10.8: Guidelines for Developing and Implementing Scoring Procedures

Table 10.9: Strengths & Weaknesses of Performance Assessments

Table 10.10: Guidelines for Developing Portfolio Assessments

Table 10.11: Strengths and Weaknesses of Portfolios Assessments

Special Interest Topics

Special Interest Topic 10.1: Example of a Performance Assessment in Mathematics

Special Interest Topic 10.2: Reliability Issues in Performance Assessments

Special Interest Topic 10.3: Performance Assessments in High-Stakes Testing.

**Chapter 11: Assigning Grades on the Basis of Classroom Assessments**

