Hypothesis Testing, Validity, and Experimental Controls in Psychological Research

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 8: Hypothesis Testing, Validity, and Threats to Validity

Hypothesis Testing in Psychological Research

Hypothesis testing is a fundamental process in psychological research, used to determine whether observed effects are likely due to the manipulation of variables or to chance. The research hypothesis is evaluated through a series of logical steps involving the null hypothesis, confounding variables, and causal inference.

Null Hypothesis (H0): The default assumption that there is no effect or relationship between variables.
Causal Hypothesis: Suggests that changes in the independent variable (IV) cause changes in the dependent variable (DV).
Confounding Variables: Factors other than the IV that may affect the DV, potentially invalidating causal conclusions.

Example: If a study finds no difference between groups and confounding variables are not controlled, the causal hypothesis is not supported.

Types of Validity in Assessment Procedures

Validity refers to the degree to which a test or procedure measures what it claims to measure. Several types of validity are important in psychological research:

Statistical Validity: The appropriateness of statistical conclusions drawn from data. Threats include unreliable measures and violations of statistical test assumptions (e.g., using continuous data when nominal is needed, or non-normal distributions).
Face Validity: The extent to which a measure appears, on the surface, to assess what it is supposed to. Example: Phrenology had face validity but lacked scientific validity.
Content Validity: The degree to which test items represent the entire domain of interest. Ensures representativeness and relevance.
Criterion Validity: The correlation between a test and an external criterion, either in the future (predictive) or concurrently (concurrent).
Construct Validity: The extent to which a test measures the theoretical construct it is intended to measure, supported by consistency with existing literature.

Example: SAT scores predicting college performance demonstrate predictive criterion validity.

Relationship Between Reliability and Validity

Reliability and validity are related but distinct concepts in measurement:

Reliability: Consistency of measurement.
Validity: Accuracy of measurement.
A measure can be reliable without being valid, but not valid without being reliable.

Example: Hitting the same spot on a target (reliable but not valid); hitting the center (reliable and valid).

Additional info: Sometimes, increasing validity may decrease interrater reliability if subjective judgment is required.

Internal and External Validity

Internal validity refers to the degree to which observed changes in the DV are due to manipulation of the IV, not confounds. External validity concerns the generalizability of findings beyond the study context.

Internal Validity: Threatened by confounds such as biased assignment, maturation, history, testing effects, instrumentation changes, regression to the mean, diffusion of treatment, sequence effects, demand characteristics, and placebo effects.
External Validity: Threatened when results cannot be generalized to other populations, settings, or times.

Example: In a study on violent video games, if participants are not randomly assigned, internal validity is compromised.

Common Threats to Internal Validity

Selection Threat: Biased assignment of participants to groups.
Maturation: Changes in participants over time unrelated to the experiment.
History: Events occurring during the study that affect outcomes.
Testing: Practice or fatigue effects from repeated testing.
Instrumentation: Changes in measurement tools or procedures.
Regression to the Mean: Extreme scores tend to move toward the average on retesting.
Diffusion of Treatment: Treatment effects spread to control group (e.g., "John Henry effect").
Sequence Effects: Influence of prior conditions on current performance.
Demand Characteristics: Participants act according to perceived expectations.
Placebo Effect: Improvement due to expectations rather than the treatment itself.

Example: In a drug study, a placebo group helps control for the placebo effect, and a one-way ANOVA can be used to compare groups.

Chapter 9: Controls to Reduce Threats to Validity

Blinding and Deception in Experimental Design

Blinding and deception are used to minimize bias and expectancy effects in research.

Single Blind: Participants do not know their group assignment.
Double Blind: Both participants and experimenters are unaware of group assignments.
Triple Blind: Outcome assessors or data analysts are also blinded.
Deception: Providing a false rationale for the study to prevent expectancy effects.
Automation: Reducing direct contact between experimenter and participant to minimize bias.

Example: Double-blind drug trials are standard in clinical research, but harder to implement in behavioral studies.

Controlling Experimenter and Participant Effects

Experimenter Attribute Effects: Differences in experimenter behavior or personality can influence outcomes.
Control Through Selection and Assignment: Careful selection and assignment of participants helps ensure representativeness and reduces bias.

Sampling Methods

Proper sampling ensures that study results are representative of the population of interest.

Random Sampling: Every member of the population has an equal chance of selection.
Stratified Random Sampling: Population divided into subgroups (strata), and samples drawn from each.
Systematic Sampling: Selecting every nth individual from a list after a random start.
Random Assignment: Randomly assigning participants to experimental conditions.
Matched Random Assignment: Participants matched on relevant variables before random assignment.
Yoked Design: Matching participants so that one’s experience determines the other’s (often used in animal studies).

Example: Stratified sampling ensures age groups are proportionally represented in a political survey.

Chapter 10: Controls of Variance Through Research Designs

Types of Variance in Experimental Research

Variance in research refers to the variability in scores among participants. Understanding and controlling variance is crucial for valid conclusions.

Systematic (Between-Group) Variance: Variance due to the effects of the IV, confounding variables, or sampling error.
Non-Systematic (Within-Group) Variance: Variance due to random error within groups.
F-Test: Used to compare between-group and within-group variance. The F-ratio is calculated as:

If the null hypothesis is true, (no systematic effects).

Example: In an ANOVA, a high F-ratio suggests significant group differences.

Designing Studies to Control Variance

Maximize experimental variance (effect of IV).
Control extraneous variance (confounds).
Minimize error variance (random noise).

Degrees of Freedom (df):

Between-groups: Number of groups minus one.
Within-groups: Total sample size minus number of groups.

Manipulation Check: Procedure to confirm that the IV was successfully manipulated (e.g., anger induction worked for males but not females).

Controlling for Confounding Variables

Keep variable constant across groups.
Randomize variable across groups.
Statistically control for variable (e.g., ANCOVA).
Build variable into the research design.

Research Designs in Psychology

Research designs vary in their ability to control for confounds and establish causality.

Non-Experimental Approaches: Lack random assignment and/or manipulation of IV.
- Ex-Post Facto Study: Relates current participant status to past events.
- Single-Group Posttest-Only Study: Measures outcome after treatment in one group.
- Pretest-Posttest Natural Control Group Study: Compares pre-existing groups before and after treatment (no random assignment).
Experimental Designs: Feature random assignment and manipulation of IV.
- Randomized Posttest-Only Control Group Design: Participants randomly assigned; only posttest measured.
- Randomized Pretest-Posttest Control Group Design: Adds pretest measurement.
- Multilevel Completely Randomized Between-Subjects Design: More than two groups compared.
Solomon Four-Group Design: Combines pretest and posttest groups with and without pretests to control for pretest effects.

Example: Smoll et al. (1993) used a pretest-posttest design to study reinforcement in little league, but lacked random assignment.

Summary Table: Types of Validity

Type of Validity	Definition	Example
Statistical	Reasonableness of statistical conclusions	Using appropriate tests for data type
Face	Superficial appearance of measuring intended construct	Phrenology appears to measure intelligence
Content	Coverage of the domain of interest	Exam covers all course topics
Criterion	Correlation with external criterion	SAT predicts college GPA
Construct	Consistency with theoretical construct	Anxiety test fits with literature

Additional info: Validity types often overlap; a well-designed study aims to maximize all forms of validity.