Foundations of Statistics: The Research Process, Data, and Statistical Thinking

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Why Learn Statistics? The Role of Statistics in Research

Introduction to Statistics and Scientific Curiosity

Statistics is a fundamental tool for answering scientific questions and making sense of data. The process of scientific inquiry begins with curiosity and the desire to explain observations about the world. To move from questions to answers, researchers use statistics to collect, analyze, and interpret data, allowing them to test theories and hypotheses.

Statistics helps transform observations into evidence-based conclusions.
Both quantitative (numbers) and qualitative (language, categories) data can be used in research, but this course focuses on quantitative methods.

The Research Process

Stages of the Research Process

The research process is a systematic approach to investigating questions and generating knowledge. It typically involves the following stages:

Initial Observation: Identifying a phenomenon or question that needs explanation.
Generating Theories and Hypotheses: Consulting existing theories and formulating testable explanations (hypotheses).
Collecting Data: Measuring variables and designing research to gather relevant data.
Analyzing Data: Using statistical methods to test hypotheses and evaluate theories.
Reporting Data: Sharing findings through publications, presentations, and open science practices.

Diagram of the research process

From Observation to Hypothesis

Observations, Theories, and Hypotheses

Scientific inquiry often starts with an observation that prompts a question. The next step is to consult relevant theories and generate hypotheses:

Theory: A well-substantiated explanation of a broad phenomenon, supported by repeated testing.
Hypothesis: A specific, testable prediction derived from a theory, explaining a narrower set of observations.
Prediction: An observable statement derived from a hypothesis, operationalized for empirical testing.

It is important to distinguish between hypotheses (conceptual explanations) and predictions (observable, testable statements).

Cartoon of a cat rising from a puddle, representing correcting misconceptions about hypotheses and predictions Cartoon of a dog thinking about hypotheses and predictions

Collecting Data: Measurement and Variables

Types of Variables

Variables are characteristics that can vary between individuals, groups, or over time. In research, variables are classified as:

Independent Variable (IV): The presumed cause or predictor, manipulated or categorized by the researcher.
Dependent Variable (DV): The presumed effect or outcome, measured to assess the impact of the IV.
Predictor Variable: Another term for IV, especially in non-experimental (correlational) research.
Outcome Variable: Another term for DV, especially in non-experimental research.

Levels of Measurement

Variables can be measured at different levels, which determine the types of statistical analyses that are appropriate:

Binary Variable: Only two categories (e.g., alive/dead).
Nominal Variable: More than two categories without order (e.g., types of fruit).
Ordinal Variable: Categories with a logical order, but not equal intervals (e.g., exam grades: fail, pass, merit, distinction).
Interval Variable: Numeric scale with equal intervals but no true zero (e.g., temperature in Celsius).
Ratio Variable: Numeric scale with equal intervals and a true zero (e.g., height, weight, reaction time).

Illustration of levels of measurement Illustration of self-report data as ordinal

Continuous vs. Discrete Variables

Continuous Variable: Can take any value within a range (e.g., age, height).
Discrete Variable: Can take only specific values, usually whole numbers (e.g., number of children).

Illustration of continuous and discrete variables

Measurement Error, Validity, and Reliability

Measurement Error: The difference between the measured value and the true value.
Validity: Whether an instrument measures what it is intended to measure.
Reliability: Whether an instrument yields consistent results under consistent conditions.

Self-test: reliability vs. validity

Research Design: Correlational and Experimental Methods

Correlational Research

Correlational research observes natural relationships between variables without manipulation. It is useful for studying variables that cannot be ethically or practically manipulated, but it cannot establish causality due to potential confounding variables.

Illustration of correlational research

Experimental Research

Experimental research involves manipulating an independent variable to observe its effect on a dependent variable, allowing for causal inference. Key features include:

Randomization: Randomly assigning participants to conditions to minimize confounding variables.
Between-Groups Design: Different participants in each condition.
Within-Subjects Design: Same participants in all conditions.
Systematic Variation: Variation due to the experimental manipulation.
Unsystematic Variation: Random variation due to other factors.

Analyzing and Describing Data

Describing Data with Tables and Graphs

Once data are collected, they are summarized using tables and graphs. Frequency distributions (histograms) show how often each value occurs and help assess the shape of the data distribution.

Normal Distribution: Symmetrical, bell-shaped curve with most scores around the center.
Skewness: Asymmetry in the distribution (positive or negative skew).
Kurtosis: The "tailedness" of the distribution (leptokurtic = heavy tails, platykurtic = light tails).

Measures of Central Tendency

Mean: The arithmetic average; sensitive to extreme values.
Median: The middle value when data are ordered; less affected by outliers.
Mode: The most frequently occurring value; can be bimodal or multimodal.

Measures of Dispersion

Range: Difference between the highest and lowest values.
Interquartile Range (IQR): Range of the middle 50% of scores.
Variance: Average squared deviation from the mean.
Standard Deviation (SD): Square root of the variance; indicates average distance from the mean.

Probability and the Normal Distribution

Probability is used to estimate the likelihood of observing certain values. The area under the normal distribution curve corresponds to probabilities. z-scores standardize values to compare across different distributions.

Reporting and Interpreting Data

Scientific Communication and Open Science

Research findings are disseminated through journals and conferences.
Reporting should be clear, transparent, and follow discipline-specific guidelines (e.g., APA style).
Open science practices, such as preregistration and data sharing, enhance transparency and reproducibility.

Key Terms and Concepts

Between-groups design, Bimodal, Binary variable, Categorical variable, Central tendency, Confounding variable, Continuous variable, Correlational research, Counterbalancing, Dependent variable, Deviance, Discrete variable, Ecological validity, Experimental research, Falsification, Frequency distribution, Histogram, Hypothesis, Independent variable, Interquartile range, Interval variable, Journal, Kurtosis, Leptokurtic, Level of measurement, Longitudinal research, Mean, Measurement error, Median, Mode, Multimodal, Negative skew, Nominal variable, Normal distribution, Ordinal variable, Outcome variable, Platykurtic, Positive skew, Practice effect, Predictor variable, Probability density function (PDF), Probability distribution, Quantitative methods, Quartile, Randomization, Range, Ratio variable, Reliability, Repeated-measures design, Skew, Standard deviation, Systematic variation, Test–retest reliability, Theory, Unsystematic variance, Validity, Variance, Within-subject design, z-scores