BackFoundations of Statistics: The Research Process, Data, and Statistical Thinking
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Why Learn Statistics? The Role of Statistics in Research
Introduction to Statistics and Scientific Curiosity
Statistics is a fundamental tool for answering scientific questions and making sense of data. The process of scientific inquiry begins with curiosity and the desire to explain observations about the world. To move from questions to answers, researchers use statistics to collect, analyze, and interpret data, allowing them to test theories and hypotheses.
Statistics helps transform observations into evidence-based conclusions.
Both quantitative (numbers) and qualitative (language, categories) data can be used in research, but this course focuses on quantitative methods.
The Research Process
Stages of the Research Process
The research process is a systematic approach to investigating questions and generating knowledge. It typically involves the following stages:
Initial Observation: Identifying a phenomenon or question that needs explanation.
Generating Theories and Hypotheses: Consulting existing theories and formulating testable explanations (hypotheses).
Collecting Data: Measuring variables and designing research to gather relevant data.
Analyzing Data: Using statistical methods to test hypotheses and evaluate theories.
Reporting Data: Sharing findings through publications, presentations, and open science practices.

From Observation to Hypothesis
Observations, Theories, and Hypotheses
Scientific inquiry often starts with an observation that prompts a question. The next step is to consult relevant theories and generate hypotheses:
Theory: A well-substantiated explanation of a broad phenomenon, supported by repeated testing.
Hypothesis: A specific, testable prediction derived from a theory, explaining a narrower set of observations.
Prediction: An observable statement derived from a hypothesis, operationalized for empirical testing.
It is important to distinguish between hypotheses (conceptual explanations) and predictions (observable, testable statements).


Collecting Data: Measurement and Variables
Types of Variables
Variables are characteristics that can vary between individuals, groups, or over time. In research, variables are classified as:
Independent Variable (IV): The presumed cause or predictor, manipulated or categorized by the researcher.
Dependent Variable (DV): The presumed effect or outcome, measured to assess the impact of the IV.
Predictor Variable: Another term for IV, especially in non-experimental (correlational) research.
Outcome Variable: Another term for DV, especially in non-experimental research.
Levels of Measurement
Variables can be measured at different levels, which determine the types of statistical analyses that are appropriate:
Binary Variable: Only two categories (e.g., alive/dead).
Nominal Variable: More than two categories without order (e.g., types of fruit).
Ordinal Variable: Categories with a logical order, but not equal intervals (e.g., exam grades: fail, pass, merit, distinction).
Interval Variable: Numeric scale with equal intervals but no true zero (e.g., temperature in Celsius).
Ratio Variable: Numeric scale with equal intervals and a true zero (e.g., height, weight, reaction time).


Continuous vs. Discrete Variables
Continuous Variable: Can take any value within a range (e.g., age, height).
Discrete Variable: Can take only specific values, usually whole numbers (e.g., number of children).


Measurement Error, Validity, and Reliability
Measurement Error: The difference between the measured value and the true value.
Validity: Whether an instrument measures what it is intended to measure.
Reliability: Whether an instrument yields consistent results under consistent conditions.

Research Design: Correlational and Experimental Methods
Correlational Research
Correlational research observes natural relationships between variables without manipulation. It is useful for studying variables that cannot be ethically or practically manipulated, but it cannot establish causality due to potential confounding variables.

Experimental Research
Experimental research involves manipulating an independent variable to observe its effect on a dependent variable, allowing for causal inference. Key features include:
Randomization: Randomly assigning participants to conditions to minimize confounding variables.
Between-Groups Design: Different participants in each condition.
Within-Subjects Design: Same participants in all conditions.
Systematic Variation: Variation due to the experimental manipulation.
Unsystematic Variation: Random variation due to other factors.
Analyzing and Describing Data
Describing Data with Tables and Graphs
Once data are collected, they are summarized using tables and graphs. Frequency distributions (histograms) show how often each value occurs and help assess the shape of the data distribution.
Normal Distribution: Symmetrical, bell-shaped curve with most scores around the center.
Skewness: Asymmetry in the distribution (positive or negative skew).
Kurtosis: The "tailedness" of the distribution (leptokurtic = heavy tails, platykurtic = light tails).
Measures of Central Tendency
Mean: The arithmetic average; sensitive to extreme values.
Median: The middle value when data are ordered; less affected by outliers.
Mode: The most frequently occurring value; can be bimodal or multimodal.
Measures of Dispersion
Range: Difference between the highest and lowest values.
Interquartile Range (IQR): Range of the middle 50% of scores.
Variance: Average squared deviation from the mean.
Standard Deviation (SD): Square root of the variance; indicates average distance from the mean.
Probability and the Normal Distribution
Probability is used to estimate the likelihood of observing certain values. The area under the normal distribution curve corresponds to probabilities. z-scores standardize values to compare across different distributions.
Reporting and Interpreting Data
Scientific Communication and Open Science
Research findings are disseminated through journals and conferences.
Reporting should be clear, transparent, and follow discipline-specific guidelines (e.g., APA style).
Open science practices, such as preregistration and data sharing, enhance transparency and reproducibility.
Key Terms and Concepts
Between-groups design, Bimodal, Binary variable, Categorical variable, Central tendency, Confounding variable, Continuous variable, Correlational research, Counterbalancing, Dependent variable, Deviance, Discrete variable, Ecological validity, Experimental research, Falsification, Frequency distribution, Histogram, Hypothesis, Independent variable, Interquartile range, Interval variable, Journal, Kurtosis, Leptokurtic, Level of measurement, Longitudinal research, Mean, Measurement error, Median, Mode, Multimodal, Negative skew, Nominal variable, Normal distribution, Ordinal variable, Outcome variable, Platykurtic, Positive skew, Practice effect, Predictor variable, Probability density function (PDF), Probability distribution, Quantitative methods, Quartile, Randomization, Range, Ratio variable, Reliability, Repeated-measures design, Skew, Standard deviation, Systematic variation, Test–retest reliability, Theory, Unsystematic variance, Validity, Variance, Within-subject design, z-scores