Skip to main content
Back

Chapter 1: Introduction to Statistics – Structured Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistical Thinking

Statistics is the science of collecting, organizing, analyzing, and interpreting data to make informed decisions. Understanding the foundational vocabulary and concepts is essential for proper statistical reasoning.

  • Data: A collection of observations, measurements, or responses.

  • Statistics: The science of collecting, organizing, analyzing, and interpreting data, and then drawing conclusions from that data.

  • Population: The complete collection of all elements (individuals, items, or data) to be studied.

  • Sample: A subset of members selected from the population.

  • Census: A collection of data from every member of the population.

Example: If your company made 2,500 truck transmissions last year and you want to study their longevity:

  • Data wanted: Lifespan of each transmission, failure rates, usage conditions, etc.

  • Population: All 2,500 transmissions produced last year.

  • Sample: A selected group (e.g., 100 transmissions) from the 2,500.

Example: Studying graduation rates of bachelor degree-seeking students transferring from OCC to MSU:

  • Data: Graduation status, time to degree, GPA, etc.

  • Population: All such transfer students.

  • Sample: A subset of these students.

Main Steps in Statistics

  1. Collection: Gathering data properly, considering the context and goal.

  2. Analysis: Graphing, exploring, and performing statistical tests on the data.

  3. Conclusion: Drawing valid inferences, considering statistical and practical significance.

Invalid Data Collection/Preparation

  • Bad Samples: Samples that do not reflect the population.

  • Small Sample: Too few observations to be representative.

  • Reported vs. Collected Results: Self-reported data may be biased.

  • Loaded Questions: Questions that influence responses.

  • Order of Questions: The position of items can affect choices.

Review of Percentages

  • Percentage Formula:

  • Fraction to Decimal: Divide numerator by denominator.

  • Decimal to Percentage: Move the decimal two places to the right.

  • Percentage to Decimal: Move the decimal two places to the left.

  • Finding a Percentage of a Number: Convert percent to decimal and multiply.

Example: If there are 23,000 OCC students and 54% are female:

Common Mistakes

  • Misleading statements (e.g., "Our deodorant lasts 30% longer" without context).

  • Impossible percentages (e.g., "Give 110% effort").

Correlation vs. Causation

  • Correlation: Two events are related or occur together.

  • Causation: One event directly causes another.

Examples:

  • Binge-drinking and lung cancer: Correlation does not imply causation.

  • Florida vs. Iowa drowning victims: Other factors may be involved.

  • Wealth and expensive cars: Owning an expensive car does not cause wealth.

Significance

  • Statistical Significance: The result is unlikely to have occurred by chance.

  • Practical Significance: The result has real-world importance.

Example: A drug that costs $100,000 per patient but only cures athlete’s foot in 10% of cases may be statistically significant but not practically significant.

Types of Data

Parameters and Statistics

A parameter is a numerical measurement describing a population. A statistic is a numerical measurement describing a sample. Parameters are more accurate but often harder to obtain.

  • Example: Measuring the average height of all students in a class (statistic).

  • Example: Calculating the GPA of all OCC students using the database (parameter).

  • Example: Counting students wearing MSU apparel for 8 hours (statistic).

Quantitative vs. Qualitative Data

  • Quantitative Data: Numerical values representing counts or measurements (e.g., weight of a car, average age).

  • Qualitative (Categorical) Data: Non-numerical labels or categories (e.g., color of shirt, month of first snowfall).

Discrete vs. Continuous Data

  • Discrete Data: Quantitative data that is countable (e.g., number of people who own a Ford, test grades).

  • Continuous Data: Quantitative data that can take any value within a range (e.g., temperature, length of hair, value of an investment).

Note: Time is always considered continuous. Money is usually treated as continuous for practical purposes.

Levels of Measurement

  1. Nominal: Data is categorized without order (e.g., ethnicity, gender).

  2. Ordinal: Data has order, but differences are not meaningful (e.g., movie ratings, year in history).

  3. Interval: Data has order and meaningful differences, but no natural zero (e.g., temperature in Fahrenheit).

  4. Ratio: Data has order, meaningful differences, and a natural zero (e.g., height, age, time to complete an exam).

Example Classification:

  • Height of students: Ratio

  • Temperature in Fahrenheit: Interval

  • Ethnicity: Nominal

  • Year in history: Ordinal

  • Time to complete SAT: Ratio

Missing Data

Missing data can invalidate results, especially if non-participants differ systematically from participants. For example, people who refuse to participate in surveys may have different opinions than those who do.

Collecting Sample Data

Observational Studies vs. Experiments

  • Observational Study: Observing and measuring without influencing subjects.

  • Experiment: Applying a treatment and observing the effects.

Examples:

  • Giving a drug to rats and observing effects (experiment).

  • Counting how often a mother eagle leaves the nest (observational study).

Types of Studies

  • Cross-sectional Study: Data collected at one point in time.

  • Retrospective Study: Data collected from past records (case-control).

  • Prospective Study: Following groups into the future (longitudinal).

Examples:

  • Monitoring cancer survivors over time (prospective).

  • Studying effects of asbestos exposure in the 1960s (retrospective).

  • Taking blood samples from returning soldiers (cross-sectional).

Confounding

Confounding occurs when the effects of different factors cannot be distinguished. For example, if a restaurant raises prices and road construction begins at the same time, it is unclear which factor affects sales.

Reducing Confounding

  • Blinding: Subjects do not know if they are receiving treatment.

  • Double-Blinding: Both subjects and data collectors are unaware of treatment assignments.

  • Randomization: Assigning subjects to groups randomly.

  • Blocking: Grouping subjects with similar characteristics.

  • Matched Pairs Design: Pairing similar subjects, one receiving treatment and one not.

Sampling Methods

Random and Simple Random Sampling

  • Random Sample: Each member of the population has an equal chance of selection.

  • Simple Random Sample (SRS): Each possible sample of size n has an equal chance of being chosen.

Example: Drawing names from a hat is a simple random sample.

Other Sampling Techniques

  • Systematic Sampling: Selecting every kth member from a list.

  • Stratified Sampling: Dividing the population into groups (strata) and sampling from each group.

  • Cluster Sampling: Dividing the population into groups (clusters), then randomly selecting entire groups.

  • Convenience Sampling: Using data that is easiest to obtain.

Examples:

  • Stratified: Split by age group, select one from each.

  • Cluster: Select all from a few age groups.

  • Systematic: Pick every 4th person from a list.

  • Convenience: Pick the first 5 people who arrive.

Summary Table: Sampling Methods

Sampling Method

Description

Example

Simple Random

Every possible sample of size n has equal chance

Drawing names from a hat

Systematic

Select every kth member

Pick every 4th person on a list

Stratified

Divide into strata, sample from each

Pick one from each age group

Cluster

Divide into clusters, select entire clusters

Pick all from selected classrooms

Convenience

Use easiest data to obtain

First 5 people who walk in

Additional info: Some explanations and examples were expanded for clarity and completeness. All key terms and methods from the original notes are included and elaborated for academic study.

Pearson Logo

Study Prep