BackChapter 1: Introduction to Statistics – Structured Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
Statistical Thinking
Statistics is the science of collecting, organizing, analyzing, and interpreting data to make informed decisions. Understanding the foundational vocabulary and concepts is essential for proper statistical reasoning.
Data: A collection of observations, measurements, or responses.
Statistics: The science of collecting, organizing, analyzing, and interpreting data, and then drawing conclusions from that data.
Population: The complete collection of all elements (individuals, items, or data) to be studied.
Sample: A subset of members selected from the population.
Census: A collection of data from every member of the population.
Example: If your company made 2,500 truck transmissions last year and you want to study their longevity:
Data wanted: Lifespan of each transmission, failure rates, usage conditions, etc.
Population: All 2,500 transmissions produced last year.
Sample: A selected group (e.g., 100 transmissions) from the 2,500.
Example: Studying graduation rates of bachelor degree-seeking students transferring from OCC to MSU:
Data: Graduation status, time to degree, GPA, etc.
Population: All such transfer students.
Sample: A subset of these students.
Main Steps in Statistics
Collection: Gathering data properly, considering the context and goal.
Analysis: Graphing, exploring, and performing statistical tests on the data.
Conclusion: Drawing valid inferences, considering statistical and practical significance.
Invalid Data Collection/Preparation
Bad Samples: Samples that do not reflect the population.
Small Sample: Too few observations to be representative.
Reported vs. Collected Results: Self-reported data may be biased.
Loaded Questions: Questions that influence responses.
Order of Questions: The position of items can affect choices.
Review of Percentages
Percentage Formula:
Fraction to Decimal: Divide numerator by denominator.
Decimal to Percentage: Move the decimal two places to the right.
Percentage to Decimal: Move the decimal two places to the left.
Finding a Percentage of a Number: Convert percent to decimal and multiply.
Example: If there are 23,000 OCC students and 54% are female:
Common Mistakes
Misleading statements (e.g., "Our deodorant lasts 30% longer" without context).
Impossible percentages (e.g., "Give 110% effort").
Correlation vs. Causation
Correlation: Two events are related or occur together.
Causation: One event directly causes another.
Examples:
Binge-drinking and lung cancer: Correlation does not imply causation.
Florida vs. Iowa drowning victims: Other factors may be involved.
Wealth and expensive cars: Owning an expensive car does not cause wealth.
Significance
Statistical Significance: The result is unlikely to have occurred by chance.
Practical Significance: The result has real-world importance.
Example: A drug that costs $100,000 per patient but only cures athlete’s foot in 10% of cases may be statistically significant but not practically significant.
Types of Data
Parameters and Statistics
A parameter is a numerical measurement describing a population. A statistic is a numerical measurement describing a sample. Parameters are more accurate but often harder to obtain.
Example: Measuring the average height of all students in a class (statistic).
Example: Calculating the GPA of all OCC students using the database (parameter).
Example: Counting students wearing MSU apparel for 8 hours (statistic).
Quantitative vs. Qualitative Data
Quantitative Data: Numerical values representing counts or measurements (e.g., weight of a car, average age).
Qualitative (Categorical) Data: Non-numerical labels or categories (e.g., color of shirt, month of first snowfall).
Discrete vs. Continuous Data
Discrete Data: Quantitative data that is countable (e.g., number of people who own a Ford, test grades).
Continuous Data: Quantitative data that can take any value within a range (e.g., temperature, length of hair, value of an investment).
Note: Time is always considered continuous. Money is usually treated as continuous for practical purposes.
Levels of Measurement
Nominal: Data is categorized without order (e.g., ethnicity, gender).
Ordinal: Data has order, but differences are not meaningful (e.g., movie ratings, year in history).
Interval: Data has order and meaningful differences, but no natural zero (e.g., temperature in Fahrenheit).
Ratio: Data has order, meaningful differences, and a natural zero (e.g., height, age, time to complete an exam).
Example Classification:
Height of students: Ratio
Temperature in Fahrenheit: Interval
Ethnicity: Nominal
Year in history: Ordinal
Time to complete SAT: Ratio
Missing Data
Missing data can invalidate results, especially if non-participants differ systematically from participants. For example, people who refuse to participate in surveys may have different opinions than those who do.
Collecting Sample Data
Observational Studies vs. Experiments
Observational Study: Observing and measuring without influencing subjects.
Experiment: Applying a treatment and observing the effects.
Examples:
Giving a drug to rats and observing effects (experiment).
Counting how often a mother eagle leaves the nest (observational study).
Types of Studies
Cross-sectional Study: Data collected at one point in time.
Retrospective Study: Data collected from past records (case-control).
Prospective Study: Following groups into the future (longitudinal).
Examples:
Monitoring cancer survivors over time (prospective).
Studying effects of asbestos exposure in the 1960s (retrospective).
Taking blood samples from returning soldiers (cross-sectional).
Confounding
Confounding occurs when the effects of different factors cannot be distinguished. For example, if a restaurant raises prices and road construction begins at the same time, it is unclear which factor affects sales.
Reducing Confounding
Blinding: Subjects do not know if they are receiving treatment.
Double-Blinding: Both subjects and data collectors are unaware of treatment assignments.
Randomization: Assigning subjects to groups randomly.
Blocking: Grouping subjects with similar characteristics.
Matched Pairs Design: Pairing similar subjects, one receiving treatment and one not.
Sampling Methods
Random and Simple Random Sampling
Random Sample: Each member of the population has an equal chance of selection.
Simple Random Sample (SRS): Each possible sample of size n has an equal chance of being chosen.
Example: Drawing names from a hat is a simple random sample.
Other Sampling Techniques
Systematic Sampling: Selecting every kth member from a list.
Stratified Sampling: Dividing the population into groups (strata) and sampling from each group.
Cluster Sampling: Dividing the population into groups (clusters), then randomly selecting entire groups.
Convenience Sampling: Using data that is easiest to obtain.
Examples:
Stratified: Split by age group, select one from each.
Cluster: Select all from a few age groups.
Systematic: Pick every 4th person from a list.
Convenience: Pick the first 5 people who arrive.
Summary Table: Sampling Methods
Sampling Method | Description | Example |
|---|---|---|
Simple Random | Every possible sample of size n has equal chance | Drawing names from a hat |
Systematic | Select every kth member | Pick every 4th person on a list |
Stratified | Divide into strata, sample from each | Pick one from each age group |
Cluster | Divide into clusters, select entire clusters | Pick all from selected classrooms |
Convenience | Use easiest data to obtain | First 5 people who walk in |
Additional info: Some explanations and examples were expanded for clarity and completeness. All key terms and methods from the original notes are included and elaborated for academic study.