Skip to main content
Back

STAT 243Z Exam 1 Review: Foundations of Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Ch. 1: Introduction to Statistics

Key Definitions

Statistics is the science of collecting, analyzing, interpreting, and presenting data. Understanding basic terminology is essential for interpreting statistical results.

  • Population: The entire group of individuals or items of interest.

  • Parameter: A numerical summary describing a characteristic of a population.

  • Sample: A subset of the population selected for study.

  • Statistic: A numerical summary describing a characteristic of a sample.

  • Census: A study that collects data from every member of the population.

Sampling Methods and Bias

  • Random Sample: Each member of the population has an equal chance of being selected.

  • Simple Random Sample: Every possible sample of a given size has an equal chance of being chosen.

  • Bias: Systematic error that skews results; eliminated by randomization and careful sampling.

  • Sampling Error: Difference between sample statistic and population parameter due to random chance.

  • Nonsampling Error: Errors not related to sampling, such as measurement or data entry mistakes.

  • Voluntary Response Sample: Participants self-select; often leads to bias.

  • Convenience Sampling: Selecting individuals easiest to reach; may not represent the population.

Types of Data

  • Categorical Data: Data that can be grouped by categories (e.g., gender, color).

  • Quantitative Data: Numerical data; can be discrete (countable) or continuous (measurable).

  • Discrete Data: Countable values (e.g., number of students).

  • Continuous Data: Measurable values (e.g., height, weight).

Study Designs

  • Experiment: Researcher manipulates variables to observe effects.

  • Observational Study: Researcher observes without intervention.

  • Practical vs. Statistical Significance: Statistical significance means results are unlikely due to chance; practical significance means results are meaningful in real-world context.

Ch. 2: Exploring Data with Tables and Graphs

Frequency Tables and Histograms

Frequency tables and histograms are foundational tools for summarizing and visualizing quantitative data.

  • Class Width: The difference between consecutive lower class limits.

  • Class Limits: The smallest and largest values in each class.

  • Relative Frequency: The proportion of data values in each class.

  • Histogram: A bar graph representing frequency distribution; x-axis should be labeled with class boundaries, not both upper and lower limits.

Other Graphs

  • Stem & Leaf Plots: Show individual data values; similar to histograms but preserve original data.

  • Dotplots: Display each data point as a dot; useful for small datasets.

  • Pie Charts: Represent categorical data; show proportions. Advantages: Easy to interpret proportions. Disadvantages: Not suitable for quantitative data.

  • Bar Charts: Used for nominal or ordinal data; bars represent frequency or proportion.

  • Time Series Charts: Line graphs showing data trends over time.

Ch. 3: Describing, Exploring, and Comparing Data

Measures of Center

Measures of center describe the typical value in a dataset.

  • Mean: Arithmetic average; sensitive to outliers.

  • Median: Middle value; resistant to outliers.

  • Mode: Most frequent value.

  • Midrange: Average of the maximum and minimum values.

  • Weighted Mean: Used when values have different weights (e.g., GPA calculation).

  • 10% Trimmed Mean: Mean calculated after removing the lowest and highest 10% of values.

Skewed Distribution: In a skewed distribution, mean, median, mode, and midrange differ. Median is most resistant to outliers.

Measures of Variation

Variation measures describe the spread of data values.

  • Range: Difference between maximum and minimum values.

  • Variance: Average squared deviation from the mean. For a sample:

  • Standard Deviation: Square root of variance; measures average distance from the mean.

  • Symbols: (population standard deviation), (population mean), (sample standard deviation), (sample mean)

  • Why Standard Deviation? Standard deviation is in the same units as the data, making it easier to interpret than variance.

  • Resistant Measures: Range and variance are not resistant to outliers; interquartile range (IQR) is more robust.

Range Rule of Thumb for Estimating Standard Deviation

The Range Rule of Thumb helps identify significant values in a dataset:

  • Significantly low values: or lower

  • Significantly high values: or higher

  • Values not significant: Between and

Range Rule of Thumb for Identifying Significant Values

Empirical Rule

For bell-shaped (normal) distributions:

  • About 68% of values fall within 1 standard deviation of the mean.

  • About 95% within 2 standard deviations.

  • About 99.7% within 3 standard deviations.

Chebychev's Theorem

Applies to any data distribution, not just normal:

  • At least of values lie within standard deviations of the mean (for ).

Comparison: The Empirical Rule is specific to normal distributions, while Chebychev's Theorem applies to all distributions.

Measures of Relative Standing (Position)

These measures describe the position of a value within a dataset.

  • Standard Scores (z-scores): ; measures how many standard deviations a value is from the mean.

  • Percentile: The percentage of values below a given value . Formula:

  • Finding x given Percentile Rank:

  • Quartiles: Divide data into four equal parts.

  • Five-Number Summary: Minimum, Q1, Median, Q3, Maximum.

  • Boxplots: Visual representation of the five-number summary; modified boxplots highlight outliers.

  • Outliers: Values more than 1.5 IQRs away from Q1 or Q3.

Example: If Q1 = 10, Q3 = 20, IQR = 10, then any value below or above is an outlier.

Pearson Logo

Study Prep