Skip to main content
Back

Comprehensive Study Notes for Introductory Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Definition and Scope

Statistics is the science of collecting, organizing, presenting, analyzing, predicting, and interpreting data to make informed decisions. It encompasses various methods for handling data and is foundational to many fields.

  • Sources of Data: Data can be collected by the researcher (primary data) or obtained from existing sources (secondary data).

  • Methods of Collecting Data: Includes observation, surveys, experiments, and focus groups.

  • Types of Data: Qualitative (attributes, labels, classifications) and Quantitative (numerical measurements or counts).

  • Scope of Data: Population (entire group of interest) vs. Sample (subset of the population).

Sample vs. Population diagram

Descriptive Statistics

Frequency Distributions

A frequency distribution organizes data into intervals (classes) and records the number of data points in each interval (frequency).

  • Class: Interval defined by lower and upper limits.

  • Class Width: Difference between consecutive class limits.

  • Range: Difference between maximum and minimum values.

  • Sample Size: Total number of data values.

Frequency distribution table

Graphical Representations

  • Frequency Histogram: Bar graph where bars touch, representing frequencies of quantitative classes.

  • Frequency Polygon: Line graph connecting midpoints of classes.

  • Relative Frequency Histogram: Vertical axis shows relative frequencies.

  • Cumulative Frequency Graph (Ogive): Line graph showing cumulative frequencies.

Frequency polygon for sneaker pricesCumulative frequency graph for sneaker prices

Other Graphs

  • Pareto Chart: Bar graph with bars in decreasing order, used for categorical data.

  • Pie Chart: Circle divided into sectors proportional to category frequencies.

Pareto chart for inventory shrinkage

Measures of Central Tendency

Mean, Median, Mode

  • Mean: Arithmetic average, sensitive to outliers.

  • Median: Middle value when data is ordered, less affected by outliers.

  • Mode: Most frequently occurring value(s).

Outliers and Distribution Shape

  • Outlier: Data value far removed from others.

  • Distribution Shape: Symmetric, Uniform, Skewed-Left, Skewed-Right.

Measures of Variation

Range, Variance, Standard Deviation

  • Range:

  • Variance: for population, for sample

  • Standard Deviation:

  • Coefficient of Variation:

Histogram for Corporation AHistogram for Corporation B

Empirical Rule

For bell-shaped (normal) distributions:

  • About 68% of data within 1 standard deviation

  • About 95% within 2 standard deviations

  • About 99.7% within 3 standard deviations

Empirical Rule diagram

Measures of Position

Quartiles, Percentiles, Z-Scores

  • Quartiles: Divide data into four equal parts.

  • Interquartile Range (IQR):

  • Percentiles: Divide data into 100 equal parts.

  • Z-Score:

Correlation and Regression

Correlation

  • Correlation Coefficient (r): Measures strength and direction of linear relationship.

  • Scatter Plot: Visualizes types of correlation: positive, negative, none, nonlinear.

Scatter plots for types of correlation

Linear Regression

  • Regression Line: Line of best fit for predicting y from x.

  • Regression Equation:

Probability

Basic Concepts

  • Experiment: Action yielding outcomes.

  • Sample Space: Set of all possible outcomes.

  • Event: Subset of sample space.

  • Fundamental Counting Principle: If one event can occur in m ways and another in n ways, total ways = .

Probability Rules

  • Classical Probability:

  • Empirical Probability: Based on observed data.

  • Subjective Probability: Based on intuition or estimates.

  • Complementary Events:

  • Odds:

Probability Distributions

Discrete Probability Distributions

  • Discrete Random Variable: Takes countable values.

  • Probability Distribution: Lists each value and its probability.

  • Mean:

  • Variance:

  • Standard Deviation:

Binomial Distribution

  • Binomial Experiment: Fixed number of trials, two outcomes (success/failure), constant probability.

  • Binomial Probability Formula:

Binomial tree diagram and probability table

Normal Distribution

  • Normal Curve: Symmetrical, bell-shaped, mean = median = mode.

  • Standard Normal Distribution: Mean 0, standard deviation 1.

  • Probability Density Function:

Sampling and Central Limit Theorem

Sampling Distributions

  • Sampling Distribution: Distribution of a sample statistic over repeated samples.

  • Central Limit Theorem: For large n, sampling distribution of sample mean is approximately normal.

  • Standard Error:

Central Limit Theorem diagramsSampling distributions for different population shapes

Confidence Intervals

Point and Interval Estimates

  • Point Estimate: Single value estimate of a parameter.

  • Interval Estimate: Range of values likely to contain the parameter.

  • Margin of Error: (if known), (if $\sigma$ unknown)

Critical values table for confidence intervalsDecision tree for normal vs t-distribution

Hypothesis Testing

Steps in Hypothesis Testing

  • State the null () and alternative () hypotheses.

  • Specify the significance level ().

  • Determine the appropriate test statistic (z or t).

  • Calculate the test statistic and p-value.

  • Make a decision: reject or fail to reject .

  • Interpret the result in context.

Types of Errors

  • Type I Error: Rejecting when it is true.

  • Type II Error: Failing to reject when it is false.

Hypothesis Testing with Two Samples

Independent vs. Dependent Samples

  • Independent Samples: Samples from different populations.

  • Dependent Samples: Paired or matched samples.

Testing Differences

  • Null hypothesis: No difference in means.

  • Test statistic for difference:

Chi-Square Tests and F-Distribution

Chi-Square Test

  • Used for categorical data to test independence or goodness-of-fit.

F-Distribution

  • Used to compare variances between two populations.

Tables and Data

Sample Tables

Tables are used to summarize and compare data, such as frequency distributions, grade point averages, and critical values.

Student

Grade Point Average

Ricky

3.7

Lucy

3.0

Fred

2.8

Ethel

3.2

Grade Point Average table

Class

Frequency, f

1-5

5

6-10

8

11-15

6

16-20

8

21-25

5

26-30

4

Frequency distribution table *Additional info: Academic context and explanations have been expanded for clarity and completeness. Only images directly relevant to the adjacent content have been included.*

Pearson Logo

Study Prep