Skip to main content
Back

Statistics Study Notes: Descriptive Statistics, Data Visualization, and Probability

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics and Data Types

Populations and Samples

Understanding the difference between a population and a sample is fundamental in statistics. A population includes all members of a defined group, while a sample is a subset of the population used to make inferences about the whole.

  • Population: The entire group of interest (e.g., all US teenagers aged 13-17).

  • Sample: A smaller group selected from the population (e.g., 1,000 US teenagers surveyed).

  • Statistic: A numerical measurement describing a characteristic of a sample.

  • Parameter: A numerical measurement describing a characteristic of a population.

  • Example: If 60% of US states have passed a recent nutrition bill, this is a parameter. If 60% of a sample of 50 US governors support the bill, this is a statistic.

Types of Variables

Variables can be classified as quantitative or categorical, and further as continuous or discrete.

  • Quantitative Variable: Represents numerical values (e.g., grams of carbohydrates).

  • Categorical Variable: Represents categories or groups (e.g., type of car).

  • Continuous Variable: Can take any value within a range (e.g., height).

  • Discrete Variable: Can take only specific values (e.g., number of parking spaces).

  • Example Table:

Variable

Type

Grams of carbohydrates in a doughnut

Quantitative, Continuous

Number of siblings

Quantitative, Discrete

Brand of car

Categorical

Height

Quantitative, Continuous

Width of a parking space

Quantitative, Continuous

Data Visualization

Pie Charts and Bar Graphs

Pie charts and bar graphs are common methods for visualizing categorical data.

  • Pie Chart: Shows proportions of categories as slices of a circle.

  • Bar Graph: Displays frequencies or percentages of categories as bars.

  • Pareto Chart: A bar graph with bars arranged in descending order of frequency.

  • Example: Student classification breakdown by survey (see pie chart and bar graph in material).

Frequency Distributions and Histograms

Frequency tables and histograms are used to summarize and visualize quantitative data.

  • Frequency Table: Lists intervals (bins), frequencies, relative frequencies, percentages, and cumulative distributions.

  • Histogram: A bar graph representing the frequency of data within intervals.

  • Example Table:

Interval

Frequency

Relative Frequency

Percentage Distribution

Cumulative Distribution

78 to 79

0.05

0.05

5%

5%

80 to 81

0.10

0.10

10%

15%

82 to 83

0.11

0.11

11%

26%

84 to 85

0.15

0.15

15%

41%

86 to 87

0.22

0.22

22%

63%

88 to 89

0.15

0.15

15%

78%

90 to 91

0.22

0.22

22%

100%

  • Most common intervals: 86 to 87 and 90 to 91.

  • Least common interval: 78 to 79.

Measures of Central Tendency and Spread

Mean, Median, and Mode

These are measures used to describe the center of a data set.

  • Mean: The arithmetic average of the data.

  • Median: The middle value when data are ordered.

  • Mode: The most frequently occurring value.

  • Example: For the data set [25, 29, 31], the median is 29.

Standard Deviation and Range

These measures describe the spread or variability of the data.

  • Standard Deviation: Measures the average distance of data points from the mean.

  • Range: Difference between the maximum and minimum values.

  • Formula for Standard Deviation:

  • Formula for Range:

Five Number Summary and Boxplots

The five number summary provides a quick overview of the distribution of a dataset.

  • Five Number Summary: Minimum, Q1, Median, Q3, Maximum

  • Interquartile Range (IQR):

  • Boxplot: A graphical representation of the five number summary.

  • Example Table:

Statistic

Mutual Fund A

Mutual Fund B

Mean

3.05

3.41

Standard Deviation

2.99

5.59

Median

3.10

3.38

Q1

0.6

1.3

Q3

7.7

7.2

Min

-2.3

-4.4

Max

12.9

12.9

  • Interpretation: Mutual Fund A has less variability and is more consistent than Mutual Fund B.

Symmetry and Skewness

Data distributions can be symmetric or skewed, affecting the relationship between mean and median.

  • Symmetric Distribution: Mean ≈ Median.

  • Right-Skewed Distribution: Mean > Median.

  • Left-Skewed Distribution: Mean < Median.

  • Example: SAT and IQ scores are often normally distributed (bell-shaped).

Standard Scores (Z-Scores)

Calculating Z-Scores

A z-score indicates how many standard deviations a value is from the mean.

  • Formula:

  • Example: An ACT score of 28 with a mean of 23 and standard deviation of 4:

Probability Concepts

Probability Rules and Tree Diagrams

Probability quantifies the likelihood of events. Tree diagrams help visualize compound events.

  • Probability of an event:

  • Tree Diagram: Used to map out all possible outcomes and their probabilities.

  • Example: Probability of selecting a song you like from a playlist, or the probability of making free throws in basketball.

Conditional Probability and Independence

Conditional probability is the probability of one event given another has occurred. Independence means the occurrence of one event does not affect the probability of another.

  • Conditional Probability:

  • Independent Events:

  • Example: Probability of selecting a Brownie from a box of cookies, or the probability of making two consecutive free throws.

Joint and Marginal Probabilities

Joint probability refers to the probability of two events occurring together, while marginal probability refers to the probability of a single event.

  • Example Table:

Rank

Lemonades

Thin Mints

Peanut Butter

Brownies

Total

Daisies

1014

2041

1014

1014

5083

Juniors

1014

1014

1014

1014

4056

Cadettes

1014

1014

1014

1014

4056

Total

3042

4069

3042

3042

13195

  • Probability of selecting a Brownie:

  • Probability of selecting a Thin Mint:

  • Probability of selecting both a Brownie and Thin Mint:

Application: Free Throw Probability

Tree diagrams can be used to calculate the probability of compound events, such as making or missing free throws in basketball.

  • Example: Probability of missing both free throws:

Additional Probability Concepts

Sample Space and Events

The sample space is the set of all possible outcomes. An event is a subset of the sample space.

  • Example: Tossing a coin three times: Sample space = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

  • Probability of getting three heads:

Summary Table: Key Formulas

Concept

Formula (LaTeX)

Mean

Standard Deviation

Range

Z-Score

Conditional Probability

Probability of Event

Additional info: Some explanations and tables have been expanded for clarity and completeness based on standard statistics curriculum.

Pearson Logo

Study Prep