BackStatistics Study Notes: Descriptive Statistics, Data Visualization, and Probability
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Descriptive Statistics and Data Types
Populations and Samples
Understanding the difference between a population and a sample is fundamental in statistics. A population includes all members of a defined group, while a sample is a subset of the population used to make inferences about the whole.
Population: The entire group of interest (e.g., all US teenagers aged 13-17).
Sample: A smaller group selected from the population (e.g., 1,000 US teenagers surveyed).
Statistic: A numerical measurement describing a characteristic of a sample.
Parameter: A numerical measurement describing a characteristic of a population.
Example: If 60% of US states have passed a recent nutrition bill, this is a parameter. If 60% of a sample of 50 US governors support the bill, this is a statistic.
Types of Variables
Variables can be classified as quantitative or categorical, and further as continuous or discrete.
Quantitative Variable: Represents numerical values (e.g., grams of carbohydrates).
Categorical Variable: Represents categories or groups (e.g., type of car).
Continuous Variable: Can take any value within a range (e.g., height).
Discrete Variable: Can take only specific values (e.g., number of parking spaces).
Example Table:
Variable | Type |
|---|---|
Grams of carbohydrates in a doughnut | Quantitative, Continuous |
Number of siblings | Quantitative, Discrete |
Brand of car | Categorical |
Height | Quantitative, Continuous |
Width of a parking space | Quantitative, Continuous |
Data Visualization
Pie Charts and Bar Graphs
Pie charts and bar graphs are common methods for visualizing categorical data.
Pie Chart: Shows proportions of categories as slices of a circle.
Bar Graph: Displays frequencies or percentages of categories as bars.
Pareto Chart: A bar graph with bars arranged in descending order of frequency.
Example: Student classification breakdown by survey (see pie chart and bar graph in material).
Frequency Distributions and Histograms
Frequency tables and histograms are used to summarize and visualize quantitative data.
Frequency Table: Lists intervals (bins), frequencies, relative frequencies, percentages, and cumulative distributions.
Histogram: A bar graph representing the frequency of data within intervals.
Example Table:
Interval | Frequency | Relative Frequency | Percentage Distribution | Cumulative Distribution |
|---|---|---|---|---|
78 to 79 | 0.05 | 0.05 | 5% | 5% |
80 to 81 | 0.10 | 0.10 | 10% | 15% |
82 to 83 | 0.11 | 0.11 | 11% | 26% |
84 to 85 | 0.15 | 0.15 | 15% | 41% |
86 to 87 | 0.22 | 0.22 | 22% | 63% |
88 to 89 | 0.15 | 0.15 | 15% | 78% |
90 to 91 | 0.22 | 0.22 | 22% | 100% |
Most common intervals: 86 to 87 and 90 to 91.
Least common interval: 78 to 79.
Measures of Central Tendency and Spread
Mean, Median, and Mode
These are measures used to describe the center of a data set.
Mean: The arithmetic average of the data.
Median: The middle value when data are ordered.
Mode: The most frequently occurring value.
Example: For the data set [25, 29, 31], the median is 29.
Standard Deviation and Range
These measures describe the spread or variability of the data.
Standard Deviation: Measures the average distance of data points from the mean.
Range: Difference between the maximum and minimum values.
Formula for Standard Deviation:
Formula for Range:
Five Number Summary and Boxplots
The five number summary provides a quick overview of the distribution of a dataset.
Five Number Summary: Minimum, Q1, Median, Q3, Maximum
Interquartile Range (IQR):
Boxplot: A graphical representation of the five number summary.
Example Table:
Statistic | Mutual Fund A | Mutual Fund B |
|---|---|---|
Mean | 3.05 | 3.41 |
Standard Deviation | 2.99 | 5.59 |
Median | 3.10 | 3.38 |
Q1 | 0.6 | 1.3 |
Q3 | 7.7 | 7.2 |
Min | -2.3 | -4.4 |
Max | 12.9 | 12.9 |
Interpretation: Mutual Fund A has less variability and is more consistent than Mutual Fund B.
Symmetry and Skewness
Data distributions can be symmetric or skewed, affecting the relationship between mean and median.
Symmetric Distribution: Mean ≈ Median.
Right-Skewed Distribution: Mean > Median.
Left-Skewed Distribution: Mean < Median.
Example: SAT and IQ scores are often normally distributed (bell-shaped).
Standard Scores (Z-Scores)
Calculating Z-Scores
A z-score indicates how many standard deviations a value is from the mean.
Formula:
Example: An ACT score of 28 with a mean of 23 and standard deviation of 4:
Probability Concepts
Probability Rules and Tree Diagrams
Probability quantifies the likelihood of events. Tree diagrams help visualize compound events.
Probability of an event:
Tree Diagram: Used to map out all possible outcomes and their probabilities.
Example: Probability of selecting a song you like from a playlist, or the probability of making free throws in basketball.
Conditional Probability and Independence
Conditional probability is the probability of one event given another has occurred. Independence means the occurrence of one event does not affect the probability of another.
Conditional Probability:
Independent Events:
Example: Probability of selecting a Brownie from a box of cookies, or the probability of making two consecutive free throws.
Joint and Marginal Probabilities
Joint probability refers to the probability of two events occurring together, while marginal probability refers to the probability of a single event.
Example Table:
Rank | Lemonades | Thin Mints | Peanut Butter | Brownies | Total |
|---|---|---|---|---|---|
Daisies | 1014 | 2041 | 1014 | 1014 | 5083 |
Juniors | 1014 | 1014 | 1014 | 1014 | 4056 |
Cadettes | 1014 | 1014 | 1014 | 1014 | 4056 |
Total | 3042 | 4069 | 3042 | 3042 | 13195 |
Probability of selecting a Brownie:
Probability of selecting a Thin Mint:
Probability of selecting both a Brownie and Thin Mint:
Application: Free Throw Probability
Tree diagrams can be used to calculate the probability of compound events, such as making or missing free throws in basketball.
Example: Probability of missing both free throws:
Additional Probability Concepts
Sample Space and Events
The sample space is the set of all possible outcomes. An event is a subset of the sample space.
Example: Tossing a coin three times: Sample space = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Probability of getting three heads:
Summary Table: Key Formulas
Concept | Formula (LaTeX) |
|---|---|
Mean | |
Standard Deviation | |
Range | |
Z-Score | |
Conditional Probability | |
Probability of Event |
Additional info: Some explanations and tables have been expanded for clarity and completeness based on standard statistics curriculum.