BackFrequency Distributions, Graphs, and Correlations: Study Notes for Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Frequency Distributions
Definition and Purpose
A frequency distribution is a statistical tool that organizes data into categories or classes, showing how data values are partitioned among these groups. It lists each category (or class) along with the number (frequency) of data values in each.
Class: A range of values into which data are grouped.
Category: A label or name for a class, often used for qualitative data.
Frequency: The count of data values within each class.
Example: If a dataset contains test scores, a frequency distribution might show how many students scored within each score range (e.g., 50-69, 70-89, etc.).
Constructing a Frequency Distribution
To create a frequency distribution, follow these steps:
Select the number of classes: Typically between 5 and 20, depending on the dataset size and convenience.
Calculate class width: Use the formula: Round up to a convenient number if necessary.
Choose the first lower class limit: Start with the minimum value or a convenient value below it.
Determine subsequent lower class limits: Add the class width to the previous lower class limit to get the next one.
List lower and upper class limits: Arrange the lower class limits vertically and identify the corresponding upper class limits.
Tally data values: For each data value, place a tally mark in the appropriate class. Sum the tallies to find the frequency for each class.
Example: For a dataset with values ranging from 50 to 139, and 5 classes, the class width is calculated as: (rounded up to 20 for convenience)
The lower class limits would be 50, 70, 90, 110, and 130.
Table: Frequency Distribution Example
The following table shows a sample frequency distribution for a group:
Class Interval | Frequency |
|---|---|
50-69 | 2 |
70-89 | 33 |
90-109 | 35 |
110-129 | 7 |
130-149 | 1 |
Additional info: The class intervals are determined by the lower and upper class limits, and the frequencies represent the count of data values in each interval.
Graphs and Visual Representations
Types of Graphs
Histograms: Bar graphs representing frequency distributions for quantitative data.
Dotplots: Each data value is shown as a dot above a number line; stacked dots indicate repeated values.
Stem-and-leaf plots: Data values are split into a "stem" (leftmost digit(s)) and a "leaf" (rightmost digit), preserving original data values and showing distribution shape.
Time-series graphs: Display data collected over time (e.g., monthly, yearly) to show trends.
Pareto charts: Bar charts for categorical data, arranged in descending order of frequency.
Pie charts: Circular charts where each slice represents a category's proportion of the total.
Example: A dotplot of pulse rates might show two dots above "50" if two individuals have a pulse rate of 50.
Relative and Cumulative Frequency Distributions
Relative Frequency Distribution
A relative frequency distribution shows the proportion or percentage of data values in each class.
Formula:
Percentage frequency:
Example: If a class has a frequency of 33 and the total frequency is 78, the relative frequency is or 42.3%.
Cumulative Frequency Distribution
A cumulative frequency distribution shows the sum of frequencies for a class and all previous classes.
Useful for determining how many data values fall below a certain threshold.
Helps in identifying percentiles and medians.
Example: If the cumulative frequency for "less than 110" is 70, then 70 data values are less than 110.
Shapes of Distributions
Normal and Skewed Distributions
Understanding the shape of a distribution is crucial for interpreting data.
Normal distribution: Symmetrical, bell-shaped curve; mean, median, and mode are equal.
Skewed distribution: Asymmetrical; can be skewed to the right (positively skewed, longer right tail) or to the left (negatively skewed, longer left tail).
Example: Annual incomes are often right-skewed; human life spans may be left-skewed.
Correlation and Scatter Plots
Correlation
Correlation describes the relationship between two variables. If the values of one variable are associated with the values of another, a correlation exists.
Positive correlation: As one variable increases, the other tends to increase.
Negative correlation: As one variable increases, the other tends to decrease.
No correlation: No discernible pattern between the variables.
Important: Correlation does not imply causation.
Example: There may be a positive correlation between hours spent studying and grades, but this does not mean studying directly causes higher grades without considering other factors.
Scatter Plots
A scatter plot is a graph of paired data values, with one variable on each axis. The pattern of points can reveal the type and strength of correlation.
Linear correlation: Points approximate a straight line.
No correlation: Points are scattered randomly.
Example: A scatter plot of waist and arm circumferences may show a distinct straight-line pattern, indicating correlation. A scatter plot of weights and pulse rates may show no pattern, indicating no correlation.
Summary Table: Types of Frequency Distributions and Graphs
Type | Description | Example |
|---|---|---|
Frequency Distribution | Counts of data values in each class | Test scores grouped by ranges |
Relative Frequency Distribution | Proportion or percentage in each class | Percentage of students in each score range |
Cumulative Frequency Distribution | Sum of frequencies up to each class | Number of students scoring below a threshold |
Histogram | Bar graph for quantitative data | Distribution of heights |
Dotplot | Dots above a number line for each value | Pulse rates of individuals |
Stem-and-leaf plot | Data split into stems and leaves | Sorted pulse rates |
Pareto chart | Bar chart for categorical data, descending order | Causes of accidental deaths |
Pie chart | Circular chart showing proportions | Distribution of accident causes |
Scatter plot | Graph of paired data values | Waist vs. arm circumference |
Additional info: This table summarizes the main types of frequency distributions and graphs used in introductory statistics.