Skip to main content
Back

Exploring Data with Tables and Graphs: Structured Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Exploring Data with Tables and Graphs

Variables and Types of Data

Understanding the nature of variables and data is fundamental in statistics. Variables are characteristics that can vary among individuals or items, and data are the values these variables take.

  • Variables: Characteristics that vary from one person or thing to another.

  • Data: Values of variables; each individual value is called an observation.

  • Dataset: Collection of all observations for a particular variable.

  • Raw Data: Data collected in its original form.

  • Qualitative Variables: Non-numerical values (e.g., sex, eye color).

  • Quantitative Variables: Numerical values (e.g., weight, height).

  • Discrete Variables: Quantitative variables with countable values (e.g., number of siblings).

  • Continuous Variables: Quantitative variables with values forming an interval (e.g., weight).

Qualitative Data: Values of qualitative variables. Quantitative Data: Values of quantitative variables. Discrete Data: Values of discrete variables. Continuous Data: Values of continuous variables.

Frequency Distributions and Tables

Frequency distributions organize data by showing how values are partitioned among categories or classes. This is a key method for summarizing and visualizing data.

  • Frequency Distribution (Frequency Table): Lists distinct values or groups and their frequencies.

  • Frequency: Number of data values in each group.

IQ Score

Frequency

50-69

2

70-89

33

90-109

35

110-129

7

130-149

1

Class Limits, Boundaries, and Midpoints

Classes in frequency tables are defined by their limits, boundaries, and midpoints.

  • Lower Class Limits: Smallest numbers that can belong to a class.

  • Upper Class Limits: Largest numbers that can belong to a class.

  • Class Boundaries: Numbers that separate classes without gaps.

  • Class Midpoints: Average of lower and upper class limits.

Example: For IQ score classes, boundaries fill the gaps between class limits, and midpoints are calculated as .

Class limits and boundaries for IQ score

Relative Frequency Distributions

Relative frequency distributions show the proportion of observations in each class, providing a standard for comparison between datasets.

  • Relative Frequency: Ratio of frequency to total number of observations.

Formula:

IQ Score

Frequency

Relative Frequency

50-69

2

0.03

70-89

33

0.42

90-109

35

0.45

110-129

7

0.09

130-149

1

0.01

Cumulative Frequency Tables

Cumulative frequency tables show the running total of frequencies up to each class, useful for understanding data distribution.

Score

Frequency

Cumulative Frequency

1

2

2

2

5

7

3

4

11

4

2

13

5

1

14

Graphs for Quantitative Data

Histograms

Histograms are bar graphs representing the frequency of quantitative data classes. They visually display the shape, center, spread, and outliers of a dataset.

  • Height of bar: Frequency

  • Width of bar: Class width

Histogram of IQ scores

Distribution Shapes

  • Unimodal: One peak

  • Bimodal: Two peaks

  • Multimodal: Multiple peaks

  • Symmetric: Mirror image on both sides

  • Skewed: Longer tail on one side

Bimodal distribution histogramNormal distribution histogramUniform vs Normal distributionHistogram skewed to the rightHistogram skewed to the leftHistogram of McDonald's lunch service time

Normal Distribution: Symmetric, bell-shaped curve. Skewed Distribution: Skewed left (tail left), skewed right (tail right). Uniform Distribution: All classes have similar frequencies.

Dotplots

Dotplots display each data value as a dot above a horizontal scale, useful for small datasets and visualizing distribution shape.

Dotplot of pulse rates of males

Stem-and-Leaf Plots

Stem-and-leaf plots separate each value into a stem (all but the last digit) and a leaf (last digit), retaining original data and showing distribution shape.

  • Stem: All but the final right digit

  • Leaf: Rightmost digit

Time-Series Graphs

Time-series graphs plot quantitative data over time, with time on the x-axis and data values on the y-axis. Useful for identifying trends and patterns.

Graphs for Categorical Data

Bar Graphs

Bar graphs represent frequencies of categorical data, making it easier to compare categories. Multiple bar graphs can show two or more datasets.

Multiple bar graph: Median income by gender

Pareto Charts

Pareto charts are bar graphs for categorical data, with bars arranged in descending order of frequency to highlight the most important categories.

Pareto chart of stolen boats

Pie Charts

Pie charts show categorical data as slices of a circle, emphasizing high-percentage categories. Best used with fewer than 10 categories.

Pie chart of candy colors

Graphs That Enlighten and Graphs That Deceive

Misleading Graphs

Graphs should be fair and objective. Common ways graphs misrepresent data include:

  • Nonzero Vertical Axis: Y-axis does not start at zero, exaggerating differences.

  • Pictographs: Using images to represent data, exaggerating differences due to area or volume.

Bar graphs with different y-axis scalesGood representation vs pictograph

Choose graphs that best represent the data, avoid distortion, and provide a fair representation of results.

Scatterplots, Correlation, and Regression

Explanatory vs Response Variables

In studies, the explanatory variable (x) influences the response variable (y). Scatterplots visualize the relationship between two quantitative variables.

  • Explanatory Variable (x): Independent variable

  • Response Variable (y): Dependent variable

Scatterplot of arm vs waist circumference

Correlation

Correlation measures the association between two variables. Linear correlation exists when the relationship forms a straight line.

  • Positive Correlation: As x increases, y increases.

  • Negative Correlation: As x increases, y decreases.

  • No Correlation: No association between x and y.

Correlation coefficient scaleScatterplots showing positive, negative, no, and nonlinear correlation

Linear Correlation Coefficient (r): Measures strength and direction of linear association.

  • r = 1: Perfect positive linear correlation

  • r = -1: Perfect negative linear correlation

  • r = 0: No linear correlation

Properties: Both variables must be quantitative, r is affected by outliers, and does not imply causality.

Example: High School GPA vs College GPA

Scatterplot and calculation of r show a strong positive correlation between high school GPA and college GPA. High school GPA is the explanatory variable (x), and college GPA is the response variable (y).

StatCrunch output: Correlation between High school GPA and College GPA is 0.88 (strong positive correlation).

*Additional info: Academic context and explanations have been expanded for clarity and completeness.*

Pearson Logo

Study Prep