Skip to main content
Back

Statistics Unit 1: Foundations, Data, and Descriptive Analysis (Chapters 1–3)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

1.1 Statistical and Critical Thinking

Statistics is the science of collecting, analyzing, presenting, and interpreting data. Critical thinking is essential in statistics to identify flaws in data presentation and interpretation.

  • Key Point: Always question how data is collected and presented. For example, survey results may be misleading if the sample is biased or the graph is not scaled properly.

  • Example: A bar graph showing survey results about hotel satisfaction may be misleading if the sample size or response options are not clear.

Important Definitions

  • Data: Collections of observations, such as measurements, genders, or survey responses.

  • Statistics: The science of planning studies and experiments, obtaining data, and then organizing, summarizing, analyzing, interpreting, and drawing conclusions based on the data.

  • Population: The complete collection of all elements (scores, people, measurements, etc.) to be studied.

  • Sample: A subset of the population, selected for study.

  • Individual: A single member of the population.

Beware of Potential Pitfalls

  • Misleading Conclusions

  • Self-reported Results

  • Loaded Questions

  • Order of Questions

  • Nonresponse

  • Percents and Percentages

Types of Data

1.2 Types of Data

Data can be classified as either quantitative (numerical) or categorical (qualitative).

  • Parameter: A numerical value summarizing a population.

  • Statistic: A numerical value summarizing a sample.

  • Quantitative Variable: Takes numerical values (e.g., height, weight).

  • Categorical Variable: Describes attributes or categories (e.g., color, type).

Examples:

  • Distance you live from university: Quantitative

  • Color of your pants: Categorical

Distinguishing Data Types

  • Discrete Variable: Countable values (e.g., number of texts sent in a month).

  • Continuous Variable: Infinite possible values within a range (e.g., temperature).

  • Nominal Level: Categories only, no order (e.g., colors).

  • Ordinal Level: Categories with order (e.g., satisfaction ratings).

  • Interval Level: Numerical, differences are meaningful, no true zero (e.g., temperature in Celsius).

  • Ratio Level: Numerical, differences and ratios are meaningful, true zero exists (e.g., height).

Collecting Sample Data

1.3 Collecting Sample Data

Data can be collected through experiments or observational studies. Sampling methods affect the reliability of results.

  • Experiment: Researcher applies a treatment and observes effects.

  • Observational Study: Researcher observes and measures without intervention.

Biased Sampling Methods

  • Convenience Sample: Easily available subjects, often biased.

  • Volunteer Sample: Subjects volunteer, may not represent population.

Probability Sampling Methods

  • Simple Random Sample: Every member has equal chance of selection.

  • Systematic Sample: Every nth member is selected.

  • Stratified Sample: Population divided into subgroups (strata), random samples taken from each.

  • Cluster Sample: Population divided into clusters, entire clusters are randomly selected.

Exploring Data with Tables and Graphs

2.1 Frequency Distributions

Frequency distributions organize data into classes or intervals and show the number of observations in each class.

  • Lower Class Limit: Smallest value in a class.

  • Upper Class Limit: Largest value in a class.

  • Class Boundaries: Values that separate classes.

  • Class Midpoint:

  • Class Width: Difference between consecutive lower class limits.

  • Frequency: Number of observations in a class.

  • Relative Frequency:

Class

Frequency

Relative Frequency

60–49

1

0.025

50–59

5

0.125

2.2 Histograms

A histogram is a bar graph representing the frequency distribution of a quantitative variable.

  • Key Components: Bars represent classes, height shows frequency.

  • Interpretation: Histograms help visualize the center, variation, distribution, and outliers in data.

Common Distribution Shapes: Uniform, bimodal, skewed, normal.

2.3 Graphs that Enlighten and Graphs that Deceive

Graphs can clarify or mislead. Proper graph selection and scaling are crucial.

  • Pie Charts: Show proportions of categorical data.

  • Bar Graphs: Compare frequencies of categorical data.

  • Non-Zero Axis: Can exaggerate differences.

  • Pictographs: Use images, may distort perception.

Color

Number

Percent

Red

23

18.1%

Yellow

25

19.7%

Blue

19

15.0%

Green

21

16.5%

Orange

20

15.7%

Brown

19

15.0%

Describing, Exploring, and Comparing Data

3.1 Measures of Center

Measures of central tendency locate the center of a data set. The main measures are mean, median, mode, and midrange.

  • Mean: Arithmetic average. For a sample: ; for a population:

  • Median: Middle value when data is ordered.

  • Mode: Most frequently occurring value.

  • Midrange:

3.2 Measures of Variation

Measures of variation describe the spread of data. The most common are range, variance, and standard deviation.

  • Range: Difference between maximum and minimum values.

  • Variance: (sample), (population)

  • Standard Deviation: (sample), (population)

Empirical Rule for Normal Distributions

The Empirical Rule describes data within a normal (bell-shaped) distribution:

  • Approximately 68% of data within 1 standard deviation of the mean

  • Approximately 95% within 2 standard deviations

  • Approximately 99.7% within 3 standard deviations

3.3 Measures of Relative Standing and Boxplots

Measures of position describe the relative location of a data value within a data set.

  • Z-Score:

  • Percentiles: Divide data into 100 groups; is the kth percentile.

  • Quartiles: Divide data into four equal parts; Q1 (25th percentile), Q2 (median), Q3 (75th percentile).

  • Interquartile Range (IQR):

5-Number Summary

  • Minimum

  • Q1 (first quartile)

  • Median (Q2)

  • Q3 (third quartile)

  • Maximum

Boxplot: A graphical representation of the 5-number summary, showing spread and outliers.

Descriptive vs. Inferential Statistics

  • Descriptive Statistics: Methods for organizing and summarizing data.

  • Inferential Statistics: Methods for making predictions or inferences about a population based on sample data.

Additional info: These notes cover the foundational concepts in statistics, including data types, sampling methods, graphical representation, and measures of center and variation, as outlined in Chapters 1–3 of a college statistics course.

Pearson Logo

Study Prep