Skip to main content
Back

Statistics Exam 1 Study Guide: Data Collection, Summarization, and Relationships

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter One: Data Collection

Introduction to Data Collection

Data collection is the foundational step in statistics, involving the gathering, organizing, and summarizing of information to understand variables and relationships.

  • Data: Describes characteristics of an individual or object.

  • Variable: Any property or characteristic that can vary among individuals.

  • Qualitative Variable: Classifies individuals based on an attribute or characteristic (e.g., gender, color).

  • Quantitative Variable: Provides numerical measures of individuals (e.g., height, weight).

  • Discrete Variable: Quantitative variable with countable values (e.g., number of children).

  • Continuous Variable: Quantitative variable with infinite possible values within a range (e.g., height, time).

Types of Studies

  • Designed Experiment: Individuals are assigned to groups, and the effect of an explanatory variable is measured on a response variable.

  • Observational Study: Observational units are observed without intervention; only associations can be determined.

Sampling Methods

  • Convenience Sampling: Observational units are selected based on ease of access, not randomness.

  • Random Sampling: Every member of the population has an equal chance of being selected, reducing bias.

Confounding and Lurking Variables

  • Confounding: Occurs when the effects of two or more explanatory variables are mixed, making it difficult to determine causality.

  • Lurking Variable: An unmeasured variable that influences both the explanatory and response variables.

Chapter Two: Organizing and Summarizing Data

Data Summarization

Organizing data helps reveal patterns and relationships. Summaries can be graphical or numerical.

  • Frequency Distribution: Lists categories or values and their frequencies.

  • Relative Frequency: Proportion of observations in each category.

  • Bar Graphs and Pie Charts: Used for qualitative data.

  • Histograms: Used for quantitative data, showing the distribution of values.

Types of Data Summaries

Type

Qualitative Data Summaries

Quantitative Data Summaries

Graphical

Bar, Pie

Histogram, Dotplot, Boxplot

Numerical

Frequencies

Mean, Median, Mode, Range, Variance, Standard Deviation

Chapter Three: Numerically Summarizing Data

Measures of Center

  • Mean: Arithmetic average of data values.

  • Median: Middle value when data are ordered.

  • Mode: Most frequently occurring value.

Measures of Spread

  • Range: Difference between the largest and smallest values.

  • Standard Deviation: Average distance of data points from the mean.

  • Variance: Square of the standard deviation.

  • Interquartile Range (IQR): Difference between the third and first quartiles ().

Distribution Shapes

  • Uniform: All values are equally likely.

  • Right-Skewed: Tail extends to the right.

  • Left-Skewed: Tail extends to the left.

  • Symmetric: Both sides are mirror images.

  • Bimodal: Two peaks in the distribution.

Chapter Four: Describing the Relation Between Two Variables

Quantiles and Percentiles

  • Percentile: Value below which a given percentage of observations fall.

  • Quartiles: Divide data into four equal parts: (25th percentile), (median), (75th percentile).

Boxplots and Outliers

  • Boxplot: Graphical summary using quartiles and median.

  • Interquartile Range (IQR): Used to identify outliers. Outliers are values below or above .

Extreme Values

  • Outlier: Observation much higher or lower than the rest.

Chapter Five: Describing Relationships Between Variables

Univariate and Bivariate Data

  • Univariate Data: Summarizes one variable at a time.

  • Bivariate Data: Examines the relationship between two variables.

Comparing Groups

  • Side-by-Side Boxplots: Used to compare distributions of a quantitative variable across groups defined by a categorical variable.

Scatterplots and Correlation

  • Scatterplot: Graphical tool to determine the relationship between two quantitative variables.

  • Positive Linear Relationship: As one variable increases, so does the other.

  • Negative Linear Relationship: As one variable increases, the other decreases.

  • Correlation Coefficient (): Measures the strength and direction of a linear relationship between two variables. .

Additional Notes: Calculations and General Concepts

  • Relative Frequency:

  • Standard Deviation:

  • Variance:

  • Interquartile Range:

  • Comparing Groups: Focus on comparing relative frequencies and distributions across groups.

  • Association vs. Causation: Quantitative association does not imply causation.

Additional info: These notes cover the first five chapters of a typical introductory statistics course, focusing on data collection, organization, summarization, and the basics of describing relationships between variables.

Pearson Logo

Study Prep