Skip to main content
Back

Foundations of Statistics: Data, Sampling, and Categorical Data Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Statistics: Introduction and Branches

Definition and Scope

Statistics is the science of collecting, organizing, summarizing, analyzing, and drawing conclusions from data. It is fundamental to understanding patterns and making informed decisions based on data.

  • Descriptive Statistics: Focuses on summarizing and presenting data in a meaningful way.

  • Inferential Statistics: Uses probability theory to make generalizations about a population based on a sample.

Lightbulb icon indicating key concept

Descriptive Statistics

Descriptive statistics help us make sense of large datasets by summarizing them with measures such as the mean, median, and mode, and by visualizing data trends.

  • Example: Calculating the mean weight of fish over five years and plotting the results to observe trends.

  • Key Point: Visualizing data can reveal patterns and anomalies, such as a drop in mean weight in a specific year.

Red exclamation mark indicating important note about raw dataRed exclamation mark indicating importance of data presentation

Inferential Statistics

Inferential statistics involve making predictions or inferences about a population based on a sample. This branch relies on probability theory to estimate population parameters.

  • Example: Using a sample of 1000 fish to infer whether the mean weight of the entire population dropped in 2023.

  • Key Point: The reliability of inferences depends on the sampling method and sample size.

Populations, Samples, and Census

Definitions

  • Population: The entire group of subjects under study.

  • Sample: A subset of the population selected for analysis.

  • Sample Size: The number of subjects in the sample.

  • Census: When the sample includes the entire population.

Red exclamation mark indicating census propertiesRed exclamation mark indicating feasibility of census

Sampling Techniques

Representativeness and Random Sampling

Sampling methods are crucial for ensuring that the sample accurately reflects the population. A representative sample mirrors the population's characteristics.

  • Key Point: The fraction of the sample with a certain property should match the fraction in the population.

  • Example: If 10% of the population weighs more than 200 pounds, 10% of the sample should also have this property.

Red exclamation mark indicating representativenessRed exclamation mark indicating census as only fully representative sampleRed exclamation mark indicating random sampling and sample size

Simple Random Sampling

Simple random sampling ensures every member of the population has an equal chance of being selected.

  • Example: Assigning numbers to whales and using a random number generator to select a sample.

  • Key Point: Simple random sampling is the foundation for most statistical inference.

Red exclamation mark indicating random sampling process

Other Sampling Techniques

  • Systematic Sampling: Selects every kth member from an ordered population.

  • Stratified Sampling: Divides the population into strata and samples from each group.

  • Cluster Sampling: Divides the population into clusters, randomly selects clusters, and includes all members from selected clusters.

  • Convenience Sampling: Uses easily accessible subjects, but may not be representative.

Variables in Statistics

Types of Variables

A variable is a property or characteristic of a population that is studied.

Red exclamation mark indicating variable definitionLightbulb icon indicating key concept about variable types

  • Categorical Variables: Allowable values are distinct categories (e.g., car models, ice cream flavors, letter grades).

  • Quantitative Variables: Allowable values are numerical and measurable.

Categorical Variables

  • Nominal: No natural order (e.g., flavors, countries).

  • Ordinal: Natural order exists (e.g., letter grades).

Quantitative Variables

  • Discrete: Values are countable and have gaps (e.g., number of children).

  • Continuous: Values form a continuum without gaps (e.g., height, weight).

Lightbulb icon indicating quantitative variable categoriesRed exclamation mark indicating quantitative variable focus

Observational vs. Experimental Studies

Study Designs

Statistical studies can be observational or experimental, depending on whether researchers influence the variables.

  • Observational Study: Researchers observe and record data without influencing variables.

  • Experimental Study: Researchers assign treatments to groups and compare responses.

Displaying and Describing Categorical Data

Frequency and Relative Frequency

Counting occurrences of each category and summarizing them in tables is a fundamental step in analyzing categorical data.

  • Frequency Distribution Table: Shows counts for each category.

  • Relative Frequency: Shows the proportion of each category relative to the total.

Red exclamation mark indicating importance of relative frequency

Formula:

Visual Representation: Bar Charts

Bar charts are used to visually represent the distribution of categorical data. The height of each bar corresponds to the frequency or relative frequency of each category.

Red exclamation mark indicating bar chart constructionRed exclamation mark indicating bar chart visualization

Contingency Tables

Analyzing Relationships Between Categorical Variables

Contingency tables display the frequency distribution of two categorical variables, allowing for the analysis of relationships and conditional distributions.

  • Marginal Distribution: Distribution of individual variables, found in the margins of the table.

  • Conditional Distribution: Distribution of one variable given a specific value of another variable.

Example Table:

Gender

Starbucks

Tim Hortons

Other

Total

M

35

10

5

50

F

40

15

5

60

Total

75

25

10

110

Practice Questions:

  • What percent of people in our sample are men that prefer Tim Hortons?

  • Of the people that prefer Tim Hortons, what percent are men?

  • What percent of men prefer Tim Hortons?

Formula for Percent:

Pearson Logo

Study Prep