BackFoundations of Statistics: Data, Sampling, and Categorical Data Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Statistics: Introduction and Branches
Definition and Scope
Statistics is the science of collecting, organizing, summarizing, analyzing, and drawing conclusions from data. It is fundamental to understanding patterns and making informed decisions based on data.
Descriptive Statistics: Focuses on summarizing and presenting data in a meaningful way.
Inferential Statistics: Uses probability theory to make generalizations about a population based on a sample.

Descriptive Statistics
Descriptive statistics help us make sense of large datasets by summarizing them with measures such as the mean, median, and mode, and by visualizing data trends.
Example: Calculating the mean weight of fish over five years and plotting the results to observe trends.
Key Point: Visualizing data can reveal patterns and anomalies, such as a drop in mean weight in a specific year.


Inferential Statistics
Inferential statistics involve making predictions or inferences about a population based on a sample. This branch relies on probability theory to estimate population parameters.
Example: Using a sample of 1000 fish to infer whether the mean weight of the entire population dropped in 2023.
Key Point: The reliability of inferences depends on the sampling method and sample size.
Populations, Samples, and Census
Definitions
Population: The entire group of subjects under study.
Sample: A subset of the population selected for analysis.
Sample Size: The number of subjects in the sample.
Census: When the sample includes the entire population.


Sampling Techniques
Representativeness and Random Sampling
Sampling methods are crucial for ensuring that the sample accurately reflects the population. A representative sample mirrors the population's characteristics.
Key Point: The fraction of the sample with a certain property should match the fraction in the population.
Example: If 10% of the population weighs more than 200 pounds, 10% of the sample should also have this property.



Simple Random Sampling
Simple random sampling ensures every member of the population has an equal chance of being selected.
Example: Assigning numbers to whales and using a random number generator to select a sample.
Key Point: Simple random sampling is the foundation for most statistical inference.

Other Sampling Techniques
Systematic Sampling: Selects every kth member from an ordered population.
Stratified Sampling: Divides the population into strata and samples from each group.
Cluster Sampling: Divides the population into clusters, randomly selects clusters, and includes all members from selected clusters.
Convenience Sampling: Uses easily accessible subjects, but may not be representative.
Variables in Statistics
Types of Variables
A variable is a property or characteristic of a population that is studied.


Categorical Variables: Allowable values are distinct categories (e.g., car models, ice cream flavors, letter grades).
Quantitative Variables: Allowable values are numerical and measurable.
Categorical Variables
Nominal: No natural order (e.g., flavors, countries).
Ordinal: Natural order exists (e.g., letter grades).
Quantitative Variables
Discrete: Values are countable and have gaps (e.g., number of children).
Continuous: Values form a continuum without gaps (e.g., height, weight).


Observational vs. Experimental Studies
Study Designs
Statistical studies can be observational or experimental, depending on whether researchers influence the variables.
Observational Study: Researchers observe and record data without influencing variables.
Experimental Study: Researchers assign treatments to groups and compare responses.
Displaying and Describing Categorical Data
Frequency and Relative Frequency
Counting occurrences of each category and summarizing them in tables is a fundamental step in analyzing categorical data.
Frequency Distribution Table: Shows counts for each category.
Relative Frequency: Shows the proportion of each category relative to the total.

Formula:
Visual Representation: Bar Charts
Bar charts are used to visually represent the distribution of categorical data. The height of each bar corresponds to the frequency or relative frequency of each category.


Contingency Tables
Analyzing Relationships Between Categorical Variables
Contingency tables display the frequency distribution of two categorical variables, allowing for the analysis of relationships and conditional distributions.
Marginal Distribution: Distribution of individual variables, found in the margins of the table.
Conditional Distribution: Distribution of one variable given a specific value of another variable.
Example Table:
Gender | Starbucks | Tim Hortons | Other | Total |
|---|---|---|---|---|
M | 35 | 10 | 5 | 50 |
F | 40 | 15 | 5 | 60 |
Total | 75 | 25 | 10 | 110 |
Practice Questions:
What percent of people in our sample are men that prefer Tim Hortons?
Of the people that prefer Tim Hortons, what percent are men?
What percent of men prefer Tim Hortons?
Formula for Percent: