STAT 205 Midterm 1 Study Guide: Probability, Data Collection, and Descriptive Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Probability and Random Experiments

Random Experiments

A random experiment is a process or action that leads to one or more possible outcomes, where the outcome cannot be predicted with certainty in advance.

Scenario Description: A random experiment is characterized by unpredictability and repeatability. Examples include tossing a coin, rolling a die, or drawing a card from a deck.
Counting Outcomes: The number of ways a random experiment can end is determined by listing all possible outcomes. For example, rolling a six-sided die has 6 possible outcomes.
Counting Events: An event is a subset of outcomes. The number of ways an event can occur depends on how many outcomes satisfy the event's criteria.

Empirical vs. Theoretical Probability

Probability quantifies the likelihood of an event occurring.

Empirical Probability: Based on observed data from experiments or historical records. Calculated as
Theoretical Probability: Based on known mathematical principles or models. Calculated as
Example: If a coin is flipped 100 times and lands heads 48 times, empirical probability of heads is 0.48; theoretical probability is 0.5.

Basic Probability Rules

Union (A or B):
Intersection (A and B): if A and B are independent.
Conditional Probability:
Tree Diagrams: Visual tools to map out sequences of events and calculate complex probabilities.

Mutually Exclusive and Independent Events

Mutually Exclusive: Events that cannot occur at the same time.
Independent: Events where the occurrence of one does not affect the probability of the other.
Example: Rolling a die: Event A (even number), Event B (number greater than 4). These are not mutually exclusive (6 is both even and >4), but may be independent depending on context.

Revisionist Probability

Revisionist probability refers to updating the probability of an event based on new information (conditional probability).

Example: If a test result is positive, what is the updated probability that a person actually has the disease?

Diagnostic Test Statistics

Prevalence, Sensitivity, Specificity

Prevalence: Proportion of individuals in a population who have a particular disease or condition.
Sensitivity: Probability that a test correctly identifies a positive case.
Specificity: Probability that a test correctly identifies a negative case.

Relative Risk

Relative Risk: Compares the probability of an event occurring in two groups.
Interpretation: A relative risk of 2 means the event is twice as likely in the exposed group.

Populations, Samples, and Variables

Identifying Populations and Variables

Population: The entire group of individuals or items of interest.
Sample: A subset of the population used to make inferences.
Numerical Variable: Quantitative, measured on a numeric scale (e.g., height, weight).
Categorical Variable: Qualitative, places individuals into categories (e.g., gender, color).

Parameters vs. Statistics

Parameter: A numerical summary of a population (e.g., population mean ).
Statistic: A numerical summary of a sample (e.g., sample mean ).
Statistical Inference: Making conclusions about a population based on sample data.

Data Collection Methods

Observational vs. Experimental Studies

Observational Study: Researchers observe subjects without intervention.
Experimental Study: Researchers assign treatments and control variables.
Control Group: Group that does not receive the treatment.
Placebo: Inactive treatment used to control for psychological effects.
Placebo Effect: Improvement due to belief in treatment.
Random Assignment: Subjects are randomly assigned to groups to reduce bias.
Blinded Experiment: Subjects do not know which group they are in.
Double-Blinded Experiment: Neither subjects nor experimenters know group assignments.
When to Use: Observational studies are used when experiments are unethical or impractical; experimental studies are used to establish causality.

Sampling Methods and Bias

Random Sample: Every member of the population has an equal chance of being selected.
Non-Random Sample: Selection is not random, leading to sampling bias.
Measurement Bias: Errors in data collection, including:
- Non-Response Bias: When selected individuals do not respond.
- Response Bias: When responses are influenced by wording or interviewer.
- Instrument Bias: When measurement tools are faulty or inconsistent.

Random Sampling Designs

Simple Random Sample: Every possible sample has an equal chance of selection.
Stratified Random Sample: Population divided into strata, random samples taken from each.
Systematic Random Sample: Select every nth individual from a list.
Cluster Sample: Population divided into clusters, entire clusters are randomly selected.

Descriptive Statistics and Data Visualization

Histograms

Histograms are graphical representations of numerical data.

Frequency Histogram: Shows counts of data in intervals.
Relative Frequency Histogram: Shows proportions or percentages.
Density-Scale Histogram: Area under bars represents probability.
Distribution Shape: Use histograms to identify if data is symmetric, skewed, unimodal, or bimodal.
Estimating Probabilities: Theoretical probabilities can be estimated from histogram areas.
StatCrunch: Software tool for creating histograms and other graphs.

Measures of Center and Variation

Sample Mean:
Sample Median: Middle value when data is ordered.
Sample Variance:
Sample Standard Deviation:
Interpretation: Median represents the central tendency; standard deviation measures spread.

Quartiles, IQR, and Boxplots

Q1 (First Quartile): 25th percentile
Q3 (Third Quartile): 75th percentile
Interquartile Range (IQR):
Lower Fence:
Upper Fence:
Boxplot: Visualizes quartiles, median, and outliers.
Outliers: Data points outside the fences are considered outliers.

Normal Distribution and the 68-95-99% Rule

Normal Distribution: Symmetric, bell-shaped curve.
68-95-99% Rule: In a normal distribution:
- 68% of data within 1 standard deviation of the mean
- 95% within 2 standard deviations
- 99% within 3 standard deviations
Z-Value:
Z-Table: Used to find probabilities associated with z-values.
StatCrunch: Can compute probabilities and z-values.

Preparation Recommendations

Review all assignments, practice problems, lab exercises, and pre-lab exercises corresponding to the topics above.
Ensure proficiency in both manual calculations and use of StatCrunch for statistical analysis.