BackSTAT 205 Midterm 1 Study Guide: Probability, Data Collection, and Descriptive Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Probability and Random Experiments
Random Experiments
A random experiment is a process or action that leads to one or more possible outcomes, where the outcome cannot be predicted with certainty in advance.
Scenario Description: A random experiment is characterized by unpredictability and repeatability. Examples include tossing a coin, rolling a die, or drawing a card from a deck.
Counting Outcomes: The number of ways a random experiment can end is determined by listing all possible outcomes. For example, rolling a six-sided die has 6 possible outcomes.
Counting Events: An event is a subset of outcomes. The number of ways an event can occur depends on how many outcomes satisfy the event's criteria.
Empirical vs. Theoretical Probability
Probability quantifies the likelihood of an event occurring.
Empirical Probability: Based on observed data from experiments or historical records. Calculated as
Theoretical Probability: Based on known mathematical principles or models. Calculated as
Example: If a coin is flipped 100 times and lands heads 48 times, empirical probability of heads is 0.48; theoretical probability is 0.5.
Basic Probability Rules
Union (A or B):
Intersection (A and B): if A and B are independent.
Conditional Probability:
Tree Diagrams: Visual tools to map out sequences of events and calculate complex probabilities.
Mutually Exclusive and Independent Events
Mutually Exclusive: Events that cannot occur at the same time.
Independent: Events where the occurrence of one does not affect the probability of the other.
Example: Rolling a die: Event A (even number), Event B (number greater than 4). These are not mutually exclusive (6 is both even and >4), but may be independent depending on context.
Revisionist Probability
Revisionist probability refers to updating the probability of an event based on new information (conditional probability).
Example: If a test result is positive, what is the updated probability that a person actually has the disease?
Diagnostic Test Statistics
Prevalence, Sensitivity, Specificity
Prevalence: Proportion of individuals in a population who have a particular disease or condition.
Sensitivity: Probability that a test correctly identifies a positive case.
Specificity: Probability that a test correctly identifies a negative case.
Relative Risk
Relative Risk: Compares the probability of an event occurring in two groups.
Interpretation: A relative risk of 2 means the event is twice as likely in the exposed group.
Populations, Samples, and Variables
Identifying Populations and Variables
Population: The entire group of individuals or items of interest.
Sample: A subset of the population used to make inferences.
Numerical Variable: Quantitative, measured on a numeric scale (e.g., height, weight).
Categorical Variable: Qualitative, places individuals into categories (e.g., gender, color).
Parameters vs. Statistics
Parameter: A numerical summary of a population (e.g., population mean ).
Statistic: A numerical summary of a sample (e.g., sample mean ).
Statistical Inference: Making conclusions about a population based on sample data.
Data Collection Methods
Observational vs. Experimental Studies
Observational Study: Researchers observe subjects without intervention.
Experimental Study: Researchers assign treatments and control variables.
Control Group: Group that does not receive the treatment.
Placebo: Inactive treatment used to control for psychological effects.
Placebo Effect: Improvement due to belief in treatment.
Random Assignment: Subjects are randomly assigned to groups to reduce bias.
Blinded Experiment: Subjects do not know which group they are in.
Double-Blinded Experiment: Neither subjects nor experimenters know group assignments.
When to Use: Observational studies are used when experiments are unethical or impractical; experimental studies are used to establish causality.
Sampling Methods and Bias
Random Sample: Every member of the population has an equal chance of being selected.
Non-Random Sample: Selection is not random, leading to sampling bias.
Measurement Bias: Errors in data collection, including:
Non-Response Bias: When selected individuals do not respond.
Response Bias: When responses are influenced by wording or interviewer.
Instrument Bias: When measurement tools are faulty or inconsistent.
Random Sampling Designs
Simple Random Sample: Every possible sample has an equal chance of selection.
Stratified Random Sample: Population divided into strata, random samples taken from each.
Systematic Random Sample: Select every nth individual from a list.
Cluster Sample: Population divided into clusters, entire clusters are randomly selected.
Descriptive Statistics and Data Visualization
Histograms
Histograms are graphical representations of numerical data.
Frequency Histogram: Shows counts of data in intervals.
Relative Frequency Histogram: Shows proportions or percentages.
Density-Scale Histogram: Area under bars represents probability.
Distribution Shape: Use histograms to identify if data is symmetric, skewed, unimodal, or bimodal.
Estimating Probabilities: Theoretical probabilities can be estimated from histogram areas.
StatCrunch: Software tool for creating histograms and other graphs.
Measures of Center and Variation
Sample Mean:
Sample Median: Middle value when data is ordered.
Sample Variance:
Sample Standard Deviation:
Interpretation: Median represents the central tendency; standard deviation measures spread.
Quartiles, IQR, and Boxplots
Q1 (First Quartile): 25th percentile
Q3 (Third Quartile): 75th percentile
Interquartile Range (IQR):
Lower Fence:
Upper Fence:
Boxplot: Visualizes quartiles, median, and outliers.
Outliers: Data points outside the fences are considered outliers.
Normal Distribution and the 68-95-99% Rule
Normal Distribution: Symmetric, bell-shaped curve.
68-95-99% Rule: In a normal distribution:
68% of data within 1 standard deviation of the mean
95% within 2 standard deviations
99% within 3 standard deviations
Z-Value:
Z-Table: Used to find probabilities associated with z-values.
StatCrunch: Can compute probabilities and z-values.
Preparation Recommendations
Review all assignments, practice problems, lab exercises, and pre-lab exercises corresponding to the topics above.
Ensure proficiency in both manual calculations and use of StatCrunch for statistical analysis.