Null Models and Probability Distributions in Biological Data

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Null Models and Probability Distributions

Introduction to Null Models

Null models are statistical models that describe expected patterns in data under a hypothesis of randomness or no effect. They are essential for testing whether observed data deviate significantly from what would be expected by chance.

Null Model: A model assuming no relationship or effect, used as a baseline for comparison.
Application: Used in hypothesis testing to determine if observed results are statistically significant.

Probability Distributions in Biological Data

Types of Random Variables

Random variables are quantities whose values result from random phenomena. They are classified as discrete or continuous, each with distinct properties and applications.

Discrete Random Variables:
- Sample space : Limited number of possible outcomes (often < 20 in practice).
- Probability can be determined empirically for each outcome.
- Examples: Number of siblings, rolling a die.
Continuous Random Variables:
- Sample space : Unlimited number of possible values within a range.
- Probability density curve can be determined empirically.
- Probability found by integration under the curve.
- Examples: Height, concentration of a substance.

Common Distributions in Biological Data

Biological data can follow various probability distributions, each with unique characteristics and applications.

Normal Distribution (Bell-shaped):
- Continuous, symmetric, defined by mean () and standard deviation ().
- Formula:
- Example: Heights, measurement errors.
Log-normal Distribution (Right-skewed):
- Continuous, but only positive values.
- Taking the logarithm of data yields a normal distribution.
- Example: Concentrations, latency periods.
Uniform Distribution:
- All values within a specified range have equal probability.
- Discrete: Random digits, rolling a die.
- Continuous: Random number generation in experiments.
Poisson Distribution:
- Discrete, models count of events in a fixed interval.
- Formula:
- Example: Number of mutations, arrivals.
Multinomial Distribution:
- Discrete, more than two categories.
- Each individual falls into category with probability .
- Example: Year of study, type of fruit.
Weibull Distribution:
- Continuous, only positive values.
- Main parameter: skewness; scale parameter defines spread.
- Example: Age distribution, survival times.

Probability Rules and Sample Space

Basic Probability Rules

Probability theory provides rules for calculating the likelihood of events in a sample space.

Probabilities must be between 0 and 1.
Sum of probabilities in sample space must equal 1.
Complement Rule: , where is the complement of event .
Addition Rule (Disjoint Events): if and are mutually exclusive.
Addition Rule (Non-disjoint Events):
Multiplication Rule (Independent Events):
Multiplication Rule (Dependent Events):

Sample Space and Categories

The sample space is the set of all possible outcomes of a random phenomenon. Categories must be exhaustive and mutually exclusive.

Exhaustive: All possible values are covered.
Mutually Exclusive: Each case falls into only one category.
Relative Frequency: Probability of event is the relative frequency of category in large samples.

Categorical Data and Random Variables

Handling Categorical Data

Categorical data are classified into distinct groups or categories. Random variables can represent these categories, and probability rules apply to their frequencies.

Sampling Unit: The object or individual being sampled (e.g., piece of fruit).
Sample Space : List of possible outcomes for random variable .
Count: Frequency of each category.
Relative Frequency: Proportion of each category in the sample.

Application: West Nile Virus Case Study

Analyzing Disease Trends

Statistical analysis of disease data involves comparing observed counts to expected frequencies under null models. For West Nile Virus (WNV), age groups and outcomes (cases, deaths) are analyzed.

Scientific Hypothesis: Which age group is most likely to be diagnosed with WNV?
Observed Data: Cases and deaths by age group.
Expected Frequencies: If all age groups are equally likely, expected counts are proportional to population frequencies.

Example Table: WNV Cases and Population

Age Group	Cases	Relative Frequency	Census Data (2010)
0-44	240	0.31	0.727
45-64	335	0.43	0.187
65-100	198	0.26	0.086

Example Table: WNV Deaths by Age Group

Age Group	Cases	Deaths
0-44	240	2
45-64	335	5
65-100	198	19

Interpreting Results

Relative Risk: Seniors (65-100) are more likely to die from WNV compared to younger groups.
Proportionality: Deaths should be proportional to cases or population frequency if risk is equal.
Conditional Probability: Probability that a patient who died was a senior:

Joint and Conditional Distributions

Joint Distributions

Joint distributions describe the probability of combinations of two categorical variables (e.g., age group and death status).

Joint Probability: is the probability that both events occur.
Independence: If , events are independent.
Expected Frequencies: If independent, expected frequency in each cell is product of marginal probabilities.

Conditional Probability

Conditional probability quantifies the likelihood of an event given that another event has occurred.

Formula:
Application: Probability that a patient who died was a senior:

Summary Table: Common Distributions

Distribution	Type	Key Properties	Example
Normal	Continuous	Symmetric, bell-shaped	Height
Log-normal	Continuous	Right-skewed, positive values	Concentration
Uniform	Discrete/Continuous	Equal probability for all values	Random digits
Poisson	Discrete	Counts of events	Mutations
Multinomial	Discrete	Multiple categories	Fruit type
Weibull	Continuous	Skewness, positive values	Survival time

Key Takeaways

Not all biological data follow a normal distribution; choose statistical tests and null hypotheses accordingly.
Understanding the type of random variable and distribution is crucial for proper statistical analysis.
Probability rules and joint/conditional distributions are foundational for analyzing categorical and count data.

Additional info: Some content inferred from context and standard statistics curriculum, such as formulas and definitions for distributions and probability rules.