Skip to main content
Back

Null Models and Probability Distributions in Biological Data

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Null Models and Probability Distributions

Introduction to Null Models

Null models are statistical models that describe expected patterns in data under a hypothesis of randomness or no effect. They are essential for testing whether observed data deviate significantly from what would be expected by chance.

  • Null Model: A model assuming no relationship or effect, used as a baseline for comparison.

  • Application: Used in hypothesis testing to determine if observed results are statistically significant.

Probability Distributions in Biological Data

Types of Random Variables

Random variables are quantities whose values result from random phenomena. They are classified as discrete or continuous, each with distinct properties and applications.

  • Discrete Random Variables:

    • Sample space : Limited number of possible outcomes (often < 20 in practice).

    • Probability can be determined empirically for each outcome.

    • Examples: Number of siblings, rolling a die.

  • Continuous Random Variables:

    • Sample space : Unlimited number of possible values within a range.

    • Probability density curve can be determined empirically.

    • Probability found by integration under the curve.

    • Examples: Height, concentration of a substance.

Common Distributions in Biological Data

Biological data can follow various probability distributions, each with unique characteristics and applications.

  • Normal Distribution (Bell-shaped):

    • Continuous, symmetric, defined by mean () and standard deviation ().

    • Formula:

    • Example: Heights, measurement errors.

  • Log-normal Distribution (Right-skewed):

    • Continuous, but only positive values.

    • Taking the logarithm of data yields a normal distribution.

    • Example: Concentrations, latency periods.

  • Uniform Distribution:

    • All values within a specified range have equal probability.

    • Discrete: Random digits, rolling a die.

    • Continuous: Random number generation in experiments.

  • Poisson Distribution:

    • Discrete, models count of events in a fixed interval.

    • Formula:

    • Example: Number of mutations, arrivals.

  • Multinomial Distribution:

    • Discrete, more than two categories.

    • Each individual falls into category with probability .

    • Example: Year of study, type of fruit.

  • Weibull Distribution:

    • Continuous, only positive values.

    • Main parameter: skewness; scale parameter defines spread.

    • Example: Age distribution, survival times.

Probability Rules and Sample Space

Basic Probability Rules

Probability theory provides rules for calculating the likelihood of events in a sample space.

  • Probabilities must be between 0 and 1.

  • Sum of probabilities in sample space must equal 1.

  • Complement Rule: , where is the complement of event .

  • Addition Rule (Disjoint Events): if and are mutually exclusive.

  • Addition Rule (Non-disjoint Events):

  • Multiplication Rule (Independent Events):

  • Multiplication Rule (Dependent Events):

Sample Space and Categories

The sample space is the set of all possible outcomes of a random phenomenon. Categories must be exhaustive and mutually exclusive.

  • Exhaustive: All possible values are covered.

  • Mutually Exclusive: Each case falls into only one category.

  • Relative Frequency: Probability of event is the relative frequency of category in large samples.

Categorical Data and Random Variables

Handling Categorical Data

Categorical data are classified into distinct groups or categories. Random variables can represent these categories, and probability rules apply to their frequencies.

  • Sampling Unit: The object or individual being sampled (e.g., piece of fruit).

  • Sample Space : List of possible outcomes for random variable .

  • Count: Frequency of each category.

  • Relative Frequency: Proportion of each category in the sample.

Application: West Nile Virus Case Study

Analyzing Disease Trends

Statistical analysis of disease data involves comparing observed counts to expected frequencies under null models. For West Nile Virus (WNV), age groups and outcomes (cases, deaths) are analyzed.

  • Scientific Hypothesis: Which age group is most likely to be diagnosed with WNV?

  • Observed Data: Cases and deaths by age group.

  • Expected Frequencies: If all age groups are equally likely, expected counts are proportional to population frequencies.

Example Table: WNV Cases and Population

Age Group

Cases

Relative Frequency

Census Data (2010)

0-44

240

0.31

0.727

45-64

335

0.43

0.187

65-100

198

0.26

0.086

Example Table: WNV Deaths by Age Group

Age Group

Cases

Deaths

0-44

240

2

45-64

335

5

65-100

198

19

Interpreting Results

  • Relative Risk: Seniors (65-100) are more likely to die from WNV compared to younger groups.

  • Proportionality: Deaths should be proportional to cases or population frequency if risk is equal.

  • Conditional Probability: Probability that a patient who died was a senior:

Joint and Conditional Distributions

Joint Distributions

Joint distributions describe the probability of combinations of two categorical variables (e.g., age group and death status).

  • Joint Probability: is the probability that both events occur.

  • Independence: If , events are independent.

  • Expected Frequencies: If independent, expected frequency in each cell is product of marginal probabilities.

Conditional Probability

Conditional probability quantifies the likelihood of an event given that another event has occurred.

  • Formula:

  • Application: Probability that a patient who died was a senior:

Summary Table: Common Distributions

Distribution

Type

Key Properties

Example

Normal

Continuous

Symmetric, bell-shaped

Height

Log-normal

Continuous

Right-skewed, positive values

Concentration

Uniform

Discrete/Continuous

Equal probability for all values

Random digits

Poisson

Discrete

Counts of events

Mutations

Multinomial

Discrete

Multiple categories

Fruit type

Weibull

Continuous

Skewness, positive values

Survival time

Key Takeaways

  • Not all biological data follow a normal distribution; choose statistical tests and null hypotheses accordingly.

  • Understanding the type of random variable and distribution is crucial for proper statistical analysis.

  • Probability rules and joint/conditional distributions are foundational for analyzing categorical and count data.

Additional info: Some content inferred from context and standard statistics curriculum, such as formulas and definitions for distributions and probability rules.

Pearson Logo

Study Prep