Skip to main content
Back

Genetic Approaches to Identifying Risk Alleles for Complex Disease

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Strategies to Identify Risk Alleles for Complex Disease

Family-Based Linkage Analysis

Family-based linkage analysis is a classical genetic approach used to map genes associated with disease by studying their inheritance patterns within families. This method is particularly useful for identifying rare, highly penetrant alleles.

  • Definition: Linkage analysis examines the co-segregation of genetic markers with disease phenotypes in families to identify chromosomal regions likely to contain disease-causing genes.

  • Pedigree Analysis: Pedigrees are constructed to track the inheritance of alleles and disease status across generations.

  • Identity by Descent (IBD): Affected siblings are analyzed for shared chromosomal segments inherited from a common ancestor, indicating potential linkage to disease loci.

  • Example: In a pedigree, if affected siblings share a particular allele more often than expected by chance, this region is likely linked to the disease.

Population Association Studies

Population association studies, including genome-wide association studies (GWAS), compare allele frequencies between affected and unaffected individuals in a population to identify genetic variants associated with disease risk.

  • Case-Control Design: Individuals with the disease (cases) are compared to those without (controls) to detect differences in allele frequencies.

  • Statistical Association: Significant differences suggest that a particular allele may contribute to disease susceptibility.

  • Example: If allele 'b' is more frequent in cases than controls, it may be a risk allele for the disease.

Genetic Models for Complex Traits

Two Loci Controlling a Trait with Environmental Variation

Complex traits are often influenced by multiple genetic loci and environmental factors. The combined effect of two loci can produce a range of phenotypes, often following a normal distribution due to environmental variation.

  • Genotype Combinations: For two loci (A/a and B/b), there are nine possible genotype combinations (e.g., AABB, AaBb, aabb).

  • Phenotypic Distribution: Each genotype combination contributes to the overall phenotypic variation, which is further broadened by environmental effects.

  • Example: The distribution of a quantitative trait (e.g., height) in a population can be modeled as the sum of genetic and environmental influences.

The Threshold Model for Discrete Diseases

The threshold model explains how complex diseases with discrete outcomes (affected/unaffected) can arise from the combined effect of multiple genetic and environmental factors.

  • Liability: An underlying continuous variable (liability) represents the combined genetic and environmental risk.

  • Threshold: Individuals whose liability exceeds a certain threshold develop the disease.

  • Frequency Distribution: The model uses a normal distribution to represent liability in the population, with only those above the threshold being affected.

  • Equation: , where is the threshold and is the liability distribution.

Power in Association Studies

The Effect of Numbers, Effect Size, and Allele Frequency

The statistical power of an association study depends on the sample size, the effect size of the risk allele (measured as genotype relative risk, GRR), and the frequency of the risk allele in the population.

  • Genotype Relative Risk (GRR): The increased risk of disease conferred by carrying a particular allele.

  • Power Curves: Power is highest when the risk allele frequency is intermediate and the effect size is large.

  • Equation: , where is the probability of a Type II error (failing to detect a true association).

  • Example: Studies with higher GRR and larger sample sizes have greater power to detect associations.

Identity by Descent (IBD) in Affected Sib Pairs

IBD analysis in affected sib pairs is used to determine whether siblings with the same disease share genetic segments inherited from a common ancestor, which can indicate linkage to disease loci.

  • IBD Sharing: Siblings can share 0, 1, or 2 alleles IBD at a locus.

  • Interpretation: If affected sib pairs share an allele IBD more than 50% of the time, this suggests linkage to a disease gene.

  • Example: In a diagram, IBD (1-share) means both siblings inherited the same allele from one parent; non-IBD (0-share) means they inherited different alleles.

HLA Locus and Disease Association

Structure and Function of the HLA Region

The human leukocyte antigen (HLA) region on chromosome 6 is highly polymorphic and plays a critical role in immune function. It is divided into three classes (I, II, III), each with distinct genes and functions.

  • HLA Class I: Present on all nucleated cells; present endogenous peptides to CD8+ T cells.

  • HLA Class II: Expressed on antigen-presenting cells; present exogenous peptides to CD4+ T cells.

  • HLA Class III: Contains genes for complement components and other immune-related proteins.

  • Example: Certain HLA alleles are associated with increased risk for autoimmune diseases.

Linkage Disequilibrium and Haplotype Blocks

Linkage Disequilibrium (LD)

LD refers to the non-random association of alleles at different loci. It is a key concept in mapping disease genes using population data.

  • Founder Effect: A mutation in a founder population can be inherited along with neighboring alleles, creating a region of LD.

  • LD Mapping: Regions of high LD can indicate the location of disease-associated variants.

  • Example: A risk allele (e.g., aBc) is found in a chromosomal region with strong LD in the present population.

Haplotype Blocks

Haplotype blocks are regions of the genome where genetic variation is limited and alleles are inherited together. These blocks can be identified using LD patterns.

  • Definition: A haplotype is a set of alleles at multiple loci that are transmitted together on the same chromosome.

  • Block Structure: The genome is organized into blocks of high LD separated by regions of recombination.

  • Example: At the RCA locus on chromosome 1q31.3, several haplotype blocks can be identified, each with limited diversity.

Block

Size (kb)

Genes Included

Block 1

63

ASPM, F13B

Block 2

8

CFHL5

Block 3

50

CFHL2, CFHL4

Block 4

145

CFHL1, CFH

Limited Chromosome Diversity in Haplotype Blocks

Within haplotype blocks, only a few common haplotypes are typically observed, reflecting limited recombination and historical mutation events.

  • Single Nucleotide Polymorphisms (SNPs): Variants within a block can be used to define haplotypes.

  • Example: Three major haplotypes may be present in a block, each defined by a unique combination of SNPs.

Statistical Analysis in Association Studies

Graphical Representation of P Values

P values are used to assess the statistical significance of associations between genetic variants and disease. Graphical methods such as Manhattan plots and Q-Q plots are commonly used to visualize results from GWAS.

  • Manhattan Plot: Plots the negative logarithm of P values across the genome, highlighting regions with significant associations.

  • Q-Q Plot: Compares observed versus expected P values to assess the presence of true associations versus random noise.

  • Example: Peaks in a Manhattan plot indicate loci with strong evidence for association with disease.

Genotype Calling and Quality Control

Cluster Plots for Genotype Calling

Cluster plots are used to assess the accuracy of genotype calling in high-throughput genotyping experiments. Each cluster represents a genotype (e.g., AA, AB, BB) based on signal intensity.

  • Quality Control: Well-separated clusters indicate reliable genotype calls, while overlapping clusters may suggest errors.

  • Example: Signal intensity plots are reviewed to ensure accurate assignment of genotypes to samples.

Pearson Logo

Study Prep