5 0 3 MB
Genotyping and Linkage Mapping
Genotyping
Overview
What is genotyping ?
The analysis of DNA-sequence variation
Genotype = the genetic constitution of an individual
How much biodiversity
1.7—2.0 million species Estimates to 10 million
Important Terms Variation : Any nucleotide change in the genome Rare Polymorphism: Variation found in < 1% of population Polymorphism : Variation found in ≥1% of population
Locus: Chromosomal location of a gene Allele : alternative form of a gene or DNA sequence at a specific chromosomal location (locus)
Heterozygous: Feature of interest is different in both alleles Homozygous : Feature of interest is identical in each allele
Hemizygous : Only one allele exists (X in Males)
What are the Types of Mutations / Polymorphisms to be Genotyped? There are six major classes of genetic variation: 1. Single base changes 2. Simple di-, tri-, tetranucleotide repeats
3. Small insertions or deletions 4. Larger, tandem repeats 5. Multi-gene (Megabase) duplication (CNV)
6. Complex rearrangements
Classes of Mutation
An example of one simple question: How much variation is there?
What are the most Informative Classes for Genotyping Studies ? Polymorphism Type
Nickname
Heterozygosity
1. Single base changes
SNP
1-50%
2. Simple di-, tri-, tetranucleotide repeats
STR- short tandem repeats
50-90%
3. Small insertions or deletions
INDELS - Insertions or deletion
1-50%
4. Larger, tandem repeats
VNTR- variable # of tandem repeat
50-90%
5. Multi-gene (Megabase) duplication
CNV - Copy Number Variation
1-50%
6. Complex rearrangements
-----------
1-50%
How many loci should be assayed? Two strategies for selecting are possible: • Select a few highly informative markers •
Select numerous, poorly informative, markers randomly distributed within the genome
To scan the whole genomes… Not like this…….
but like this
Microcentrifuge Tube 384-well plates
96-well plates
Affymetrix genechip
Setting up the reactions
Not like this…….
but like this
Not like this…….
but like this
Applications enabled by HTP genotyping
Diagnostics, MAS, disease related genes, Domestication traits, bar coding, industrial protection of genotypes
Genotyped individuals
100,000
10,000
1,000
Plant and animal breeding for GWAS selected traits validation and candidate gene association Candidate region fine mapping
Genome-Wide Association Studies
100 Diagnostics Fingerprinting, Whole genome scans
10
10
100
1,000
10,000
Genotyped loci
100,000
1,000,000
High Throughput genotyping techniques Two main suppliers for GWA: ILLUMINA and AFFYMETRIX
Genotyped individuals
100,000
Genome-Wide Association Studies
10,000
iPLEX Gold
1,000
Sequenom
SNPlex, AB GenPlex
TaqMan
Invader SNaP 100 shot Pyroseq VeraCode GoldenGate
Illumina High-Density 1M-Duo chip
Illumina GoldenGate TaqMan assay Openarrays
10
Affymetrix Genome-Wide Human SNP Array 6.0 Illumina
iselect Infinium BeadChips
Affymetrix Targeted GeneChips 10
100
1,000
10,000
Genotyped loci
100,000
BeadChips 1,000,000
5 Basic Methodologies ….. 1) Hybridization – Microarrays – TaqMan, Molecular Beacons 2) Allele-specific PCR – FRET – Intercalating Dyes 3) Primer Extension – MALDI-TOF (Matrix Assisted Laser Desorption/Ionization Time-of-flight mass spectrometry) – SNaPshot (Single nucleotide primer extension) 4) Ligation – Padlock Probes – Rolling Circle Amplification 5) Endonuclease Cleavage – RFLP – PIRA/RFL
RFLPs (Based on Endonuclease Cleavage)
Differences in DNA sequence generate different recognition sequences and DNA cleavage sites for specific restriction enzymes Two different genes will produce different fragment patterns when cut with the same restriction enzyme due to differences in DNAsequence
Microarray (Based on Hybridization) Purpose: multiple simultaneous measurements by hybridization of labeled probe
DNA elements may be:
Oligonucleotides cDNA’s Large insert genomic clones
Microarray technologies
Microarray chip
Affymetrix 100k chip set
Affymetrix 500k chip (SNP array 5.0)
Entire genome with 100 000 SNPs (low density). Entire genome with 500 000 SNPs (high density)
Affymetrix 1M chip (SNP array 6.0)
Entire genome with 1 000 000 SNPs (very high density)
Organization of a DNA microarray
1.28 cm
Hybridization of a labeled probe to the microarray
Detection of hybridization on microarray Light from laser
Hybridization intensities on DNA microarray following laser scanning
B
BB (0)
AB (0.5)
AA (1)
A
SNPs
Single Nucleotide Polymorphisms Change one nucleotide
Insert Delete Replace it with a different nucleotide
Many have no phenotypic effect Some can disrupt or affect gene function
SNP genotyping methods
over 100 different approaches Ideal SNP genotyping platform:
high-throughput capacity simple assay design robust affordable price automated genotype calling accurate and reliable results
Overview of SNP array technology
A little more on SNPs
Most SNPs have only two alleles
Easy to automate their scoring Becoming extremely popular
Typing Methods
Sequencing Restriction Site Hybridization
Linkage Mapping
Overview
Types of Maps
Physical Maps
Cytogenetic Maps
Complete or partially sequenced organisms Breakpoints in disease Direct binding of probes to chromosome
Genetic Linkage Maps
Markers
What happens in meiosis…
Leads to formation of haploid gametes from diploid cells
Assortment of genetic loci
Recombination or crossover
What is Linkage?
Linkage is defined genetically: the failure of two genes to assort independently.
Linkage occurs when two genes are close to each other on the same chromosome.
However, two genes on the same chromosome are called syntenic.
Linked genes are syntenic, but syntenic genes are not always linked. Genes far apart on the same chromosome assort independently: they are not linked.
Linkage is based on the frequency of crossing over between the two genes.
Crossing over occurs in prophase of meiosis 1, where homologous chromosomes break at identical locations and rejoin with each other.
Applications/Uses of Linkage Maps
Studying genome structure, organization and evolution. Estimation of gene effects of important agronomic traits. Tagging genes of interest to facilitate marker assisted selection (MAS) programs. Map based cloning Identify genes responsible for traits.
Plants or Animals Disease resistance Meat or Milk Production, …… etc
Genetic Linkage Mapping Steps
Development of The Mapping Population Genotyping of Mapping Population (Selection of suitable MM). Linkage Analysis Map Construction QTL Identification (in case QTL-Mapping) Marker-Assisted Selection.
Development of The Mapping Population
Linkage analysis Linkage : alleles from two loci segregate together in a family. Recombination fraction (θ): the probability of a marker and a susceptibility locus segregating independently (recombination). θ= 0.5 No linkage;
θ< 0.5 linked together
Reasons why alleles at different loci may not assort independently: 1. Chance
2.Preferential Segregation (nonrandom segregation of homologous chromosomes) - hinted at but not shown in humans
non-
3.Linkage - the presence of loci measurably close together on the same chromosome.
Types of Linkage Analysis ƒParametric Lod-Score Hƒaseman-Elston Sib-Pair ƒAffected Sib-Pair and Affected Relative Pair ƒAffected Pedigree Member Method ƒVariance Components Method
Recombination frequency Total amount of recombinants
Ɵ = Total amount of recombinants + Total amount of non-recombinants
Parent
A
B
a
Gametes
Theta
50% non-rec and 50% rec
0.5
90% non-rec and 10% rec
0.1
99% non-rec and 1% rec
0.01
100% non-rec
0
b
In double heterozyote:
Cis configuration = mutant alleles of both genes are on the same chromosome = ab/AB
Trans configuration = mutant alleles are on different homologues of the same chromosome = Ab/aB
Genes with recombination frequencies less than 50 percent are on the same chromosome = linked) Linkage group = all known genes on a chromosome Two genes that undergo independent assortment have recombination frequency of 50 percent and are located on nonhomologous chromosomes or far apart on the same chromosome = unlinked
Recombination
Recombination between linked genes occurs at the same frequency whether alleles are in cis or trans configuration
Recombination frequency is specific for a particular pair of genes
Recombination frequency increases with increasing distances between genes
No matter how far apart two genes may be, the maximum frequency of recombination between any two genes is 50 percent.
• Cross-over frequencies can be converted into map units. • Ex: A 5% cross-over frequency equals 5 map units. –gene A and gene B cross over 6.0 percent of the time –gene B and gene C cross over 12.5 percent of the time – gene A and gene C cross over 18.5 percent of the time
Lod scores
1cM = 1MB 1MB=1000kb 1kb=1000bp 1cM = 1,000,000 bp
Genetic Mapping
The map distance (cM) between two genes equals one half the average number of crossovers in that region per meiotic cell The recombination frequency between two genes indicates how much recombination is actually observed in a particular experiment; it is a measure of recombination Over an interval so short that multiple crossovers are precluded (~ 10 percent recombination or less), the map distance equals the recombination frequency because all crossovers result in recombinant gametes. Genetic map = linkage map = chromosome map
58
Gene Mapping: Crossing Over
Crossovers which occur outside the region between two genes will not alter their arrangement
The result of double crossovers between two genes is indistinguishable from independent assortment of the genes
Crossovers involving three pairs of alleles specify gene order = linear sequence of genes
59
Genetic vs. Physical Distance
Map distances based on recombination frequencies are not a direct measurement of physical distance along a chromosome Recombination “hot spots” overestimate physical length Low rates in heterochromatin and centromeres underestimate actual physical length
60
Gene Mapping
Mapping function: the relation between genetic map distance and the frequency of recombination
Chromosome interference: crossovers in one region decrease the probability of a second crossover close by
Coefficient of coincidence = observed number of double recombinants divided by the expected number Interference = 1-Coefficient of coincidence
Genetic distance Genetic distance = the genetic length over which one crossover occurs in 1% of meiosis. This distance is expressed in cMorgan. 1 cMorgan = 0.01 recombinants = average of 1Mb (physical distance) (Assuming that the recombination frequency is uniform along the chromosomes)
As double recombinants occur the further two loci are, the frequency of recombination does not increase proportionately.
Linkage related Concepts
Interference - A crossover in one region usually decreases the probability of a crossover in an adjacent region.
CentiMorgan (cM) - 1 cM is the distance between genes for which the recombination frequency is 1%.
Lod Score - a method to calculate linkage distances (to determine the distance between genes).
Linkage vs. Association
Linkage analyses look for relationship between a marker and disease within a family (could be different marker in each family)
Association analyses look for relationship between a marker and disease between families (must be same marker in all families)
Thank You Any Questions ??