Protein Evolution
expected proportion of differences (K) between 2 sequences:
K = −ln(1−D)
◦ divergence (D)
• expected proportion of substitutions:
• expected proportion of substitutions: K = 2r t
with
r = rate of amino-acid substitution (the proportion of a.a.’s substituted per unit time t
The Molecular Clock
• K appears to be constant over time
K:expected proportion of substitutions / expected proportion of differences (K) between 2 sequences:
• once we know the rate of substitution for a given protein, we can use it to determine
the time of divergence of two species for which we have amino acid sequences of that protein
• molecular clock is constant with time, not with generations
Relative Rate Test
Clock Dispersion
DNA Evolution
• unlike proteins (20 amino-acids), only have 4 bases
◦ purines (double-ring chemical structure): A(denine), G(uanine)
◦ pyrimidines (single ring chemical structure): C(ytosine), T(hyamine)
• transitions (purine ↔ purine; pyrimidine ↔ pyramidine)
◦ e.g. A ↔ G ; C ↔ T
• transversions (purine ↔ pyramidine)
◦ e.g. A ↔ C, etc.
DNA divergence corecction (k = expected proportion of differences)
types of DNA:
protein-coding sequences (exons),
introns,
5’ and 3’ untranslated regions (UTRs),
5’ and 3’ flanking regions
Codons (in protein-coding sequences)
pseudogene: under no selective constraint
Synonymous/Nonsynonymous Substitutions
Codon Bias
a synonomous can lead to inproper splicing
—> synonomous mutation not always neutral
Molecular Phylogenetics
• ”tree building”: evolutionary relationships among organisms or genes using molecular
data (typically protein or DNA sequences) and statistical techniques
tree Nomenclature
◦ Additional notes:
OTU = Operational Taxonomic Unit. These are often (but not always!) species.
A monophyletic group is also known as a clade.
trees may also be drawn with the OTUs at the top, left, or right. It is the branching pattern (topology) that is important!
•species trees vs. gene trees
gene trees: evolutionary relationships based on single or groups of homologous genes
• homologous genes (homologs) = similar sequence due to common ancestry
◦ orthologous genes (orthologs): homologs due to speciation
◦ paralogous genes (paralogs): homologs due to gene duplication
• number of Possible Trees
maximum parsimony
tree with fewest seq changes = minimum evolution
human - ape tree
favor tree 1 by parsimony
General Methods tree
• phenetics → group by overall similarity, all characters are considered
◦ allows paraphyletic groups
• cladistics → group by evolutionary relationship, use shared, derived characters
(”synapomorphies”)
◦ accepts only monophyletic groups (clades)
Specific Methods tree
Bootstrapping
• statistical method that can be applied to the above distance, parsimony, and ML methods
• bootstrap values are given for each node of a tree
• if the bootstrap percentage is high
◦ the node is supported by many different sites in the alignment
◦ we have greater confidence that it is correct
Application of Molecular Phylogenetics
• example involves the domesticated dog
• shows incredible phenotypic diversity, but has a relatively short evolutionary history
fossil evidence
dog-like jaws, other bones: found in Europe (Germany) dating to 14,000 years ago
dog-like skeletons: found buried with human remains in Israel (12,000 years ago)
Molecular Studies
Domestic cats
• mitochondrial DNA from domesticated cats and wild cat species
• domestic cats have a Near Eastern origin
• most closely related to the Near Eastern wild cat
Testing the Neutral Theory
Tajima’s D - tests
Hudson-Kreitman-Aguidé (HKA) test
McDonald-Kreitman (MK) test
Haplotype tests
• distribution of segregating sites among alleles
• haplotype = unique sequence type
• Haplotype number test - tests the total number of haplotypes in a sample
• Hudson’s haplotype test - tests for subsamples with unusually low polymorphism
• problem: similar to Tajima’s D, hard to interpret reason for departures from neutrality
Genetic Hitchhiking and Selective Sweeps
• neutral theory predictions for polymorphism and divergence depend only on the effectivepopulation size and the mutation rate, but not recombination
• experimental results indicate that there is a correlation between polymorphism and rate of recombination
• decoupling of polymorphism and divergence often leads to rejection of the neutral theory in regions of low recombination by the HKA
• genetic hitchhiking: this deviation from neutrality
◦ positively selected mutation goes to fixation → all linked neutral mutations to fixation
Background Selection
• alternative explanation to genetiv hitchhiking
• also relies on the “hitchhiking” of neutral polymorphisms
• except here they are linked to deleterious mutations that are removed from the popu-
lation
by negative (purifying) selection
• reducing the effective population size in regions of low recombination relative to those
with normal recombination
Origin of humans
Origin of modern humans
out africa accepted today
first ancient human hybrid:Denny (father denisovans, mother neaderthal)
Mitochondrial Eve
Genetic variation and the effective population size of humans
Geographic differentiation in humans
within subpopulations
Positive Selection in Humans
• identify genes or regions of the genome that have undergone recent positive selection
Unusual patterns of polymorphism
common inversion under selection in Europeans
Prion disease and human cannibalism
• Creutzfeld-Jakob Disease (CJD) in humans; or Kuru
• Bovine Spongiform Encephalopathy (BSE) in cows (also known as “mad cow” disease)
• Polymorphism in the human protein PRNP is associated with CJD and Kuru
• strong balancing selection in the Fore population exposed to cannibalism
• Tajima’s D at PRNP is significantly positive (suggesting balancing selection)
Selective advantage of FGFR2 mutations in the male germ
line
CCR5 gene mutation
malaria
prevents production of cd4 in t cells
The world of ”-omics”
• Genomics
◦ functional Genomics: experimental identification of functional regions of genome
◦ comparative Genomics: comparison of genomes between species
◦ evolutionary Genomics: genomes change over time
Genomics
• The scale of genomes:
◦ 1,000 base pairs (bp) = 1 kilobase (kb); scale of individual genes
◦ 1000 kb =1 megabase (Mb); scale of bacterial genomes
◦ 1000 Mb =1 gigabase (Gb); scale of vertebrate genomes
C-value paradox:
C value = genome size
not strong correlation between organism complexity and genome size
sequencing
example: Drosophila melanogaster genome with Shotgun
Functional Genomics - Microarrays
• used for transcriptomics
• made by attaching many DNA sequences to a small surface
• unique DNA sequence is placed in a ”spot” of known location ( 1000 spots in array)
• amplify each cDNA seperately by PCR
• cDNA sample (A):labeled with a ”red” fluorescent dye; cDNA from the other sample (B) is labeled with a ”green” fluorescent dye
◦ genes with higher expression in sample A: stronger red signal
◦ genes with higher expression in sample B: stronger green signal
◦ genes with equal expression samples A and B: equal red and green signal = yellow
Functional Genomics - Gene ”knockouts”
• also known as reverse genetics
• knockout usually refers to homologous recombination
• a knockout DNA sequence (usually a plasmid) that shares homologous end sequences
with the target gene is constructed in vitro, then introduced into the nucleus of a cell
RNA interference (RNAi)
double-stranded RNA (dsRNA) complementary to target gene introduced into cell
dsRNA activates an innate defense pathway that leads to the degradation of the corresponding mRNA → post-transcriptional gene silencing (PTGS)
• only about 1/3rd of all mouse genes lead to inviable or infertile mice when knocked-out
• this has led some researchers to classify genes as ”essential” and ”dispensable”
Comparative Genomics
Evolutionary Genomics - Rates of DNA loss
◦ Laupala (Hawaiian crickets) have a genome 11 times larger than that of Drosophila
◦ rates of DNA deletion in “neutral” dead-on-arrival (DOA) transposable elements indicate that there is a much faster rate of DNA loss in Drosophila through spontaneous
deletion mutations
Evolutionary Genomics - Gene duplication
central genes are under higher evolutional constraint
Patterns of DNA sequence variation
used to identify selective sweeps and determine the location of the selected site(s)
Hox genes
The eyeless gene
• fruit flies with a mutation in the eyeless gene fail to develop eyes
• homolog of eyeless is known as Pax6; mutations lead to defects in eye development
• eyeless coding sequence is replaced by the human Pax6 sequence, it can signal for normal eye development in the fly
• Eyes did not evolve independently, but were present in a common ancestor
Plant developmental biology
• homeotic genes that control floral development
• MADS-box (180bp) encodes 60aa of the DNA binding domain → transcription factors
• instead of a homeobox → MADS-box
• MADS-box: comes from four different genes from different species that contain this
domain (MCM1, AG, DEF, and SRF)
• some Hox domains can be found in plants, but don’t play important role in development
The role of cis-regulation in morphological divergence
cis acts locally, trans globally
drosophila yellow spot, cis
mice —> cis regulatory , Agouti DD/LD/LL
Evolution of Sex Chromosomes
Sex determination
Special regulation of the X chromosome
• two regulatory mechanisms specific to the X chromosome in male Drosophila
• X chromosome dosage compensation
◦ in male somatic tissues, expression of the single X chromosome in up-regulated ap-
proximately two-fold
◦ balances expression with autosomal genes and with the 2 copies of the X in females
◦ up-regulation occurs through the specific binding of an RNA/protein complex (Dosage
Compensation Complex, DCC)
• Suppression of X expression in the male germline
◦ expression of the X chromosome is suppressed
◦ referred to as meiotic sex chromosome inactivation (MSCI)
• suppression of the X occurs in testes, but not in other tissues
• degree of suppression depends on the expression level of the gene
expectation - aa selection
negative = purifying selection
Introns and recombination
intron length negative correlated with recombination rate
selection favors shorter introns
Last changed2 years ago