%GC
percent of G and C nucleotides in the genome (G=C, A=T, but GC may not = AT)
%GC varies among species (particularly bacteria, plants, invertebrates; little variation among vertebrates)
However, vertebrates show much more %GC heterogeneity within their genomes
Isochores
The vertebrate genome can be divided into isochores
= long stretches (100s of kb) of DNA with uniform %GC
With completion of the human genome, it was found that isochores could span >10 Mb
High %GC = heavy isochores (H)
Low %GC = light isochores (L)
Traditional isochore classification of human genome:
< 37% GC
37-42% GC
42-47% GC
47-52% GC
>52% GC
L1
L2
H1
H2
H3
Often L1 and L2 are grouped together as a single L isochore
Heavy isochores are found only in warm-blooded vertebrates (mammals, birds), not in cold- blooded vertebrates (fish)
Why Isochores?
Selectionist hypothesis
GC-pairing is stronger than AT-pairing (3 vs. 2 hydrogen bonds) and may stabilize DNA at higher temperatures.
supported by the observation that heavy isochores are found in warm-blooded vertebrates
heavy isochores are gene-rich
Mutationist hypothesis
pool of available nucleotides changes over replication (which takes 8 or more hours for mammals)
more GC available early in replication, so mutations will be biased towards G or C
Over time, regions of the genome that replicate early become GC-rich
In general, GC-rich regions have been observed to replicate early
sliding window analysis of the human genome DNA sequence
defining isochores as segments >300 kb with distinct %GC and low heterogeneity
the authors found: that isochores covered 41% of the human genome, and most had low %GC
They suggest a four-family model with mean GC contents of 35%, 38%, 41%, and 48% and conclude:
“These findings undermine the utility of the isochore theory and seem to indicate that the theory may have reached the limits of its usefulness as a description of genomic compositional structures”
On a much smaller scale, %GC varies among different regions of a gene
Coding regions > introns > 5’ flanking regions > 3’ flanking regions
However, there is a strong correlation in %GC among all regions of a gene.
Codon Bias
analysis of many protein-coding sequences from a species indicates that all of the synonymous codons for a particular amino acid are not used with equal frequency as would be expected at random
This phenomenon is known as “codon bias”
Certain codons are “preferred” and are used much more frequently than “unpreferred” codons
For example, Leucine is an amino acid that shows very high codon bias
Leu can be encoded by 6 different codons, CTG, CTA, CTC, CTT, TTG, TTA
At random, we would expect each codon to be used about 1/6 (17%) of the time
However, in highly expressed E. coli genes, CTG is used ≈90% of the time
In yeast, TTG is used ≈90% of the time
The preferred codons correspond to the most abundant tRNA in each species, suggesting that selection favors the use of codons that increase the level of gene expression
Note that the preferred codons may differ from species to species
level of codon bias - measures
ENC
Fop
level of codon bias - measures: ENC
ENC = effective number of codons
the average number of codons that are used to encode the 20 amino acids
The minimum is 20 (one codon per a.a.) the maximum is 61 (all codons except the 3 stop codons)
Low ENC = high codon bias.
ENC can be applied to any species without prior knowledge of expression or codon usage.
level of codon bias - measures: Fop
Fop = frequency of optimal codons
the frequency with which the “optimal” codon is used for each amino acid
Optimal codons are defined as those used with the highest frequency in highly expressed genes
High Fop = high codon bias.
Fop is species-specific and requires that optimal codons are known. For this, one must have many gene sequences and expression information.
Some observed patterns of codon bias:
a) higher in highly-expressed genes
b) higher in short genes than in long genes
c) higher in female-expressed than in male-expressed genes
Why is there codon bias?
a) Selection
b) Mutation (neutral)
Why is there codon bias? Selection
Natural selection favors the use of optimal codons (those that correspond to the most abundant tRNA) to make translation faster and more accurate
Codon bias could be used as a way to regulate gene expression post-transcriptionally.
Evidence: Highly expressed genes have higher codon bias
Conserved protein motifs, such as DNA binding domains, have higher bias than other protein regions.
This suggests selection for accuracy of translation.
Experimental replacement of optimal codons with non-optimal codons reduces the level of protein. Example: Drosophila Alcohol dehydrogenase (ADH)
Why is there codon bias? Selection - Example leucine
Replacement of optimal leucine codons with non-optimal codons leads to a lower level of ADH protein in vivo and reduces ethanol tolerance in adult flies
Wa-F = wild-type; optimal leucine codons
1 leu = 1 luecine codon changed from optimal to non-optimal
6 leu = 6 leucine codons changed from optimal to non-optimal
10 leu = 10 luecine codons changed from optimal to non-optimal
In a comparison of ADH protein concentration, it was found that:
Wa-F > 1 leu > 6 leu > 10 leu
The wild-type flies were also more tolerant to ethanol that the mutant flies
The LD50 (the ethanol concentration at which 50% of the flies were killed within 24 hours) was 9% for wild-type flies
It was only 7.5% for 10 leu flies.
Replacing sub-optimal leucine codons in the Adh gene with optimal codons increases ADH enzymatic activity in larvae, but decreases it in adults
This suggests that there may be a trade-off between optimal codon usage (or other factors) in different developmental stages
Codon bias is highest for larval-expressed genes.
Why is there codon bias? Mutation
there may be a bias in mutation
For example, if mutations from A or T to G or C are more frequent and there is no selection on synonymous sites, then these sites should become GC-rich
The reverse bias in mutation would lead to synonymous sites being AT-rich
In both cases, this would lead to non-random codon usage.
Why is there codon bias? Mutation bias A,T,C,G
Most optimal codons end in G or C
Thus, the mutational hypothesis would require a mutational bias towards G or C
However, most observations indicate that the mutational bias is towards A or T, which runs counter to the prediction#
However, there does appear to be biased mismatch repair in favor of G or C
This tends to be strongest in regions of high recombination (where most of the highly codon-biased genes are located)
So this process could explain at least part of the observed codon bias.
The GC content at third codon positions is correlated with local genomic GC content, suggesting an effect of mutation of codon usage.
Why is there codon bias? - selective and neutral explanations
The strength of selection acting on a particular codon is expected to be very weak (on the order of Ns = 1, where N is the population size and s is the selection coefficient)
This means that it is often very difficult to distinguish between selective and neutral explanations for codon bias.
Typically, there is evidence for selection affecting codon usage in organisms with small genomes and large population sizes (bacteria, yeast, Drosophila)
The evidence is much weaker in humans and other vertebrates, where it appears that mutational/repair biases can explain the observed codon usage.
Last changeda year ago