Minimal Genome Projects
Question: What is the minimum number of protein-coding genes required for life? (at least for bacteria growing under laboratory conditions)
Bioinformatic approach
What genes are conserved in all sequenced bacterial genomes?
-> This may be the minimal core set.
Initially: used by Mushegian and Koonin in 1996
looked for all genes conserved among the 3 complete bacterial genomes available at the time (H. influenzae, M. genitalium, E. coli).
The result: 256 genes in the minimal set
A more recent check of the COG (Clusters of Orthologous Genes) database reveals:
63 genes are common to all of life (Archaea, Bacteria, Eukaryotes)
91 genes are common to Archaea and Bacteria
217 genes are common to Bacteria
Bioinformatic approach - limitations
with distantly-related species it is hard to identify orthologs
some functions can be performed by non-orthologous genes (functional analogs)
known as NOD (Non-Orthologous gene Displacement)
results will change as new genome sequences become available
no evidence that minimal set alone is sufficient for life
First Experimental approach - facts
1999
Craig Venter + colleagues at TIGR
They began with two closely-related species of bacteria:
Mycoplasma genitalium
the smallest genome known
580 Kb, 480 protein-encoding genes
M. pneumoniae
816 Kb
480 M. genitalium orthologs + 197 unique genes = 677 total genes
First Experimental approach (Venter) - Approach
Used transposable element (TE) insertioaln mutagenesis to randomly knock out genes in both species
then sequenced DNA flanking the TE to determine where it inserted and which gene was knocked out
If the TE was inserted within first 80% of a protein-coding region and beyond nucleotide 9 of the coding sequence, it was considered a gene disruption (or a “hit”)
Only surviving cells could be detected, so “hit” genes must be non-essential
First Experimental approach (Venter) - Results
Result/hits:
93 genes in M. genitalium
150 genes in M. pneumoniae
(57 had M. genitalium orthologs, 93 were unique)
For M. genitalium, the directly observed minimal gene number is 480 - 93 = 387. However, this will be an overestimate, because this was not a saturation screen: some genes were not hit just by chance.
First Experimental approach (Venter) - Minimal genome estimation:
Assume the 197 M. pneumoniae-specific genes are non-essential
Mutations were recovered in 93 (47%) of these genes
Assume that only 47% of the non-essential M. genitalium orthologs were hit, then 0.47N = 57 and N = 121, where N is the number of non-essential genes.
This gives 480 - 121 = 359 essential genes.
First Experimental approach (Venter) - Minimal genome estimation - Presumption of the authors
The authors think the number is probably lower and give a final estimate with a range of 265–350 essential genes.
111 genes of unknown function were not disrupted, many of these may be required for life.
Possible problems with the minimal genome estimate
Not all 197 M. pneumoniae-specific genes may be non-essential (“new” genes can become essential)
Genes were knocked-out individually -> What about synthetic lethals
synthetic lethals = two genes can be knocked out individually with no effect, but the cell dies if both genes are knocked out together
Can a synthetic organism be engineered by combining these 359 (or fewer) genes?
Secound Experimental approach
Kobayashi et al. in 2003
used common laboratory bacterium, Bacillus subtilis (4,100 genes)
individually knocked-out genes using a targeted, homologous recombination approach
These results were combined with those of previous studies.
Secound Experimental approach (Kobayashi) - results
271 genes (6% of total) were essential for growth under optimal laboratory conditions
The remaining 3,830 (94%) were non-essential!
Essential genes were classified into functional categories
about 50% were involved in DNA/RNA metabolism or protein synthesis
Over half of the essential protein synthesis genes encode ribosomal proteins
Secound Experimental approach (Kobayashi) - essential genes of B. subtilis
essential genes of B. subtilis are well-conserved in other bacterial species, and many are also found in Archaea and/or Eukaryotes
However, not all were completely conserved over all known genomes or even over all known bacterial genomes
Thus, these essential genes would not show up as part of the minimal genome by the bioinformatic approach
Of the essential genes, ≈30% were conserved across 54 bacterial species and ≈20% were conserved across 18 Archaea and Eukaryotic species.
Secound Experimental approach (Kobayashi) - Limitation
genes were knocked-out individually.
What about synthetic lethals?
Can an organism with only 271 genes be engineered?
Minimal Genome Projects: Conclusion
the number of essential genes is not strongly correlated with the total number of genes in the genome
there is a negative correlation between the percentage of essential genes and the total number of genes in the genome.
In species with few genes, such as M. genitalium, a high proportion of the genes are essential.
In species with many genes, such as B. subtilis, a low proportion of the genes are essential.
More recent work has shown that it is possible to transplant entire bacterial genomes and engineer synthetic life (Gibson et al. 2008, 2010).
This approach has been used to make a synthetic strain of Mycoplasma that has only 473 genes (Hutchison et al. 2016).
2 Approaches to build a minimal genome
best results:
473 genes
Last changeda year ago