Haemophilus influenzae genome size
1.8 Mb, 1 circluar Chr.
Who published the Haemophilus influenzae genome? And when? And by whom?
The Institute for Genomic Research (TIGR) in 1995 by Venter and Smith
What method was used to sequence the Haemophilus influenzae genome?
WGS
Total predicted protein-encoding genes in H. I.
Amount of genes with unknown function
1.743
736 were unique genes of unknown function
1.007 were assigned functional roles based on homology to other bacterial proteins
Do triplets occur with different probabilities? YES or NO?
Yes they do.
Advantages/reasons of the Haemophilus influenzae genome
Genome size very typical for bacteria
GC content (38%) similar to human genome
Physical map did not exist
(Hamilton Smith worked on this species for many years before)
Sequencing stragtegy for Haemophilus influenzae (steps)
Sequenced 4.000 templates, checked all sequences against each other for overlaps and calculated the P0 value -> compared observed dis with P = very close to expectation
3 months of sequencing using WGS (early-generation automated sequencing machines: ABI 373, 8 people and 14 machines):
Genomic DNA broken into 2-Kb fragments and cloned into plasmids
Insert DNA sequenced with universal primer (16,240 reads of 485 bp), ½ also with reverse primer (7,744 reads of 444 bp) to give “paired reads”
Around 300 large insert clones (15-20 Kb in lambda, λ, phage) sequenced from both ends (using fluorescent detection and gels)
Total seq = 11,631,485 bp ≈ 6.5x coverage assembled by computer
Targeted closure of gaps (“finishing”, after computational assembly)
Assembly stragtegy for Haemophilus influenzae
TIGR Assembler: pairwise comparison of all sequence reads for overlaps (build contigs = contiguous stretches of DNA sequence)
Result: 140 contigs + 140 gaps
Required that paired reads point to each other and be separated by ≈ 1 Kb
Paired-read information to connect contigs (sequence gaps: 98 gaps, filled by primer walking)
Remainder are physical gaps (no template available, total = 42 gaps)
name 4 oligonucleotide primer methods for Physical gaps
DNA hybridization (Southern blotting) to develop a “fingerprint”. Genomic DNA cut with restriction enzyme(s) and pieces separated by gel electrophoresis, then hybridized with labeled primer DNA. If two contig ends are close to each other, they should show the same (or very similar) fingerprints. This filled 15 gaps.
Peptide links. Each contig end was used for a BLAST search against a protein database. If two contigs match different parts of the same protein, then they are likely to be adjacent. This filled 2 gaps.
λ clones. The λ libraries (10–15 Kb inserts) were screened with the labeled
primers. If a contig end hybridized with a λ clone, then the ends of the λ clone
were sequenced and this was used to bridge the gap. This filled 23 gaps.
PCR. Pairwise combinations of contig end primers used for PCR reactions. An
adjacent pair of primers will give a PCR product of the size of the gap between them. This filled 37 gaps (and was used to verify other methods).
Name two gaps and theri closure methods for Haemophilus influenzae genome
a) Sequence gaps filled by primer walking: design new sequencing primer to span gap
b) Physical gaps: Specific oligonucleotide primers designed to ends of each contig.
Annotation H.infulenza
ab initio gene prediction (program = GENEMARK):
predicted open reading frames (ORFs) based on codon frequency matrices from 122 H. influenzae coding sequences in GenBank.
Predicted coding sequences were compared with GenBank and SwissProt databases.
Last changeda year ago