When was the completion of the human genome announced
2000
Name two groups who published the “draft” sequence
1. Publicly-funded International Human Genome Project (IHGP; Francis Collins, Eric Lander, et al.). Published in Nature (409: 860-921). Made sequence and annotation freely available to public through GenBank
2. Celera Genomics (Craig Venter). Published in Science (291: 1304-1351). Made raw sequence with minimal annotation available on Celera website for free. Charged a subscription for full access to sequence with annotation.
Where did they get the human DNA? Whose genome was sequenced?
IHGP – DNA collected from anonymous donors of both sexes and diverse ethnic backgrounds. A subset (≈10%) were used for library construction. Identity is untraceable.
Celera – 21 voluntary donors from diverse ethnic backgrounds, of these 5 were chosen for sequencing (3 women, 2 men): 1 African, 1 Chinese, 1 Hispanic, 2 Caucasian. Later it was revealed that most of the sequenced DNA (70%) was from Venter himself.
In both cases, only a single “reference genome” was produced. It did not take into account differences among individuals or between the two copies of a chromosome within an individual. There was just one sequence representative of the entire species.
What was the genome sequencing strategy for the human genome for IHGP ?
“Clone-by-clone” or hierarchical shotgun sequencing, distributed worldwide.
Used set of BAC clones of ≈100–200 kb each.
Total clone sequence = 4.3 Gb (clones partly overlapped each other, so total more than 3 Gb).
Most clones in “draft” form (3–5x coverage).
About 20% of clones in “finished” form (8–12x) coverage.
Total Raw sequence = 23 Gb (about 7.5x coverage of genome).
The largest sequencing center (at MIT) could do 200,000 Sanger sequencing reactions per day using robots and automated sequencing machines. The human contact per machine was about 15 minutes. Across all centers, the IHGP could produce 1 Kb of sequence per second.
What was the Assembly Strategy for the human genome from Celera?
Computational assembly similar to that used for Drosophila: Screener, Unitigger, Scaffolder, Repeat Resolver (Rocks, Stones).
Compartmentalized Genome Assembly: Used above data, but scaffolds and bactigs were first separated into large chromosomal regions (using information from IHGP mapping), and each region assembled separately. This assembly was used for annotation and analysis.
What is Whole Genome Shotgun Assembly (WGS) ?
Whole Genome Shotgun Assembly: Celera shotgun reads + shredded IHGP bactig assemblies (“faux” shotgun reads of 550 bp of perfect 2X coverage of each bactig). Bactig = a contig of assembled reads from within a BAC clone. Due to BAC overlap, this is ≈3x complete (not random) coverage of the genome.
How big and how many gaps where sequenced in the human genome at IHGP and Celera?
IHGP – 2.69 Gb of sequence, 145,514 gaps
Celera – 2.65 Gb of sequence, 116,442 gaps
It is hard to directly compare the two versions of the genome, but by most measures they appear to be quite similar in quality and completeness.
What is the human genome annotation at IHGP?
Trained Genscan with known human genes (splice signals, codon usage, exon and intron length) to predict ORFs.
Confirmed ORFs (open reading frames) with EST (expressed sequence tags) or protein match, or with Genie prediction.
Total number of human protein-encoding gene = 31.778
We now know that the gene number is even lower; the current estimate is around 20,500.
What is the human genome annotation at Celera?
Otto, automated gene prediction software.
Uses combination of experimental evidence and ab initio prediction (human or mouse EST, protein database match, mouse genome fragment match).
Celera has already started sequencing the mouse genome, so had some mouse sequence for comparison.
Total number of human protein-encoding gene = 26.588
What was the Assembly Strategy for the human genome from IHGP?
each large-insert clone (BAC) assembled separately from shotgun reads, then entire genome ordered based on order of previously mapped clones.
What was the genome sequencing strategy for the human genome for Celera ?
Whole Genome Shotgun (WGS) sequencing, done at Celera sequencing center Inserts of 2 kb, 10 kb, 50 kb with paired reads for >75%.
Total sequence = 14.8 billion base pairs (about 5x coverage of genome).
The Celera sequencing center could do 175,000 Sanger sequencing reactions per day.
In total, 9 months were spent on sequencing.
How many base pairs has the human genome and when was it announced
3 billion bp and June 2000
How much does it cost to seuqence a human genome
1000 $
What does Celera published 2004
a human genome assembly using only their own WGS data:
Whole-genome shotgun assembly and comparison of human genome assemblies.
What does October 2007 happend?
the first diploid genome sequence of an individual human (Craig Venter) was published.
This was the first time that a single genome (instead of a composite sequence) was available, and the first time that it was possible to identify both alleles (maternal and paternal) present over the whole genome.
What did also happend in 2007
454 sequencing technology (next generation) was used to sequence the genome of James Watson,(co-discoverer of the double helix structure of DNA)
Last changeda year ago