Definition of Next Generation Sequencing (NGS or “massively parallel sequencing“)
= group of DNA sequencing technologies that can rapidly sequence DNA on the gigabase scale
-> replaced Sanger Sequencing
“pre-next generation” – sequencing by hybridization
when to use it
method
pros and cons
-> used for SNP (single nucleotide polymorphism) detection or whole genome re-sequencing
-> requires previous knowledge of most of the sequence
-> can be used for human or model organisms which already have a high-quality reference genome sequence available tidentify rare differences among individuals
Method
very similar to the Affymetrix microarray system
but here hybridizes genomic DNA to an array that has 4 probes, each with a different nucleotide at the center
center base can be “called” by determining which of the probes it hybridizes to)
-> having enough probes allows to find every singe base in the genome and finde the differences to the reference genome
Pro
fast, inexpensive, high throughput (can re-sequence many individuals)
Con
requires reference genome, is not good at base calling if there is more than one polymorphism (or insertion/deletion) in a 25 base region
Next Generation Method:
454 (pyrosequencing or Roche GS FLX)
Steps?
Orignial vs modern method?
-> first next generation method to be commercially available and the first to be applied to large-scale sequencing projects
-> uses “sequencing by synthesis” approach
DNA is broken into pieces of 500-1,000 bp, ligated to adaptors, and amplified on tiny beads by PCR (emulsion PCR)
Beads (with DNA attached) are placed into tiny wells (one bead per well) on a PicoTiterPlate that has over a million wells. Each well is connected to an optical fiber.
Sequence DNA by adding polymerase and DNA bases containing pyrophosphate. The different bases (A,C,G,T) are added sequentially in a flow chamber. When a base complementary to the template is added, the pyrophosphate is released and a burst of light is produced. The light is detected and used to call the base. If the same base occurs multiple times in a row, the light signal will be proportionally stronger.
Originial method
read lengths were 100-150 bp
Modern method
read lengths were 700-800 bp
allows pair reads
1 run produces about 1 million reads in 10 hours
-> one machine can sequence >1 Gb per day
Illumina (Solexa)
-> uses “sequencing by synthesis” approach (similar to Sanger with flourescent labled terminators)
Steps
DNA is broken into small fragments and ligated to an adaptor.
The fragments are attached to the surface of a flow cell and amplified.
DNA is sequenced by adding polymerase and labeled reversible terminator nucleotides (each base with a different “color”). The incorporated base is determined by fluorescence. Then the fluorescent label is removed from the terminator and the 3’ OH is unblocked. This allows a new base to be incorporated and the process repeats.
Original Method
read lengths were 35 bp
Modern Method
read length increased to up to 150 bp or 250 bp
1 lane of the machine can give 180 million reads. A typical machine has 8 lanes that can be used simultaneously.
-> output is >1 Gb per day
SOLiD
Orignial method and its improvements?
-> uses “sequencing by ligation”
-> instead of DNA polymerase, usage of DNA ligase for sequencing
-> approach used for e.g. the human genome project
-> no longer common approach
The fragments are attached to beads and amplified by emulsion PCR. Beads are attached to the surface of a glass slide.
DNA is sequenced by adding 8mer fluorescently labeled oligonucleotides. If an oligo is complementary to the template, it will be ligated and 2 of the bases can be called. The attached oligo is then cut to remove the label and the next set of labeled oligos are added. The process is repeated from different starting points (using different universal primers) so that each base is called twice (two-base encoding). This allows for more accurate base calls.
read lengths were 25 bp
Newer Method
read lengths increased to up to 50-100 bp.
One run of the machine can give 85 million reads.
-> The output is >1 Gb per day
What is NGS used for?/
What are the two applications of NGS?
Whole Genome Sequencing
NGS methods give rather short read lengths and they are often used for re-sequencing.
Instead of doing a complete, independent genome assembly, the sequence reads can be aligned to a reference genome sequence.
-> e.g.,the sequence reads from a single person can be aligned to the reference human genome.
However, NGS methods have been modified to produce “paired reads” in which both ends of a DNA fragment of known length are sequenced. This makes it possible to do de novo assemblies of genomes.
Gene expression profiling
NGS methods can also be used to sequence cDNA.
short read length is enough to map a particular cDNA to the genome, allowing to measure gene expression.
Millions of cDNA fragments are sequenced in a single run and each fragment is then mapped back to its corresponding gene in the genome.
The more reads that match a gene, the higher the expression of that gene. Since this approach produces exact counts of transcript abundance, it is sometimes called “digital expression profiling”. It is more commonly referred to as “RNA-seq”. As throughput increased and the cost decreased, this appraoch replaced microarrays as the major technique used in transcriptomics. It is also possible to assemble de novo transcriptomes (non-model organisms) using sinlge- or pairedend reads
Limitation of NGS
Read lengths are relatively short. Usually 50-250 bp (but now up to 700-800 bp for 454 sequencing)
The error rates (the rate of calling the wrong base in a DNA sequence) are typically higher than with Sanger sequencing
Base calling is relatively slow. A base is added to the template, then the base is interrogated (called), then the next base is added. These methods have much higher throughput and can produce many more bases per hour than Sanger sequencing, though.
Newer Next Generation Method:
Pacific Biosciences (PacBio)
Method?
Pros and Cons?
-> can perform Single Molecule Real Time (SMRT) sequencing
sequencing by synthesis method which uses a DNA polymerase to add nucleotides to a template
polymerase enzyme is attached to a surface at the bottom of a nanometer-scale well (known as a ZMW or Zero-Mode Waveguide)
laser is focused very precisely on the polymerase at the bottom of the ZMW
different nucleotides are labeled with different fluorescent dyes that are released and fluoresce only when the base is incorporated into the growing DNA strand
A “movie” is made of each polymerase and bases are called as the fluorescence changes over time
fast (base incorporation time is less than 1 sec)
can give very long reads (up to 25,000 bp or more), which is very useful for de novo genome assembly
error rates are high (about 15%)
read lengths vary among sequences – some are very long, but most are short.
Average read length is around 900 bp.
The machine is large and relatively expensive.
Ion Torrent
method?
pros and cons?
Method:
-> similar to 454, but does not rely on light detection.
-> Instead a semi-conductor chip is used to detect very small changes in pH when protons are released during base incorporation.
machine is smaller and cheaper than other next-gen technologies. Fast run-times (7 hours), suitable for individual labs.
Uses standard chip production techniques, so should become cheaper and more efficient over time – just like other microchips.
does not have as high sequencing throughput as Illumina or Solid.
Read length are a bit shorter than 454-Roche (currently up to 400 bases).
Can have high error rate, especially when there is a repeated stretch of the same nucleotide.
Oxford Nanopore (MinION)
improvements of the method
reads the bases of a single-stranded DNA molecule as it passes through a protein channel (or “pore”)
-> pores can be packed very densely and the sequencing can be very fast – on the order of milliseconds per base
-> These sequencers are small (hand-held) and can, ideally, sequence up to 1 Gb in 6 hours at low cost. Read lengths can be up to 100’s of kb
very small and inexpensive
can be taken into the field
gives the longest read lengths
high error rate (can be up to 15-30%)
typically requires some optimaization experiments to get platform to work on particular samples or in particular environments
Improvements
In the future, the protein channel may be replaced with a solid-state nano-channel. This may be something like a “DNA transistor”.
-> faster and cheaper than protein nanopores, but technical challenges remain before such a sequencer can be manufactured.
Last changeda year ago