by Tanja P.

What is SAGE (Serial Analysis of Gene Expression)?

How does it work and name the pros and cons?

= method that is similar to EST sequencing, but more efficient because only short “tags” of around 10–15 bases are sequenced from each cDNA


  • Before sequencing, the tags are concatenated so that many of them can be sequenced in a single Sanger sequencing reaction

    -> requires annotation of the genome, so that the tags can be accurately mapped back to their corresponding genes

  • Purify mRNA (poly-A) from sample

  • Use biotinylated oligo dT primer to synthesize double-stranded cDNA

  • cut cDNA with a restriction enzyme, such as NlaIII which recognizes the sequence CATG and cuts, on average, every 256 bp

  • purify only the 3' poly dT ends of the cut cDNA in a streptavidin column (binds to biotin attached to the oligo dT primer)

  • ligate an adapter (short synthesized DNA sequence) to the cut end. The adapter contains a restriction site for the restriction enzyme BsmFI (recognizes GGGAC, but cuts 15 bp away from this sequence into the cDNA fragment)

  • ligate two adapter ends to each other tail-to-tail to create “ditags”. PCR amplify the ditags with primers complementary to the to adapter sequence

  • cut again with NlaIII to remove adaptors, leaving a 30 bp ditag

  • ligate many ditags end-to-end (up to 1 Kb total length), then sequence 1000's of these. (typically sequence 30-40 tags per Sanger sequencing reaction)

-> Each 15-bp tag should give a unique match to a transcript in the genome (random match is very unlikely) and should always be after the 3' most NlaIII site (if not: genes may be missed)

-> To quantify the expression level of a gene: count the number of times that the tag for that gene is sequenced. At least 10,000–50,000 tags should be sequenced to get an accurate estimate of expression (≈300–1000 sequencing reactions).


  • gives an estimate of absolute transcript abundance

  • more efficient than large-scale EST sequencing, because many fewer sequencing reactions are required.


  • still requires much sequencing, which can be expensive

  • not accurate for rare transcripts

  • sometimes difficult to map tags to genes

  • must be repeated for each sample (tissue, sex, treatment, etc.)


Tanja P.


Last changed