Define repetitive and non-repetitive eukaryotic DNA
non-repetitive: unique sequences, only one copy in haploid genome
repetitive: more than one copy:
moderately repetitive (complex repeats):
short sequences
10-1000 copies
short interspersed repeats
highly repetitive (simple repeats):
very short (<100bp)
many thousands of copies
long tandem repeats
What are the types of repetitive DNA?
Interspersed (complex)
Retroelements:
LINEs (Long Interspersed Nuclear Elements)
SINEs (Short “ - “)
LTRs (Long-terminal repeat transposons)
DNA-Transposons
Tandem (simple)
minisatellites
microsatellites
satellite & telomeric repeats
Segmental duplications
What are the problems with tandemly repeated DNA that occur during DNA sequencing?
can cause incorrect overlaps of fragments
What are the problems with genome-wide repeats that occur during DNA sequencing?
can also cause incorrect overlaps of fragments
Explain micro- and mini-satellites.
Microsatellites:
1-12 bp, e.g., (A)*n, (CA)*n, (CGG)*n
formed by: replication slippage
Minisatellites:
1-500 bp
What disease is associated with an expansion of triplet repeats such as CAG?
Huntington’s
What are satellites and telomeric repeats?
limited to well-defined chromosomal regions
satellites -> centromeric
telomeric -> end of chromosome (telomer)
can span millions of bps
often species-specific
How much percent of the human genome is comprised of micro-and minisatellites?
3%
What are simple sequence repeats (SSR)?
perfect (or slightly imperfect) tandem repeats of k-mers
How much percent of the human genome is made up of complex repeats (interspersed)?
~45%
What are DNA-Transposons?
type of interspersed (complex) repeats
can exit and re-integrate itself in genome
“cut-and-paste”
take advantage of
~3% of genome
What are the three different mechanisms of transposon transportation?
conservative: T itself moves to new location
replicative: copy of T moves to new location
retrotransposition: RNA copy moves to new location -> reverse transcription -> integration
What are Class I transposable elements?
Move by retrotransposition
e.g.
LTRs
LINEs
SINEs
What are Long Terminal Repeats (LTRs)?
are retrotransposons: transcribed to RNA -> back to DNA -> integrated
hundreds to thousands of bps
What are LINEs?
Long interspersed nuclear elements
autonomous retrotransposons, because
contain own retrotransposition machinery:
ORF1 -> chaperone for mRNA (helps folding)
ORF2 -> endonuclease -> reverse transcriptase
What are SINEs?
Short interspersed nuclear elements
evolved from RNA genes (e.g., tRNA)
up to 1000 bps
nonautonomous, because
do not have retrotranscription machinery
What are the most abundant SINEs?
Alu repeats
~300 bps
> 10% of genome
found in:
introns
3’ UTRs of genes
intergenic regions
What are Class II transposable elements?
move by conservative “cut-and-paste” mechanism
What is the connection between repeats and pseudogenes?
pseudogenes commonly arise from
retrotransposition
What are segmental duplications?
large copies of DNA (1 - >200kb) in genome
99% identitfy
Why do we study repeats?
is everywhere in eukaryotic genomes
evolution
disease
mobile elements may be coding
they complicate alignments
Describe the general approach to finding repetitive elements. List three methods.
find repeats
k-mer approach
sequence self-comparison
periodicity approach
build consensus of each sequence family
classify detected sequences
Describe the k-mer approach.
look for overrepresented k-mer
challenge:
defining k (length of repeat)
defining number of allowed mismatches
Name three programs used for the detection of repeats.
Repeat finding -> REPuter (1999)
only exact matches
Clustering -> RepeatFinder (2001)
merge and extend short exact matches
Masking -> RepeatMasker (2002)
uses precompiled libraries
Last changed4 months ago