what do we learn out of FastQC?
what is a referenced baed assembly?
RNA seq reads are maped against the correspondinggenomic position
in a Perfect world, reads map only to exonic regions and split reads identify exon-intron boundaries
what is a de-novo assembly?
overlapping sequence reads with sufficient sequence similarity are collapsed into longer sequences (aka contigs)
the contigs serve as reconstruction of the original transcript
what is a contig?
a set of reads that are related to one another by overlap of their sequences. all reads belon to one and only contig, and each contiguous contans at least one read
set of reads that locally overlap with each other
what is a scaffold?
ordered and oriented - typically not overlapping - contigs seperated by gas approximtely known lentgh. Scaffolds ae typically fomred by identifying contig pairs that each contain one read of a `read pair`
what is the toolbox for RNA seq assembly?
De NoVo or Reference based
Reads
Read clean up
Assembly
3.1 De novo:
Trinity
3.2 Reference based
alignment to reference genome
Transcript reconstruction
Post assembly analysis
QC
Full length transcription analysis
Abundance estimation
DGE
Protein coding region annotation
Functional annotation
what are the prblems with paralogs?
software can not distinguish between similar reads
what are the Four stages of trinty?
Jellyfish
Extracts and counts K-Mers
K=25 from reads
Inchworm
Assembles initial contigs by “greedity” extending sequences with most abundat K-mers
Chrysalis
Clusters overlapping Inchworm contigs, builds de Brujn graphs for each cluster,partitions reads between clusters
Butterfly
Resolves anternatively spliced and paralogous transcripts independently for each cluster
what are the steps of inchworm?
extract k-mers
generate hash table
assemble contigs
begin with most frequent entry in hash table and maek as used (looks for kmer that has highest count)
search the table for k-1 overlapping kmers (t left and right)
use exact pattern matching > constant in time
if a kmer is found mak as used n the hash table
extended seed with the mew nucleotide
continue until no further extension possible
then start with unused Kmer as a new seed
decompose all reads into overlapping Kmers, inoring low-complexity kmers
what does chrysalis do?
integrate isoforms via k-1 overlaps, verify via “welds”
build de brujin graph
Last changed11 days ago