Please explain the differences between the 10X platform and the SmartSeq platform? What are the differences in the SmartSeq versions?
10X platform:
UMIs integrated
higher throughput leading to way more cells sequenced
higher dropout rate, noisier data
low coverage
only 3’ data
cheaper
SmartSeq:
sorting required (e.g. FACS)
higher sequencing depth
more sensititive gene detection
good if few cells need to be deeply characterized (precious samples)
full length transcripts
Explain the way 10X cDNA libraries are generated?
Oil suspension to get single-cells including beads (emulsion)
Beads are added that contain R1, 10X barcode and polydTVN sequences
Explain the way smartSeq libraries are generated?
What defines the capture rate?
The capture rate is 1 - the fraction of cells that that are in droplets without any beads.
What is the split rate referring to?
The split rate is the fraction of droplets with exactly one cell that have more than one bead.
What is the doublet rate?
The doublet rate is the fraction of droplets with 1 bead that have more than one cell.
What defines the sensitivity?
Fraction of transcripts detected in every cell
What kind of fastq files are generated by bcl2fastq?
I1.fastq - sample index (more or less experimental/sample id, 10X barcodes are re-usable and just enough for roughly 700-800k cells)
R1.fastq files - cell barcode + UMI
R2.fastq files - transcript information
What steps are covered by the cellranger count module?
Read Trimming:
STAR Alignment:
MAPQ filtering
filtering on uniquely mapped reads
10x barcode correction:
Count the observed frequency of every barcode on the whitelist in the dataset.
For every observed barcode in the dataset that is not on the whitelist and is at most one Hamming distance away from the whitelist sequences:
Compute the posterior probability that the observed barcode did originate from the whitelist barcode but has a sequencing error at the differing base (by base quality score).
Replace the observed barcode with the whitelist barcode that has the highest posterior probability (>0.975).
UMI counting:
If two groups of reads have the same barcode and gene, but their UMIs differ by a single base (i.e., are one Hamming distance apart), then one of the UMIs was likely introduced by a substitution error in sequencing. In this case, the UMI of the less-supported read group is corrected to the UMI with higher support.
Last changeda year ago