What kind of methods do you know to filter for empty droplets and what do you need to consider for that?
Cellranger:
Plotting of UMI counts (basically genes discovered) depending on the number of barcodes to determin number of cells
everything ______
CAVE: always provide a number of estimated cells
emptyDrops:
(1) every droplets with less than 100 UMIs are filtered out
(2) find knee point and inflection point
(3) calculate correlation between cells in this range
How can you estimate the rate of doublets?
idea: sequencing human samples/tumors in mice:
just plot #mouse UMIs and #human UMIs —> double-positive fraction
alternative: use RNAseq counts that have the same SNPs
other methods available:
scDblFinder (R package):
Srublet: artificially create doublets (pool expression values of different columns, cluster observed transcriptome and synthetic doublets
CAVE: differentiate between clustered cells and continous expression date (trajectory) but doublet filtering might not be as important as these cells are clustering away anyway
What kind of distribution would you use to model scRNAseq and what kind of parameter does it take into account? How can you then estimate the technical variation?
Poisson/Negative Binomial/Regularized Negative Binomial distribution
Poisson: models distribution depending on mean gene expression across cells
Regularized Negative Binomial: middle gound between Negative Binomial and Poisson distribution (most stable,
variance residuals can be computed from fitted Poisson distribution
Last changeda year ago