What is the ideal sc transcriptomics method?
every cell is assayed
-> 100% capture rate
every transcript of every cell is detected
-> 100% sensitivity
every transcript is identified by its full length sequence
How is scRNA currently sequenced?
since 2017: in situ barcoding with microfluidic methods
inDrop
Drop-seq
10X
How does the inDrop method work?
At 4°C: bead releases barcodes
At 50°C: cDNA synthesis with barcodes
What should droplet look like and why is that exact composition necessary?
Droplet consists of cell and barcoding hydrogen bead in RT/lysis reagent, which are closed with oil
Beads:
more than one
-> too many barcodes -> only half or less transcript per bead
none
-> no sequencing at all
Cells:
different
-> can’t tell which transcript comes from which cell
same
-> too many (double) transcripts
-> waste of resources, since only barcodes are then sequenced
What is the cell capture rate?
= probability of one ore more beads in a droplet
= 1 - [probability of zero beads]
= 1 - e^(-μ)
What is the cell duplication rate?
= rate at which a captured single cell is associated with two or more beads
What are snythetic doublets?
= caused by barcode collision
barcode collision = when two cells are separately encapsulated with bead having identical barcodes
barcode collision rate = expected proportion of assayed cells not having a unique barcode
-> can be avoided with a high relative barcode diversity
What are technical doublets and what is the technical doublet rate?
technical doublets = two or more cells in a droplet
technical doublet rate = given that at least one cell is in the droplet, the rate of having additional cells in the droplet
Having a constant flow rate, what would happen if more cells are loaded?
loading more cells, while having the same flow, increases technical and synthetic doublets
What are biological doublets?
= two cells form a discrete unit, which does not break apart during disruption to form a suspension
-> avoid with nuclear scRNA-Seq
What are the pre-processing steps of sc analysis?
raw data processing
quality control
normalization
batch effect correction
visualization
PCA
tSNE
UMAP
What is PCA?
What are its pro and cons?
= Orthogonal linear transformation
Pro:
Captures as much variance as possible
Good at showing global structure
Con:
Poor at resolving local similarities
Sensitive to outliers
Not able to capture non-linear relations
However, non linear transformations perform better (tSNE, UMAP)
What is tSNE?
How does it work?
t-distributed Stochastic Neighbor Embedding
= unsupervised and non-linear dimensionality reduction technique, which preserves local similarities
Approach:
Select neighbors with Gaussian distribution over points in high dimensional space
Select neighbors with t-distribution ober points in low dimensional space
Minimize Kullback-Leibler divergence between both distribution
What is UMAP?
Uniform Manifold Approximation and Projection
= unsupervised and non-linear dimensionality reduction technique, which preserves local and global similarities
-> scales better than tSNE
Approximate manifold for data in high dimensional space using simplical complexes as neighborhood graph
-> K-simplex = convex hull of k+1 points (in k-d space)
-> Consider radius around points to get overlaps, since overlaps can be represented by simplices and create simplical complex
(overlap = x -> x-simplex)
Approximate distances for data in low dimensional space using spectral embedding
Optimize low dimensional fuzzy topology to be similar to high dimensional fuzzy topology via fuzzy set cross entropy
What are the pitfalls of non-linear transformations?
meaningless cluster size
meaningless distances between clusters
misleading patterns
What is Cluster Analysis?
= downstream analysis at cell level using clustering
useful for compositional analysis and cluster annotation
clustering groups based on cells having similar gene expression profiles
simiarity based on distance metrics
e.g. similarity scores from euclidean distances, calculated on PC reduced expression space
Approaches to generate cell clusters from similarity score:
clustering algorithms
community detection methods
What is trajectory analysis?
= downstream analysis at cell level with trajectory inference
-> molecular simulations to study the time evolution of a system's behavior over a series of discrete time steps
useful for finding Metastable states and Gene expression dynamics
interprets sc data as snapshots of a continuous process via trajectory inference methods
process is reconstructed by finding paths through cellular space with minimal transcriptional changes between neighboring cells
pseudotime variable = ordering cells along these paths = proxy for developmental time
-> variable related to transcriptional distances from root cell
How can cluster and trajectory inference be unified?
can be combined in coarse-grained graph representation, which represents both static and dynamic nature of data
-> sc clusters as nodes
-> trajectories between clusters as edges
What is RNA velocity?
= used to infer the direction and speed of changes in gene expression levels in single cells over time
leverages the ratio of spliced and unspliced mRNA molecules to predict whether a gene's expression is increasing or decreasing in each cell
providing insights into cellular dynamics and developmental processes
helps to understand how individual cells are transitioning between different states
What are the types of automatic annotations?
automatic annotation can be:
marker gene database-based
correlation-based
supervised-classification-based
Last changeda year ago