undefined

by Pascal H.

What are the advantages and disadvantages of scRNAseq compared to Bulk-RNAseq?

Advantages

+ characterization of cellular heterogeneity within a population. (identify rare cell types, distinguish different cell states, capture cell-to-cell variability in gene expression)

+ discovery of novel cell types or subpopulations

+ can capture gene expression changes over time

Disadvantages

- Higher cost and lower throughput

- more complex data analysis

- Increased technical noise

Given a count matrix, what do we have to normalize for before any further analysis given we want to do?

sequencing depth -> Between sample comparison
gene length -> Within sample comparison

What normalization techniques can be used to normalize for sequencing depth and gene length?

TPM

divide by length of gene, constant 10^6, total number of reads
Use: within or between sample groups

CPM (only for depth)

divide by total number of reads, constant 10^6

Use: between sample groups

RPKM

divide by total number of reads, constant 10^6, length of a gene

within sample groups

FPKM

divide alle RPKM by 2

What is the reason for using a negative binomial rather than a poisson distribution for modeling read counts?

Why are TPM and FPKM considered within-sample measures of gene expression? Name two between-sample biases they do not capture.

normalize each sample independently

RNA Composition
Batch effects

What are barcodes in the context of scRNAseq?

1. Cell Barcodes

label and differentiate RNA molecules originating from different cells within a mixed population.

distinguish and assign RNA reads to specific cells, enabling cell-level analysis in scRNA-seq.

2. Unique Molecular Identifier (UMI)

used to distinguish individual RNA molecules within a cell.
utilized to identify and quantify individual RNA molecules within a cell, improving the accuracy of gene expression quantification.

Summary

Both barcode types are essential components of scRNA-seq library preparation and play critical roles in demultiplexing and data analysis.

What is the Droplet Method and how does ist work?

The droplet method, also known as droplet-based scRNAseq is a technique used in single-cell genomics to analyze the gene expression profiles of individual cells.

Steps

Cell suspension & barcoding
Droplet Generation
Emulsion
Reaction in Droplets
Reaction after Demulsification
Down Stream Applications etc.

What is the basic workflow to analyze scRNAseq?

Preprocessing
- control, normalization, and feature selection
Visualization
- PCA, t-SNE, UMAP
Analysis
- Cluster Analysis, Trajectory Analysis

scRNAseq-Analysis: What is clustering analysis?

cell cluster -> first result of any single-cell-anlysis
=> infer the identity of member cells
Clusters groups cells based on the similarity of their gene expression profiles
Expression profile similarity - determined by distance metrics (takes dimensionality reduced representations as input)
Two approaches to generate cell clusters from similarity scores:
- clustering algorithms
- community detection methods

What are doublets in context of scRNAseq?

Cells with…

unexpected high count
Large #detected genes

What is the advantage of UMIs?

help prevent PCR-bias

Differential expression analysis: What is the main cause for Bias? Which metrics to correct it?

Cause of Bias?

sequencing depth
library efficiency
amplification bias

How to correct?

TBD

Explain how DeSeq2 borrows information across genes.

Model expression mean vs. variance over all genes -> genes w/ similar mean are estimated together for their variance

empirical Bayes estimation:

Step 1: creates a pseudo-reference sample (row-wise geometric mean)

Step 2: calculates ratio of each sample to the reference

Step 3: calculate the normalization factor for each sample (size factor)

What distribution follows RNA-seq data? And why?

Negative binomial distribution

-> overdispersion

Differential expression analysis: Negativ Binomial Distribution vs. Poisson

distribution

Read counts descriped by Poisson distribution (mean and variance are equal)

-> Not the case for RNAseq-counts (overdispersion)

=> negativ binomial distribution - takes this into account through a dispersion

parameter alpha

Differential expression analysis - What is the source of overdispersion?

transcript is present at slightly different levels in each sample

What is STAR and why is it so efficient?

STAR is a alignment tool for RNAseq data.
Usage of uncompressed suffix array to search for MMPs

How does STAR basically work?

maximal mappable prefix (MMP) is determined for each aligned read
Seed: search for unmapped suffix that can be in another exon
Extend: no exact match for seed found -> extend previous seed; if no good alignment -> remove suffix from read
stich seeds togehter based on (a) proximity to a set of anchors or (b) best on best alignment of read

Why RNA-Seq Pseudo alignment?

Goal: Estimate transcript abundances

==> Classify diseases, understand expression changes, track cancer progression
Estimation but accurate
faster and more efficient than alignment based approaches
Tools: Kallisto and Salmon

How does SPONGE work?

What are the main ideas behind the steps in the analysis?

Identify likely miRNA-gene pairs
Indetify ceRNA pairs (<-> shared miRNAs) and calculate sensitivity correlation
Use SPONGE null model to infere p-values of the significance of the interaction

Name three biases that are overcome by the null model of the SPONGE method in comparison to previous correlation-based approaches (3 points):

Gene-gene correlation
sample size
several miRNAs regulate many transcripts

What are the advantages of the SPONGE method?

How does KALLISTO work, what are the inputs/outputs?

IN

Reference transcriptome
RNA-Seq reads from experiment

Steps

Indexing / Hashing of k-mers
- Construction of hash table of k-mers to contigs and their position within
- Skipping of redundant k-mers in same k-compatibility class
- Intersection of constituent k-mers => k-compatibility class of read
Pseudoalignment
- lookup of k-compatibility class for each k-mer in kallisto index, intersecting the k-compatibility classes
- K-mer hashing is strand agnostic
- Optimization: All k-mers in a contig of the de Bruijn graph have the same k-compatibility class
- => For each k-mer lookup, find distance to junctions at the end of contig ==> skip k-mers up to that distance
Quatification

OUT

Kallisto-Index, Quantification of RNA-Seq samples (transcript Pseudoalignment abundances)

Pseudoalignment, as implemented in methods such as Kallisto, is considerably faster than classical alignment-based approaches. Which of the following steps are part of the Kallisto method? (2 points) (multiple correct answers possible)

What makes pseudoalignment so much faster than mapping approaches?

What approaches for gene quantification from RNA-seq can be used

Which of the following terms is NOT used in pseudoalignment?

Give some examples for small and large non-coding RNAs

Large non-coding RNAs

lncRNA
eRNA
circRNA

Small non-coding RNAs

miRNA
siRNA

What are miRNA?

miRNAs are 19-22 nucleotide long molecules
key regulators of gene expression -> each miRNA can potentially regulate hundreds of genes

Name a hyptothesis of miRNAs and what does it state?

Competing endogenous RNA hypothesis == SPONGE hypothesis

RNA can compete for binding to miRNAs through shared MREs (miRNA response elements) present in their sequences
=> influence availability of miRNAs and affect the expression of target messenger RNAs (mRNAs)
suggests that when non-coding RNAs and coding RNAs contain similar MREs, they can act as "sponges" for miRNAs
this interaction among RNA molecules => complex regulatory network
=> abundance of one RNA => influence the expression of other RNAs (by competing for shared miRNAs)

ceRNA <=> Key for hidden RNA language and found everywhere

In contrast to microarrays, RNA-seq can

What methods exist to measure gene expression?

Microarray
Nanopore & -drop
Illumina HiSeq

Which of the following methods can be used for sequence assembly?

What approaches for gene quantification from RNA-seq can be used?

Which of the following terms is NOT used in pseudoalignment?

What makes pseudoalignment so much faster than mapping approaches?

What is the reason for using a negative binomial rather than a poisson distribution for modeling read counts?

Which statement is true?

What are desirable characteristics of single cell technologies?

What is the Barnyard plot used for?

What are UMIs and what are they used for?

What highlights issues in quality control?

What is NOT part of downstream analysis?

Which of the following proteins is NOT involved in miRNA synthesis?

Which factor is NOT considered by miRNA target prediction tools?

Which statement is NOT true about the competing endogenous RNA hypothesis?

What are the advantages of the SPONGE method?

What are applications of RNAseq?

transcript discovery

gene quantification
expression profiling

Name the three types of RNAseq reads.

Single-end
Paired-end
Multiplexing

Why is it not a good idea to only look at the log fold change when comparing gene expression between two conditions?

p-value

Name two models to do differential expression analysis

DESeq2
edgeR

What is the idea behind LFC shrinkage in DESeq2?

Correct fold change for variance

Why is it difficult to estimate dispersion in practice?

Often too few samples per group

Explain DESeq2’s dispersion shrinkage

treat each gene seperately -> estimate dispersion
fit a smoothing curve
shrink gene-specific dispersion towards expected dispersion

What is dimensionality reduction and why is it useful?

Visualization

Dimensionality reduction, UMAP vs tSNE.

What are desirable characteristics of single cell technologies?

What is the Barnyard plot used for?

Join Course

Preview

Author

Pascal H.

Information

Last changed
a year ago

Report course

List

Advantages

Disadvantages

TPM

CPM (only for depth)

RPKM

FPKM

1. Cell Barcodes

2. Unique Molecular Identifier (UMI)

Summary

IN

Steps

OUT

Large non-coding RNAs

Small non-coding RNAs

Author

Pascal H.

Information