Buffl

12. Statistics for RNA-seq

PP
by Pia P.

Which statements regarding unsupervised learning methods for gene expression are correct?

1) A principal Component Analysis (PCA) identifies significantly differently expressed genes

2) The number of principal components of a PCA is equal to the number of dimensions in the input data matrix

3) In a Principal Component Analysis (PCA) the first principal component (PC1) always explains more or equal variation than the second principal component (PC2)

4) PCA, t-SNE and UMAP identify clusters and assign p-values to cluster differences -> wrong, these learning methods are not hypothesis testing methods, so they do not spit out a p-value

5) An advantage of t-SNE versus PCA is that local cluster structures are better visible

6) Hierarchical clustering of gene expression can be done with different measures of similarity, such as the Pearson correlation coefficient or the Euclidian distance

7) Applying k-means clustering to gene expression data results in k clusters of genes and/or samples whereas k is a parameter that needs to bet set

Which statements regarding unsupervised learning methods for gene expression are correct?

1) A principal Component Analysis (PCA) identifies significantly differently expressed genes -> wrong; these learning methods are not hypothesis testing methods, so they do not spit out a p-value

2) The number of principal components of a PCA is equal to the number of dimensions in the input data matrix-> correct (PCA just resort the data matrix)

3) In a Principal Component Analysis (PCA) the first principal component (PC1) always explains more or equal variation than the second principal component (PC2) -> correct

4) PCA, t-SNE and UMAP identify clusters and assign p-values to cluster differences -> wrong, these learning methods are not hypothesis testing methods, so they do not spit out a p-value

5) An advantage of t-SNE versus PCA is that local cluster structures are better visible -> correct

6) Hierarchical clustering of gene expression can be done with different measures of similarity, such as the Pearson correlation coefficient or the Euclidian distance -> correct

7) Applying k-means clustering to gene expression data results in k clusters of genes and/or samples whereas k is a parameter that needs to bet set -> correct

Which statements regarding pathway/Gene set analyses in the context of gene expression analyses is correct?

1) In a Gene Ontology, a gene is usually annotated in several categories/GO terms

2) The Gene Ontology is a method to detect enriched pathways in differentially expressed genes

3) The Gene Ontology consortium annotates genes according to three main domains: Molecular Function, Biological Process and Cellular component

4) If a pathway is said to be differently expressed between two groups, it usually means that all genes in the pathway are differently expressed between the two groups

5) A more significant enrichment of differently expressed genes in GO group 1 than in GO group 2 also means that a higher fraction of genes are differently expressed in GO group 1 than in GO group 2

Which statements regarding pathway/Gene set analyses in the context of gene expression analyses is correct?

1) In a Gene Ontology, a gene is usually annotated in several categories/GO terms -> correct

2) The Gene Ontology is a method to detect enriched pathways in differentially expressed genes -> wrong, GO is not a method, it’s a database that stores gene annotation; the method is (should be) Gene set enrichment analysis, which uses GO for that matter

3) The Gene Ontology consortium annotates genes according to three main domains: Molecular Function, Biological Process and Cellular component -> correct

4) If a pathway is said to be differently expressed between two groups, it usually means that all genes in the pathway are differently expressed between the two groups -> wrong; if a pathway is affected, its just enriched, its more affected than you expect on average -> doesn’t mean all are higher expressed, just that a higher fraction is differently expressed (if one e.g. has a big group of genes than even just a slightly higher fraction of differently expressed genes will be significant; if it’s a small group, I need a big difference)

5) A more significant enrichment of differently expressed genes in GO group 1 than in GO group 2 also means that a higher fraction of genes are differently expressed in GO group 1 than in GO group 2 -> wrong, e.g. if the GO have different sizes -> (if one e.g. has a big group of genes than even just a slightly higher fraction of differently expressed genes will be significant; if it’s a small group, I need a big difference)

Author

Pia P.

Information

Last changed