What is high dimensional data ?
Data with more features than samples.
e.g. Omics data
What are standard variable names.
n: number of samples
p,q: number if variables,predictors
For high dim-data p>>n
What are CNVs?
“Copy number variations”
The number of times a DNA segment is duplicated back to back
Variations due to Deletions / Insertions
What are some X (input) variables in high-dim bio data ?
CNVs (copy number variations) / X number of copies / values 1,2,3 ….
Microbial amplicon data / X i,j relative counts / 0 < X i,j < 1
Transcriptomic gene expression data
Proteomic data
What are some Y variables ?
Disease status
Disease stage / sub-type
Phenotype
Disease outcome
Experimental condition (wild type vs. knockout)
What are common distributions for discrete X/Y ?
Bernoulli distribution for binary trials
Binomial distribution for n binary trials
Poisson distributions for approximating the binomial
Multinomial distribution for categorical data (discrete variables with more than 2 outcomes)
Last changed20 days ago