Give a potential statistical problem with high dim bio data.
Predicting BMI from Gut micro biome counts
p >> n e.g; counts for 2000 species but only 100 subjects
Name the general principle for a simple statistical task.
define statistical regression model
define a penalized likelihood function
estimate parameters by solving
determine tuning param in a data driven way
Name some common questions on statistical data
How do I normalize my data
What does the data look like
How would unsupervised learning look on this kind of data
How can I detect changes accross conditions
Whats the most crucial step at the beginning of a statistical project
exploratory data analysis
Describe a statistical task on 16S sequencing data.
How can species-species associations be infered from the data
e.g: Which species are propably closely related or have a symbiotic relationship based on the sequencing data
How can the issue pseudo correleation be shown ?
Sample 2000 samples with 600 features each from a multivariate normal distribution
=> We know there is no correlation among the features
However if we sample just the first 200 there will be pseudo correlation values among some features
This pseudo correlation disappears with more data sampled
Without peanlization the resutls are shit
Describe a typical statistical workflow
Get the data
Normalize the data
Parametric association estimation
Model selection
Dependency Graph / Covariance matrix
Explain the poisson distribution
λ = expected events in a given time interval
P(X=k) is the propablity of seeing exactly k occurances
λ can be computed by the average rate r * t
Exlpain the multinomial distribution
Generalization of the binomial distribution
Applied to data with more than 2 categories e.g ATCG
You can compute the propability that you will get a set value for each category, given their category propabilities and abs number.
Explain the distribution best suited for 16S rRNA data
Negative binomial
Models number of failures before set amount of success is reached
Intuitively with dice rolls; 6=success 1-5=Failure => How many failures will be there before the third success given as a propability
Last changed5 months ago