Give an example for trial, outcome and sample space
Trial: Flip a coin
Outcome: heads
Sample space: S={heads, tails) all possible outcomes
How do I write hypotheses for paired data?
0 : μp − μn = 0
HA : μp − μn > 0
What does d0 mean when checking paired data?
Hypothesized mean difference (usually 0)
what does b0 mean?
The hypothesized value of the true coefficient (from the null hypothesis)
How should I structure my answer for a 10 point hypothesis testing?
Hypotheses
Test statistic This is a test for proportion (formula)
Critical value (from the table)
Decision Rule
Calculating the observed test statistic
Conclusion
Explain p-value
The p-value is the probability of getting a test statistic this extreme (here
2.181818) or more extreme in favor of the alternative hypothesis if the null hypothesis
true.
What are the key assumptions of matched pairs (paired design)
independence of observations between subjects
normality of differences
No Carryover or Order Effects ( random assignment of what is done in the groups)
Controlled or Matched Confounding Variables
Give me interpretation of prediction intervall of MIDPARENTAL is 175 cm
(159.9, 178.8)
A 95% prediction interval means that for a child whose MIDPARENTAL height
is 175 cm, we can be 95% confident that this child’s height as an adult will fall between
159.9 cm and 178.8 cm.
What does normal distribution mean in a linear regression model?
In linear regression, the normality assumption refers to the residuals from the
model (i.e., the differences between observed and predicted values), not the original data.
We assume that the residuals are normally distributed with mean zero for value predicted
by the model.
Briefly explain what homoscedasticity is and how you can use plots to
investigate if this assumption may be violated.
Homoskedasticity means that the error terms in a linear regression
model have constant variance. If the variance of the error term
instead depend on the x-values, we have heteroskedsticity and this is a
violation of one of the assumptions of the linear regression model.
We can investigate the presence of homoskedasticity by plotting the
residuals against the fitted y-values. If there is no clear pattern, we are
good. If we see a pattern with much larger spread among the residuals
in one part of the plot compared to other parts, this is a sign of
heteroskedasticity.
What is an event in probability theory?
collection of outcomes
What is theoretical probability? (for example of rolling a dice)
(logical probability). Consider the
physical properties of a die. If all sides are equally probable,
we get
What is empirical propability? (example: rolling a dice)
the proportion of sixes if I roll the die
an infinite number of times.
What is subjective propability?
My previous experience of die rolls
and my belief about the properties of the die tells me that myprobability to roll a six is 1/6 ≈ 0.1667
What does disjoint events mean?
no common elements
What does intersecting events mean?
events with common elements
How do you write intersecting of A and B
A ∩ B
What does union mean of A and B?
A or B occurs (at least one of them
AUB
What is complemet of A?
A^c (complement)
when A does not occur
Whcih axioms does P(A) fulfill?
0<=P(A) <=1
P(S) = 1 (S=Sample space)
P(A ∩ B)= P(A) + P(B) (if A and B are disjunct)
In which case are A and B independent?
P(A∩ B) = P(A) * P(B) (Events A and B are independent if the knowledge that B has taken place does not affect the probability of A, and vice versa)
What is the generel addition rule for P(A U B)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
What does !n mean?
n! = n(n − 1)(n − 2) · · · 2 · 1
What equals !0?
1
What does P(B|A) mean?
A given that B has happened
How do you calculate P(B|A)?
P(B|A) = (P(A ∩ B))/P(A)
What is joint, marginal and conditional probability of A and B
Joint: (P∩B)
Marginal: P(A) or P(B)
Conditional Probability: P(B|A)
What is the Bayes rule?
P(B|A) = P(A|B)P(B)/ P(A)
What does prevalence mean?
Prevalence: P(Covid) - share of population that is infected
What is sensitivity?
Sensitivity: P(pos|Covid) - how sensitive is the test for
detecting Covid?
What is specificity?
Specificity: P(neg|not Covid) - is the test specific for Covid,
or is it often triggered when Covid is not present?
Spe-ci-fi-ci-ty
What does captial X and small x mean?
We write random variables with capital letters, X, Y, ... and
their numerical outcomes with lower case letters, x, y,
Example: The random variable X is the number of dots on a die. We roll a die and get x = 3.
When is a variable discrete?
A random variable is discrete if you can count the possible
outcomes, even if the the number of outcomes is infinite
When is a variable continuous
A random variable is continuous if the outcomes are
uncountable
The body temperature of a patient with fever
Which letter is used to represent E(X)?
We often use the greek letter μ to represent E[X]
Where is the expectrd value in a distribution?
The expected value is the center of the probability distribution
How is the variance called?
• The variance is often denoted by the symbol σ^2.
Which are the parameters in a normal distirbution?
Normal Distribution X ∼ N(μ, σ)
• Expected Value E(X) = μ
• Standard Deviation SD(X) = σ
What is the are f(x) under a density function?
f(x) = 1
For what do you use a density function?
The density function is used to calculate probabilities:
P(a ≤ X ≤ b) = area under f (x) between a and b.
How do you switch from random normal distirbution to a standard normal distribution?
Z = (X − μ)/σ
What is 0 and 1 in a standard normal distribution?
0= μ
1 =sigma
What does positive and negative covariance mean?
Positive covariance - the variance of the sum is greater than in the case of independence
Negative covariance - the variance of the sum is smaller than in the case of independence
What are the assumptions in a Bernoulli experiment?
binary outcome
independent observations
propability p same for each trial
Classic example: flip a coin
Which terms are sued to describe the outocme of a Bernoulli trial?
0= failure
1 = success
What is a geometrical distribution?
Independent Bernoulli trials, each with probability of success p, are preformed until the first success. Let X be the total number of trials. Then,
P(X = n) = (1 − p)n−1p
We say that X follows a Geometric Distribution and we denote this X ∼ Geom(p)
Easy explanation:
Bernoulli trials until first success
The number of trials varies - the number is the outcome
What is a binominal distribution?
Also a number of identically distributed Bernoulli trials
• Number of trials n is predetermined
• X is the number of successes from the n trials
How is binominal distribution noted?
• Notation: X ∼ Bin(n, p)
• X follows a binomial distribution with parameters n and p
• X is the sum of n independent and identically distributed
Bernoulli trials
What is a uniform distribution?
The density is the same everywhere where f (x) > 0
What is a poisson distribution?
discrete
Used for counting
The occurance of one event should not affect the probability of the next event
The rate is constant
Two events cannot occur at exactly the same time
Which is the parameter in a Poisson distirbution?
lambda
What is maximum likelihood?
Maximum Likelihood - choosing the λ that maximized the
probability of the data that we have
What is difference between t-distribution and normal distribution?
Student-t is fat-tailed meaning that extreme outcomes are more probable than for the normal distribution
• As ν increases, it starts to resemble the normal distribution
When do we use a t-distribution and when do we use a normal distribition?
If the population variance σ2 is known, the standardized mean follows a normal distribution.
If the population variance σ2 is unknown and must be estimated with s2. Then the standardized mean follows a Student’s t-distribution with ν = n − 1 degrees of freedom.
For what is a statistic used?
to estimate population parameter
What is μ and ¯ X
• The population mean μ is estimated using the estimator ¯ X
What is a confidence intervall?
This kind of interval has a 95%
probability of capturing the true value every time a
proper sample is drawn
What does Unbiasedness mean?
Unbiasedness means the estimator is correct on average, over all possible samples.
What happens if n increases?
SD(ˆp) decreases as the sample size n increases.
• Law of Large Numbers - ˆp will be close to p in large samples.
What is the formula for bias?
Bias of ˆp
Bias(ˆp) = E[ˆp] − p = 0.
When is the normal approximation sufficiently accurate?
Sample size n ≥ 30 (Central Limit Theorem)
Success-Failure Condition
np ≥ 10 and nq ≥ 10 for estimation of proportion
Independence assumption must be (reasonably) satisfied
The sample is at most 10% of the population
What is the standard error?
The standard deviation of a sampling distribution is called
standard error. It measures how much the statistic (e.g., the sample mean) is expected to vary from sample to sample. We often denote this SE.
What happens if the confidence intervall is larger?
• Trade-off: Higher confidence level ⇒ larger margin of error.
When is n>=30 not required?
When we sample from a normal distribution, n ≥ 30 is not
required (we then do not need central limit theorem
What is H0 and HA?
H0= no change (we first assume H0 is true)
HA= change
What are the steps to perfom hypothesis testing?
To preform a hypothesis test
1. State the hypothesis
2. State the test statistic
3. Find the critical value. Draw!
4. Formulate the decision rule
5. Calculate the observed test statistic
6. Draw a conclusion by comparing the observed test statistic to
the decision rule
What are β0 and β1
• β0 is the population intercept
• β1 is the population slope of the regression line
What are the assumptions for a linear regression analysis?
1. The relationship between y and x is linear
2. The error terms εi are independent
3.The error terms have a constant standard deviation
(homoskedasticity)
4. The error terms are normally distributed
What is the prediction intervall?
Prediction interval for ˆy⋆ - two sources of uncertainty
The unknown parameters β0 and β1
The variation in individual y-values around the regression line. All observations are ”hit by an ε” with standard deviation σε.
What is the variance inflation factor?
Variance Inflation Factor (VIF) inflation of variance of the regression coefficient due to multicollinearity. High VIF indicates a predictor has a strong linear relationship with other predictors.
How do I calculate the expected value for a chisquare test?
Exp = (sumColumn*sumRow)/sumN
What are the assumptions for a chisquare test?
Frequency Data We assume we have frequency data for individuals, with values for two variables.
Independence We assume the observations are independent, e.g., from a random sample.
Sufficient Cell Frequency We assume the expected count (Exp) is at least 5 in each cell.
What are type 1 and type 2 errors?
What is type II error?
β = P(fail to reject H0 | HA true)
Falsely failing to reject the null
What is type I error?
Type I Error
α = P(reject H0 | H0 true).
• Falsely rejecting the null
How is test power defined?
Test Power
1 − β = P(reject H0 | HA true)
Zuletzt geändertvor 5 Stunden