undefined

Buffl

Statistics and Data Analysis for Business Administration

von Paula N.

Give an example for trial, outcome and sample space

Trial: Flip a coin
Outcome: heads
Sample space: S={heads, tails) all possible outcomes

How do I write hypotheses for paired data?

0 : μp − μn = 0

HA : μp − μn > 0

What does d0 mean when checking paired data?

Hypothesized mean difference (usually 0)

what does b0 mean?

The hypothesized value of the true coefficient (from the null hypothesis)

How should I structure my answer for a 10 point hypothesis testing?

Hypotheses
Test statistic This is a test for proportion (formula)
Critical value (from the table)
Decision Rule
Calculating the observed test statistic
Conclusion

Explain p-value

The p-value is the probability of getting a test statistic this extreme (here

2.181818) or more extreme in favor of the alternative hypothesis if the null hypothesis

true.

What are the key assumptions of matched pairs (paired design)

independence of observations between subjects
normality of differences
No Carryover or Order Effects ( random assignment of what is done in the groups)
Controlled or Matched Confounding Variables

Give me interpretation of prediction intervall of MIDPARENTAL is 175 cm

(159.9, 178.8)

A 95% prediction interval means that for a child whose MIDPARENTAL height

is 175 cm, we can be 95% confident that this child’s height as an adult will fall between

159.9 cm and 178.8 cm.

What does normal distribution mean in a linear regression model?

In linear regression, the normality assumption refers to the residuals from the

model (i.e., the differences between observed and predicted values), not the original data.

We assume that the residuals are normally distributed with mean zero for value predicted

by the model.

Briefly explain what homoscedasticity is and how you can use plots to

investigate if this assumption may be violated.

Homoskedasticity means that the error terms in a linear regression

model have constant variance. If the variance of the error term

instead depend on the x-values, we have heteroskedsticity and this is a

violation of one of the assumptions of the linear regression model.

We can investigate the presence of homoskedasticity by plotting the

residuals against the fitted y-values. If there is no clear pattern, we are

good. If we see a pattern with much larger spread among the residuals

in one part of the plot compared to other parts, this is a sign of

heteroskedasticity.

What is an event in probability theory?

collection of outcomes

What is theoretical probability? (for example of rolling a dice)

(logical probability). Consider the

physical properties of a die. If all sides are equally probable,

we get

What is empirical propability? (example: rolling a dice)

the proportion of sixes if I roll the die

an infinite number of times.

What is subjective propability?

My previous experience of die rolls

and my belief about the properties of the die tells me that myprobability to roll a six is 1/6 ≈ 0.1667

What does disjoint events mean?

no common elements

What does intersecting events mean?

events with common elements

How do you write intersecting of A and B

A ∩ B

What does union mean of A and B?

A or B occurs (at least one of them
AUB

What is complemet of A?

A^c (complement)

when A does not occur

Whcih axioms does P(A) fulfill?

0<=P(A) <=1
P(S) = 1 (S=Sample space)
P(A ∩ B)= P(A) + P(B) (if A and B are disjunct)

In which case are A and B independent?

P(A∩ B) = P(A) * P(B) (Events A and B are independent if the knowledge that B has taken place does not affect the probability of A, and vice versa)

What is the generel addition rule for P(A U B)

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

What does !n mean?

n! = n(n − 1)(n − 2) · · · 2 · 1

What equals !0?

What does P(B|A) mean?

A given that B has happened

How do you calculate P(B|A)?

P(B|A) = (P(A ∩ B))/P(A)

What is joint, marginal and conditional probability of A and B

Joint: (P∩B)
Marginal: P(A) or P(B)
Conditional Probability: P(B|A)

What is the Bayes rule?

P(B|A) = P(A|B)P(B)/ P(A)

What does prevalence mean?

Prevalence: P(Covid) - share of population that is infected

What is sensitivity?

Sensitivity: P(pos|Covid) - how sensitive is the test for

detecting Covid?

What is specificity?

Specificity: P(neg|not Covid) - is the test specific for Covid,

or is it often triggered when Covid is not present?

Spe-ci-fi-ci-ty

What does captial X and small x mean?

We write random variables with capital letters, X, Y, ... and

their numerical outcomes with lower case letters, x, y,

Example: The random variable X is the number of dots on a die. We roll a die and get x = 3.

When is a variable discrete?

A random variable is discrete if you can count the possible

outcomes, even if the the number of outcomes is infinite

When is a variable continuous

A random variable is continuous if the outcomes are

uncountable

The body temperature of a patient with fever

Which letter is used to represent E(X)?

We often use the greek letter μ to represent E[X]

Where is the expectrd value in a distribution?

The expected value is the center of the probability distribution

How is the variance called?

• The variance is often denoted by the symbol σ^2.

Which are the parameters in a normal distirbution?

Normal Distribution X ∼ N(μ, σ)

• Expected Value E(X) = μ

• Standard Deviation SD(X) = σ

What is the are f(x) under a density function?

f(x) = 1

For what do you use a density function?

The density function is used to calculate probabilities:

P(a ≤ X ≤ b) = area under f (x) between a and b.

How do you switch from random normal distirbution to a standard normal distribution?

Z = (X − μ)/σ

What is 0 and 1 in a standard normal distribution?

0= μ
1 =sigma

What does positive and negative covariance mean?

Positive covariance - the variance of the sum is greater than in the case of independence
Negative covariance - the variance of the sum is smaller than in the case of independence

What are the assumptions in a Bernoulli experiment?

binary outcome
independent observations
propability p same for each trial

Classic example: flip a coin

Which terms are sued to describe the outocme of a Bernoulli trial?

0= failure
1 = success

What is a geometrical distribution?

Independent Bernoulli trials, each with probability of success p, are preformed until the first success. Let X be the total number of trials. Then,

P(X = n) = (1 − p)n−1p

We say that X follows a Geometric Distribution and we denote this X ∼ Geom(p)

Easy explanation:

Bernoulli trials until first success
The number of trials varies - the number is the outcome

What is a binominal distribution?

Also a number of identically distributed Bernoulli trials

• Number of trials n is predetermined

• X is the number of successes from the n trials

How is binominal distribution noted?

• Notation: X ∼ Bin(n, p)

• X follows a binomial distribution with parameters n and p

• X is the sum of n independent and identically distributed

Bernoulli trials

What is a uniform distribution?

The density is the same everywhere where f (x) > 0

What is a poisson distribution?

discrete
Used for counting
The occurance of one event should not affect the probability of the next event
The rate is constant
Two events cannot occur at exactly the same time

Which is the parameter in a Poisson distirbution?

lambda

What is maximum likelihood?

Maximum Likelihood - choosing the λ that maximized the

probability of the data that we have

What is difference between t-distribution and normal distribution?

Student-t is fat-tailed meaning that extreme outcomes are more probable than for the normal distribution

• As ν increases, it starts to resemble the normal distribution

When do we use a t-distribution and when do we use a normal distribition?

If the population variance σ2 is known, the standardized mean follows a normal distribution.

If the population variance σ2 is unknown and must be estimated with s2. Then the standardized mean follows a Student’s t-distribution with ν = n − 1 degrees of freedom.

For what is a statistic used?

to estimate population parameter

What is μ and ¯ X

• The population mean μ is estimated using the estimator ¯ X

What is a confidence intervall?

This kind of interval has a 95%

probability of capturing the true value every time a

proper sample is drawn

What does Unbiasedness mean?

Unbiasedness means the estimator is correct on average, over all possible samples.

What happens if n increases?

SD(ˆp) decreases as the sample size n increases.

• Law of Large Numbers - ˆp will be close to p in large samples.

What is the formula for bias?

Bias of ˆp

Bias(ˆp) = E[ˆp] − p = 0.

When is the normal approximation sufficiently accurate?

Sample size n ≥ 30 (Central Limit Theorem)
Success-Failure Condition
np ≥ 10 and nq ≥ 10 for estimation of proportion
Independence assumption must be (reasonably) satisfied
The sample is at most 10% of the population

What is the standard error?

The standard deviation of a sampling distribution is called

standard error. It measures how much the statistic (e.g., the sample mean) is expected to vary from sample to sample. We often denote this SE.

What happens if the confidence intervall is larger?

• Trade-off: Higher confidence level ⇒ larger margin of error.

When is n>=30 not required?

When we sample from a normal distribution, n ≥ 30 is not

required (we then do not need central limit theorem

What is H0 and HA?

H0= no change (we first assume H0 is true)

HA= change

What are the steps to perfom hypothesis testing?

To preform a hypothesis test

1. State the hypothesis

2. State the test statistic

3. Find the critical value. Draw!

4. Formulate the decision rule

5. Calculate the observed test statistic

6. Draw a conclusion by comparing the observed test statistic to

the decision rule

What are β0 and β1

• β0 is the population intercept

• β1 is the population slope of the regression line

What are the assumptions for a linear regression analysis?

1. The relationship between y and x is linear

2. The error terms εi are independent

3.The error terms have a constant standard deviation

(homoskedasticity)

4. The error terms are normally distributed

What is the prediction intervall?

Prediction interval for ˆy⋆ - two sources of uncertainty

The unknown parameters β0 and β1
The variation in individual y-values around the regression line. All observations are ”hit by an ε” with standard deviation σε.

What is the variance inflation factor?

Variance Inflation Factor (VIF) inflation of variance of the regression coefficient due to multicollinearity. High VIF indicates a predictor has a strong linear relationship with other predictors.

How do I calculate the expected value for a chisquare test?

Exp = (sumColumn*sumRow)/sumN

What are the assumptions for a chisquare test?

Frequency Data We assume we have frequency data for individuals, with values for two variables.
Independence We assume the observations are independent, e.g., from a random sample.
Sufficient Cell Frequency We assume the expected count (Exp) is at least 5 in each cell.

What are type 1 and type 2 errors?