Binominal Distribution (discrete)
describes Bernoulli-Experiments,
Binomial distributed random variables result from n-times repetition of a Bernoulli-Experiment
X ∼ Bin(n, p)) : X ="number of successes after repeating the experiment n times”
Example: How likely is it to get k-hits at n-tries, wie oft haben 65% oder mehr der Befragten zugestimmt, dass irh Manager biased bei Gehaltserhöhungen ist?
Random variable X=“number of delayed projects in one year”
When is the logarithm transformation especially useful?
when the association between variables is more meaningful on a percentage scale
Geometric Distribution (discrete)
on the basis of independent Bernoulli-Experiments
"Distribution of waiting of the first success"
Poisson Distribution (discrete)
what is it used for?
what is λ?
mainly used when the frequency of an event over a certain time is considered in a random experiment
X ∼ P(λ) : X ="number of events that happen in a period of time"
λ coincides with the expected number of events happening in the specified period of time
Linear
x²,y²
log linear
log y, 1/y
log log
log x, 1/x
linear log
Exponential (continuous)
used to measure times between events
(X ∼ Exp(λ))
Uniform (continuous)
when the probability is uniformly distributed within an interval
(X ∼ U(a, b))
Normal (continuous)
Sample Size Conditions (CLT)
if they hold you can use X ~ N(mu, se)
SE of a Sample Proportion
pbinom
q,size,prob
ppois
q, lambda
pgeom
q,prob
pnorm
q, mean, sd
qnorm
p, mean, sd
(falls reelle Zahlen benötigt *100)
TTestA
mean, sd, length(data), “two-sided”, 0.95, mu
two-sided, less, greater
Mu, sofern es sich um einen Hypotesentest zum Mittelwert handelt
Ohne mu, beim CI
which
which(data, <= threshold
<=, =, >=
threshold vorher definieren
SE
When do we use transformation?
What does it improve?
What is required to be able to transform our data?
Reduces skewness of our original data
Improves linearity
Boosts validity of our data
Required: origninal data must (approx) follow a log-normal distribution
Breusch-Pagan Test
For what?
Interpretation
Heteroscedasticity
if p-value <= 0: Ho verwerfen, Data is heteroscedastic
VIF
Multicollinearity
VIF > 5 or 10 suggests the variables are highly redundant
F-Test
Ho: b0 = b1 = b2…
H1: Ho stimmt nicht
Anova
After F-Test
Durbin-Watson Test
Autocorrelation
If p-value < 0: There’s autocorrelation, Ho rejected
Fixing Heteroscedasticity - step by step
Detect heteroscedasticity (bp)
Revise the model
rename variables (invheight (x), volumeperft (y)
X: 1/x
Y: y/x
Run regression
Check residual plot
Regression diagnostic
Name the Assumptions of Simple/Multiple Linear Regression
Linearity
Strict Exogenity
No Corr of X and the Error Term
No Corr of Error Terms (Autocorrelation)
Homoscedasticity
Normality (QQ Plot)
No perfect Multicollinearity (only MRM)
Principle of Parsimonia
Keep it simple
P(E|F)
Probability Mass Function
P(X=a)
Cumulative Distribution Function (CDF)
P(X<=a)
Kurtosis
What is it?
What kinds of kurtosis are existing?
indicates how steep the curse of a curve is
Types
Mesokurtic: 3 average
Excess Kurtosis
Leptokurtic: > 3 indicates flatter tails
Platykurtic < 3 indicates larger tails
y
dependent/independent
effect/cause
explanatory/response
predictor/outcome
dependent
Effect
response
outcome
x
Independent
Cause
explanatory
predictor
Zuletzt geändertvor 2 Jahren