Applied Empirical Methods

Buffl

N�

von Nele �.

Binominal Distribution (discrete)

describes Bernoulli-Experiments,
Binomial distributed random variables result from n-times repetition of a Bernoulli-Experiment
X ∼ Bin(n, p)) : X ="number of successes after repeating the experiment n times”
Example: How likely is it to get k-hits at n-tries, wie oft haben 65% oder mehr der Befragten zugestimmt, dass irh Manager biased bei Gehaltserhöhungen ist?

Random variable X=“number of delayed projects in one year”

When is the logarithm transformation especially useful?

when the association between variables is more meaningful on a percentage scale

Geometric Distribution (discrete)

on the basis of independent Bernoulli-Experiments
"Distribution of waiting of the first success"

Poisson Distribution (discrete)

what is it used for?
what is λ?

mainly used when the frequency of an event over a certain time is considered in a random experiment
X ∼ P(λ) : X ="number of events that happen in a period of time"
λ coincides with the expected number of events happening in the specified period of time

Linear
x²,y²

log linear
log y, 1/y

log log
log y, 1/y
log x, 1/x

linear log
log x, 1/x

Exponential (continuous)

used to measure times between events
(X ∼ Exp(λ))

Uniform (continuous)

when the probability is uniformly distributed within an interval
(X ∼ U(a, b))

Normal (continuous)

Sample Size Conditions (CLT)

if they hold you can use X ~ N(mu, se)

SE of a Sample Proportion

pbinom

q,size,prob

ppois

q, lambda

pgeom

q,prob

pnorm

q, mean, sd

qnorm

p, mean, sd

(falls reelle Zahlen benötigt *100)

TTestA

Student's t-Test Based on Sample Statistics

mean, sd, length(data), “two-sided”, 0.95, mu

two-sided, less, greater

Mu, sofern es sich um einen Hypotesentest zum Mittelwert handelt
Ohne mu, beim CI

which

which(data, <= threshold

<=, =, >=

threshold vorher definieren

When do we use transformation?

What does it improve?
What is required to be able to transform our data?

Reduces skewness of our original data
Improves linearity
Boosts validity of our data
Required: origninal data must (approx) follow a log-normal distribution

Breusch-Pagan Test

For what?
Interpretation

Heteroscedasticity

if p-value <= 0: Ho verwerfen, Data is heteroscedastic

VIF

For what?
Interpretation

Multicollinearity
VIF > 5 or 10 suggests the variables are highly redundant

F-Test

For what?
Interpretation

Multicollinearity
Ho: b0 = b1 = b2…
H1: Ho stimmt nicht

Anova

For what?
Interpretation

Multicollinearity
After F-Test

Durbin-Watson Test

For what?
Interpretation

Autocorrelation
If p-value < 0: There’s autocorrelation, Ho rejected

Fixing Heteroscedasticity - step by step

Detect heteroscedasticity (bp)
Revise the model
- rename variables (invheight (x), volumeperft (y)
- X: 1/x
- Y: y/x
Run regression
Check residual plot
Regression diagnostic

Name the Assumptions of Simple/Multiple Linear Regression

Linearity
Strict Exogenity
No Corr of X and the Error Term
No Corr of Error Terms (Autocorrelation)
Homoscedasticity
Normality (QQ Plot)
No perfect Multicollinearity (only MRM)

Principle of Parsimonia

Keep it simple

P(E|F)

Probability Mass Function

P(X=a)

Cumulative Distribution Function (CDF)

P(X<=a)

Kurtosis

What is it?
What kinds of kurtosis are existing?
Interpretation

indicates how steep the curse of a curve is
Types
- Mesokurtic: 3 average
- Excess Kurtosis
  - Leptokurtic: > 3 indicates flatter tails
  - Platykurtic < 3 indicates larger tails

dependent/independent
effect/cause
explanatory/response
predictor/outcome

dependent
Effect
response
outcome

dependent/independent
effect/cause
explanatory/response
predictor/outcome

Independent
Cause
explanatory
predictor

Beitreten

Vorschau

Author

Nele �.

Informationen

Zuletzt geändert
vor 2 Jahren

Kurs melden