undefined

Buffl

RMBA

by Luca I.

Panel Data

combines cross-sectional and time series data
it contains for the same observation units (cross-section) data for several points in time (time series)
the total number of observations T*N

-> small N can be compensated by large T

What types of panel data exist?

Balanced panel: all observation units have measurements for all time periods
Unbalanced panel: measurements are not available for all units for all time periods

What is N and T here?

T = 5 (Years)

N = 3 (Movies)

What does panel data means for variation?

Now we have two sources:

between units (cross section data(/ movies)
within each unit (variation between different points in time)

Advanatges of Panel data

high external and internal validity
more degreed of freedom -> higher efficency
contol impact of unobserved heterogenity
-> reduces problem of potential omitted variable bias
facilitate constructing and testing more complex hypotheses -> study dynamics. treatment can not be observed in one period of time or with one person, it needs multiple persons over multiple periods of time

Challenges of panel data

Data collection is costly and time-consuming
”Panel mortality” or “panel attrition”: units drop out of the panel, firm lose interest, disappear excluding them -> might result in a bias (there is a reason why they disappeared)
Missing observations → ”unbalanced panel”
Similar problems as for time series data (e.g., autocorrelation, seasonal effects etc.)

Problem of unobserved heterogeneity

What are potential effects we might want to consider for the movie example?

heterogenity = omitted variable

Movie effects:

Some movies are of higher quality (but as this is difficult to measure, we do not observe this) (we call this unobserved heterogeneity)
So what is influencong the amount of hours streamed? Is it really the marketing money we spent for the film? Or do we unknowingly spend moer for films with higher quality, whih means the amount of streams is dependet of the Quality not the marketing money spent

Time-related effects:

we already know that during some months people watch less -> If we decide to advertise movies during those months, our analysis might again suffer from an omitted variable bias

How to analyze panel data?

fixed effects
random effects
Pooled OLS

Asses if fixed effects or random effects model is appopriate

Use the Hausman Test ->The Hausman test checks if your unobserved individual effects (αᵢ) are correlated with your explanatory variables (X).

Entity fixed effects:

Adding entity/unit dummies

-> für jeden Film die Qualität bestimmen -> Rechenleistung intensiv

First Differences:

Using “first differences” between successive time periods eliminates 𝛼𝑗 as it is time independent.

Qualität pro Film bleibt gleich über Zeit -> verschwindet

Within group fixed effects:

We can also eliminate 𝛼𝑗 by subtracting from each variable for each unit its mean value (over time)

Difference first differences and within group fixed effects

time fixed effects

Adding seasonal dummies:

Add a dummy variable 𝐴𝑚 for each month (𝐴𝑚 equals 1 only for the observation unit j)
The different constants capture the combined effects of several (or many) unknown time-related effects that are different between periods but constant for all observation units (e.g., winter effect)

-> Often, unit-fixed effects and time-fixed effects are combined.

Before vs After fixed effects

Black arrow = heterogenity

Random effects:

In a random effects regression, we assume that 𝛼 is purely random, uncorrelated with the observed variables 𝑋𝑘𝑖𝑡. This means a random effects model considers 𝛼 as a random variable.

Important!

𝑢𝑖𝑡 will be subject to autocorrelation:

• OLS is inefficient and the standard errors it computes are wrong

• Alternative approach: Generalized least square (GLS)

How to chose the right model

Estimation and interpretation

Panel regression

not so much an analysis method but a type of data set or data structure
Many types of models can be used with panel data:
- OLS
- Logit, probit
- Autoregressive models
- Other time series models

Estimation using Python

Causality

Correlation is not causality! (see next slide)
we actually want to make are recommendations to managers and policymakers -> need causality: increasing A leads to an increase in B
To prove that a causal mechanism created the correlation, we need to be able to make a ceteris paribus statement
Ceteris paribus statement: ”keeping all other factors equal increasing A increases B.”

Causality Problems:

Join Course

Preview

Author

Luca I.

Information

Last changed
2 months ago

Report course

6th week

Author

Luca I.

Information