Panel Data
combines cross-sectional and time series data
it contains for the same observation units (cross-section) data for several points in time (time series)
the total number of observations T*N
-> small N can be compensated by large T
What types of panel data exist?
Balanced panel: all observation units have measurements for all time periods
Unbalanced panel: measurements are not available for all units for all time periods
What is N and T here?
T = 5 (Years)
N = 3 (Movies)
What does panel data means for variation?
Now we have two sources:
between units (cross section data(/ movies)
within each unit (variation between different points in time)
Advanatges of Panel data
high external and internal validity
more degreed of freedom -> higher efficency
contol impact of unobserved heterogenity
-> reduces problem of potential omitted variable bias
facilitate constructing and testing more complex hypotheses -> study dynamics. treatment can not be observed in one period of time or with one person, it needs multiple persons over multiple periods of time
Challenges of panel data
Data collection is costly and time-consuming
”Panel mortality” or “panel attrition”: units drop out of the panel, firm lose interest, disappear excluding them -> might result in a bias (there is a reason why they disappeared)
Missing observations → ”unbalanced panel”
Similar problems as for time series data (e.g., autocorrelation, seasonal effects etc.)
Problem of unobserved heterogeneity
What are potential effects we might want to consider for the movie example?
heterogenity = omitted variable
Movie effects:
Some movies are of higher quality (but as this is difficult to measure, we do not observe this) (we call this unobserved heterogeneity)
So what is influencong the amount of hours streamed? Is it really the marketing money we spent for the film? Or do we unknowingly spend moer for films with higher quality, whih means the amount of streams is dependet of the Quality not the marketing money spent
Time-related effects:
we already know that during some months people watch less -> If we decide to advertise movies during those months, our analysis might again suffer from an omitted variable bias
How to analyze panel data?
fixed effects
random effects
Pooled OLS
Asses if fixed effects or random effects model is appopriate
Use the Hausman Test ->The Hausman test checks if your unobserved individual effects (αᵢ) are correlated with your explanatory variables (X).
Entity fixed effects:
Adding entity/unit dummies
-> für jeden Film die Qualität bestimmen -> Rechenleistung intensiv
First Differences:
Using “first differences” between successive time periods eliminates 𝛼𝑗 as it is time independent.
Qualität pro Film bleibt gleich über Zeit -> verschwindet
Within group fixed effects:
We can also eliminate 𝛼𝑗 by subtracting from each variable for each unit its mean value (over time)
Difference first differences and within group fixed effects
time fixed effects
Adding seasonal dummies:
Add a dummy variable 𝐴𝑚 for each month (𝐴𝑚 equals 1 only for the observation unit j)
The different constants capture the combined effects of several (or many) unknown time-related effects that are different between periods but constant for all observation units (e.g., winter effect)
-> Often, unit-fixed effects and time-fixed effects are combined.
Before vs After fixed effects
Black arrow = heterogenity
Random effects:
In a random effects regression, we assume that 𝛼 is purely random, uncorrelated with the observed variables 𝑋𝑘𝑖𝑡. This means a random effects model considers 𝛼 as a random variable.
Important!
𝑢𝑖𝑡 will be subject to autocorrelation:
• OLS is inefficient and the standard errors it computes are wrong
• Alternative approach: Generalized least square (GLS)
How to chose the right model
Estimation and interpretation
Panel regression
not so much an analysis method but a type of data set or data structure
Many types of models can be used with panel data:
OLS
Logit, probit
Autoregressive models
Other time series models
Estimation using Python
Causality
Correlation is not causality! (see next slide)
we actually want to make are recommendations to managers and policymakers -> need causality: increasing A leads to an increase in B
To prove that a causal mechanism created the correlation, we need to be able to make a ceteris paribus statement
Ceteris paribus statement: ”keeping all other factors equal increasing A increases B.”
Causality Problems:
Last changed19 days ago