Research Process
Formulate Research Question
Literature Review
Develop a theory
Formulate hypothesis
Design empirical tests for hypothesis testing
Interpret results
Theory
logically coherent set of ideas that can explain and predict empirical observations
Proxies for goog hypotheses
are unambiguous
are simple
specifiy a specific relation between X and Y
are testable
Empirical approach
archival
experimental
survey
qualitative
Econometrics
use of statistical methods to analyze economic data
Objectives Econometrics
Estimating relationships between economic variables
Testing economic theories and hypotheses
Forecasting (macro) economic variables
Evaluating and implementing government and firm policies/actions
Causality -> judgement whether theory fits observations (Smith 2011)
Covariation -> variables of interest move together
Cause prior to effect -> causal event should occur before effect event
Absensce of plausible rival hypotheses -> no endogeneity concerns (omitted variables)
Types of economic data
cross-sectional data -> many different variables at one point of time (observations are more or less independent)
time-series data -> observations of variable over time (serially correlated)
pooled cross sections -> combination of cross sectional data (drawn independently)
panel/longitudinal data -> same cross-sectional date followed over time
Construct
abstract idea which is not directly obserable or measureable
Variable
observable item which can assume different values and is used to measure a theoretical construct
Construct validity
degree to which variable captures the underlying theoretical construct it is supposed to measure
-> internal & external validity
Internal validity
degree of confidence that tested causal relationship is trustworthy and not influenced by other factors
External validity
extent to which studied results can be applied to ouhter situations, groups or events
Construct reliability
degree to which a measurement provides consistent estimates
sample statistic
estimate of population parameter based on random sample from population
Law of large numbers
sample averages converge to the population mean as sample size increases & standard error decreases as sample size increases
Central Limit theorem
even if population distribution is not nomral distributed -> sampling distribution of sample means will be approximately normal distributed (if sample size is large enough)
sample standard error
estimate of standard deviation of the sampling distribution of a statistic
-> z.B. sample standard error of means
Testing Strategy
draw a sample of n observations from the population
calculate sample statistic
calculate standard error os sample statistic
Calculate t-statistic
comparision t-statistic with critical values
Type 1 Error
rejecting null hypthesis that is actually true -> significance level alpha
Type 2 Error
accepting null hypothesis that is actually false
Size of test
probability type 1 error
power of a test
probabability of correctly rejecting null hypothesis when its indeed false
= 1 - beta
Power of test increasing
constant alpha
increasing sample size n
effect size is larger
trade-off type 1 & type 2 error
Increasing alpha (5% -> 1%) -> higher risk type 2 error (lower power)
vs.
decreasing alpha (1% -> 5%) -> higher risk type 1 error but also higher power
Problems with published research
Strong publication bias for significant results
P-Hacking -> play around with specifications until you get a result with significant p-values
HARKing -> hypotheses are formulated after reviewing data and sold as ‘a priori’
Outcome swichting -> check for various outcome variables and report only significant ones
Mediating variable
variable that explains mechanism between independent und dependent variable
Moderating variable
factors that strengthen or weaken the relation between dependent and independent variable
control variable
capture the effects of other factors (Z) that are related to X, Y or both but are not in direct interest when examining the effect of X on Y
counterfactual
indicates what would happend to the dependent variable if under identical circumstances the independent variable wouldn’t have changed
Ideal procedure establishing causal effects (with counterfactual)
measure the change of Y before to after change in X
measure the change of Y if X had not changed (counterfactual)
test whether the difference is statistically significant of both measurements
sources endogeneity
presence of omitted correlated variables
reverse causality or simultaneity
measurement error in the independent variables
Beta coefficient
replace y and each variable x with a standardized version -> subtract mean and divide be standard deviation
-> coefficients reflects standard deviation change of y for on standard deviation change in x
association studies
causel inferences are difficult to make due to endogeneity concerns
optimal control variables in regression
Z is correlated with X and affects Y -> incl. otherweise endogeneity
Z is correlated with Y but not with X -> incl. to reduce overall error term (more of Y is explained)
Z is correlated with X but not with Y -> excl. because multicollinearity inflates standard error of regression coefficients
disadvanteges fixed effects
no estimations of the effect of certain variables that are constant within the level of fixed effects
reduction of variation in dataset and less precision -> equivalent to de-meaning all variables
crucial assumption Difference in Difference
whatever happened to control group over time would also have happened to treated group in absence of treatment -> parallel trends
Arguments supporting control group
treated & control group should have generally similar characteristic
treated & control grop had similar trajectories for the dependent variable before treatment
Last changeda day ago