What is a distribution of a variable?
A distribution says how often given values occur when you randomly sample that variable over and over
-> e.g. the distribution of a count toss is that half the time it gives you Head and half the time it gives you Tail
Summarizing categorical variables
If the variable is categorical (Takes several discrete values, like Heads and Tails), often the best way to describe its distribution is just count the number (or fraction) of observations in each category
Summarizing continuous variables
When the variable is continuous, we cannot exactly count the number of times each value comes up
-> So we look at the number of times it falls within a particular interval of value (e.g. in each decile by using a histogram)
Descriptive statistics & Key concepts
Descriptive statistics are a set of techniques used to summarize and describe the main features of a dataset. Also called summary statistics - they provide a summary of what the distribution looks like
Means and Medians: Describe where the center of the distribution is
Percentiles (e.g. 25th and 75th): Describe the value below which a given percentage of observations fall
Standard deviations and variances: Describe how spread out the distribution ist
What does it mean to be related - Dependence & Correlation
Variables are dependent on each other if telling you the value of one gives you information about the distribution of the other
Dependency: Implies a broader concept of one variable influencing or being related to another.
Variables are correlated if knowing whether one of them is unusually high gives you information about whether the other is unusually high (positive correlation) or unusually low (negative correlation)
Correlation: Specifically measures the linear relationship between two variables, providing a quantitative measure of the strength and direction of that relationship.
-> Example of negative correlation: Population and growth in GDP per capita
Explaining one variable Y with another X means predicting your Y by looking at the distribution of Y for your value of X
Example: Output per worker and Capital per worker
The Solow model suggests that capital per worker k is an essential determinant of output per worker y
Question: Is there any relationship between these two variables?
Check for dependence
Check for correlation
Using descriptive statistics:
Using a scatter plot:
Dependence
For dependence, simply see if the distribution of one variable changes for the different values of the other
Correlation
We are interested in whether two variables tend to move together (positive correlation) or move apart (negative correlation)
-> One basic way to do this is to see whether values tend to be high together
Count the observations above the median for both variable
Correlation ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation)
Important: Just because two variable are related doesn’t mean we know why
-> If corr(k,y) is positive, it could be that k causes y … or that y causes k, or that something else causes both!
What is a Regression?
A measure of the relation between the mean value of one variable (e.g. output per worker) and corresponding values of other variables (e.g. capital per worker)
A regression model quantifies the relationship between a given variable (y) and one or more other variables (x1,x2,…)
What are the 3 questions to ask in order to find out the relationship between k and y?
Is there any relationship between these two variables
-> There seems to be a relationship
If yes, what sign does this relationship have?
-> It is likely to be positive
What is the magnitude of this relationship?
-> We don’t have enough information
-> We need a regression model to answer question 3
Bivariate Linear Regression Model:
Steps to build a regression model (4)
Formulate a clear question of interest
-> How does a country’s k affect its y
Construct an economic model to guide your understanding of the problem
-> y = f(x)
-> In the standard notation of regressions, the explained variable (on the left) is denoted by y and the explanatory variable (on the right) is denoted by x
Find the data for the variables that you need
Turn the economic model into an econometric model, specifying the form of the function f(x), such as
General construct of a Bivariate Linear Regression Model
A two-variable (bivariate) linear regression model can be expressed with the following regression function:
Where:
How does the Bivariate Linear Regression Model look in our example? How can data differ from estimations (Population regression function & Sample regression function)?
How does the fitted line look like? Which parameters represent the intercept and the slope?
What does it mean to best fit the data?
How can we finde the one line that best fits the data?
-> Ordinary Least Squares (OLS)
How does Ordinary Least Squares work?
How to interpret Regression Results & Circle back to 3 questions
The estimated coefficients are reported next to the variable name
The standard errors of the estimates indicate the precision of the estimates and are reported in the brackets below the coefficients
The stars flag how big the confidence intervals (i.e. precision levels) are
-> They indicate a level of significance
(*) less than 0.05
(**) less than 0.01
(***) less than 0.001
The R-Squared (between 0 and 1) is a measure of fit: it reports the fraction of variance of the dependent variable explained by the explanatory variables
Units of Measurement
Nonlinear Relationship & Interpretation of regression coefficients
Estimating the Solow Model & Results
Results:
Positive relationship between savings rate and log GDP per worker
Negative relationship between population growth and log GDP per worker
Both relationships are statistically significant (high estimation precision)
Differences in savings and population growth account for a large fraction of the cross-country variation in income per capita (e.g. for the intermediate sample, the R-squared is 0.59)
-> However, the implied α is much higher than 1/3 (e.g. 0.59 for the intermediate sample) - casting some doubts on the textbook Solow model
What is causality? & Examples (obvious & less obvious)
We say that X causes Y if we were to intervene and change the value of X without changing anything else then Y would also change as a result
Examples of non-causal relation (Reverse causality & Omitted variable)
Non-zero correlations sometimes are not causal
This typically happens when:
The direction of causality is Y -> X (reverse causality)
-> Rooster crowing sounds are followed closely by sunrise
There is a third variable (omitted variable) that causes both X and Y
-> People tend to wear shorts on days when ice cream trucks are out (omitted variable: weather determines both)
-In case of multiple determinants, if we don’t consider one of them, we can estimate an incorrect magnitude of the effect of X the omitted variable causes a bias
In some cases, the variables jointly determine each other, Y <-> X (simultaneity)
-> Equilibrium price of a good and quantity demanded
What is causal inference, and what is a counterfactual? How can we do this?
The main goal we have in doing causal inference is in making as good a guess as possible as to what that Y would have been if X had been different
That “would have been” is called a counterfactual - counter to the fact of what actually happened
In doing so, we want to think about two people/firms/countries that are basically exactly the same except that one has X=0 and one has X=1
Experiments:
A common way to do this in many fields is an experiment (e.g. Randomized Controlled Trials)
-> If you can randomly assign X, then you know that the people with X=0 are, on average, exactly the same as the people with X=1
Models:
When we’re working with people/firms/countries, running experiments is often infeasible, impossible, or unethical
So we have to think hard about a model of what the world looks like We can use our model to figure out what the counterfactual would be
In causal inference, the model is our idea of what we think is the process that generated the data
We have to make some assumptions about what this is!
Last changeda year ago