Which are the five steps of theory construction (methodology)?
Identifying a relevant phenomena
Observation: People go to sleep when it gets dark
Formulate a prototheory
Idea: darkness makes people tired
Develop a formal model
The longer there is darkness, the more tired do people get
Check the adequacy of the formal model
Does it work? If it doesn’t, go get more data
Evaluate the overall worth of the constructed theory
How are theory, data and phenomenons related to eachother?
Theroy - phenomena
Theory explains/predicts phenomena
Theory is abducted from phenomena
Phenomena - data
Data is generalized into a phenomena
Phenomena predicts data
Explain the differences between theories and models.
Theory
Conceptual framework
Explains or predicts phenomena
Model
Simplified, formal representation of specific relationship
Derived from theory, but NOT a theory itself
Tests, explores and refines theories
Provides a mathematical/empirical framerwork to operationalize theoretical constructs
Why should we formalize theories as statistical computational models?
Models clarify the understanding of complex phenomena
Theories can be vague an imprecise
Models translate those conceptual frameworks into mathematical or computational form, forcing
precision
clarity
Benefit
clarifies assumptions
allows objective scrutiny
Helps identify which parts of a theory are supported by data
Models facilitate iterative theory development
Models serve as a tool to
test theories
explore theories
Refine theories
When model predictions diverge from empirical results, it drives the refinement/rejection of theoretical assumptions
Benefit: promotes continuous improvement and understanding of the world
Models make testable predictions
Theories are generalized statements about relationships
Models allow us to make quantitative predictions
This offers testable and falsifiable predictions
Models quantify how much one variable affects another
Benefits: offer a framework to test hypotheses and validated theoretical claims with empirical data
Models allow generalization and prediction of future outcomes
Models use historical data to predict future observations
Benefit: useful for forecasting inboth research and applied settings
Models inform intervention and policy decisions
Theorie suggest how variables should be related
Models can simulate the effects of interventions and predict how those changes will affect outcomes
Benefit: help policymakers, clinicians or researchers evaluate the impact of interventions before implementing them -> better informed decisions
Explain the differences between predictive and casual models
Predictive models
Predict outcomes in similiar context with high accuracy without needing to understand casuality (A and B often ocurr together, not relevant why the often ocurr together)
Enough for pure prediction
Problematic for when it comes to intervention
Casual models
Generalize predictions to new outcomes
Casual drives make for more effective interventions
What is data?
Observations and measurements from the real world
Starting point of most theories
What is model structure?
Mathematical framework/equation whcih broadly describes how observations are related
Example: simple regression model with
b0 + b1 x Temperature + e
no specific values for parameters, but ther is a slope, a baseline etc.
General framework of how observations are related
What are model parameters?
The value of the parameters which best explain the data are estimated (i.e. how much “is” the slope numerically?)
Model fit procedures adjust the parameters to minimize the difference between model predictions and actual data
Criterion: typically done by minimizing an objective function like the sum of squared errors (SSE) or maximizing likelihood
SEE = sum(observed-predicted)^2
Which are the key components of statistical models?
Data
Model structure
Model parameters
What are Nested observation?
Grouped observations (couples, families, school class, repeated measurements within individual)
Ignoring nesting means assuming independence between observations, leading to:
Biased standard errir (Type 1, Type 2)
Inaccurate effect estimation
Estimations within the same groupe are correlated
Random Effects in Linear Mixed Models account for this dependency by modelling within-group variability
Accounting for nesting allows us to distinguish population-level trends from individual- or group-level deviations, leading to more
Accurate predictions
Inference
Theories
What are linear models?
Include fixed effects
Capture the relationship between predictors and outcomes
Fixed effects (betas): average effects of predictors across all individuals
eij: residual error for the j-th observation of individual i, assumed to be
independent
normally distributed
What are linear mixed effects models?
Model which extend traditional linear models by incorporate random effects to account for individual or group-level variability
Can include only random intercept or also random slope
What are the characteristics a random intercept in a linear mixed effects model?
Formula for LMM with solely a random intercept:
u0i: random effect specific to individual i
Residual error varies for every individual and every obervation, while random intercept only varies across individuals but is constant across observations
Random intercept, therefore the difference in intercept is potentially an interesting difference (might for instance signify a different baseline level in different classes), while the residual error is noise (hence why it has to vary across observations and cannot be constant, it wouldn’t be random otherwise)
It stands for an individuals basis level which deviates from the trend (is thus added to the intercept)
What are the characteristics a random slope in a linear mixed effects model?
Formula for LMM with both random intercept and random slope
Random slope allows the effects of predictor X1 to vary across indiviuals
Which distribution are the random effects in Linear Mixed Effects Models expected to follow?
Multivariate normal distribution
bi = vector of random effects for individual or group i
N = multivariate normal distribution
0 = mean vector, assumed to be zero because the assunmption is that most people do not deviate from the average trend
G = Covariance matrix
What is a multivariate normal distribution?
If multiple variables which have a normal distribution are laid over eachother
Highest point: where all variables “cross”
Outer places are for instance people who might have a normal distribution for two variables, but not the third
What can be said about the covariance matrix G in relation to random effects in Linear Mixed Effects Models?
Covariance matrix of the random effects includes
Variance (diagonal): how much the individuals random effects vary from individual to individual (how strongly can the intercept or the slope differ from the average for each given individual?)
Covariance of random intercept and slope: How (if at all) are those random effects correlated with each other?
G: derivations follow a structured pattern
G can include both random intercepts and random slope, allowing for flexibility in modelling individual or group level variability
o^2 intercept: variance of random intercept
o^2 slope: variance of random slope
ointercept, slope: covariance between random intercept and slope
Positive: individuals with a higher than average intercept also have a higher than average slope
Their starting point is higher than average and increases/decreases “faster”
Negative: individuals with a higher than average intercept temd to have a lower than average slope
their starting point is higher, but it grows/deminished slower
Close to zero: likely no relationship between slope and intercept
Correlation between random intercept and slope helps us understand whether people with higher baseline outcomes also show greater or lesser sensitivity to predictors (random slopes)
How can a (simple) linear model be created in R?
Linearm <- lm(formula = reaction ~ Days, data = sleepstudy)
lm = linear model
reaction ~ Days: reaction predicted by days
data = sleepstudy: used data
summary(Linearm)
How can a linear mixed effcts model with a random intercept be created in R?
LLM1 <- lmer(Reaction ~ days + (1 I Subject), data = sleepstudy)
lmer = LLM formula
(1 I Subject) 1 -> intecept allowed to vary, grouped by subject
summary(LLM1)
How can a linear mixed effcts model with a random intercept and random slope be created in R?
LLM2 <- lmer(Reaction ~ days + (Day I Subject), data = sleepstudy)
summary(LLM2)
Identify a relevant phenomena
Observation: when it gets dark people go to sleep
The longer it is dark outside, the more people go sleep
Does it work? If not, get more data
What is SEM/Structureal Equation Models?
Framework that integrates
Factor Analysis: modeling relationship between observed and latent variables
Linear Regresssion/Path models: modeling casual relationships among variables (observed and latent)
-> encompasses measurement (reating observed to latent variables) and structural models (relating latent to latent variables)
Factor models in SEM representation represent
ovals for latent variables
rectangles for observed variables
Why are SEM/structural equation models useful?
Allows for testing complex relationships between multiple
Dependent and independent variables
Latent and manifest variables
Major reasons for it’s use in psychology are:
Ability to use
multiple variable
noisy variables
observed variables
to estimate a latent variable
The use of multiple dependent variable sometimes better represents a theretical idea than any individual indicator
The ability to include more than one type of dependent variable in the odel allows for models that represent entire theories rather than small pieces
Visual representation sometimes helps to understand implications and correct problems
How should SEM/structural equation models be interpreted?
Path coefficients (arrows)
represent the strength and direction of a relationshop between variables
Direct effect: the effect of one variable directly on another
Analoguous to regression coefficients -> represent how much one variable changes in response to change in another
Direct effects from a latent actor to an indicator (manifest variable) are sometimes calles factor loading (as in factor analysis)
Indirect effect: the effect of one variable on another through a mediator
Covariances (unstandardized) and correlations (standardized)
Variances
Which assumptions to SEM/structural equation models make?
Linearity: relationships between varibales are linear
Multivariate Normality: Residuals should be normally distributed (most near the expected value)
Independence of Residuals: residuals should not contain information about other residuals
No measurement error in predictors: assume no error in the measurement of exogenous variables (predictors) unless explicitly modeled
How do SEM/structural equation models work?
Compare observed mean and covariance matrix to model-implied mean and covariance matrix, given estimated parameters
Evaluate fit: does the model account for the observed means, variances and covariances
Typical basis of comparison: “saturated” (freely estimated) means and covariance matrix
Chi-square difference tests are typical for this, but many fit indices exist (non significant result = good)
What is the general goal of SEM/Structural equation models (or also statistical models in general)?
Enough parameters to represent relationship between data
Relate parameters to theoretical concerns
Avoid “over-fitting” -> minimum number of parameters necessary to explain data
How do you fit a structural equation model that represents change in reaction time without any random effects in R with lavaan?
Specify the simple linear growth model (model structure without model parameters)
name <-
‘i = ~ 1 * Day0 + 1 * Day2 + 1 * Day4 + 1 * Day9
-> this defines the intercept, it is always the same with the same weight for the four days intervall
-> this is about intervalls, not about Data which is contained for those days
s = ~ 0 * Day0 + 2* Day2 + 4 * Day4 + 9* Day9
-> the slope “grows”, i.e. the effect of the slope on the first day of sleep deprivation is none, on the second day times two etc.
i ~ imean * 1
s~ smean * 1
-> Asking lavaan to also estimate slopes and means
Day0 ~~ residualVar *Day0
Day2 ~~ residualVar * Day2
Day4 ~~ residualVar * Day4
Day9 ~~ residualVar * Day9
-> all have the same variance, captures whatever variability was not yet captured’
Fit the model to the data -> now adding actual data to the structure
fit_1 <- lavaan(name, data = sleepstudy)
specify structure + dataset
Summary(fit_1)
What is the output of a summary of the fit between a structural equation model and data?
Model Test User Model:
Test statistic -> chi-square, higher value = worse fit
Degrees of freedom
P-Value (of Chi-Square, below .05 = bad)
Intercept and slope with p value
Variances with p values -> how much of observed variance is NOT explained by model -> lower is better
How do you create a path diagram on a fitted model?
Graph Layout: specifiy where belongs what:
graph <- matrix(c
(NA, NA, NA, ‘s’,
-> this is the visual representation with 4 spaces, where s is completely on the right
‘i’, NA, NA, NA,
‘Day0’, ‘Day2’, ‘Day4’, ‘Day9’), ncol = 4, byrow= TRUE)
-> ncol = 4 means four colons (which makes sense, because it was 4 elements in the NA/s thing)
This is a general structure with no values inserted yet
Insert actual values
graphsem(model = fit_1, layout = graph, spacing_y=2, varaince_diameter =.3)
graph_sem = function
model = fit_1 -> fitted model with structure and data, which was fitted before
layout = structure which was sepcified before with matrix
spacing_y = how much space vertically
variance diameter = size of circles
How do you add a
random slope
random intercept
covariance between random intercept and random slope
Zero covariance between random intercept and slope
in a structural model equation in R (with lavaan)?
random slope: s ~~ s
random intercept: i ~~ i
covariance between random intercept and random slope: i ~~ s
Zero covariance between random intercept and slope: i ~~ 0*s
How do you compare two structural equation models with R?
lavTestLRT(fit1, fit2)
How can this result of the comparison of two structural equation models with lavTestLRT be interpreted?
4 has a slightly smaller AIC and BIC -> better fit
their Chisq is quite similiar, with 3 showing a smaller (and thus somewhat better) fit
Chis diff: difference of Chisq between 3 and 4
Df diff: difference in degree of freedoms, wherein more Df are better -> fit4 better
Pr(> Chisq) = 0.3697, which is bigger than 0.05 -> insignificant, i.e. difference in fit between the models not statistically significant
-> BIC etc not significant, the two models are not statistically significantly different but 4 is simpler and might therefore be prefered
What do path coefficients represent in Structural equation models and which distinction needs to be made?
Path coefficients represent the strength and direction of relationships between variables, similiar to regression coefficients (betas) in standrad linear models
When predictors are uncorrelated, each path coefficient represents the direct effect of that predictor on that specific variable (DV)
i.e. the influence of one predictor (i.e. hours of sleep) on the dependent variable (i.e. concentration) is not dependent on another predictor (i.e. what someone ate)
When predictors are correlated, the path coefficients represent the unique direct effect after accounting for the covariance with the other predictor
i.e. The path of sleep -> concetration represents the effects of sleep alone, while the shared effects with for instance mood are detracted from this path (and accounted for elsewhere)
What are varianca and covariance?
Variance: the range of a variable, i.e. how far the data strays from the expected point
Wurzel(Variance) = standarddeviation
1 STD = 68% of Data in this range
Covariance: how two variables change together, i.e. if one variable increases if another increases
For instance relationship between x and y
If they are uncorrelated, the covariance is 0
Correlation: standardized, takes on values between -1 and 1 (covariation = negative infinite - positive infinite)
How does variance change basen on whether predictors are correlated or uncorrelated (exemplified on the money distribution allegory)?
Experiment:
Both groups get two random amounts of money (with a mean of 100 and a SD of 10) -> normal distribution
One group draws lots twice (independent/uncorrelated)
correlation = 0, Covariance = 0
Another group draws lots once and then gets the exact same amount they have drawn a second time in the second round -> perfectly correlated/covariance
Correlation = 1.0, Covariance = 100
-> Both groups will get (approximately) the same amount of money, but the variance of the second group will be significantly bigger
How is Variance calculated when predictors are correlated vs. when they are uncorrelated?
Sum of (mean - actual value)^2/number of included values (mean)
Variance when predictors are correlated:
Variance draw 1 + Variance draw 2 + Covariance draw 1 + Covariance draw 2
-> if uncorrelated, covariance will be 0
Variance when predictors are uncorrelated:
Variance draw 1 + Variance draw 2 + 2 x Covariance (assuming perfect correlation)$
-> additional variance
What do the single components of this SEM represent?
Triangles
on top: intercept/starting point
Triangle on side: weight of other things, i.e. here how much money is in the wallet already
Circles:
Predictors, here two drawings of money
have their respective variance on the side
Path coefficients
100 = expected mean value
b1 = 1.0 -> weight of path coefficient, similiar to betas
Dashed line: in some cases covariance, in others no covariance
Square: Result, dependent variable
Resvar = rest variance, unexplained
How would the total variance of y be calculated based on this model?
y = var(x1) x b1^2 + var(x2) x b2^2 x cov(x1, x2) x b1 x b2 + resvar(y)
= 100 x 1^2 + 100 x 1^2 x cov(either 0 or 100) x 1 x 1
How would the expected mean of y be calculated based on this model?
1 x expected mean x1 + 1 x expected mean x2 + 1 x path coeff 50 (baseline amoung)
= 1 x 100 x 1 + 1 x 100 x 1 + 1 x 50 = 250
What is RAM notation and what do the single components of RAM notation mean?
Compact way of expressing the different SEM relations, used in some software
I = identity matrix, diagonal of 1
tells you simply how many variables there are
A = Asymetric (direct path) matrix
shows how strong effects are
Asymetric because it might show the effect of x on y but not vice versa
S = Symmetric Matrix (non-direct paths/covariance)
shows covariance -> symmetric, because variables move together
Also shows variance and restvariance
M = Means Vector
average means of variables
What is the model implied covariance and how is it calculated in RAM?
Model implied covariance: how all variables move together
i.e. if you’re connected by a rope and someone pulls, how much does everyone move?
-> model implied means: where does everyone land after the pull?
normally model implied covariance calculated as follows:
(I - A)^-1 x S x (I - A)^-T
= identity matrix (which variables exist) - Asymmetric matrix (one directional effects, i.e. effect of x on y/betas) x Symmetric matrix (covariance) x transposed asymmetric matrix
Takes into account indirect effects, covariance, and transposed effects (i.e. also how y influences x)
Combination of direct, indirect and mutual influence between variables
What is the model implied mean and how is it calculated in RAM?
i.e. if you’re connected by a rope and someone pulls, where does everyone land?
-> model implied covariance: i.e. if e pulls, how much does everyone move?
RAM calculation
(I-A)^-1 x M
I - A^-1 takes into account direct and indirect realtionship
M = basic means = adjusting average values of variables based on how they influence each other
What needs to be done to make different measurement types (i.e. in latent variables) equivalent? What are the implications for latent variables in psychology?
To make things equivalent, we might need to allow for differences in
Factor loading
Measurement specific intercepts
With latent variables (and thus often in psychological measurements), we do not know the intercept and the factor loading, therefore we have to estimate the intecept and the factor loadings
What is an important step to take before interpreting estiamted model parameters and why is it relevant?
Before interpreting estimated model parameters, it is important to ensure the the model which has been fit to data provides a good representation of ther underlying relationship in the data
i.e data interpretation will be faulty if certain covarainces are not represented in the model
What is model fit?
When a model which has been fit to data provides a good representation of the underlying relationship
What happens when data is fit to a model which does not adequately represent the relationship between the variables (i.e. bad model fit)?
Model is likely to make poor predictions regarding future data
The estimated parameters may misinform us about relations between psychological constructs
Extreme examples: Assuming independence between different extraversion measures
When is the least squares approach a good approach to find the best model fit and when does it make sense to use another statistical approach?
LS good choice for estimating the parameters of linear regression models
Residual variation is assumed to be the same across all data points -> makes sense to use least squre, as this approach aims to find estimates which minimise the sum of squared residuals
Not a very good choice for SEM
There are different dependent variables, hence also often different residual variances
100 residual variance for one variable might be not so much, while for another variable a residual variance of 0.2 might be a lot
LS would however focus on minimising the RV of the 100 variable because it is bigger (neglecting how it might change the RV of the 0.2 variable) -> relative importance of the residuals very wrong
-> Some means is needed to weight the residuals while accounting for their relative importance (i.e. how large a prediction error they represent) -> likelihood appriach
What defines the likelihood of a row of data in typical SEM modelling? What makes for a better likelihood?
The multivariate normal distribution with an expected mean and covariance matrix
The closer
Zuletzt geändertvor einem Tag