undefined

Buffl

Statistik 3

by 12DayFIsh

Which are the five steps of theory construction (methodology)?

Identifying a relevant phenomena
- Should be generalizations which fulfill following criteria
  - robust
  - stable
  - reproducable
- Finding a good, stable phenomena very important, since this is the basis of the theory
- Better to choose boring but stable phenomena than spectacular phenomena
- Observation: People go to sleep when it gets dark
Formulate a prototheory
- Done with abductive reasoning
- Based on small set of generally valid principles, which could explain phenomena
- Normally in this step represented verbally
- Idea: darkness makes people tired
Develop a formal model
- Generally valid principles are translated into rules of equations
- Not the same as data model
- Should help transcend cognitive
- The longer there is darkness, the more tired do people get
Check the adequacy of the formal model
- Does it work? If it doesn’t, go get more data
Evaluate the overall worth of the constructed theory

How are latente growth models and gerenal structural equation models (SEM) different?

Latent growth models are part of SEM BUT
LG models specifically model changes over different time points
SEM more generally models relationships between observed and lateten (or latent and latent and obsrved and observed) variables which can be causal or factor loadings

What are interecepts broadly speaking and what is their meaning in

Regression Models
Latent Growth Models
General SEM Models

Defining features of intercepts across all models
- General terms
- reflects the constant value which is added to variables
- Provides a reference point for understanding the baseline or average level of variable when other predictors in the model are zero or have no effect

Intercepts in regression Models
- B0
- Predicted value of the independent variable 0 if all predictors/independent variables (x1, x2 etc.) are zero
Intercepts in latent Growth Models
- Represents the initial level or starting point of an individuals trajectory over time
- Captures individual differences in initial levels across participants when random variations are allowed
- there might also be an intercept for some or all observed variables, representing their deviation above or below the implied latent trajectory (i.e. intercept)
Intercepts in general SEM Models
- Each observed variable has an intercept, representing its expected value when all predictors (direct paths towards it) are zero
- Intercepts contribute to the mean structure in SEM, enabling it to represent non-zero means across variables
- Example in CFA model on personality traits:
  - Intercepts represent average item response or latent trait means

How are theory, data and phenomenons related to eachother?

Theroy - phenomena
- Theory explains/predicts phenomena
  - which is made visible by data
- Theory is abducted from phenomena
- If the world is how theory A says it is, then phenomena A must be true
Phenomena - data
- Data is generalized into a phenomena
- Phenomena predicts (specific pattern in) data
- Data offers evidence for the existence of phenomenas

Explain the differences between theories and models.

Theory
- Conceptual framework
- Explains or predicts phenomena
Model
- Simplified, formal representation of specific relationship
- Derived from theory, but NOT a theory itself
- Tests, explores and refines theories
- Provides a mathematical/empirical framerwork to operationalize theoretical constructs

Why should we formalize theories as statistical computational models?

Models clarify the understanding of complex phenomena
- Theories can be vague an imprecise
- Models translate those conceptual frameworks into mathematical or computational form, forcing
  - precision
  - clarity
- Benefit
  - clarifies assumptions
  - allows objective scrutiny
  - Helps identify which parts of a theory are supported by data
Models facilitate iterative theory development
- Models serve as a tool to
  - test theories
  - explore theories
  - Refine theories
- When model predictions diverge from empirical results, it drives the refinement/rejection of theoretical assumptions
- Benefit: promotes continuous improvement and understanding of the world
Models make testable predictions
- Theories are generalized statements about relationships
- Models allow us to make quantitative predictions
  - This offers testable and falsifiable predictions
  - Models quantify how much one variable affects another
- Benefits: offer a framework to test hypotheses and validated theoretical claims with empirical data
Models allow generalization and prediction of future outcomes
- Models use historical data to predict future observations
- Benefit: useful for forecasting inboth research and applied settings
Models inform intervention and policy decisions
- Theorie suggest how variables should be related
- Models can simulate the effects of interventions and predict how those changes will affect outcomes
- Benefit: help policymakers, clinicians or researchers evaluate the impact of interventions before implementing them -> better informed decisions

Explain the differences between predictive and casual models

Predictive models
- Predict outcomes in similiar context with high accuracy without needing to understand casuality (A and B often ocurr together, not relevant why the often ocurr together)
  - Enough for pure prediction
  - Problematic for when it comes to intervention
Casual models
- Generalize predictions to new outcomes
- Casual drives make for more effective interventions

What is data?

Observations and measurements from the real world
Starting point of most theories

What is model structure?

Mathematical framework/equation which broadly describes how observations are related
Example: simple regression model with
- b0 + b1 x Temperature + e
  - no specific values for parameters, but ther is a slope, a baseline etc.
- General framework of how observations are related

What are model parameters?

The value of the parameters which best explain the data are estimated (i.e. how much “is” the slope numerically?)
Model fit procedures adjust the parameters to minimize the difference between model predictions and actual data
- Criterion: typically done by minimizing an objective function like the sum of squared errors (SSE) or maximizing likelihood
- SEE = sum(observed-predicted)^2

Which are the key components of statistical models?

Data
Model structure
Model parameters

What are Nested observation?

Grouped observations (couples, families, school class, repeated measurements within individual)
Ignoring nesting means assuming independence between observations, leading to:
- Biased standard errir (Type 1, Type 2)
- Inaccurate effect estimation
Estimations within the same groupe are correlated
Random Effects in Linear Mixed Models account for this dependency by modelling within-group variability
Accounting for nesting allows us to distinguish population-level trends from individual- or group-level deviations, leading to more
- Accurate predictions
- Inference
- Theories

What are linear models?

Include fixed effects
Capture the relationship between predictors and outcomes
Fixed effects (betas): average effects of predictors across all individuals
eij: residual error for the j-th observation of individual i, assumed to be
- independent
- normally distributed

What are linear mixed effects models?

Model which extend traditional linear models by incorporate random effects to account for individual or group-level variability
Can include only random intercept or also random slope

What are the characteristics of a random intercept in a linear mixed effects model? How is it different from the residual error?

Formula for LMM with solely a random intercept:

u0i: random effect specific to individual i
Residual error varies for every individual and every obervation, while random intercept only varies across individuals but is constant across observations
Random intercept, therefore the difference in intercept is potentially an interesting difference (might for instance signify a different baseline level in different classes), while the residual error is noise (hence why it has to vary across observations and cannot be constant, it wouldn’t be random otherwise)
It stands for an individuals basis level which deviates from the trend (is thus added to the intercept)

What are the characteristics a random slope in a linear mixed effects model?

Formula for LMM with both random intercept and random slope
Random slope allows the effects of predictor X1 to vary across indiviuals

Which distribution are the random effects in Linear Mixed Effects Models expected to follow?

Multivariate normal distribution
bi = vector of random effects for individual or group i
N = multivariate normal distribution
0 = mean vector, assumed to be zero because the assunmption is that most people do not deviate from the average trend
G = Covariance matrix

What is a multivariate normal distribution?

If multiple variables which have a normal distribution are laid over eachother
Highest point: where all variables “cross”
- Outer places are for instance people who might have a normal distribution for two variables, but not the third

What can be said about the covariance matrix G in relation to random effects in Linear Mixed Effects Models?

Covariance matrix of the random effects includes
- Variance (diagonal): how much the individuals random effects vary from individual to individual (how strongly can the intercept or the slope differ from the average for each given individual?)
- Covariance of random intercept and slope: How (if at all) are those random effects correlated with each other?
G: derivations follow a structured pattern
G can include both random intercepts and random slope, allowing for flexibility in modelling individual or group level variability
o^2 intercept: variance of random intercept
o^2 slope: variance of random slope
ointercept, slope: covariance between random intercept and slope
- Positive: individuals with a higher than average intercept also have a higher than average slope
  - Their starting point is higher than average and increases/decreases “faster”
- Negative: individuals with a higher than average intercept temd to have a lower than average slope
  - their starting point is higher, but it grows/deminished slower
- Close to zero: likely no relationship between slope and intercept
Correlation between random intercept and slope helps us understand whether people with higher baseline outcomes also show greater or lesser sensitivity to predictors (random slopes)

How can a (simple) linear model be created in R?

Linearm <- lm(formula = reaction ~ Days, data = sleepstudy)
- lm = linear model
- reaction ~ Days: reaction predicted by days
- data = sleepstudy: used data
summary(Linearm)

How can a linear mixed effcts model with a random intercept be created in R?

LLM1 <- lmer(Reaction ~ days + (1 I Subject), data = sleepstudy)

lmer = LLM formula
(1 I Subject) 1 -> intecept allowed to vary, grouped by subject

summary(LLM1)

How can a linear mixed effcts model with a random intercept and random slope be created in R?

LLM2 <- lmer(Reaction ~ days + (Day I Subject), data = sleepstudy)

summary(LLM2)

Which are the five steps of theory construction (methodology)?

Identify a relevant phenomena
1. Observation: when it gets dark people go to sleep
Formulate a prototheory
1. Idea: darkness makes people tired
Develop a formal model
1. The longer it is dark outside, the more people go sleep
Check the adequacy of the formal model
1. Does it work? If not, get more data
Evaluate the overall worth of the constructed theory

What is SEM/Structureal Equation Models?

Framework that integrates
- Factor Analysis: modeling relationship between observed and latent variables
- Linear Regresssion/Path models: modeling casual relationships among variables (observed and latent)
-> encompasses measurement (reating observed to latent variables) and structural models (relating latent to latent variables)
Factor models in SEM representation represent
- ovals for latent variables
- rectangles for observed variables

Why are SEM/structural equation models useful?

Allows for testing complex relationships between multiple
- Dependent and independent variables
- Latent and manifest variables
Major reasons for it’s use in psychology are:
- Ability to use
  - multiple variable
  - noisy variables
  - observed variables
    - to estimate a latent variable
The use of multiple dependent variable sometimes better represents a theretical idea than any individual indicator
The ability to include more than one type of dependent variable in the odel allows for models that represent entire theories rather than small pieces
Visual representation sometimes helps to understand implications and correct problems

How should SEM/structural equation models be interpreted?

Path coefficients (arrows)
- represent the strength and direction of a relationshop between variables
  - Direct effect: the effect of one variable directly on another
    - Analoguous to regression coefficients -> represent how much one variable changes in response to change in another
    - Direct effects from a latent actor to an indicator (manifest variable) are sometimes calles factor loading (as in factor analysis)
  - Indirect effect: the effect of one variable on another through a mediator
  - Covariances (unstandardized) and correlations (standardized)
  - Variances

Which assumptions do SEM/structural equation models make?

Linearity: relationships between varibales are linear
Multivariate Normality: Residuals should be normally distributed (most near the expected value)
Independence of Residuals: residuals should not contain information about other residuals
No measurement error in predictors: assume no error in the measurement of exogenous variables (predictors) unless explicitly modeled

How do SEM/structural equation models work?

Compare observed mean and covariance matrix to model-implied mean and covariance matrix, given estimated parameters
Evaluate fit: does the model account for the observed means, variances and covariances
- Typical basis of comparison: “saturated” (freely estimated) means and covariance matrix
- Chi-square difference tests are typical for this, but many fit indices exist (non significant result = good)

What is the general goal of SEM/Structural equation models (or also statistical models in general)?

Enough parameters to represent relationship between data
Relate parameters to theoretical concerns
Avoid “over-fitting” -> minimum number of parameters necessary to explain data

How do you fit a structural equation model that represents change in reaction time without any random effects in R with lavaan?

Specify the simple linear growth model (model structure without model parameters)
- name <-
  - ‘i = ~ 1 * Day0 + 1 * Day2 + 1 * Day4 + 1 * Day9
    -> this defines the intercept, it is always the same with the same weight for the four days intervall
    -> this is about intervalls, not about Data which is contained for those days
    s = ~ 0 * Day0 + 2* Day2 + 4 * Day4 + 9* Day9
    -> the slope “grows”, i.e. the effect of the slope on the first day of sleep deprivation is none, on the second day times two etc.
    i ~ imean * 1
    s~ smean * 1
    -> Asking lavaan to also estimate slopes and means
    Day0 ~~ residualVar *Day0
  - Day2 ~~ residualVar * Day2
  - Day4 ~~ residualVar * Day4
  - Day9 ~~ residualVar * Day9
    -> all have the same variance, captures whatever variability was not yet captured’
Fit the model to the data -> now adding actual data to the structure
fit_1 <- lavaan(name, data = sleepstudy)
- specify structure + dataset
Summary(fit_1)

What is the output of a summary of the fit between a structural equation model and data?

Model Test User Model:
- Test statistic -> chi-square, higher value = worse fit
- Degrees of freedom
- P-Value (of Chi-Square, below .05 = bad)
Intercept and slope with p value
Variances with p values -> how much of observed variance is NOT explained by model -> lower is better

How do you create a path diagram on a fitted model?

Graph Layout: specifiy where belongs what:
graph <- matrix(c
(NA, NA, NA, ‘s’,
-> this is the visual representation with 4 spaces, where s is completely on the right
‘i’, NA, NA, NA,
‘Day0’, ‘Day2’, ‘Day4’, ‘Day9’), ncol = 4, byrow= TRUE)
-> ncol = 4 means four colons (which makes sense, because it was 4 elements in the NA/s thing)
This is a general structure with no values inserted yet
Insert actual values
graphsem(model = fit_1, layout = graph, spacing_y=2, varaince_diameter =.3)
- graph_sem = function
- model = fit_1 -> fitted model with structure and data, which was fitted before
- layout = structure which was sepcified before with matrix
- spacing_y = how much space vertically
- variance diameter = size of circles

How do you add a

random slope
random intercept
covariance between random intercept and random slope
Zero covariance between random intercept and slope

in a structural model equation in R (with lavaan)?

random slope: s ~~ s
random intercept: i ~~ i
covariance between random intercept and random slope: i ~~ s
Zero covariance between random intercept and slope: i ~~ 0*s

How do you compare two structural equation models with R?

lavTestLRT(fit1, fit2)

How can this result of the comparison of two structural equation models with lavTestLRT be interpreted?

4 has a slightly smaller AIC and BIC -> better fit
their Chisq is quite similiar, with 3 showing a smaller (and thus somewhat better) fit
Chis diff: difference of Chisq between 3 and 4
Df diff: difference in degree of freedoms, wherein more Df are better -> fit4 better
Pr(> Chisq) = 0.3697, which is bigger than 0.05 -> insignificant, i.e. difference in fit between the models not statistically significant
-> BIC etc not significant, the two models are not statistically significantly different but 4 is simpler and might therefore be prefered

What do path coefficients represent in Structural equation models and which distinction needs to be made?

Path coefficients represent the strength and direction of relationships between variables, similiar to regression coefficients (betas) in standrad linear models
When predictors are uncorrelated, each path coefficient represents the direct effect of that predictor on that specific variable (DV)
- i.e. the influence of one predictor (i.e. hours of sleep) on the dependent variable (i.e. concentration) is not dependent on another predictor (i.e. what someone ate)
When predictors are correlated, the path coefficients represent the unique direct effect after accounting for the covariance with the other predictor
- i.e. The path of sleep -> concetration represents the effects of sleep alone, while the shared effects with for instance mood are detracted from this path (and accounted for elsewhere)

What are variance and covariance?

Variance: the range of a variable, i.e. how far the data strays from the expected point
- Wurzel(Variance) = standarddeviation
  - 1 STD = 68% of Data in this range
Covariance: how two variables change together, i.e. if one variable increases if another increases
- For instance relationship between x and y
- If they are uncorrelated, the covariance is 0
- Correlation: standardized, takes on values between -1 and 1 (covariation = negative infinite - positive infinite)

How does variance change basen on whether predictors are correlated or uncorrelated (exemplified on the money distribution allegory)?

Experiment:
- Both groups get two random amounts of money (with a mean of 100 and a SD of 10) -> normal distribution
- One group draws lots twice (independent/uncorrelated)
  - correlation = 0, Covariance = 0
- Another group draws lots once and then gets the exact same amount they have drawn a second time in the second round -> perfectly correlated/covariance
  - Correlation = 1.0, Covariance = 100
-> Both groups will get (approximately) the same amount of money, but the variance of the second group will be significantly bigger

How is Variance calculated when predictors are correlated vs. when they are uncorrelated?

Sum of (mean - actual value)^2/number of included values (mean)
Variance when predictors are correlated:
- Variance draw 1 + Variance draw 2 + Covariance draw 1 + Covariance draw 2
  -> if uncorrelated, covariance will be 0
Variance when predictors are uncorrelated:
- Variance draw 1 + Variance draw 2 + 2 x Covariance (assuming perfect correlation)$
  -> additional variance

What do the single components of this SEM represent?

Triangles
- on top: intercept/starting point
- Triangle on side: weight of other things, i.e. here how much money is in the wallet already
Circles:
- Predictors, here two drawings of money
- have their respective variance on the side
Path coefficients
- 100 = expected mean value
- b1 = 1.0 -> weight of path coefficient, similiar to betas
- Dashed line: in some cases covariance, in others no covariance
Square: Result, dependent variable
Resvar = rest variance, unexplained

How would the total variance of y be calculated based on this model?

y = var(x1) x b1^2 + var(x2) x b2^2 x cov(x1, x2) x b1 x b2 + resvar(y)
- = 100 x 1^2 + 100 x 1^2 x cov(either 0 or 100) x 1 x 1

How would the expected mean of y be calculated based on this model?

1 x expected mean x1 + 1 x expected mean x2 + 1 x path coeff 50 (baseline amoung)
= 1 x 100 x 1 + 1 x 100 x 1 + 1 x 50 = 250

What is RAM notation and what do the single components of RAM notation mean?

Compact way of expressing the different SEM relations, used in some software
- I = identity matrix, diagonal of 1
  - tells you simply how many variables there are
- A = Asymetric (direct path) matrix
  - shows how strong effects are
  - Asymetric because it might show the effect of x on y but not vice versa
- S = Symmetric Matrix (non-direct paths/covariance)
  - shows covariance -> symmetric, because variables move together
  - Also shows variance and restvariance
- M = Means Vector
  - average means of variables

What is the model implied covariance and how is it calculated in RAM?

Model implied covariance: how all variables move together
- i.e. if you’re connected by a rope and someone pulls, how much does everyone move?
  -> model implied means: where does everyone land after the pull?
normally model implied covariance calculated as follows:
- (I - A)^-1 x S x (I - A)^-T
  = identity matrix (which variables exist) - Asymmetric matrix (one directional effects, i.e. effect of x on y/betas) x Symmetric matrix (covariance) x transposed asymmetric matrix
- Takes into account indirect effects, covariance, and transposed effects (i.e. also how y influences x)
- Combination of direct, indirect and mutual influence between variables

What is the model implied mean and how is it calculated in RAM?

i.e. if you’re connected by a rope and someone pulls, where does everyone land?
-> model implied covariance: i.e. if e pulls, how much does everyone move?
RAM calculation
- (I-A)^-1 x M
  - I - A^-1 takes into account direct and indirect realtionship
  - M = basic means = adjusting average values of variables based on how they influence each other

What needs to be done to make different measurement types (i.e. in latent variables) equivalent? What are the implications for latent variables in psychology?

To make things equivalent, we might need to allow for differences in
- Factor loading
- Measurement specific intercepts
With latent variables (and thus often in psychological measurements), we do not know the intercept and the factor loading, therefore we have to estimate the intecept and the factor loadings

What is an important step to take before interpreting estiamted model parameters and why is it relevant?

Before interpreting estimated model parameters, it is important to ensure the the model which has been fit to data provides a good representation of ther underlying relationship in the data
- i.e data interpretation will be faulty if certain covarainces are not represented in the model

What is model fit?

When a model which has been fit to data provides a good representation of the underlying relationship

What happens when data is fit to a model which does not adequately represent the relationship between the variables (i.e. bad model fit)?

Model is likely to make poor predictions regarding future data
The estimated parameters may misinform us about relations between psychological constructs
Extreme examples: Assuming independence between different extraversion measures

When is the least squares approach a good approach to find the best model fit and when does it make sense to use another statistical approach?

LS good choice for estimating the parameters of linear regression models
- Residual variation is assumed to be the same across all data points -> makes sense to use least squre, as this approach aims to find estimates which minimise the sum of squared residuals
Not a very good choice for SEM
- There are different dependent variables, hence also often different residual variances
- 100 residual variance for one variable might be not so much, while for another variable a residual variance of 0.2 might be a lot
- LS would however focus on minimising the RV of the 100 variable because it is bigger (neglecting how it might change the RV of the 0.2 variable) -> relative importance of the residuals very wrong
  -> Some means is needed to weight the residuals while accounting for their relative importance (i.e. how large a prediction error they represent) -> likelihood appriach

What defines the likelihood of a row of data in typical SEM modelling? What makes for a better likelihood?

The multivariate normal distribution with an expected mean and covariance matrix
A higher likelihood results when an observation
- is close to the expected mean
- has a small variance
- has a small covariance

What can be said about the connection between likelihoid and log?

In SEM, for numerical reasons we normally work with the log likelihood of all the data we are fitting
This is done by taking the sum of the log likelihoods of each row of data
= sum of log of multivariate normal density of set of observations y given the expectedmean and expected covariance matrix
Sometimes in SEM modelling, it has become convetion to instead use two times the negative log likelihood (-2LL) and minismise it rather than maximize it

What are nested models? Which model comparison can be used for nested models?

Nested models: model that can be obtained from a more complex model by constraining some of its parameters -> simpler
- i.e. setting factor loading or covariance to zero
- removing a predictor
  -> fewer free parameters which have to be estimated
- Also called a Null moden or restricted model
Alternative model: more complex model which includes additional free parameters
- also called unrestricted model
  -> if the more complex model is not a significantly better fit, the null model is kept because otherwise there is a risk of overfitting the data
Test used for comparison: the Likelihood Ratio Test (LRT)

What does the likelihood test ratio do?

Test used to compare null model and more complex model
Test compares the log likelihood of the two tests
- LLunrestricted = log likelihood of the more complex/unrestricted model
- LLrestricted = Log likelihood of the simpler (restricted)/ Null model
Likelihood ratio test statistic (chi^2) is given by
- chi^2 = -2 x (LLrestricted - LLunrestricted)
  -> always results in a positive value, since LL restricted < LLunrestricted (and is then multiplied by -)
The statistic follows a chi-square distribution with degrees of freedom equal to the difference between the number of free parameters between the two models

What is a chi^2 distribution?

Difference in frequency between expected frequency (of an event) and real, observed frequency
Answers the question, whether such a difference between expected and real frequency is still likely

How should the likelihood ration test be interpreted?

If the test statistic is significant (p-value below specified threshold like 0.05) it suggests that the difference in log likelihood us unlikely due to random sampling variation
-> the more complex model provides a significantly better fit to the data than the simple model
If the test statistics is not significant, it means that the simpler model fits the data just as well as the more complex model, thus the simpler model must be preferred
- The more complex model might still fit the data better, but not enough to justify the inclusion of extra parameter
- The extra parameters are also more likely to be overfitting the data and thus worsening the performance for predicting new data and interring relations

What simple way exists to check whether a model actually fits the data? How can this be implemented in R?

Comparing
- the model implied means with the actual means
- the model implied covariance matrix and the actual covariance matrix (variance and covariance)
In R
- inspectSampleCov(model_1, big5)
  - = sample covariance & means matrix of actual data
- OR
- lavInspect(fit_piqDuration, "sampstat")
- lavInspect(fit_1, what=’exp’)
  - expected covariance and mean matrix
OR
- lavInspect(fit_1, what=’res’)
  -> output is the difference between observed - expected data -> residual covariance, means and variance
OR
- residuals(fit_piqDuration, type='cor')
BUT: does not tell us if those differences are significant
Could be checked by adding an additional parameter and conduction a likelihood ratio test (thus comparing the two tests)

When does standardization of data make sense?

Sometimes working with parameters estimates based on raw units of measuerment data makes more sense, especially if the unit of measurement has obvious or understood implications (things like age, time, temperature etc.)
- “increase in number of words known per year of age in chldren”
For psychological variables there is often no such implicit meaning of the measurement scales used - a 5 of extraversion does not mean anything without further context, and extraversion might have a slide of 0 to 5 in one measurement and a slide of 0 - 100 in another
-> in such cases it is much easier to interpret standardized estimates
Besides fixing the latent variance to 1 and scaling the observed variables so they have a variance of 1, theres an easier solution -> calculate the appropriate standardized form of estimates based on the unstandardized estimates

What are standardized factor loadings and how are they calculated?

Raw factor loadings (Lambda): how much an observed variable changes for a one-unit change in the latent variable
- If the observations have different scales (i.e. one observation measured in kg, another in meters etc.) the raw factor loadings are difficult to compare (is an increase by 5 kg more than an increase by 5 meter?) -> solution: standardization
calculating a standradized factor loading
- Lambdastd = sqrt(o^2F/o^2x)
  - o^2F = the variance of the observed variable which is explained by the latent factor
  - o^2x = total variance of the observed variable
- If the variables are already standardized (which is often the case in SEM), i.e. the total variance is set to 1, this results in
  - sqrt(1 - resvar(observed variable)/1) = sqrt(explained variance)/sqrt(1 - resvar)
The standardized factor loading is equivalent to the correlation between the latent factor and the observed variable
interpretation: if the latent factor increases by one standard deviation, the observed variable increases by the standardized factor loading

What can be said about standardized variance in SEM?

In a standardized solution, all variances are expressed with respect to a total variance for each variable (latent and observed) of 1.00
- Latent factor variance:
  - Variance of the latent factor is set to 1 in standardized solutoin
  - therefor the latent factor is interpreted as a standardized latent construct
- Resiudal variance: the residual variance is also standardized and represents the proportion of variance in an observed variable that is not explained by the latent factor
  - 1 - Lambdastd, since lambdasta is represents the proportion of explained variance

How can covariances between latent factors or observed variables be standardized to correlatoin?

pXY = Cov(X, y) / ( oX x Oy)

oX = standard devition of varibale X
Cov-> Assuming there is one latent factor Y connected to two observed variables X1 and X2
- Variance(y) x factor loading path(y -> x1) x factor loading path(y -> x2)

What are the benefits of standardization?

Interpretability: Standardized estimates are easier to interpret because they express relationships as proportions of variance or correlation -> uni-free and comparable across variables
Comparison: Direct comparison of
- factor loadings
- correlations
- variances
across different
- models
- variables
- studies
Even if they use different measurement scales!
In softwares like lavaan in R, standardized estimates are often included by the output by using options liek standardized = TRUE or by requesting standardized solutions in model summaries

How is covariance between two observed variables calcualted in SEM?

Assuming there is one latent factor Y connected to two observed variables X1 and X2
- Variance(y) x factor loading path(y -> x1) x factor loading path(y -> x2)

How can two (fitted) models be compared and what is the output thereof?

input: anova(model_1, model_2)
Output: AIC, BIC, ChiSq

How would you interpret those model-comparison outputs:

Slope model has
- Smaller AIC
- Smaller BIC
Aditionally, the more complex (slope) mode has a signficant chi^2 value (p < .05), indicating it is a better fit
Without the direct model comparison, the amount of explained variance also indicated a better fit for the slope model

Fit a model in r for a structural equation model representing the relationship between two latent factors (linear regression). What would be needed for a random intercept and random slope?

Create model (model structure):

model1 <- ‘

i =~ 1*time0 + 1*time1 + 1*time2 + 1*time3

s =~ 0*time0 + 1*time1 + 2*time2 + 3*time3

i ~ imean*1

s ~ smean*1

time0 ~~ residualVar * time0

time1 ~~ residualVar * time1

time2 ~~ residualVar * time2

time3 ~~ residualVar * time3

i ~~ ivar * i

s ~~ svar * s

i ~~ covsi * s -> allows for covaraiance between intercept and slope

’

Fit to data/parameters:

fit_model <- lavaan(model = model1, data = modeldata)

Represent data

summary(fit_model)

What is the formula for calculating the expected variance of an observed variable in SEM

Observed variable: A
Latent variable influenceing A: B, C
Formula
Residual variance A +
+ Variance B x (Path B-A)^2
+ Variance C x (Path C-A)^2
+ 2 x Cov(B, C) x (Path C-A) x (Path B-A)

-> root of this for standard deviation

What can be said about latent factor models over time?

A factor showing the same concept at different times can look just like a model with multiple latent factors relating to each other
The “two” latent factors (i.e. Extraversion as a child vs. as an adult) probably have a certain covariance -> could tell us something about the stability of this latent factor
Requirements for estimation are the same for factors over time
- we need to fix a certain amount of parameters to specific values
Requirements for interpretation might differ
- Measurement parameters might be different, i.e. one item on an extraversion scale might be highly relevant as a child (playing with others) and way less relevant as an adult

What is measurement invariance (in SEM)?

Measurement invariance = when none of the measurement parameters differ across factors, i.e. measurement properties do not vary
- For instance if extraversion is measured at childhood (E1) and adulthood (E2), measurement ivariance would be if one manifest variable would predict extraversion equally well at both time points
Comparisons are easiest when there’s measuerement invariance
If all loading and measurements between two (or mored) models are the same, it’s called total measurement invariance
- normally not the same

What assumptions does SEM make regarding measurement invariance and what can be said about this assumption?

Classical SEM perspective:
- Comparisons regarding latent variables (i.e. change up or down in the mean of the latent variable) are only possible when there’s measurement invariance across the variables of interese (i.e. time)
This assumption is not really true
i.e. Imagine studying extraversion over time, but some measures occurred during the pandemic lockdowns -- average responses to questions about party attendence etc could be substantially lower
- Doesn’t necessarily reflect changes in underlying extraversion
- Reflects a change in the measurement property, not in latent factor
Interpretation does however get harder when measuerement properties change
- meaning of latent factor is always determined by
  - measurement used
  - relation between measueres and latent factor

How can we test measueremtn invariance for a model in SEM?

Create a model where two latent factors have same measuerement parameters
Create a factor where measurement parameters can vary
Compare the simpel and the more restrictive models

What does this image represent?

Latent Variable of six people over 50 different time points
Black Dots: latent Variable
Latent Factor is constant across individuals (in terms of intercept and slope) and does not change over time
Residual errors and residual variance are not included
Manifest variable matches latent factor perfectly and is measured perfectly (without error), therefor it is not visible (“hidden” behind latent factor)

What does this image represent

Latent Variable of six people over 50 different time points
Black Dots: latent Variable
Latent Factor is constant across individuals (in terms of intercept and slope) and does not change over time
Residual errors and residual variance are now included
Manifest variable matches latent factor perfectly and is measured perfectly (without error), therefor it is not visible (“hidden” behind latent factor)

What does this image represent

Development of one latent variable over 50 time points for multiple individuals
Different intercepts and slopes which are likely correlated -> people with lower intercepts have bigger slope
Orange points: residual errors of manifest variable (/manifest variable + residual errors)

What does this image represent

development of one latent variable for six individuals measured with three different manifest variables (red, blue, orange)
The manifest variables have the same factor loading but different intercept/mean structure
- They all have approximately a factor loading of 1, but blue has a lower intercept and thus starts further down
- Factor loading deducted from the slope

What does the blue line represent here

Representation of the development of one latent variable over 50 time points for six individuals
Measured with three different manifest variables
Variables all have the same factor loading -> slope look alike and follow the latent factors slope -> probably 1
The blue line has less residual variance (0.1 instead of 1) -> indicator for better measurement for latent factor

Which aspects indicate that a manifest variable is a better indicator for a latent variable?

Lower residual variance
Higher factor loading

What does the blue line represent in this picture

Represented are the development of a latent variable for six individuals over 50 time points, measured with 3 manifest variables
The blue manifest variable has a higher factor loading -> if goes up more for each point in latent factor that increases than the latent factor -> this manifest variable is better
Expected value for the measurement is calculated as follows:
- latent variable x factor loading + intercept of manifest variable
- expected value = latent variable x factor loading + intercept
- factor loading = expercted value - intercept of manifest variable/ latent variable

What meaning does the scale of a latent variable have?

There is no true scale for a latent variable
Scale is simply set to represent things in relation to it
It does not change the relationship of the observed variables
this does not mean that the scale is menaingless - without a scaled latent factor, a change in 2 in a manifest variable could not be intrerpreted
- When the latent factor has a mean of 100 and SD of 15, 2 is meaningless
- When the latent factor has a mean of 0 and a SD of 1, a change of 2 is huge

Imagine you have to create a SEM model for how the IQ of patients changes at three different measurement points (piq_1, piq_2, piq_3). What meanings would the different components have?

model <-

‘i =~ 1*piq_1 + 1*piq_2 + 1*piq_3

s =~ 0*piq_1 + 1*piq_2 + 2*piq_3

s ~ smean * 1

i ~ imean * 1

-> specifying mean and scaling intercept and slope

piq_1 ~~ residualVar*piq_1

piq_2 ~~ residualVar*piq_2

piq_3 ~~ residualVar*piq_3

-> implies same residual variance

-> piq_1~~ piq_1 would imply free residual variance’

fit_piq <- lavaan(model, data = …)

summary(fit_piq)

Output
- Intercept and slope
- Variances

How are the requirements for latent factor models over time different and equal to the requirements for multiple latent factor models?

Equal: same requirements for estimation
- Fixing one variance/factor loading and one mean
  - Often variance 1, latent mean 0
Different: requirements for interpreteation
- Comparisons are easiest when none of the measurement parameters differ over time
  - If a measurement for extraversion from year 1 to year 2 changes in term of loading, it is hard to understand if a change was due to time or due to measurement

What is measurement invariance? How is it connected to SEM?

When none of the measurement parameters vary across factors (= measurement properties do not vary)
Classical SEM view: comparison regarding latent variables only possible when measurement properties are invariant across all variables of interest (including time) -> often not very realistic though, since the context also influences those things

How can measurement invariance be tested beteween models capturing change in time?

Create one model where parameters for both are fixed to be the same and a freer model where some or all parameters are allowed to vary across time
Compare restrictive and simpler model

What is the relationship between variance and standard deviation?

Var(x) = (SD(X))^2
SD(X) = Wurzel (Var(X))
- -> in SEM calculation of total variance
  - Wurzel von (Variance Factor x factor -> observed v.^2 + residual variance)) -> SD of observed variable

How can correlation between two observed variables be calculated in SEM?

Corr(x1, x2) = Cov(x1, x2)/SD(x1) x (SD(x2))
Explicitely, assuming x1 and x2 are being influenced by same latent factor
- calculate covariance:
  - Loading F -> x1 x Loading F -> x2 x Variance F
- Calculate SD
  - Wurzel(total variance)
  - Total variance = Variance F x Loading F -> x1 ^2 + Resiudal Variance

How can the proportion of explained vvriance be calculated?

R^2Y = 1 - (Var(Y) residual ( Var(y))

How can the implied means of a variable X in a SEM be calculated?

Loading F -> x1 x meanF + Intercept x

How can the variance of an observed variable be calculated if it is influenced by multiple latent factors?

Var(x) =
- LoadingL1 -> x^2 x VarianceL1 +
- LoadingL2 -> x^2 x VarianceL2 +
- 2x(CovL1, L2) x LoadingL1 -> x x LoadingL2 -> x^2
- + Residual Variance x
If L1 and L2 do not covary
- LoadingL1 -> x^2 x VarianceL1 +
- LoadingL2 -> x^2 x VarianceL2 +
- + Residual Variance x

How can RAM notification be used to draw SEM models?

Draw a rectangle for any observed variable
Draw a circle for every latent variable
Draw a triangle with a 1 inside for the constant (means)
For every element of the A matrix which is not zero, draw a single headed arrow
For every element of the S matrix which is not zero, draw a double headed arrow
- if it is the same variable -> double headed arrow starting and ending on same variable
For the m matrix, for every element that is not zero draw a path from the triangle to that variable

How is the implied covariance matrix different from RAM?

Smaller matrice sometimes used as alternative to RAM
Assumes that the observed variables are all indicators of the latent factor
- have no direct paths between themselves
- Have no covariances between themselves

Looking at this output, anwer the following questions:

are there obvious patterns in the data that the model is not capturing? What are these patterns, and what might they represent conceptually, i.e., with respect to the subjects and the research questions?
What aspect/s could we include in our model to represent the idea that initial perfor mance after coma is (possibly) related to the duration of the coma? And what if the recovery rate was related to the duration of the coma?
If such a feature were important, and we included it in the model, what would you expect to see happen with the residual correlations?

are there obvious patterns in the data that the model is not capturing? What are these patterns, and what might they represent conceptually, i.e., with respect to the subjects and the research questions?
- Covariance between observed variables is not being captured by model
- They might reptesent that a better performance at point 1 will go hand in hand with better performance at point 2
What aspect/s could we include in our model to represent the idea that initial perfor mance after coma is (possibly) related to the duration of the coma? And what if the recovery rate was related to the duration of the coma?
- we would need to include the duration variable, and model an effect from it the latent intercept
- This path would allow the initial performance IQ to be influenced by the duration of the coma
- We would do similar for the latent slope– make it depend on the duration of the coma.
  - i ~ imean * 1 + duration
    -> Introduces a relationship between the intercept and the durationdurationduration, meaning changes in durationdurationduration affect the initial level
  - s ~ smean * 1 + duration
  - duration ~ durationmean * 1
  - # Residual variances
    - piq_1 ~~ residualVar *piq_1
    - piq_2 ~~ residualVar *piq_2
    - piq_3 ~~ residualVar *piq_3
    - duration ~~ durationVar * duration
If such a feature were important, and we included it in the model, what would you expect to see happen with the residual correlations?
- Residual correlatoins would decrease since explained variance (by the latent factor.s) would increase

Based on this image, does adding effects of duration to this model make sense/make for a better model fit? Why or why not?

No it doesnt
Path coefficients of duration to slope/intercept are pretty much zero
If compared with other graph, residual variances have not decreased -> no (significant amount of residual variance has been explained by adding duration)

Fit a regression model where “piq” is predicted by duration and the interaction of duration with time

lm_piq <- lm(data = Wong,piq~duration+time*duration)

What can be said about time and causality in relation to SEM?

Not always so clear -> change in one thing can lead to change in another thing and so on
Mediating role of things often not direct, but rather happening at a significantly later time point
- Eg. the meidating role of exercise motivation on peoples fitness -> Intervening on motivation (A) will immediately impact daily exercise (B) but wont change fitness (C) one day later
We can however measure change in motivation and estimate how this translates to later change in fitness

What is state dependent change and how is it implemented in SEM?

State dependent change: how a variable changes may depend on its current past values or the current/past value of other variables
This is different from correlated intercept and slope, since correlated intercept and slope means that a lower intercept at point A leads to a constatly lower/higher slope, but not that a lower intercepts meas that the slope will become steeper (or that learning becomes faster or slower) as time progresses

What is represented here

State dependendence
Image 1:
- state dependent slope -> change in where the single individuals start and thus in how steep slope is at any given point
- All end up at same point: at lower skill level, skills increase faster and then come to a plateau
Image 2
- State dependence with individual differences in equilibrium
- Slope curve is similiar, but plateau at different level
- Might represent a learning curve by age (where at some point getting better is harder)

What does this image represent

Random change in latent variable
In reality, latent variables do not change smoothly and predictably most of the time
Two sources of noise here
- Residual (measuerement) error -> everything we are not interested in
- Random fluctuations in latent variable -> unpredictable changes in variable we are interested in

What are residuals and which different kinds of residuals exist?

Residuals in general
- Term residual generally used for what is left after a prediction
Raw data residuals
- Residual for each specific data point
  - i.e. how much noise around Day 3?
- Differene between prediction and actual data point = residual
  - Prediction is made using model implied means and covariances
- Each of those should be independent of all other raw data residuals
  - if not, it means tehre is information left in the residual that we could use to make better predictions (i.e. our model does not allow for covariance between two variables but there’s actually covariance there) and the model is misspecified
  - if one residual can be used to predict another -> bad
- Covariance/correlation matrix residuals
  - Not specifically about the raw data, but relates to higher level patterns in the data -> covaraiances/correlations
  - Appears if there is a difference between the sample data covariance matrix and the model implied covariance matrix
  - Observing non-zero residual covariance is one way we can know thet the “raw data residuals” will not be independent
- Residual variances
  - Amount of variance we expect to see in the observed variable after accounting for the predictors given based on all other element of the model
    - i.e. total variance - explained variance
  - it this is incorrect, the model is misspecified and inferences may be wrong

What can be said about modeling covariance between observed variables?

Explicitely modelling covariation between two observed variables is undesirable because it implies the latent factor cannot explain all the covariation of the data
But this is not a problem in regards to the assumptions of SEM
- Can be a way to actually avoid violating assumptions
- Pattern of variance and covariance in the model needs to match the model, otherwise the raw data residuals will not be indepenent

What is the issue with large n’s and what can be done about this?

With lots of data, even minor imperfection will be statistically significant (with the Chi seuared difference test)
There’s no single solution to this
- need to balance the importance of model imperfection and model simplicity/complexity depends on the context
  - For predictions in new scenario -> simpler model
  - For predictions in same context -> no need to understand model, choose best performer
  - For interpretation, sacrificing small amount of predictive power for the value of much simpler and more general concepts makes sense
- RMSEA as possible solution -> models significance while also not simply favouring more complex models when more data is available

What is RMSEA and what does it do?

Model fit index
Index which does not simply favour more complex model when more data is available (as the chi^2 index does) by
- including sample size in model fit calculations
- icnorporates a division by sample size -> makes it less sensitive to dataset size
Population focused: RMSEA estimates how well the model would perform in the population, not just the sample

How should RMSEA fit indices be interpreted?

RMSEA < 0.05
- good fit
- model closely apporximates the population covariance matrix
0.05 < RMSEA < 0.08
- acceptable fit
- Model reasonable but could be improved
RMSEA > .10
- Poor fit
- Model fails to adequately represent data

What are interactions in SEM and what other name exists for interactions?

Also called moderators
Interactions : when we want to know how a particular parameter (i.e. correlation between latent factors, mean of a slope or intercept factor etc.) might differ as a function of a covariate like age, gender, treatment etc

What is the difference between simple and complex interactions in SEM and how can they be included in a SEM model?

Simple interactions: When we want to examine whether a variable makes a difference in terms of mean or variance AND the interaction term is observed (therefore not a latent variable) we can include this in the model as a covariate
More complex interactions: when we want to know whether there is a difference in one or more
- factor loadings
- correlations
- covariances
- variances terms as a function of some variable

How are interactions in SEM different when the moderator is a discrete vs. a continuous variable?

Discrete variable:
- we can use mutlitple group SEM to estimate different models for each group, specified by interaction (grouping) variable
- Example: create a model where measurement works for young and old people the same way, then create model which makes a difference between the two -> compare two models
Continious variable
- SEM cannot handle contiuous interctions/moderators
- For the same reason SEM does not handle interaction between latent variables

How can you visualize group models in R for a SEM model?

Create the model first:

Goop <- ‘
Neuroticism =~ N1 + N2 + N3
Conscientiousness =~ C1 + C2 + C3
Neuroticism ~~ Neuroticism*1 (-> variance)
Conscientiousness ~~ Conscientiousness*1
Neuroticism ~~ Conscientiousness (-> structural relationship)

Residual variance

N1 ~~ N1

N2 ~~ N2

C1 ~~ C1 etc.

Mean structure

N1 ~ 1

N2 ~ 1

etc’

Fit the model separately
fit_males <- lavaan(goop, data = big5[big5$gender = ‘male’, ])
fit_females <- lavaan(goop, data = big5[big5$gender = ‘female’, ])
create path diagramm of fitted model
graphLayout <- matrix(c( ‘N1’, ‘N2’, ‘N3’,
‘con’, NA, ‘Neuro’,
‘C1’, ‘C2’, ‘C3’), byrow=TRUE, ncol = 3)
Fit data to path diagram
graphSEM(model = fit_males, layout = graphLayout)
graphSEM(model = fit_females, layout = graphLayout)
-> visual representation
summary(fit_males) for summary with standardization of estimates

How can we figure out with SEM whether theres a group difference in terms of an effect (assuming the groups are discretely divided and not contiuos)?

Create general model where effect of group can vary
Create model where groups are forced to have same relationship
model_restricted <-
Neuroticism =~ N1 + N2 + N3
Conscientiousness =~ C1 + C2 + C3
Neuroticism ~~ Neuroticism*1 (-> variance)
Conscientiousness ~~ Conscientiousness*1
Neuroticism ~~ c(corrNC, corrNC) * Conscientiousness
-> structural relatioship, both groups have the same correlatoin between Neuroticism and Cornscientiousness
-> for forced different corr: c(corMale, corrFemale) * Conscientousness

Residual variance
N1 ~~ N1
N2 ~~ N2
C1 ~~ C1 etc.
Mean structure
N1 ~ 1
N2 ~ 1
etc’
fitrestricted <- lavaan(modelfitrestricted, data = big_5, group=’gender’)
summary(fitrestricted)
Compare the two models
anova(fitunres, fitrestricted)
-> restricted has fewer free parameters, so if it is a significant effect the more complex (unrestricted) model is a better fit

What does a saturated model look like in R and what is the point of saturated models?

Saturated models are the most unrestricted models in a way, allowing for all kinds of correlations and covariations
Model must be constructed in a way that all variables are free to have the relationship they want
In R
saturated <- ‘
N1 ~~ N2 + N3 + C1 + C2 + C3
N2 ~~ N3 + C1 + C2 + C3
N3 ~~ C1 + C2 + C3
C1 ~~ C2 + C3
C2 ~~ C3
N1 ~~ N1
N2~~N2 etc.
N1 ~ 1
N2 ~ 1
N3 ~ 1 etc’
-> fit model
saturated models allow us to see if the fitted model really makes more sense than the free covariations -> if the saturated model is a better fit, we gotta inspect what went wrong (i.e. if we’ve not paid attention to a covariance etc.) and can try to build those missing relationships into our model
We could now extract the residuals from the fitted data to see what has been forgotten about

How can the residuals of a fitted model be extracted in r?

residualsgroup1 <- residuals(fit_group1, type = “corr”)
print(residualsgroup1$cov)

Which different regression syntax exist in R to incoroporate an interaction in a linear regression model, given

dataset = people
dependent variable = fear
independent variable 1 = bees
independent variable 2 = big noses

thehorrors <- lm(data = people, fear ~ bees*bignose) OR

thehorrors <- lm(data = people, fear ~ bees + bees*bignose) OR

thehorrors <- lm(data = people, fear ~ bignose + bees*bignose) OR

thehorrors <- lm(data = people, fear ~ bees + bignose + bees*bignose)

-> when we look at interactions we also automatically look at the main effects (i.e. its the influence of interaction of bees and bignoses on fear that we look at, but also automatically the effects of bignoses alone and bees alone on fear)

What is the meaning behind the regression syntax in R for Linear random effect model?

lme4(data = Wongm, piq ~ (1 I id)
-> 1 = intercept, constant which depends on id
lme4(data=Wong, piq~(subject I id)
same as lme4(data=Wong, piq~(1 + subject I id)
-> subject = (not constant) which depends on id -> slope, but also implicitly models 1 (constant) which depends on id -> intercept

What is autocorrelation? When does it happen?

Autocorrelation: score at first observarion is related to score at second observation etc.
Happens often in multiple observations of the same thing (i.e. nested data)
Individual differences in slope and intercept are one source of autocorrelation, but not the only one

What is the difference between auto- and cross correlation and which different kinds of auto- and crosscorrelations exist?

Autocorrelation: earlier values of one variable are correlated/can predict later varlues of that variable
- Positive: If at one point variable goes up, likely at next point variable goes up even more
- Negative (very rare and unlikely): if variable goes up at point 1, it goes down at point 2 etc.
Crosscorrelation: earlier values of one variable are correlated/can predict later values of different variable
- Positive Crosscorrelation: When variabne A goes up, variable B goes up shortly after (overlapping parallel lines)
- Negative Crosscorrelation: When variable A goes up, Variable B goes down shortly after (overlapping mirrored lines)
Relation between variables over time can be very informative

When is model comparison used?

Comparing competing theories
Extend theories
Check whether theoretical model matches data/observations in the wolrd

What does the chi^2 test do in terms of model comparison and how should it be interpreted? What other name is there for the chi^2 test?

Also likelihood test
Chi^2 used to compare two model fits
- one model is simple and one more complex
Gives probability of observing the difference in likelihood of the two model fits
Nullhypothesis/Assumption: simpler model is better fit/equally as good as more complex model
if p < .05 = only 5% probability that simpler model is as good as the more complex model (but technically there’s still a certain probability that it is a better fit) -> should lead to rejection of null hypothesis

What does the AIC do and how should it be itnerpreted?

Model comparison tool
Broadly applicable, not only for nested models (as the chi^2/likelihood test does)
Combines likelihood of a model with number of parameters
Interpretation
- Lower AIC = this model is expected to perform/predict better than the one with higher AIC
- Makes no claims about statistical significance though

What is the meaning of theories?

Inellectually:
- important for collective history
- allow new intellectual viewpoints
Practical
- Facilitates undertanding our surroundings and empirical phenomenons
- Help us predict and control phenomenons in our world

What is the role of theories in psychology?

Lack of strong theories in psychology -> could explain reproduction crisis
Lack of theory-construction
- Strong focus on theory-testing rather than theory creation
Toothbrush-problem: theories in psychology are mostly simply product of single individuals
Loyality to hypothetico-deductive methode: idea that scientific progress depends on repeated testing of theories

Why is the lack of strong theories in psychology an issue?

Danger of repeatedly reinventing the wheel
Lack of overview of existing theories
Lack of unterstanding of connection between phenomenons
Lack of understanding, if and which phenomenons come from same source
Without strong theories it is hard to create effective interventions
Without strong theories it is harder to create/operationalize studies

Which two different starting points of scientific methodology exist?

hypothetico-deductive science
- Putative theory which is repeatedly testes
Theory construction methodology (TCM)
- Starting point: set of relevant pheonmenas
- Endpoint: Theory which explains phenomenas

What are phenomenas and their role in science?

Stable and generally valid characteristics/features of the world
empirical generalizations
Science tries to explain phenomenas

What is data and how is it different than phenomenas?

Quite direct observations or reports about observations in the world
Distinct, related to specific investigative context
Have specific empirical patterns (while phenomenas have general empirical patterns)
Data is NOT/only indirectly explained by theories
- Theorie explain phenomenas which are made visible through data

What are theories?

Theories help explain phenomena
Set of linked statements
- at least one of the statements expresses a general principle

Why ways exist to evaluate the overall worth of a constructed theory?

hypothetico-deductive method: evaluating theory based on how well it can predict pheonmena
Other criterions -> Kuhns five features to evaluate a theory:
- accuracy (genauigkeit)
- consistency
- Goal
- simplicity
- productivity
Inference for the best explanation -> TCM

Which way of evaluating the overall worth of a constructed theory does TCM prefer?

Inference for the best explnation
- Theory of coherence of explanations
Three criterions:
- Explanatory breadth: number of phenomenas, which are explained through theory
- Criterion analogy: analogical thinking, repeated success
- Criterion of simplicity: prefers theories with fewer parameters

What is TCM?

Haigs abductive theory of method
- Scientific research subdivided into two categories
  - Discovering empirical phenomenas
  - Explaining pheonmenas through theories which exist to explain those pheonmeas
- Multicriterial perspective: theories have two purposes
  - Predictive
  - Explanatory

What is analogical abduction?

Join Course

Preview

Author

12DayFIsh

Information

Last changed
2 years ago

Report course