What do I need a test theory for?
These tests and questionnaires aren‘t flawless
Measurement error
Must be taken into account when interpreting test results
We need a quantification of measurement error
CTT is mainly about the
quantification of error
systematic variance
test reliability
based on observed scores of participants
CTT – Axioms
Systematic errors
Systematic errors are due to identified causes and can, in principle, be eliminated. They cause bias in scores: Bias affects all scores the same way, pushing them in the same direction
Systematic errors are typical attributes of the person or the exam that would occur across administrations, i.e., are replicable as they influence a person’s score in the same way at every repeated test administration
Example: Exhaustive item with excess verbiage that asks a simple math problem and the simple math problem is what is intended to be measured, not the candidate’s ability to sort through the verbiage
Rater severity or leniency effects: Under-/ over estimating students‘ performance on essay exams
Random errors
Random errors are positive and negative fluctuations that cause about one-half of the measurements to be too high and one-half to be too low. Sources of random errors cannot always be identified. Errors are effects of lots of little random causes, all nearly independent of each other and of the variable being measured.
=> These errors would not occur across administrations (vary between measurements), i.e., are not replicable and therefore not predictable
Examples: Random response errors (e.g.,momentary distractions, variationin reaction time etc.), transcription errors, transient or temporal effects, e.g., poor performance on a cognitive assessment due to fatigue on a particular day
=> not replicable although its source is systematic
Confusion between systematic and random errors
To avoid a misunderstanding: The source of random errors can be systematic (e.g., fatigue, mood, unattentiveness, noise etc.).
However, as long as these sources of errors do not occur a) across test administrations (repeated measurement case), or b) vary across participants (some are affected by fatigue, some aren‘t, others got a motivational boost resulting in an unexpectedly good performance, other show text anxiety, some don‘t, etc., etc.; single study case), that is, are not replicable across a) measurement occasions or b) participants, a mathematical random error model will work
So, ”random“ refers to how the error behaves in the long run or across participants (predictability/replicability matters), not to the source of error
Sources of Error: Conditions of Test Administration and Construction
Changes in time limits
Changes in directions
Different scoring procedures
Interrupted testing session
Qualities of test administrator
Time test is taken
Sampling of items
Ambiguity in wording of items/questions
Ambiguous directions
Climate of test situation (heating, light, ventilation, etc)
Differences in observers
Sources of Error: Conditions of the Person Taking the Test
Reaction to specific items
Health
Motivation
Mood
Fatigue
Luck
Memory and/or attention fluctuations
Attitudes
Test-taking skills (test-wiseness)
Ability to understand instructions
Anxiety
A person‘s true score: A key element of CTT
In a perfect world, a person’s “true score” is an accurate measure of their abilities or skills
Definition of the true score: Expected value (i.e., informally the theoretical mean) a person would get from taking an infinite number of equivalent forms of a test under identical conditions
-> True score: Expected value of the observed test scores
Note: True score is error-free because random errors average out
True Score (Definition)
Expected value (= theoretical mean) of observed scores obtained over a theoretically infinite number of repeated trials with the same test or parallel tests (i.e., tests with with (a) identical true scores, (b) identical means, (c) identical variances, and (d) identical error variances). Notice that the true score is a constant (i.e., does not vary)
Basic axioms
Case A
Case B
Case (a): The expected value (= theoretical mean) of the random measurement error is equal to zero for a population measured with a parallel test j (with j = 1,...,m)
– Random error influences affecting individuals of a population average out across individuals
Case (b): The expected value of the random measurement error is equal to zero for infinite measures for just one person (i.e., observational unit U) with the same test j.
– Random error influences affecting an individual across repeated test applications average out across measurement occasions
Implications concerning correlations (rho) of true scores and error scores
You may say that the implications are unrealistic, detached from reality
It should be stressed that it is possible to develop factor-analytic models of psychological tests from classical test theory
These assumptions are therefore empirically be falsifiable (i.e., need not to be true)
Last changed3 months ago