CTT Reliability

by Emily P.

The reliability (𝛒tt) is the accuracy of a test

A test is reliable if the scores of a person do not differ substantially when tested repeatedly
If this holds true for all/most participants, we expect a high reliability of the test scores
- Implication: We estimate reliability based on the variance of the observed test scores
  - Because the test score variance consists of a systematic (i.e., due to systematic differences among participants; differences in their ability/personality) and an unsystematic part (i.e., variance due to a random measurement error)

Reliability (illustration)

Reliability

(4+3)

A test's true score variance is not known, however, and
reliability must be estimated rather than calculated directly
There are several ways to estimate a test's reliability based on the implications of CTT
In general, reliability can be measured by the correlation between two strictly parallel tests
– Strictly parallel: Two tests with
- (a) identical true scores
- (b) same means
- (c) same variances, and
- (d) same error variances
Each method involves assessing the consistency of an examinee's scores over time (retest reliability), across different content samples (parallel test reliability), or across different items (internal consistency)

Reliabiltiy

Definiton (with formula)

Reliability Methods (3)

The interrelatedness of items measuring the same trait is computed (internal consistency)
- used to assess the consistency of item responses across items within a test
- Cronbach‘s alpha, McDonald’s Omega and others
The correlation of two (parallel) tests which measure the same latent variable (e.g., intelligence) is computed
A test is administered twice with a certain interval and the correlation is computed (retest reliability)

Reliability: Consistency

Situation: a single administration of one test form
Procedure: Divide test into comparable halves and correlate sum scores from both halves
- Split Half with Spearman Brown adjustment
- Kuder Richardson #20 and #21
- Cronbach’sAlpha
Meaning: consistency of item responses across the parts of a measuring instrument (“parts” = individual items or subgroups of items)
Correct interpretation: Amount of systematic variance in test scores. NOT: Degree of homogeneity, whereby homogeneity = unidimensionality
- A measure is said to be unidimensional if a single latent trait accounts for the correlations among a set of items

Retest Reliability

Situation: Same people taking two administrations of the same test
Procedure: Correlate scores on the two tests which yields the coefficient of stability
Meaning: the extent to which scores on a test can be generalized over different occasions (temporal stability).
Appropriate use: Information about the stability of the trait over time
– Stable rank-ordering of test takers

Parallel Test Reliability

Situation: Testing of same people on different but comparable/parallel forms of the test
Procedure: correlate the scores from the two tests which yields a coefficient of equivalence
Meaning: the consistency of responses to different item samples (where testing is immediate) and across occasions (where testing is delayed)
- Providing information about the equivalence of forms

Last changed
a year ago