The reliability (𝛒tt) is the accuracy of a test

A test is reliable if the scores of a person do not differ substantially when tested repeatedly

If this holds true for all/most participants, we expect a high reliability of the test scores

Implication: We estimate reliability based on the variance of the observed test scores

Because the test score variance consists of a systematic (i.e., due to systematic differences among participants; differences in their ability/personality) and an unsystematic part (i.e., variance due to a random measurement error)

Reliability (illustration)

Reliability

(4+3)

A test's true score variance is not known, however, and

reliability must be estimated rather than calculated directly

There are several ways to estimate a test's reliability based on the implications of CTT

In general, reliability can be measured by the correlation between two strictly parallel tests

– Strictly parallel: Two tests with

(a) identical true scores

(b) same means

(c) same variances, and

(d) same error variances

Each method involves assessing the consistency of an examinee's scores over time (retest reliability), across different content samples (parallel test reliability), or across different items (internal consistency)

Reliabiltiy

Definiton (with formula)

Reliability Methods (3)

The interrelatedness of items measuring the same trait is computed (internal consistency)

used to assess the consistency of item responses across items within a test

Cronbach‘s alpha, McDonald’s Omega and others

The correlation of two (parallel) tests which measure the same latent variable (e.g., intelligence) is computed

A test is administered twice with a certain interval and the correlation is computed (retest reliability)

Reliability: Consistency

Situation: a single administration of one test form

Procedure: Divide test into comparable halves and correlate sum scores from both halves

Split Half with Spearman Brown adjustment

Kuder Richardson #20 and #21

Cronbach’sAlpha

Meaning: consistency of item responses across the parts of a measuring instrument (“parts” = individual items or subgroups of items)

Correct interpretation: Amount of systematic variance in test scores. NOT: Degree of homogeneity, whereby homogeneity = unidimensionality

A measure is said to be unidimensional if a single latent trait accounts for the correlations among a set of items

Retest Reliability

Situation: Same people taking two administrations of the same test

Procedure: Correlate scores on the two tests which yields the coefficient of stability

Meaning: the extent to which scores on a test can be generalized over different occasions (temporal stability).

Appropriate use: Information about the stability of the trait over time

– Stable rank-ordering of test takers

Parallel Test Reliability

Situation: Testing of same people on different but comparable/parallel forms of the test

Procedure: correlate the scores from the two tests which yields a coefficient of equivalence

Meaning: the consistency of responses to different item samples (where testing is immediate) and across occasions (where testing is delayed)

Providing information about the equivalence of forms

Last changed2 months ago