Item difficulty

Definition

Interpretation

optimal difficulty

Mean item response score

p low (< .20): difficult test item

p moderate (.20 - .80): moderately difficult

p high (> .80): easy item

=> Variance / SD is maximized at p = 0.5 for dichotomous items

Item discrimination

Linear relationship of item to total scale score (scale score: mean item responses per participant or sum score)

Correlation between item score and total score on test

– Total score: Sum score or mean response on all items per test taker

Item discrimination answers the question: Does the item differentiate among test takers varying in their ability / personality trait?

Item Discrimination

Why show some items high discrimination and others show low discrimination?

Consider the following mathematical ability items

How many quarters are in three bushels? a) 12 b) 24

What is 10 times 10 ? a)10 b)100

Both items require the ability to perform multiplication

The first item, however, also requires knowledge of what a bushel is. This kind of knowledge is irrelevant to math ability and therefore induces error variance in item responses

Consequently, this item would have a low item discrimination as it is only weakly related to math ability

Dichotomous items

Rating scales

Point-biserial correlation (i.e., correlation between a dichotomous and a continuous variable)

Pearson correlation

Positive values closer to 1 are desirable

– What do negative discriminations imply?

Check scoring key —> reverse-coded?

Item-total correlations are directly related to reliability

– Becausethemoreeachitemcorrelateswiththetestasawhole, the higher all items correlate with each other

Part-whole corrected item discriminations

Item discriminations tend to be spuriously inflated (biased) because each item is correlated with the test of which that item is a part —> The correlation is partly because of the correlation of the item with itself

This is why we usually interpret part-whole corrected item discriminations (based on correction formulas)

Point-biserial correlation

Formula

Explanation

At what value of p is r pbis maximized?

Relationship between p-value (difficulty) and item discrimination

—> Not unlikely to see item discrimination values < .30 for very hard or easy items! These low item discriminations canbea mathematical artifact

Guidelines Summary (3)

Consider dropping or revising items with discriminations lower than .30

But be careful: Low item discrimination can be due to the mathematical artifact!

Not unlikely to see item discrimination values < .30 for very hard or easy items!

Not advised to put too much emphasis on maximizing Cronbach’s alpha (i.e., omitting items from the test to maximize alpha)

Binary [0, 1]

The proportion of people who answered the item correctly (p)

Used with dichotomously scored items – Correct Answer – score = 1 – Incorrect Answer – score = 0

Item difficulty a.k.a. p-value (but not to be confused with the p-value of significance tests!)

• Dichotomous items

Mean=p • Example with n = 5: p = (1+1+0+1+0)/5 = .6

Var(X) = p*q, where q = 1-p

Interpretation of R output

item difficulty

Variability of item scores

Sample size

Guidelines

Should we only choose items of p = .50?

Not necessarily ...

1) When wanting to screen the very top group of applicants (i.e., admission to university or medical school).

=> Cutoffs may be much higher (e.g., p <= .20)

2) Other institutions want a minimum level (i.e., minimum reading level)

=> Cutoffs may be much lower (e.g., p >= .80)

Item Difficulty

Guidelines (4)

High p-values, item is easy; low p-values, item is hard

If p-value = 1 (or 0), everyone answering question correctly (or incorrectly) and there will be no variability in item scores

If p-value too low, item is too difficult, needs revision or perhaps test is too long (i.e., not all participants could complete the test)

Good to have a mixture of difficulty in items on test

Rating scale items

The mean of the item responses on the rating scale

Example: Rating scale items with 5-point Likert-scale: “Strongly disagree”(1) to “Strongly agree”(5)

Last changed16 days ago