Item Analysis

Buffl

Diagnostik (LMU P3.2)

by Emily P.

Item difficulty

Definition
Interpretation
optimal difficulty

Mean item response score

p low (< .20): difficult test item
p moderate (.20 - .80): moderately difficult
p high (> .80): easy item

=> Variance / SD is maximized at p = 0.5 for dichotomous items

Item discrimination

Linear relationship of item to total scale score (scale score: mean item responses per participant or sum score)

Correlation between item score and total score on test
– Total score: Sum score or mean response on all items per test taker
Item discrimination answers the question: Does the item differentiate among test takers varying in their ability / personality trait?

Item Discrimination

Why show some items high discrimination and others show low discrimination?

Consider the following mathematical ability items
- How many quarters are in three bushels? a) 12 b) 24
- What is 10 times 10 ? a)10 b)100
Both items require the ability to perform multiplication
The first item, however, also requires knowledge of what a bushel is. This kind of knowledge is irrelevant to math ability and therefore induces error variance in item responses
Consequently, this item would have a low item discrimination as it is only weakly related to math ability

Item Discrimination

Dichotomous items
Rating scales

Dichotomous items

Point-biserial correlation (i.e., correlation between a dichotomous and a continuous variable)

Rating scales

Pearson correlation
Positive values closer to 1 are desirable
– What do negative discriminations imply?
- Check scoring key —> reverse-coded?
Item-total correlations are directly related to reliability
– Becausethemoreeachitemcorrelateswiththetestasawhole, the higher all items correlate with each other

Item Discrimination

Part-whole corrected item discriminations

Item discriminations tend to be spuriously inflated (biased) because each item is correlated with the test of which that item is a part —> The correlation is partly because of the correlation of the item with itself
This is why we usually interpret part-whole corrected item discriminations (based on correction formulas)

Item discrimination

Point-biserial correlation

Formula
Explanation
At what value of p is r pbis maximized?

Relationship between p-value (difficulty) and item discrimination

—> Not unlikely to see item discrimination values < .30 for very hard or easy items! These low item discriminations canbea mathematical artifact

Item discrimination

Guidelines Summary (3)

Consider dropping or revising items with discriminations lower than .30
But be careful: Low item discrimination can be due to the mathematical artifact!
- Not unlikely to see item discrimination values < .30 for very hard or easy items!
Not advised to put too much emphasis on maximizing Cronbach’s alpha (i.e., omitting items from the test to maximize alpha)

Dichotomous items

Binary [0, 1]

The proportion of people who answered the item correctly (p)
- Used with dichotomously scored items – Correct Answer – score = 1 – Incorrect Answer – score = 0
Item difficulty a.k.a. p-value (but not to be confused with the p-value of significance tests!)
• Dichotomous items
- Mean=p • Example with n = 5: p = (1+1+0+1+0)/5 = .6
- Var(X) = p*q, where q = 1-p

Interpretation of R output

item difficulty
Variability of item scores
Sample size

Item difficulty

Guidelines

Should we only choose items of p = .50?

Not necessarily ...

1) When wanting to screen the very top group of applicants (i.e., admission to university or medical school).

=> Cutoffs may be much higher (e.g., p <= .20)

2) Other institutions want a minimum level (i.e., minimum reading level)

=> Cutoffs may be much lower (e.g., p >= .80)

Item Difficulty

Guidelines (4)

High p-values, item is easy; low p-values, item is hard
If p-value = 1 (or 0), everyone answering question correctly (or incorrectly) and there will be no variability in item scores
If p-value too low, item is too difficult, needs revision or perhaps test is too long (i.e., not all participants could complete the test)
Good to have a mixture of difficulty in items on test

Item Difficulty

Rating scale items

The mean of the item responses on the rating scale
Example: Rating scale items with 5-point Likert-scale: “Strongly disagree”(1) to “Strongly agree”(5)

Join Course

Preview

Author

Emily P.

Information

Last changed
a year ago

Report course