Abstract :
This paper reports a comparison of human and computer marking of approximately
600 essays produced by 11-year-olds in the UK. Each essay script was
scored by three human markers. Scripts were also scored by the e-rater
program. Therewas a good agreement between human and machine marking.
Scripts with highly discrepant scores were flagged and assessed blind by expert
markers for characteristics considered likely to produce human–machine discrepancies.
As hypothesised, essays marked higher by humans exhibited more
abstract qualities such as interest and relevance, while there was little, if any,
difference on more mechanical factors such as paragraph demarcation.