the study’s major finding states only that “the results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items.” A paragraph on p. 21 reiterates the same thing: “By and large, the scoring engines did a good [job] of replicating the mean scores for all of the data sets.” In other words, all this hoopla about a study Tom Vander Ark calls “groundbreaking” is based on a final conclusion saying only that automated essay scoring engines are able to spew out a number that “by and large” might be “similar” to what a bored, over-worked, under-paid, possibly-underqualified, temporarily-employed human scorer skimming through an essay every two minutes might also spew out. I ask you, has there ever been a lower bar?

Todd Farley via