I get 6-10 email messages each day from my students. They’re often silly questions that even a detail-oriented professor doesn’t care about (“do I put the title of my paper before the abstract or after?”). Sometimes they ask really good questions, and other times they send me messages to tell me about some new technology or scientific discovery they read about. Yesterday, I got a message from a student saying, “hey, aren’t you glad we already turned in our final research papers so we can’t use this program that claims to be able to write a college-level paper in less than a second?”
Well, let me grade the paper you did turn in first, and then I’ll answer the question.
I’m dubious of a computer’s ability to write a “near-perfect” paper, just like I’m dubious of any actual person’s ability to do it, myself included. But that skepticism only galvanizes scientists, including MIT’s former director of writing, Les Perelman, who recently created a program he calls the “Babel Generator,” which delivers an entire college essay after the student inputs a few key words. Thankfully, Perelman isn’t suggesting that the program can write a sophisticated, substantive paper—but he is suggesting that it can write the type of paper that can score well if graded automatically.
As much as the idea of getting some grading help from computers excites me, it’s also terrifying. For one thing, I don’t want to be put out of work by a machine. I’d love something to make the onerous process of grading 20-page research papers easier, but not something that invalidates a human response to a complex assignment. But, the world being what it is, of course there are automated grading systems, especially for tests like the SAT. This is what Perelman has been studying for a while, noting that such systems actually encourage poor writing skills in students, largely because the systems reward length over any other trait. The College Board Website provides 23 graded essays as examples, as well as an additional 16 to help graders calibrate their own scoring. The shorter essays, composed of about 100 words or so, almost always got a 1, the lowest grade, while the 400-word essays got 6s, the highest grade. Often, those longer essays had errors of fact as well as grammar, but it didn’t seem to matter.
Perelman argues that automatic grading systems “are not measuring any of the real constructs that have to do with writing.” They don’t know facts, and they don’t know if what they’re look at is word salad or whether it has any actual meaning. The Babel Generator is latest attempt to demonstrate the shortcomings of these programs. At the click of a mouse, it uses a keyword to produce sentences stuffed to the brim with million-dollar words and complex sentences that ultimately are as empty as an old swimming pool. But those essays constantly receive high scores—usually somewhere around a 5.5 out of 6—via the computerized assessment system. Point taken.
Still, computer scientists are working on ways to create grading systems that reflect individual preferences and tendencies, building on dozens of professor-graded samples. Something tells me it’ll be awhile before computers can grade with my level of sarcasm, but it’d be fun to see them try.