Who Grades Our Children's Tests?
I just caught this great Times op-ed from Sunday on standardized test scoring, by Todd Farley, who has published an exposé book on the subject. He writes:
For one project our huge group spent weeks scoring ninth-grade movie reviews, each of us reading approximately 30 essays an hour (yes, one every two minutes), for eight hours a day, five days a week. At one point the woman beside me asked my opinion about the essay she was reading, a review of the X-rated movie “Debbie Does Dallas.” The woman thought it deserved a 3 (on a 6-point scale), but she settled on that only after weighing the student’s strong writing skills against the “inappropriate” subject matter. I argued the essay should be given a 6, as the comprehensive analysis of the movie was artfully written and also made me laugh my head off.All of the 100 or so scorers in the room soon became embroiled in the debate. Eventually we came to the “consensus” that the essay deserved a 6 (“genius”), or 4 (well-written but “naughty”), or a zero (“filth”). The essay was ultimately given a zero.
I'm cautiously enthusiastic about the National Governors' Association-led effort to move toward national education standards. (Even though the first draft of the English/Language Arts standards was maddeningly vague.) But with national standards will come national standardized tests, so it's an especially good time to rethink how these exams are scored, and by whom. Perhaps teachers and principals should be scoring tests, not $8 an hour part-timers. In that case it would be important, especially with the push for merit pay, to make sure teachers aren't grading their own students' tests, to decrease the temptation to engage in foul play.
The fact of the matter is, scoring isn't fun, and it's not something teachers will want to spend more time doing. In some countries with great schools, like Finland, detailed national educational standards have not led to a major focus on standardized tests. For a variety of reasons, that doesn't look like the likely outcome in the U.S. So I only hope that we learn from past testing mistakes and create a more consistent, humane system, driven by deep respect for the critical thinking and writing skills necessary for success in higher education and on the job market.
--Dana Goldstein
Feeds: 


COMMENTS (3)
It is mind numbing work but agreement rates are actually not that bad. The company I worked for had some or all of the essays read twice (depending on the contract with the state) and gave us much statistical feedback including agreement rates of around 75%. A few people who couldn't achieve that had extra training or were let go.
Posted by: unemployed teacher | September 28, 2009 11:25 PM
I graded standardized tests one summer several years ago, and a lot of my co-workers were teachers trying to make some extra money on their summer breaks. Of course, that doesn't mean they're experts in the areas being tested - I spent 4 weeks grading a language arts exam while sitting between a choir director and a social studies teacher. I haven't read Farley's book yet, but I don't think that the quality of the graders is the biggest problem with standardized testing. I'd really like to see if he goes into any detail on the costs involved when states contract with private testing companies to administer and grade these exams, especially over the past 10 years or so. That's probably the real scandal, and anyone who argues that more money should be going to teacher salaries and school building repairs is going to have to face down state DOEs that are already paying for comprehensive statewide standardized testing.
Posted by: eric l | September 29, 2009 1:06 AM
Mr. Farley’s book is both important and entertaining. But it is far from authoritative because, to his own credit, Mr. Farley knows that he doesn’t know much about scoring student work.
You mention in your blog entry that…
“Perhaps teachers and principals should be scoring tests, not $8 an hour part-timers.”
There’s over 100 years of research on this issue and educators have been shown to be the LEAST ACCURATE group of people in judging student work. The research is summarized in Robert Marzano’s “Transforming Classroom Grading”.
If we want better accuracy in scoring, the best approach is as follows:
1. Hire intelligent people from the general public and pay them a fair wage.
2. Give them more time to score thoughtfully.
3. Provide good training with anchor paper examples at all score points.
4. Check at least once a day for inter-rater reliability and then re-train individuals as necessary.
5. Get as many people as possible to score the same piece of work. (Multi-rater feedback with two to four people usually does the trick in most cases.)
All assessments of student work can be expressed as TRUTH + ERROR. Sometimes the ERROR is a little high; sometimes it’s a little low. The greater the number of assessments we have, the more the ERRORs cancel each other out. This is sometimes known as “The Wisdom of Crowds” theory and it works for just about everything.
When I have been in charge of largescale scorings, like Mr. Farley was, these are the techniques I have used. Accuracy and fairness have never been in question because there’s good science available – all we have to do is read it and use it.
Steve Peha
President, Teaching That Makes Sense
P.S. Another simple check on the system would be to let teachers keep copies of their kids’ tests and then check student scores against student work themselves. This would not only catch the few mistakes that might sneak through, it would give teachers an even better sense of how standards are interpreted through testing.
Posted by: Steve Peha | September 30, 2009 3:50 PM