Connecting test scores to teacher evaluations: Why not?

Mike Wiser at The Quad-City Times reported today on the controversy here in Iowa around connecting student test scores to teacher evaluations (aka ‘value-added modeling’ or ‘VAM’). Last week I shared the research and prevailing opinion of scholars supporting why this should not be done.

In the article, notes that ‘teacher accountability has to be be part of it, or it’s not reform.’ This is consonant with policymakers’ general willingness to ignore the rating volatility concerns associated with VAM. As Amrein-Beardsley, et al. (2013) noted:

Policymakers have come to accept VAM as an objective, reliable, and valid measure of teacher quality. At the same time, [they ignore] the technical and methodological issues.

There appears to be a blind faith by many legislators in the objectivity of VAM, even though the actual data show that there is extremely high volatility in teacher ratings from year to year. Somehow policymakers are able to dismiss that rating instability as unimportant, even though it has tremendous impacts on teachers’ lives and reputations and public faith in the educational system. When Teachers of the Year are being rated ‘unsatisfactory’ by VAM systems, parents are rightfully suspicious. When high-achieving schools are rated as ‘needing improvement’, the public rightfully suspects that something’s not right. It’s important to note that legislators are not asking other professions to accept evaluation schemes in which 30 to 50 percent (or more) of their ratings fluctuate widely and completely randomly.

Of greater concern to me, however, is the response of Tom Narak, lobbyist for the School Administrators of Iowa (SAI). SAI represents all of the principals and superintendents in the state and is supposed to be knowledgeable about educational research and policy. Yet Mr. Narak says about VAM, “Why wouldn’t you? It’s the way (evaluations) are going now.”

Well, Mr. Narak, here are a few big reasons why we wouldn’t:

  • Because year-to-year ratings for teachers are randomly varying 30%, 40%, 50%, or even higher [Di Carlo; Economic Policy Institute; Baker; National Education Policy Center]. In other words, extremely high percentages of teachers’ evaluations have absolutely nothing to do with their actual performance. As lobbyist for the administrators responsible for evaluating teachers, this should be alarming to you, not dismissed out-of-hand. Do you want principals and superintendents to send the message to their teaching staffs that they don’t care if evaluations are fair?
  • Because even when student test scores are averaged over 3 to 5 years, random variation in teacher ratings still results in over 25% to 48% of teachers being rated inaccurately [U.S. Department of Education; Di Carlo]. In other words, when it comes to rating instability, looking over a longer time frame helps some but not a lot.
  • Because the National Research Councilthe National Academy of Education, the American Educational Research Association, RAND, the Annenberg Institute for School Reform at Brown University, the National Education Policy Center, James Popham, Gerald Bracey, Robert Linn, and many, many other of our most-respected scholars and research organizations all have looked at the research and said vehemently that we shouldn’t. In short, it’s a “who’s who” of educational policy and research – the folks we have trusted to inform us on policy decisions – all unilaterally aligned against VAM systems because of their volatility and unfairness.
  • Because when VAM systems are implemented, predictably ludicrous and harmful results occur. These policy decisions have real consequences for our teachers for whom we supposedly have such great respect.
  • Because even if we could devise a fair VAM system (which right now no one seems to be able to do), research shows consistently that the contribution of teachers to overall student test scores is 10% to 15% at most. The rest is attributable to other school factors or non-school factors. Any VAM system that imputes greater teacher responsibility than that small percentage would be highly unethical.
  • Because holding teachers ‘accountable’ for random variation and/or factors outside of their control violates both the equal protection and due process rights due teachers under the U.S. Constitution.

If Mr. Narak and SAI are going to take a policy position on teacher evaluation, they should be up on the research I cited last week. In fact, on April 21 I e-mailed Mr. Narak the research noted above. Apparently, like many legislators, he and SAI don’t seem to care that the teacher evaluation systems for which they’re expressing support are inherently unfair and probably illegal? Would they feel the same if we were talking about the principals and superintendents whom they represent?

“Dear principal, 33% of your year-to-year evaluation will be completely random. Even though what you did this year isn’t substantially different from what you did last year, you may end up being rated highly or you may be rated near the bottom. Despite the extreme rating instability, there will be real consequences for you depending on the results. Good luck.”

Our teachers deserve evaluation systems that are fair. If they’re not fair, they’re unethical. If they’re not fair, they’re illegal. And right now, despite their intuitive appeal and legislative popularity in certain circles, VAM systems are unable to meet the basic principle of fairness and thus should not be supported by SAI or any other knowledgeable educational organization or policymaker.

[I’ll also note as an aside that some states are starting to talk about evaluating administrators based on student test scores. If we are rightfully concerned about volatility in teacher ratings, wait until we remove the connection to students one additional step and try to tie scores to administrators. In other words, SAI, be careful for what you advocate because the principals and superintendents you represent are next…]

Finally, I’ll close with a plea to Jason Glass, Director of the Iowa Department of Education (DE), to publicly release the research that he has which supposedly supports VAM. Over the past months Jason has said repeatedly that DE and the Governor were not advocating for VAM approaches. And yet, here at the end of the legislative session, we somehow find ourselves discussing VAM systems and both DE and the Governor are supporting them. Whatever research Jason has, it’s going to somehow have to address the concerns noted above. Given that leading scholars and our most respected educational research/policy organizations are familiar with and have summarized the literature base and yet still strongly advocate against VAM, I’m skeptical. But, hey, maybe he’s got a bunch of dispositive studies with which both I and they are unfamiliar…


I recognize that this post likely is going to make me unpopular with SAI (and even more unpopular than I already am with DE), which I regret because I’ve had good relations with them for a long time. But when the weight of the evidence is overwhelmingly against the policy position for which they’re advocating, I can’t just sit by and say nothing, not when it has very real, negative consequences for Iowa educators. John Ewing, President of Math for America, notes:

Of course we should hold teachers accountable, but this does not mean we have to pretend that mathematical models can do something they cannot.

I’ll state emphatically that we absolutely, under any circumstances, shouldn’t pretend that mathematical imprecision in evaluative processes has no impact on teachers’ lives and the fairness of our educational systems.

As always, I await your thoughts…

11 Responses to “Connecting test scores to teacher evaluations: Why not?”

  1. Mr. Narak reminds me of this pointy-haired administrator in today’s Dilbert comic:

  2. Hi Scott,

    Teachers should absolutely be judged based on student test performance. But only after meeting Michele Kerr’s conditions:

    (1) Teachers be assessed based on only those students with 90 percent or higher attendance.
    (2) Teachers be allowed to remove disruptive students from their classroom on a day-to-day basis.
    (3) Students who don’t achieve “basic” proficiency in a state test be prohibited from moving forward to the next class in the progression.
    (4) That teachers be assessed on student improvement, not an absolute standard — the so-called value-added assessment.


  3. @Doug: You can’t correct for external factors without a lot more time and money than student test performance. Certainly not for every teacher and student, every year. There are also a number of ways to game the system, from all sides.

    Taking out the human judgement precludes adjustments for proven experience, potential, and growth. It becomes a quota system – some will never achieve the minimum, despite excellent performance, while others will max out the scale with little effort, and not bother to aim higher.

    These are people issues. When people are reduced to numbers, it no longer adds up.


  4. Hi Scott,

    Since you referenced me I am sharing the note that I emailed to you. I do have a tremendous amount of confidence in the teachers and administrators in Iowa’s schools. I believe they could help resolve this issue in a positive way.
    School funding for the coming school year is being held up until the state policy is decided, and it is very important that both political parties find some compromise that makes sense or the results could be devastating to the students, teachers, and administrators in many of our schools.
    Here is my email response to your email:

    I believe that if Iowa’s teachers and administrators had the opportunity to help develop a fair and appropriate way to include student learning indicators in teacher and administrator evaluations, it can be done well.

    It has the potential to be much better than what has happened in other states with the concerns on which you are focusing.

    With the requirement for a waiver from NCLB to have this component, we cannot ignore the importance of removing the burdens of that ineffective federal mandate.

    Tom Narak
    Government Relations Director

    • Thank you, Tom, for interacting here in this space. I am including below my response to your above email so that my readers can see the full exchange that we had…

      With due respect, Tom, this is a pretty blind faith – and non-evidence-based belief – that we somehow ‘can do better,’ isn’t it? Do we believe that our educators, researchers, and policymakers are smarter and more capable than those in other states? I absolutely adore our folks here in Iowa but I’m not willing to fool myself that we live in Lake Wobegon…

      The fact that numerous other states/researchers have tried to incorporate student learning outcomes (aka ‘student test scores’) into educator evaluations but have failed miserably should be of great concern to us. Until we can devise stable rating systems in which volatility and resultant unfairness are greatly reduced, it’s unethical and inappropriate to devise and advocate for such legislation. Given the resultant harmful impacts on teachers, this should be more alarming to legislators and SAI than it apparently is. As I noted in my research summary, at best the research supports carefully-designed, low-stakes pilot projects.

      And all of this leaves aside the issue of whether NCLB waiver requirements should be driving our state educational policy…

      All my best, SCOTT

      [I’ll conclude here by noting that many of us do NOT think that NCLB waiver requirements should be driving Iowa educational policy or SAI legislative advocacy. As someone said to me recently, SAI should not be ‘willing to craft bad policy in order to meet bad policy.’]

  5. Randy Richardson Reply April 28, 2013 at 9:20 pm

    Scott has already done a great job of referencing the research so I’ll avoid rehashing that information. I spent 19 years in the classroom and still maintain close contact with many of my former students. One of those students said that the greatest gift I gave him was the ability to think. Another told me that I was his favorite teacher because I could call every student by name. None of those things show up on standardized tests. I’m sure my former administrators would tell you what a pain I was since I served as the chief negotiator for the local association for many years and since I required the district to develop a policy on the teaching of controversial issues, but my kids did learn. I also worked as a coordinator for talented and gifted programs and took time to work with Talent Searck kids to teach them how to take the ACT.mynpoint is that the vast majority of the things I did that were positive for students weren’t reflected on standardized tests. I don’t know of a single effective teacher who would disagree with this. I do know that some administrators want the state to get a waiver for NCLB. I don’t understand why we would pursue bad policy in our evaluations just to get federal approval that would result in waiving the restrictions created by NCLB.

  6. I read the article and the reference to the bell curve is one of the items in the article that concerned me. “In an ideal system, teacher effectiveness based on student scores across a school would look like a bell curve, with 20 percent or so being highly effective, 65 percent being effective and a tail end of 15 percent of the teachers being deemed not as effective. That bell curve could then be used to help administrators evaluate their teaching corps.” That is not a valid use of the bell curve. The number of teachers within a school is too small a data set to distribute over a bell curve.

  7. I blogged about the articled shared in the QC Times that has these topics mentioned. Thank you for standing up to the issues. Here are my two cents worth not that it means much

  8. Scott,

    I just wanted to thank you for pulling all this together. We have equally confident legislators pushing for equally unreliable systems for teacher evaluation in North Carolina — and while I’ve always felt like VAM was unfair, I never had easy access to the research to prove it.

    I’ll be sharing this widely.

    BTW: Just posted a bit on how high stakes testing and VAM scores is going to change the work I do with students:

    That’s a completely DIFFERENT reason that these kinds of policies are bad for education — in our attempt to “make the grade,” however impossible that is, teachers are changing their instruction in a really ugly way.

    And that includes me.



  9. Can you tell me where the states have come up with the 50% of your evaluation comes from student performance scores on State mandated tests? Is it in RttT or was it from Alec?

    • Carrie, I don’t know. Sorry. I am pretty sure that a fixed percentage is not specified in Race to the Top requirements. I’m not sure if ALEC is advocating a specific percentage…

Leave a Reply