Student achievement and teacher evaluations: The math doesn’t add up?

1+1=3

Many states are applying for NCLB waivers. But they are adopting teacher evaluation criteria that are statistically, professionally, and morally inappropriate. As Education Week notes:

Federal officials say they have generally approved systems in which student growth counts for between 20 percent and 50 percent of a teacher’s evaluation. But also acceptable is a “trigger” mechanism, like one in Arkansas, where a teacher can’t be rated as effective if he or she fails to meet expectations for student growth.

Another acceptable method is a matrix system, like one in Massachusetts, in which student growth doesn’t receive a specific weighting but is coupled with other measures, such as unannounced teacher observations.

This would be fine if 20% to 50% (or more) of student achievement could be attributed to teachers. But decades of peer-reviewed research show that teachers are responsible for 10% to 15% of student achievement at best. The remaining influences on student achievement are other school factors (another 5% to 10%), non-school factors (60% or so), and random error (about 20%). As Matthew Di Carlo states:

though precise estimates vary, the preponderance of evidence shows that achievement differences between students are overwhelmingly attributable to factors outside of schools and classrooms.

Let’s simplify this even further:

  1. Decades of research show that teachers are responsible for 10% to 15% of student achievement
  2. State laws hold teachers responsible for 20% to 50% (or more) of student achievement
  3. Teachers thus are held responsible for 5% to 40% (or more) of student achievement over which they have NO CONTROL and negative consequences ensue under so-called ‘accountability’ schemes

Does anyone want to argue that this is fair or reasonable or valid?

This can’t be said enough: It is morally inappropriate (and probably illegal) for policymakers to evaluate teachers and hold them ‘accountable’ for factors beyond their control. But that’s exactly what appears to be happening in state after state after state.

In Iowa, lawmakers and our new Commission on Educator Leadership and Compensation are working together over the next year to formulate teacher evaluation criteria. Even if we somehow can become the first state in the nation to overcome all of the other statistical volatility and operational unreliability issues associated with tying teacher evaluations to numerical student learning outcomes, will we do what’s right and ensure that the student achievement component of teacher evaluations is at most 10% to 15%? If we do, will the federal government even let us? If we don’t, how long until the first due process lawsuit is filed?

[UPDATE: Be sure to see the 4 scenarios below. Which seems most fair to you?]

Image credit: 1+1=3, Austin Kleon

24 Responses to “Student achievement and teacher evaluations: The math doesn’t add up?”

  1. Scott
    I have been grappling with how to explain PA’s system to our teachers when they return for a new school year without getting them frustrated, fearful, and furious. PA’s formula is terribly convoluted and confusing – and IMHO unsound. In addition, the focus on our poorly constructed state assessments and on using Benchmark assessments of the same ilk to gather “useful data” is enough to drive us all mad – especially when we know this is not the type of education our kids need! It is terribly disturbing!

  2. I’m sorry, but is the argument here that because 10-15% of student achievement can be attributed to teachers, that 10-15% of the teacher evaluation should be based on student performance?

    If yes, which I’m pretty sure is the case, then it follows that you’re arguing teacher evaluations should mirror the factors associated with student achievement. So the rest of a teacher’s evaluation would then be:

    “other school factors (another 5% to 10%), non-school factors (60% or so), and random error (about 20%)”

    I’m not going to argue that student achievement should or shouldn’t drive the majority of the teacher’s evaluation, but you’re improperly transferring percentages that should not be matched up point for point.

    • No, the rest of the evaluation comes from observations, non-teaching assignments or extra work (i.e. they do their extra stuff on time and correctly).

  3. Thanks for the comment, Jason. What I’m saying here is that teachers should not be held ‘accountable’ for factors over which they have no control.

    I believe there are 4 possible scenarios…

    SCENARIO 1: EXPECTED GAINS & TEACHER EVALUATION BOTH MIRROR STATISTICAL RESPONSIBILITY

    State expects Student to make 7 points of growth this year. Teacher is, statistically speaking, generally responsible for about 1 point of that growth. State holds Teacher ‘accountable’ for that 1 point of growth and makes that component worth 1/7 of her evaluation.

    SCENARIO 2: NEITHER EXPECTED GAINS NOR TEACHER EVALUATION MIRRORS STATISTICAL RESPONSIBILITY

    State expects Student to make 7 points of growth this year. Teacher is, statistically speaking, generally responsible for about 1 point of that growth. State holds Teacher ‘accountable’ for 3 points (or more) of growth and makes that component worth 3/7 of her evaluation.

    SCENARIO 3: EXPECTED GAINS MIRROR STATISTICAL RESPONSIBILITY BUT TEACHER EVALUATION DOES NOT

    State expects Student to make 7 points of growth this year. Teacher is, statistically speaking, generally responsible for about 1 point of that growth. State holds Teacher ‘accountable’ for that 1 point of growth and makes that component worth 3/7 of her evaluation.

    SCENARIO 4: TEACHER EVALUATION MIRRORS STATISTICAL RESPONSIBILITY BUT EXPECTED GAINS DO NOT

    State expects Student to make 7 points of growth this year. Teacher is, statistically speaking, generally responsible for about 1 point of that growth. State holds Teacher ‘accountable’ for 3 points (or more) of growth and makes that component worth 1/7 of her evaluation.

    Which scenario seems most fair to you? I believe that the reciprocal mirroring in Scenario 1 is most fair. I believe that Scenario 2 best resembles what’s happening in most states.

    Also, the rest of a teacher’s evaluation shouldn’t be based on other student achievement factors that are outside her control (as you state in your comment). That would make student achievement 100% of the teacher’s evaluation. We should base the rest of a teacher’s evaluation on other non-student achievement factors that are within her control.

    • Scott,

      I appreciate the point that I believe you’re trying to make that the performance of teachers should not be driven primarily by an evaluation of student achievement. What I am saying is that improper use of numbers is, dare I say, dangerously irrelevant.

      You argue that “State laws hold teachers responsible for 20% to 50% (or more) of student achievement.” This is not true based on what you have written here. What is true, is that 20-50% of the teacher’s evaluation is based on student achievement. The student’s achievement and the teacher’s evaluation are two very different things.
      The statement that you make “Decades of research show that teachers are responsible for 10% to 15% of student achievement” is also misleading. The 10-15%, is not the total of the student’s achievement associated with the quality of the teacher, but the amount of the variance in student performance that can be attributed to a teacher. This is a subtle, but important difference. This does not mean that if we expect a student to improve 7 points, that a teacher is responsible for 1 point of that achievement, it means that if student achievement varies by say 10 points from one student to the next, that 1-1.5 points of that difference could be attributed to the teacher.

      So, of your scenarios (as presented), yes number 1 and 3 are equally fair given the data, even if I prickle a little bit at how you are interpreting it. There is absolutely no reason that the proportion of the student’s overall achievement attributable to quality of teaching should mirror the proportion of what the teacher’s evaluation is made up of. There is a proportion of the variation of student achievement that can be attributed to the teacher. That proportion is what the teacher should be evaluated on. The question of what proportion of the evaluation should be based on student achievement is entirely separate from what proportion of student achievement can be attributed to teacher quality. Of the 4 scenarios you present only #2 and #3 are excluded by “Federal officials say they have generally approved systems in which student growth counts for between 20 percent and 50 percent of a teacher’s evaluation.”

      I think it is clear from skimming the studies on student achievement, that are available in the links you’ve provided, that measuring achievement is very difficult i.e. 20% error in measurement. It also seems clear that teachers do have a meaningful impact on student achievement, for example one of the major conclusions from the Hanushek 1998 paper (cited on the Shanker Blog that you link to) is that even after considering all other factors that a school can influence “…differences in teacher quality would swamp all other school inputs.” From this, I don’t think it would be hard to argue that a large proportion of teacher evaluations should be made up of how a teacher impacts student achievement.

      I personally believe that measures of student achievement aren’t particularly valuable in any case. There are plenty of questions about how SAT scores correlate to Freshman GPA for example, and whether or not GPA is really a good measure in any case. I would very much like to see a better case made to minimize how we value measures of student achievement, both for the students and in evaluating schools and teachers. I think you’ll rally the troops who agree with you already by saying the numbers don’t add up, but I also think the argument you’ve made is logically flawed, I’m asking you to provide me with a rationale with why the proportion of variance in student achievement attributable to quality of teaching should mirror the proportion of a teacher’s evaluation.

      Now, with all of that said, I actually agree most with what Steve Peterson has said here. I don’t think that evaluation will do much to improve individuals or systems. Systems need to be empowered to remove individuals who are clearly not performing, and I have faith that this can be done without a lot of statistical hoop jumping and measurements of student achievement. It isn’t that hard to tell who is competent, dedicated and hard-working. Resources should be prioritized to support those competent and dedicated individuals rather than evaluating them.

      • Hi Jason,

        Thank you for taking the time to leave me this long, thoughtful comment. I am trying hard to be careful with my numbers (indeed, so that my argument is not ‘dangerously irrelevant’), which is why I added my 4 scenarios in my follow-up comment and also linked to them in my original post. A few thoughts in response…

        1. You are correct that in my post I used ‘student achievement’ to stand for ‘student achievement growth.’ While the technical differences that you point out hold true, for practical purposes I don’t think that it matters since all policymakers care about is whether there is student growth or not in a given year and for how much of that we should hold teachers ‘accountable.’

        2. Perhaps I’m missing something in the literature but I think that most VAM systems are modeling statistically-adjusted test score gains of individual students. If the dependent variable is the yearly growth of students – which is then adjusted with various, often arcane, incomprehensible statistical ‘controls’ and weights – that is a focus on the change in test scores of each student, not the variance from student to student, no? See the original quote from Education Week, for example: Policymakers are interested in what proportion of students’ test score growth (or lack thereof) can be attributed to their teachers.

        3. Most importantly, if I am reading your 2nd and 4th paragraphs correctly, I understand what I think is your biggest beef with my original post. I realized that was unclear too, which is why I added my 4 scenarios. When you say “There is absolutely no reason that the proportion of the student’s overall achievement attributable to quality of teaching should mirror the proportion of what the teacher’s evaluation is made up of,” I think you’re arguing for the defensibility of Scenario 3 and a careful, reasoned, principled case could perhaps be made for that scenario. I disagree and am arguing for Scenario 1 because I think that a sense of proportionality is exactly what’s warranted here. When I see what policymakers are doing on this front, I wouldn’t apply the words ‘careful’ or ‘reasoned’ to what is occurring. An overweening emphasis on teacher ‘accountability’ is leading to grave political interpretations and misuses of student bubble test scores. I believe that as educators and citizens we should try and hold that in check as much as possible. Proportionality is also warranted because there are numerous other things on which we should want teachers to focus besides student outcomes on fixed-response assessments of low-level mental work. The greater emphasis that we place on this one aspect of learning and teaching, the more it skews everything else in the system.

        4. Teacher inputs do indeed swamp all other school inputs. But school inputs are dwarfed by non-school inputs by a factor of 4 to 1. So if we look at overall student achievement (growth), approximately 20% is attributable to school factors, 60% to non-school factors, and 20% to random error (roughly speaking; this differs a bit from study to study). Of the 20% that is attributable to school factors, 1/2 to 3/4 of that chunk is due to teachers (so 10% to 15% of overall achievement (growth)) and 1/4 to 1/2 (5% to 10% of overall achievement (growth)) is due to other school factors such as administrators, demographics of other students in building, culture/climate, etc. It would be one thing if policymakers were saying, “Look, teachers, we understand that you only have a very limited impact on overall student achievement (growth). In fact, most of what impacts students’ overall achievement (growth) is due to factors beyond your control and even the school’s control. However, for the very limited portion of student learning outcomes for which you’re responsible, we’re going to make that a large percentage of your yearly evaluation because you’re the most important school factor that we have.” But that’s not what policymakers are saying, which is why, again, I am arguing for a sense of proportionality and Scenario 1, not Scenario 3.

        5. Ultimately you and Steve Peterson are both correct that much larger, systemic issues are present here. Teacher ‘accountability’ systems are going to have very limited impacts on the success or failure of either students or school systems. But they’re political talking points right now that are having very detrimental impacts on public opinion, teacher morale, the future of the teaching profession, and, indeed, the very notion of public, common schools, which is why we have to fight to ensure that, if such systems are going to be put into place, they at least be done so in careful, measured, reasoned, defensible ways. Right now we are seeing anything but that.

        I hope that my responses are clear and address some of what you (and I) thought were some of my ‘logical flaws’ in my original post. Thanks again for contributing to the dialogue.

        • Scott,
          Thank you again for another timely and thought out response, the dialogue is likewise appreciated.
          My intention for every time I’ve used student achievement or student performance has been to mirror the measures that are in the studies you’ve referred to. For those that I have looked over, they look at the change in a student’s score on some sort of standardized test. Personally, I see no difference in calling this achievement, growth, performance, aptitude, or any combination thererof. So, yes for practical purposes all that matters is what it is about this 10-15% variance in student “pick your terminology” actually means.
          The value-added models take students past scores and try to see how well they can predict future student scores and adjust for a number of variables depending on the model. The dependent variable then would be the change in the student’s test score. The variance in that change that is attributable to teaching is what is of interest. We can then take those changes in students and use them to compare from student to student or group of students taught by teacher A compared to those taught by Teacher B, or students in District 1 vs District 2 and so on. As we can see from the link provided by Stuart Buck, this 10-15% variance can lead to a very meaningful change, i.e. a teacher having a classroom achieve 66% proficiency on an exam as opposed to 34%. Or in other words, rather than your 1 out 7 points of achievement are attributable to a teacher example, the proper interpretation is that if scores varied overall by as much as 100 points, a good teacher could provide as much as 10-15 points of improvement.
          We are in agreement that teachers should not be evaluated for factors beyond their control, as such you’re scenarios 2 and 4 do not meet my standards, and if I was feeling the need for hyperbole I might even venture to borrow your words and say “It is morally inappropriate (and probably illegal) for policymakers to evaluate teachers and hold them ‘accountable’ for factors beyond their control.”
          You are absolutely correct; my main concern is that there is absolutely no reason to mirror the proportion of variance in student achievement to the proportion of the total teacher evaluation. I agree that focusing on teacher accountability is not a productive and positive direction, but you are fighting the right battle on the wrong front with the wrong weapon. Proportionality itself is not warranted, and puts you in a precarious situation. If we agree that teachers can only be evaluated on what can be controlled, then a proportionality argument could follow that the effect of teachers is larger than any other factor under the control of the school in affecting student achievement, thus the effect of the teacher on student achievement should be larger than any other portion of the teacher’s evaluation. That is why I feel your argument is dangerously irrelevant, if you cherry pick 10-15% number out of the total variation to assign it proportionally to the teacher evaluation, then your opponents are equally entitled to argue that 50-75% of the variance that the school has control over is the teacher ergo 50-75% of the teacher evaluation. I sincerely hope that we do not see that number being utilized.
          The school system should focus on what the school system is capable of and tasked with controlling. However, if you’re arguing for this proportionality, another logical extension would be that 10-15% of all school resources should be focused on improving teacher quality to the end of improving achievement scores. I’m willing to speak for you here, and say that we would both bristle at that idea. I think it’s much easier to see how big of a deal 10-15% can be in those terms. Again, would it then be ok to say teachers must spend 10-15% of their classroom time directly focusing on improving performance on standardized exams? Again, I say no. The VAMs show that quality teaching is the best thing that schools can institute to improve performance, they say nothing about what the teacher evaluation should be composed of and nothing about whether or not teacher evaluations, however they are constituted, improve teacher performance. And most importantly, they say nothing about whether or not performance on standardized tests actually matters at all. Those are the fronts you should be fighting on, not a contrived proportionality that pushes the numbers in a direction that presently favors your position. I ask you, if current evaluation systems placed 5% of the weight of the teacher’s evaluation on standardized test performance by students, would you still be arguing to shift the number to 10-15% for the sake of proportionality?

          • I’d like to change

            “The VAMs show that quality teaching is the best thing that schools can institute to improve performance”

            to read:

            “VAMs show that quality of teaching is largest measured factor that is under school control that has an effect on performance”

            I’d hate to exclude the possibility that schools might come up with progressive solutions.

      • @Jason
        You’re also missing the 800lb Gorilla in the room which is “Are the test valid?” and “Do they actually measure student performance?” In many states the test questions and answers are not released, nor are the methods used for determining “Passing” scores. So even if there is a difference in the Proficiency rating, is that actually any reflection on the quality of teaching (to the limited impact that that has on test scores)? Anyone who puts any faith in standardized test scores has probably not looked at the actual tests.

        • I’ll quote myself:

          “I personally believe that measures of student achievement aren’t particularly valuable in any case. There are plenty of questions about how SAT scores correlate to Freshman GPA for example, and whether or not GPA is really a good measure in any case. I would very much like to see a better case made to minimize how we value measures of student achievement, both for the students and in evaluating schools and teachers.”

          My contention is with Scott’s claim that the numbers don’t add up, and that deviation from the proportionality he is lobbying for is “morally inappropriate (and probably illegal).”

          To cut to the heart of the matter, I’ll ask you the same question I asked at the end of my last post:

          If current evaluation systems placed 5% of the weight of the teacher’s evaluation on standardized test performance by students, would you still be arguing to shift the number to 10-15% for the sake of proportionality?

          Taking the right side of an argument does not make the inappropriate use of a percentage the right thing to do.

          If the goal is to convince the proponents of standardized testing that standardized testing is a bad thing, then you need data and analysis that deals with the effectiveness of the testing. The VAMs and the variance in student achievement that Scott’s original post refer to do not do this.

          As an additional question, if we are to rely on proportionality of components of the teacher’s evaluation to the the proportion of some other measure, what other measures are you going to use to fill out the remaining 85-90% of the teacher evaluation?

          It is precisely because I feel that standardized tests are overvalued that I am bothered by the way in which Scott is trying to make this argument here. As I’ve stated previously, and as Scott himself pointed out, if we use the same logic of the need for proportionality then the argument could be made from these numbers that 50-75% of teacher evaluations should be based on student achievement as measured by standardized exams.

          • My point is GIGO (Garbage In, Garbage Out). If the tests are not valid, arguing what percentage of an evaluation they should be makes no sense! Scott seems to be trying to minimize the damage from a bad idea, rather than just making the point that it’s a bad idea. You wouldn’t want an invalid medical test used to determine your treatment, you would not merely ask that it be given a smaller role. Why would we accept it in education?

          • @ Bill

            RE: GIGO

            All I’m trying to say is that argument of the math doesn’t add up doesn’t hold water. It’s going to be incredibly easy for someone arguing in favor of “accountability” to dismiss this part of Scott’s argument.

            Standardized tests are a polarizing issue. Making ill formed arguments to suit your needs does not help make progress, it pushes the sides farther apart. Convincing arguments are grounded in good evidence, if the evidence and analysis you put into an argument is garbage, what do you think you’re going to get out?

            Scott, Bill, anyone else who thinks this argument of the math doesn’t add up is great, please answer my question:

            If current evaluation systems placed 5% of the weight of the teacher’s evaluation on standardized test performance by students, would you still be arguing to shift the number to 10-15% for the sake of proportionality?

  4. Playing Devil’s Advocate here; wouldn’t historical data measure a lot of the external factors. Assuming that, historically speaking, the class / content being taught in your school has a failure rate of 50%, couldn’t that help establish a benchmark where the teacher would be evaluated from? A teacher that has a 60% failure rate (long term) would be less effective than a teacher with a 40% failure rate. Again, allowing for year / semester variances, but looking for an overall trend. Maybe allow for a +/- in each of those external factors (the ones that can be measured at least).

    • Andrew, I think the value-added models that attempt to take into account historical performance do so by accounting for previous student performance (in order to then measure THIS YEAR’S student gains), not teacher performance across cohorts of students. Despite researchers’ best attempts to statistically control for numerous factors, there is a lot of noise in those models and the reliability/validity of the results are extremely volatile (and thus unfair). More at my VAM page: http://bit.ly/11rP2DP

  5. These are some questions that I am pondering:

    The gains that students are supposed to make are for every student, including students with IEP for various needs and students that are various levels of English language learners? These gains are expected for each testing area or is this some sort of average achievement over all testing areas? What if you have a middle school student joining your subject area for the first time, for example, this student is taking science for the first time ever (then I suppose the rule of non-FAY applies, but….)? What about those teachers that teach classes that aren’t directly tested, such as band, chorus, or art? Would they just get matched up with testing results as a whole or not at all?

    Something else that I wonder about is this. A celebration at our school last year was that the 5th grade class was 100% proficient in math. This was an effort between the two 5th grade teachers, the special ed teacher, and title teachers, and of course the students. I don’t know where they stood in terms of proficiency the previous year, but it wasn’t 100%. I think this is great for our students, but it worries me as a teacher that doesn’t directly have access to these other professionals working with my students. And then, I think it begs the question, how does the evaluation show the individual teacher’s effect size?

  6. Anne, the piece that worries me the most in your situation is what happens the next year, when there are one or two special ed 5th graders who are mainstreamed but just really not cognitively capable of doing 5th grade math (yet?) Do you dump the learning disabled kids someplace else so they don’t wreck your perfect score? Do you leave them in 5th grade and blow the 100% proficiency and the other 10-year-olds all know that little Johnny is the reason we don’t get a pizza party this year because he can’t do math, and the parents all believe the teachers are slacking because the scores went down?

  7. A few questions and some discussion.
    1. Is it really that difficult to tell who is a good teacher and who isn’t?
    2. Can finely-tuned summative evaluation of teachers make schools better?

    I’m a teacher. By all accounts a pretty good one, too. But I think this teacher evaluation piece will not make schools better for kids. Why?

    From my experience, I think it’s actually not that difficult to tell who is doing a good job and who isn’t. Figuring out what to do with that information is the difficult part. By the way, I think the number of teachers who are really poor isn’t all that high. There are many who are mediocre (in the same way that many managers in business are mediocre), but the labor market can’t promise significantly better teaching for lots of reasons; not the least of these being that teaching, like lots of other high-skill occupations, requires putting in those 10,000 hours of reflective “practice” in order to develop high levels of expertise and the costs of high turnover are seriously large.

    Second, teacher evaluation as a driver of educational reform assumes that individual teachers are what make for great schools systems. I disagree. Instead, I agree with Fullan and DuFour that systems built on reflection, collaboration, inquiry, and constant, incremental change are what will improve education. Sure, individuals can lead that change (and will always strive to be great despite summative evaluation!) But we are fooling ourselves if we think that we can evaluate ourselves to where we need to be. Focusing so much of our energy and effort on individual summative evaluation beyond the competent / incompetent level creates the illusion that individuals are the problem and the solution, and it undermines the real work that it takes to create cultures of collaboration, reflection, and inquiry.

    • Linda Darling-Hammond famously said, “You can’t fire your way to Finland.”

      One of the biggest problems with ‘value-added’ teacher evaluation models is that the most noise (i.e., statistical volatility and unreliability) is at the ends, precisely the areas targeted by policymakers as they try to implement merit pay for the ‘best’ teachers and punishments for the ‘worst.’ Both the research and practitioner literature are replete with discussions of how these teacher ‘accountability’ models actually aren’t very helpful in identifying struggling teachers and usually have so much noise that there are extremely high percentages of teachers unfairly labeled as ‘unsuccessful’ and extremely high year-to-year variations in which teachers receive that label.

      Instead of fixing the system (Deming, anyone?), policymakers are trying to blame those embedded in the system (because, as those responsible for the system, otherwise they’d have to do something meaningful and difficult). What amazes me is how many educators and educational organizations are willing to be complicit with these schemes.

      To quote Chris Lehmann: “Our schools are structurally dysfunctional places. . . . Let’s stop falling victim to the soft thinking that just finding more ‘great teachers’ and getting rid of all the bad ones is the way to reform education and start asking ourselves, ‘How do we create schools that make it easier for all students and teachers to shine?'” (Beyond the Great Teacher Myth, http://bit.ly/1379Gba)

  8. The notion that teachers account for 10% of the variance has almost nothing to do with the total effect of teacher quality. See http://stuartbuck.blogspot.com/2012/01/dont-believe-defenders-of-teachers.html

    • Thank you for the link, Stuart. I’m struggling with the assertion of “well, teacher quality COULD matter more” versus what in practice actually does seem to matter. There’s wishful thinking and there’s the reality of what we see day-to-day. We can hope all we want that miracle teachers can somehow overcome all of those other non-teacher factors that you minimize but, a few exceptions here and there aside, it’s not happening and won’t happen because we can’t isolate one part of a larger, complex system and expect it to change the entire system.

      We don’t live in Lake Wobegon. We don’t expect any other profession to be staffed 100% by exceptional people. Instead of hoping for the miracle of ‘great’ teachers in every classroom, we should be trying to create systems in which most students and most educators – not just the exceptional ones – can be successful. That’s reality, regardless of statistical possibilities.

      No one’s asserting that teachers are unimportant or that they can’t make an impact on student outcomes. But we can base our policies on realistic expectations or we can base them on wishes. I’d like to stay rooted in reality and work on overall system factors rather than simply isolating and blaming one component.

      • I agree with Scott about making a real difference in learning and teaching by improving systems, not focusing on individuals. Ironically, focusing on systems may be the way to create better individual teachers, too.

        As a teacher, I’m amazed (and dismayed) at all the effort that goes into figuring out exactly how much of what score is attributable to my efforts vs. other people’s efforts. Set aside the question about whether the scores represent important learning, and set aside the real affects the tests have on students taking the tests year after year. (Here’s a blog post I wrote about how the tests changed some students in my classroom this year: http://insidethedog.edublogs.org/2013/04/13/testing-kids-and-relationships/.)

        I’m not against being evaluated. In fact, like a lot of teachers I really enjoy when people talk to me about my teaching. I just don’t put much stock in summative evaluations as an avenue for real change, so I wonder what’s the point other than simple competence or incompetence? To me, it’s clear that other pressures besides marginally better teachers inhibit good teaching and learning, pressures that evaluation won’t come close to touching. These include the fragmentation of our workplaces that the minute grading of schools and teachers only exacerbates, the cultures of our schools, not to mention the economic realities our students face. Why would we think putting a potentially “excellent” or “better” teacher into a dysfunctional environment is enough to make real change? Can’t the change work the other way, too? Teacher turnover numbers suggest that this might be the case.

        If the top-down, accountability kind of talk was just neutral, then it might be more tolerable. Unfortunately, I think it has some pretty important negative side-effects (besides on students!) and actually undermines the very kind of system-wide changes that are needed. Just one small example: even in my small, rural school district in Iowa we employ a full-time person whose job is basically data and compliance management. She’s really busy. There is no dedicated position designed to promote and facilitate system-wide collaboration that might help improve teaching and learning. We are responding to pressures from outside.

        I learn the most, and do my best teaching, when I’m embedded in a strong network of empowered people who use what they know to learn more, and tweak from there. The accountability that comes from this kind of system is one based on the desire we have to do our best, coupled with there being a community of support and critique for my next, best efforts. That’s stronger accountability (and more hopeful in the long run) than externally driven change can ever be.

        Sorry about being long-winded responding to this post. This is an issue that touches a nerve with me because, from inside, it seems so obviously wrong-headed. Thanks for raising it.

  9. All of you are wasting your time. Student growth counts 100% for evaluations of teachers under these new systems.

    People like numbers. We have known for decades that in-person classroom observations and evaluations are practically good for nothing when comparing outcomes from principals to outcomes on testing. Principals are poor judges of teacher quality based on their outcomes and student outcomes on standardized tests.

    State departments, and the feds, are aware of this research. They implemented these systems with “multiple measures” on purpose in order to hassle and bash teachers more than they already are. Politicians love this crap because it means a few less teachers will be taking funds from taxpayers in pension systems.

    The bottom line is, student growth could be 1% on paper, but in real life, administrators are going to pay special attention to those teachers who bring in low growth scores whether it be their fault or not.

    What’s sad is these systems are highly unstable – as Darling-Hammond and others have shown. For example, during the pilot years of receiving VAM outcomes 2009 – 2012 (that’s 3 years worth), the first year I was average with growth of “0”. The next year, I was horrible – probably the worst in our county – at “-3″. The next year, I was at the top with a VAM estimation of “+3″.

    Now what the crap is that? How believable is that? This occurs frequently – I think Darling-Hammond showed a large percentage of teachers will move beyond what is statistically predicable. Also the study out of Houston, and their use of EVAAS, showed that teachers felt as though VAM outcomes were like winning the lottery. And teachers would change rankings greatly when changing grade levels.

    Glazerman (Mathematica), and others, have been straightforward with the fact that there will be some teachers misidentified. Three years of time is supposed to help with this some, but its all about the message.

    And the message is this to teachers – reel in high test scores or you’re fired. Campbell’s law shows what will happen – teachers will teach to the test, prep and coach their kids for the test, and even worse – downright cheat.

    Too many teachers have too easy of access to these tests via read-alouds to kids whose IEP’s demand such service and other ways that will not be mentioned here.

    We’ve got a HUGE problem on our hands. What we have implemented is a system that values test scores way more than anything else. What do you expect teachers to do? What would anybody do in this case?

    TEACH TO THE TEST. Screw critical thinking. Screw rationalization. Screw answering real world questions and producing real world evidence of learning through authentic means. Screw portfolios – just more stuff to grade. I care about one thing and ONE thing only – TEST SCORES.

    We have ruined our public schools. Or should I say, reformers have ruined our public schools.

  10. …And one more thing – what do you think principals will do when VAM outcomes come back and those teachers who they rated as “Distinguished” and their VAM outcomes show they are ineffective? What do you think will happen next time?

    This is already happening in FL where there has been evidence that administrators had the ability to go in after VAM outcomes came back and change their original ratings to make them harmonize with VAM outcomes.

    Administrators will not want to contradict VAM outcomes – they’ll hear it from those above them, at least if it occurs too much.

    This is a messy, messy situation we are in. Shame on Bill Gates and his stack ranking. Shame on Eli Broad for his hatred for teachers and public schools. Shame on the Walton for hating teacher’s unions. Shame on our unions for accepting bribery in the form of $$$$ in order to go along with this crap.

  11. One way I am exploring the VAM equation for Florida is to look at it as this: A = B + C + D etc. I know from the Shaker institute that school scores go right long with poverty. I can’t find how they are getting the school score as they may be using the mu term as a population mean so it could be each student’s score added up then divided by the number of students. But then each student score could be the A with factors regarding the school itself within the D factor say. I’m trying to see if the weighting means what factors can be dropped out. 75% of teachers don’t have a test for their subject area. But even for those who do, the student score has to be based on the teacher last year “teaching the kid to learn” and so giving them a readiness for the next teacher. Results have shown “student growth” itself is based on poverty level which is verboten to fudge factor for in Florida. So I know then that A has been proven to go along with poverty. I’m plotting Florida counties to show that each factor goes long with poverty. Without “peer reviewed” papers the stuff from the state never even defines their terms fully. Any help would be appreciated. The new teacher’s contract in orange County,Florida for year by year teachers use the VAM which will fire all the teachers in a high poverty school within 3 years. I have 60 days until they vote on it. (There is a new conflicting state law which says they can’t use student scores of students they haven’t taught so it is very confusing)

Leave a Reply

Switch to our mobile site