Teacher Evaluation Pt. 1: A Good Idea Gone Bad?

Teacher Evaluation Pt. 1: A Good Idea Gone Bad?

By Scott Marion | National Center for the Improvement of Educational Assessment | January 3, 2019

"The push towards a more centralized approach grew out of the belief that credible and consistent evaluations would better reveal inequities in teacher quality... However, the new approach didn’t lend the results we imagined."
"The push towards a more centralized approach grew out of the belief that credible and consistent evaluations would better reveal inequities in teacher quality... However, the new approach didn’t lend the results we imagined."
 
"The push towards a more centralized approach grew out of the belief that credible and consistent evaluations would better reveal inequities in teacher quality... However, the new approach didn’t lend the results we imagined."

Let’s be honest about state-mandated teacher evaluations. They have not fulfilled the policy promises of increased teacher quality and student learning. There is inequitable distribution of high quality teachers across states and districts.

I’ve spent significant time over the past 10 years helping design and implement state-controlled educator systems – and I must admit these policies did not work as many had hoped. Does that mean we should give up on teacher evaluation? No, but we need to make sure we attend to the many lessons learned in rethinking educator support and evaluation.

A Very Brief History
The Race to the Top competition and the waiver authority exercised under the No Child Left Behind Act by the Obama-Duncan administration shifted the control of educator evaluation systems from districts to states. The push towards a more centralized approach for evaluating teachers grew out of policy intentions that believed credible and consistent evaluations would better reveal inequities in teacher quality. Another factor contributing to increased state control was the advent of sophisticated statistical models for evaluating “student growth” (Brill, 2011). These models generally rely on large-scale data, so it made sense to apply the results in similar ways across the state. However, the new approach didn’t lend the credibility and consistency we imagined.

New Data, Big Hopes
State-led teacher evaluation systems designed to meet the Race to the Top and waiver rules generally included two major components: teacher practices and student “growth.” Student growth was required to be weighted “significantly” in the evaluation, interpreted by most states as somewhere between 20% and 50% of the overall rating[1].Thanks to Race to the Top, an increase in available data allowed researchers to better estimate long-term effects of having higher quality teachers. A well-publicized study of 2.5 million children by Raj Chetty and colleagues (2011) found a strong correlation of student success to high-performing teachers. Students with these teachers were “more likely to attend college, attend higher ranked colleges, earn higher salaries, live in higher SES neighborhoods, and save more for retirement.” When faced with the challenge of how to incorporate measures of student learning into teacher evaluations, other related improvements developed. We learned how to design and implement student learning objectives in ways that could support improvements in instruction. We also designed better accountability for teachers in classes without state test data.

On the teacher practice side of the equation, many rating tools and processes that had been around for a while such as those developed by Charlotte Danielson, Robert Marzano, and Kim Marshall, received much more attention as principals were expected to improve the accuracy, effectiveness and consistency of their ratings of teacher practices.

What Went Wrong?
In spite of some of these gains, the evidence seems pretty clear that state-led teacher evaluation systems failed to achieve the policy aims. In fact, one of the key problems was that the policy aims were often unstated or unclear. Were policy makers interested in formalizing the system so it would be easier to get rid of very low performing teachers? Or were they interested in a system that could support improvements in teacher quality across the board? Most state plans tended to indicate the latter even though the results might offer some weak support for the former.

In other words, after years of intensive design work and implementation support, the distribution of teacher ratings only appears to differ slightly from those found in the Widget Effect (Weisberg, et al., 2009), which documented that teacher evaluation methods failed to identify truly exceptional teachers, didn’t result in specific feedback to improve teacher quality, and didn’t result in consequences for poor performing teachers. It concluded that in the teacher evaluation system, “excellence goes unrecognized and poor performance goes unaddressed.” To be fair, recent analyses (Kraft and Gilmour, 2017) show more variability than documented previously, except for the lowest category where states report remarkably few teachers (generally less than 1%).

Beyond the lack of clarity, state-led teacher evaluations suffered from many technical problems. Much has been written about the challenges associated with employing value-added or student growth percentile models for evaluating individual teachers[2]. There is no question this is a serious issue, but it pales in comparison to the insurmountable hurdles of evaluating “student growth” in non-tested subjects and grades (NTSG). Approximately 70 to 80 percent of educators teach in non-tested subjects or grades. Trying to attribute changes in student achievement to teachers in NTSG in comparable ways across a state is nearly impossible (Marion, et al., 2012). More on that in the Center for Assessment report here.

Principal-Proofing
Of all the shortcomings, perhaps the most egregious was that states—following federal guidelines—appeared intent on creating mechanistic, “principal-proof” evaluation systems. This was about as effective as “teacher-proof” curriculum. It cannot be done. While the intention to limit personal bias is noble, principals are critical in evaluating their teachers. They provide at least half of the evaluation through teacher practice ratings. Despite issues like school culture, labor pools, and union battles that could give prejudicial leanings, there is growing evidence that principals are able to better differentiate teacher practices compared to what is reported (Kraft and Gilmour, 2017). However, principals must work with their teachers each day and are often unwilling to risk giving poor ratings unless they are absolutely sure of their rating or if they have other information about the teacher.

So, here we are. The state systems are not working for both teachers and students. But there is some hope. In my next blog, I offer some suggestions for moving forward, chiefly at the local district level, about why and how we should continue to support and evaluate teachers in order to improve the quality of student learning.

Scott Marion is a national leader in designing innovative and comprehensive assessment systems to support both instructional and accountability uses. He is the Executive Director of The National Center for the Improvement of Educational Assessment and coordinates and/or serves on five district state Technical Advisory Committees for assessment, accountability and educator evaluation.

 
 
 

Comments

  1. A major problem with tying teacher evaluation to student test results is that it totally dismisses any variations in student abilities or backgrounds. This is naive at best and dishonest at its worst. Dishonest because it knowingly presumes an untruth- that all students are alike.

Leave a Comment