Skip to:

Accountability

  • Research On Teacher Evaluation Metrics: The Weaponization Of Correlations

    Written on July 21, 2015

    Our guest author today is Cara Jackson, Assistant Director of Research and Evaluation at the Urban Teacher Center.

    In recent years, many districts have implemented multiple-measure teacher evaluation systems, partly in response to federal pressure from No Child Left Behind waivers and incentives from the Race to the Top grant program. These systems have not been without controversy, largely owing to the perception – not entirely unfounded - that such systems might be used to penalize teachers.  One ongoing controversy in the field of teacher evaluation is whether these measures are sufficiently reliable and valid to be used for high-stakes decisions, such as dismissal or tenure.  That is a topic that deserves considerably more attention than a single post; here, I discuss just one of the issues that arises when investigating validity.

     The diagram below is a visualization of a multiple-measure evaluation system, one that combines information on teaching practice (e.g. ratings from a classroom observation rubric) with student achievement-based measures (e.g. value-added or student growth percentiles) and student surveys.  The system need not be limited to three components; the point is simply that classroom observations are not the sole means of evaluating teachers.   

    In validating the various components of an evaluation system, researchers often examine their correlation with other components.  To the extent that each component is an attempt to capture something about the teacher’s underlying effectiveness, it’s reasonable to expect that different measurements taken of the same teacher will be positively related.  For example, we might examine whether ratings from a classroom observation rubric are positively correlated with value-added.

    READ MORE
  • Empower Teachers To Lead, Encourage Students To Be Curious

    Written on July 9, 2015

    Our guest author today is Ashim Shanker, a former English Language Arts teacher in public schools in Tokyo, Japan. Ashim has a Master’s Degree in International Education Policy from Harvard University and is the author of three books, including Don’t Forget to Breathe. Follow him on Twitter at @ashimshanker.

    In the 11 years that I was a public school teacher in Japan, I came to view education as a holistic enterprise. Schools in Japan not only imbued students with relevant skills, but also nurtured within them the wherewithal to experience a sense of connection with the larger world, and the exploratory capacity to discover their place within it.

    In my language arts classes, I encouraged students to read about current events and human rights issues around the world. I asked them to make lists of the electronics they used, the garments they wore, and the food products they consumed on a daily basis. I then had them research where these products were made and under what labor conditions.

    The students gave presentations on child laborers and about modern-day slavery. They debated about government secrecy laws in Japan and cover-ups in the aftermath of the Fukushima nuclear disaster. They read an essay on self-reliance by Emerson and excerpts on civil disobedience by Thoreau, and I asked them how these two activists might have felt about the actions of groups like Anonymous, or about whistleblowers like Edward Snowden. We discussed the Milgram Experiment and the Stanford Prison experiment, exploring how obedience and situational role conformity might tip even those with the best of intentions toward acts of cruelty. We talked about bullying, and shared anecdotes of instances in which we might unintentionally have hurt others. There were opportunities for self-reflection, engagement, and character building—attributes that I would like to think foster the empathic foundations for better civic engagement and global citizenship.

    READ MORE
  • Do We Know How To Hold Teacher Preparation Programs Accountable?

    Written on June 30, 2015

    This piece is co-authored by Cory Koedel and Matthew Di Carlo. Koedel is an Associate Professor of Economics and Public Policy at the University of Missouri, Columbia.

    The United States Department of Education (USED) has proposed regulations requiring states to hold teacher preparation programs accountable for the performance of their graduates. According to the proposal, states must begin assigning ratings to each program within the next 2-3 years, based on outcomes such as graduates’ “value-added” to student test scores, their classroom observation scores, how long they stay in teaching, whether they teach in high-needs schools, and surveys of their principals’ satisfaction.

    In the long term, we are very receptive to, and indeed optimistic about, the idea of outcomes-based accountability for teacher preparation programs (TPPs). In the short to medium term, however, we contend that the evidence base underlying the USED regulations is nowhere near sufficient to guide a national effort toward high-stakes TPP accountability.

    This is a situation in which the familiar refrain of “it’s imperfect but better than nothing” is false, and rushing into nationwide design and implementation could be quite harmful.

    READ MORE
  • Will Value-Added Reinforce The Walls Of The Egg-Crate School?

    Written on June 25, 2015

    Our guest author today is Susan Moore Johnson, Jerome T. Murphy Research Professor in Education at Harvard Graduate School of Education. Johnson directs the Project on the Next Generation of Teachers, which examines how best to recruit, develop, and retain a strong teaching force.

    Academic scholars are often dismayed when policymakers pass laws that disregard or misinterpret their research findings. The use of value-added methods (VAMS) in education policy is a case in point.

    About a decade ago, researchers reported that teachers are the most important school-level factor in students’ learning, and that that their effectiveness varies widely within schools (McCaffrey, Koretz, Lockwood, & Hamilton 2004; Rivkin, Hanushek, & Kain 2005; Rockoff 2004). Many policymakers interpreted these findings to mean that teacher quality rests with the individual rather than the school and that, because some teachers are more effective than others, schools should concentrate on increasing their number of effective teachers.

    Based on these assumptions, proponents of VAMS began to argue that schools could be improved substantially if they would only dismiss teachers with low VAMS ratings and replace them with teachers who have average or higher ratings (Hanushek 2009). Although panels of scholars warned against using VAMS to make high-stakes decisions because of their statistical limitations (American Statistical Association, 2014; National Research Council & National Academy of Education, 2010), policymakers in many states and districts moved quickly to do just that, requiring that VAMS scores be used as a substantial component in teacher evaluation.

    READ MORE
  • Trust: The Foundation Of Student Achievement

    Written on May 21, 2015

    When sharing with me the results of some tests, my doctor once said, "You are a scientist, you know a single piece of data can't provide all the answers or suffice to make a diagnosis. We can't look at a single number in isolation, we need to look at all results in combination." Was my doctor suggesting that I ignore that piece of information we had? No. Was my doctor deemphasizing the result? No. He simply said that we needed additional evidence to make informed decisions. This is, of course, correct.

    In education, however, it is frequently implied or even stated directly that the bottom line when it comes to school performance is student test scores, whereas any other outcomes, such as cooperation between staff or a supportive learning environment, are ultimately "soft" and, at best, of secondary importance. This test-based, individual-focused position is viewed as serious, rigorous, and data driven. Deviation from it -- e.g., equal emphasis on additional, systemic aspects of schools and the people in them -- is sometimes derided as an evidence-free mindset. Now, granted, few people are “purely” in one camp or the other. Most probably see themselves as pragmatists, and, as such, somewhere in between: Test scores are probably not all that matters, but since the rest seems so difficult to measure, we might as well focus on "hard data" and hope for the best.

    Why this narrow focus on individual measures such as student test scores or teacher quality? I am sure there are many reasons but one is probably lack of familiarity with the growing research showing that we must go beyond the individual teacher and student and examine the social-organizational aspects of schools, which are associated (most likely causally) with student achievement. In other words, all the factors skeptics and pragmatists might think are a distraction and/or a luxury, are actually relevant for the one thing we all care about: Student achievement. Moreover, increasing focus on these factors might actually help us understand what’s really important: Not simply whether testing results went up or down, but why or why not.

    READ MORE
  • The Purpose And Potential Impact Of The Common Core

    Written on May 19, 2015

    I think it makes sense to have clear, high standards for what students should know and be able to do, and so I am generally a supporter of the Common Core State Standards (CCSS). That said, I’m not comfortable with the way CCSS is being advertised as a means for boosting student achievement (i.e., test scores), nor the frequency with which I have heard speculation about whether and when the CCSS will generate a “bump” in NAEP scores.

    To be clear, I think it is plausible to argue that, to the degree that the new standards can help improve the coherence and breadth/depth of the content students must learn, they may lead to some improvement over the long term – for example, by minimizing the degree to which student mobility disrupts learning or by enabling the adoption of coherent learning progressions across grade levels. It remains to be seen whether the standards, as implemented, can be helpful in attaining these goals.

    The standards themselves, after all, only discuss the level and kind of learning that students should be pursuing at a given point in their education. They do not say what particular content should be taught when (curricular frameworks), how it should be taught (instructional materials), who will be doing the teaching and with what professional development, or what resources will be made available to teachers and students. And these are the primary drivers of productivity improvements. Saying how high the bar should be raised (or what it should consist of) is important, but outcomes are determined by whether or not the tools are available with which to accomplish that raising. The purpose of having better or higher standards is just that – better or higher standards. If you're relying on immediate test-based gratification due solely to CCSS, you're confusing a road map with how to get to your destination.

    READ MORE
  • Teaching = Thinking + Relationship

    Written on May 5, 2015

    Our guest author today is Bryan Mascio, who taught for over ten years in New Hampshire, primarily working with students who had been unsuccessful in traditional school settings. Bryan is now a doctoral student at the Harvard Graduate School of Education, where he conducts research on the cognitive aspects of teaching, and works with schools to support teachers in improving relationships with their students.

    Before I became a teacher I worked as a caretaker for a wide variety of animals. Transitioning from one profession to the other was quite instructive. When I trained dogs, for example, it was straightforward: When the dog sat on command I would give him praise and a treat. After enough training, anyone else could give the command and the dog would perform just as well and as predictably. When I worked with students, on the other hand, it was far more complex – we worked together in a relationship, with give and take as they learned and grew.  Regrettably, when I look at how we train teachers today, it reminds me more of my first profession than my second.

    Teaching is far more than a mechanized set of actions. Our most masterful teachers aren’t just following scripts or using pre-packaged curricula. They are tailoring lessons, making professional judgments, and forging deep bonds with students – all of which is far more difficult to see or understand.  Teaching is a cognitive skill that has human relationships at its center. Unfortunately, we typically don't view teaching this way in the United States. As a result, we usually don't prepare teachers like (or for) this, we don’t evaluate them like this, and we don’t even study them like this. In our public discussion of education, we typically frame teaching as a collection of behaviors, and teachers as though they are simply technicians.  This doesn’t just create a demoralized workforce; it also leaves students in the care of well-meaning and hard-working teachers who are, nonetheless, largely unable to meet their students' individual needs – due either to lack of preparation for, or mandates that prevent, meeting them.

    READ MORE
  • Measurement And Incentives In The USED Teacher Preparation Regulations

    Written on April 22, 2015

    Late last year, the U.S. Department of Education (USED) released a set of regulations, the primary purpose of which is to require states to design formal systems of accountability for teacher preparation (TP) programs. Specifically, states are required to evaluate annually the programs operating within their boundaries, and assign performance ratings. Importantly, the regulations specify that programs receiving low ratings should face possible consequences, such as the loss of federal funding.

    The USED regulations on TP accountability put forth several outcomes that states are to employ in their ratings, including: Student outcomes (e.g., test-based effectiveness of graduates); employment outcomes (e.g., placement/retention); and surveys (e.g., satisfaction among graduates/employers). USED proposes that states have their initial designs completed by the end of this year, and start generating ratings in 2017-18.

    As was the case with the previous generation of teacher evaluations, teacher preparation is an area in which there is widespread agreement about the need for improvement. And formal high stakes accountability systems can (even should) be a part of that at some point. Right now, however, requiring all states to begin assigning performance ratings to schools, and imposing high stakes accountability for those ratings within a few years, is premature. The available measures have very serious problems, and the research on them is in its relative infancy. If we cannot reliably distinguish between programs in terms of their effectiveness, it is ill-advised to hold them formally accountable for that effectiveness. The primary rationale for the current focus on teacher quality and evaluations was established over decades of good research. We are nowhere near that point for TP programs. This is one of those circumstances in which the familiar refrain of “it’s imperfect but better than nothing” is false, and potentially dangerous.

    READ MORE
  • Charter Schools, Special Education Students, And Test-Based Accountability

    Written on April 7, 2015

    Opponents often argue that charter schools tend to serve a disproportionately low number of special education students. And, while there may be exceptions and certainly a great deal of variation, that argument is essentially accurate. Regardless of why this is the case (and there is plenty of contentious debate about that), some charter school supporters have acknowledged that it may be a problem insofar as charters are viewed as a large scale alternative to regular public schools.

    For example, Robin Lake, writing for the Center for Reinventing Public Education, takes issue with her fellow charter supporters who assert that “we cannot expect every school to be all things to every child.” She argues instead that schools, regardless of their governance structures, should never “send the soft message that kids with significant differences are not welcome,” or treat them as if “they are somebody else’s problem.” Rather, Ms. Lake calls upon charter school operators to take up the banner of serving the most vulnerable and challenging students and “work for systemic special education solutions.”

    These are, needless to say, noble thoughts, with which many charter opponents and supporters can agree. Still, there is a somewhat more technocratic but perhaps more actionable issue lurking beneath the surface here: Put simply, until test-based accountability systems in the U.S. are redesigned such that they stop penalizing schools for the students they serve, rather than their effectiveness in serving those students, there will be a rather strong disincentive for charters to focus aggressively on serving special education students. Moreover, whatever accountability disadvantage may be faced by regular public schools that serve higher proportions of special education students pales in comparison with that faced by all schools, charter and regular public, located in higher-poverty areas. In this sense, then, addressing this problem is something that charter supporters and opponents should be doing together.

    READ MORE
  • How Not To Improve New Teacher Evaluation Systems

    Written on March 9, 2015

    One of the more interesting recurring education stories over the past couple of years has been the release of results from several states’ and districts’ new teacher evaluation systems, including those from New York, Indiana, Minneapolis, Michigan and Florida. In most of these instances, the primary focus has been on the distribution of teachers across ratings categories. Specifically, there seems to be a pattern emerging, in which the vast majority of teachers receive one of the higher ratings, whereas very few receive the lowest ratings.

    This has prompted some advocates, and even some high-level officials, essentially to deem as failures the new systems, since their results suggest that the vast majority of teachers are “effective” or better. As I have written before, this issue cuts both ways. On the one hand, the results coming out of some states and districts seem problematic, and these systems may need adjustment. On the other hand, there is a danger here: States may respond by making rash, ill-advised changes in order to achieve “differentiation for the sake of differentiation,” and the changes may end up undermining the credibility and threatening the validity of the systems on which these states have spent so much time and money.

    Granted, whether and how to alter new evaluations are difficult decisions, and there is no tried and true playbook. That said, New York Governor Andrew Cuomo’s proposals provide a stunning example of how not to approach these changes. To see why, let’s look at some sound general principles for improving teacher evaluation systems based on the first rounds of results, and how they compare with the New York approach.*

    READ MORE

Pages

Subscribe to Accountability

DISCLAIMER

This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.