Real observations on the classroom

Once in a while, a piece of purely academic research opens a path around political bottlenecks. A paper by Tom Kane, Eric Taylor, John Tyler, and Amy Wooten—just out in the Journal of Human Resources—offers a way out of the student-test-scores-for-teacher-evaluation fight. I’ll talk more about the paper below. For now, the oversimplified summary is that carefully done teacher evaluations based onclassroom observation can do an excellent job of picking up the effect of teachers on student outcomes as measured by value-added scores.

To begin, the goal of schools is to produce informed, productive, well-functioning young adults. Research and applied teacher evaluation are relevant insofar as they produce measures that tell us how to distinguish actions that contribute toward reaching this goal from those that don’t.

A good teacher makes a huge difference in moving her students toward the goal. What’s more, the difference in teachers is so great that an experienced educator can size up a teacher after about two minutes of observing a classroom…and almost always be right. This suggests that a wide variety of different evaluation systems might do a good job of sorting teachers. (A related, but distinct, issue is whether evaluation systems provide useful feedback to help a teacher do better. More about this on Friday.)

Economists, and increasingly state legislators, are enamored of value-added measures (VAM) based on improvements in standardized test scores. Count me in (up to a point). It’s not that hard to measure whether a student can read and cypher. And there isn’t any doubt that “narrow” academic abilities contribute to making a successful adult.

Using VAM for research purposes is obviously right. Using it for evaluating individual teachers has some problems. On the practical level, existing measures are very imperfect. They contain a good deal of noise, which makes them potentially unfair to individual teachers. Nor is it clear that VAM completely adjusts for student background. What’s more, all sorts of things that we want for our children aren’t measured by standardized tests. I think that VAM measures have an important role to play as a piece of teacher evaluation systems. But it doesn’t matter what I think, because teachers, teacher unions, and much of the public hate evaluation based on standardized tests.

The alternative is to evaluate teachers based on what the teacher does rather than how well the teacher’s students do. Economists have not much liked this method, largely because existing evaluation methods are either shams (everyone is rated above average) or are based on behavior that is irrelevant to the ultimate goal of producing successful students (getting a master’s degree). However, economic theory in no way suggests that measures of input (teacher behavior) are in principle inferior to measures of output (test scores). Both are indirect measures of movement toward the ultimate goal.

People really like measures that they think are controllable by the person being evaluated. So classroom observations have a big political advantage.

To break through the political bottleneck we want to find measures of observable teacher behavior that do as well at identifying student outcomes as do VAM measures. And this new research shows this is very doable.

The coauthors looked at teacher evaluation scores in Cincinnati, which has a very detailed program in which teachers are evaluated using four classroom observations scored against a detailed rubric. The researchers then asked how well teacher evaluation scores explain student VAM measures. The answer is “pretty well.”

Here’s the relevant measurement. Take a high score (top quartile)and a low score (bottom quartile) on the teacher evaluation. Run the score difference through the model and ask how much of a VAM difference it predicts. Compare this with the difference in high and low direct VAM rankings of teachers. It turns out that Cincinnatti’s system explains one-third to one-half of the VAM difference. (My reading of the paper is that the authors are quite conservative in making this claim. The effect is probably larger.)

So a well-constructed, classroom observation system, can get us much of the information available through test scores with little of the political controversy.

Further, a completely independent research study done on New York City schools by Susanna Loeb and colleagues, on a much smaller sample, comes to much the same conclusion.

Could this research be the catalyst to push towards widespread use of meaningful, observation-based, teacher evaluation methods?

  1. gls says:

    The NEA did just endorse some type of evaluations, though apparently not any from standardized tests now in existence. Perhaps this is the answer they’d like if any of them bother to read academic journals.

