Tucker’s Lens: International Comparative Data on Student Achievement – A Guide for the Perplexed

This is a second version of this article intended to correct an error made in the first version.*

By Marc Tucker

pruebas Pirls-tims

My apologies to Maimonides.  But I would not blame you if you were perplexed about the recent dust-up after the latest PIRLS and TIMSS data came out.  Some of the best-known names in education research worldwide came out with guns blazing, mostly at one-another, in a rapid-fire exchange about what the numbers meant.  I thought some of you might welcome a guide to the shooters and the shots, and a bit of commentary on the profound meaning of it all.

Tom Loveless, the head of the Brown Center at the Brookings Institution jumped on the data to say that they called for a “rethinking of the Finnish miracle success story….If Finland were a state taking the 8th grade NAEP [the sample survey used in the United States to monitor the progress of American students over time], it would probably score in the middle of the pack.”  Jack Buckley, Commissioner of the National Center for Educational Statistics in the U.S. Department of Education confessed that, “I’ve always been a little puzzled” by the high level of attention paid to Finland.  Well, so much for Finland!

Martin Carnoy and Richard Rothstein wrote an analysis of the data claiming to show that while reading achievement of American students on PISA was growing between 2000 and 2009, it was falling by an even larger amount in Finland.  Similarly, they said, in math, US students from the lowest social class were also gaining substantially, while scores of comparable Finnish students declined.  “This is surprising,” they said, “because the proportion of disadvantaged students in Finland also fell…” And they go on to say that, by their analysis, the achievement gap between the most and the least advantaged students in the United States is actually smaller than in “similar postindustrial countries, and often only slightly larger than gaps in top-scoring nations.”

Ha!  That means that the withering criticism showered on American schools for their poor performance was totally undeserved.  The problem, if there is a problem, lies not in the schools, poor Horatio, which have been doing a much better job than anyone has given them credit for, but in the enormous disparities in family income that have opened up in American society.  And Finland, according to this analysis, hardly deserves its status as the model that the United States should be adopting.

Not so fast, say Paul Peterson, Eric Hanushek and Ludger Woessmann.  Peterson is at Harvard, Hanushek at Stanford and Woessmann at the University of Munich.  The data, they say, don’t show that at all.  What they actually show is that, even if such corrections are made, American students at the top do not perform anywhere near as well as the students in the top performing countries, or, at least, not such a high proportion of them do.  Things are just as bad as they always said they were, and the need to turn up the heat on the schools to perform up to international standards is as great as ever.

But wait a minute, says Andreas Schleicher.  The Carnoy-Rothstein analysis depends, he said, on a challenge to the methods used by OECD-PISA to do its survey research, and that challenge, says Schleicher, just won’t hold up in court.  To which Carnoy and Rothstein said in reply to the reply, Oh yes it will.

So what is going on here?  Why are all these people so exercised about this data?  What are their agendas anyway?  Who is right and who is wrong?  Why does it matter?  And what does it mean?

I know that research is supposed to go where the evidence leads it and the researcher is only there to record the ineluctable result, without fear or favor.  But the reality is that researchers have values to support and reputations to protect, and their conclusions are more often than not influenced by both their values and the reputations they have established as a result of the policy positions they have taken.  So, perhaps it would help to sketch in the positions taken on the relevant issues by the people I have named.

It should surprise no one that spokespeople for the Brookings Institution and the United States National Center for Educational Statistics should be waiting to pounce on Finland and on the people who have used the Finns’ standing in the international league tables to make a case for using the educational strategies the Finns have embraced.  Both Brookings and a series of U.S. Department of Education research executives, some of whom have gone to Brookings when they left the Department of Education, have been deeply skeptical of international education benchmarking and ardent advocates of what they have described as the “gold standard” of education research, meaning the use of experimental research techniques as the only legitimate way to attribute cause in social research.  It is obviously impossible to randomly assign national “treatments” to national populations in the arena of education, so, from their point of view, all statements that this or that set of policies “causes” these or those national outcomes in the arena of education policy are necessarily suspect.

Brookings and the Peterson, Hanushek, Woessmann team are both strong supporters of charters and the introduction of market forces generally as school reform strategies.  Brookings, as well other Washington-based think tanks, are eager to deflate the recent enthusiasm for international education benchmarking in part because they fear that the close examination of the strategies used by the top-performing countries will show little evidence that charters or market strategies in general are effective strategies for raising student achievement at a national scale.

Kids-taking-a-test-flickr-commons-rzganozaPeterson, Hanushek and Woessmann each have their own views on what is most important in education reform, but all are advocates of charters and reform agendas based on market forces, and all appear to believe that it will take fear of foreign competitors to put this reform agenda over the top in the United States.  They have also done research that they say supports their claim that market strategies do work in the top-performing countries.  Implicitly, then, they believe, unlike their Brookings colleagues, that it is possible to do rigorous research using comparative data gleaned from these international surveys that attributes cause and from which, therefore, it is possible to draw policy conclusions.  This team of researchers has consistently advanced the view, like my own organization, that economic ruin will be the fate of any nation that fails to hold its own in international education competition, though their prescriptions as to the most effective policy agenda are different from our own, based on the study of pretty much the same data.

But Carnoy and Rothstein come from a very different place.  They believe that the relatively poor performance of American students on the international surveys of student achievement is a function of the large and increasing disparity in incomes among Americans, in absolute terms and in relation to other countries.  They are outraged that organizations like my own and researchers like Peterson, Hanushek and Woessmann hold the schools accountable for poor student performance, when they think the fault lies not in the schools and teachers, but rather in a society that tolerates gross and increasing disparities in income among Americans.  They would have us focus on promoting policies that would result in a fairer distribution of income in the United States.

Which puts them in direct conflict not just with Peterson, Hanushek and Woessmann, but also with Andreas Schleicher, the driver of the whole PISA system at the OECD.  Schleicher’s primary framework for the analysis of the PISA data displays the country data on two axes, one for student achievement on the subjects assessed by PISA and the other for equity, the pattern of the distribution of results from the poorest to the best performers within countries.  Countries with short tails in that distribution are described as having high equity; those with long tails are described as having low equity.  Schleicher points out that the United States just barely escapes being among those countries in the worst quartile on both measures.  Another table in Schleicher’s slide deck shows that, when socio-economic status is held constant, the schools of some nations do a much better job than others of reducing achievement disparities among students.  Carnoy and Rothstein would take American teachers off the hook, saying that the performance of poor and minority students is actually improving, the gap is not so large as was thought, and the performance of poor and minority students in the top performing countries is actually declining.  To the extent there is a problem, it is a problem caused by socio-economic status of the students, not the teachers’ performance.  Schleicher would say, no, that is not so.  Even when we look at students from comparable socio-economic backgrounds, American schools do less to close the gap with the students from more favored backgrounds than schools in most other countries.  They cannot both be right.

So it is no wonder that Carnoy and Rothstein go after Schleicher and his data and methods with hammer and tongs.

children-taking-a-testSo who is right and who is wrong here?  All of the people I have named are competent researchers from well-regarded institutions.  Just as each of these people have their own values and established positions on the relevant policy issues, the same is true of me and the organization with which I am associated.  Our analysis of the dynamics of the global economy strongly suggests that high wage countries like the United States will find it increasingly difficult to maintain their standard of living unless they figure out how to provide a kind and quality of education to virtually all their children that they formally thought appropriate only for a few.  And we also believe that the most likely source of good ideas for strategies that will enable them to do that is the countries that have already done it.  We think that whether the source of poor performance is mainly growing disparities of income or relatively poor performance of the education system, the dynamics of the global economy are unforgiving and countries like the United States do not have the option of saying that the educators can do nothing, that the only thing that will save us is income redistribution.  We do not think that the only way to learn what strategies are likely to work is research methods derived from the experimental sciences.  Indeed, we think that the record clearly shows that American business recovered from a devastating assault from Japanese firms in part by inventing and using the very method—industrial benchmarking—that we and others are now using in the field of education.

To me, the most important conclusion to be drawn from the debate whose contours I have just rather roughly outlined is that now, for the first time in the United States, the international surveys of student achievement really matter.  That is a big, big change.  It was not the case before that advocates of the most hotly debated education reforms in the United States felt that they needed to take the data from these surveys seriously, to defend their positions or to advance them.  Clearly, they do now.

The second point is that the data from the international surveys is being used to make points not about peripheral issues, but central issues.  It really matters whether the cause of the United States’ relatively low standing in the international league tables is income disparities among the students’ families or poor education in the schools.  It really matters whether or not countries like Finland have important lessons for the rest of the world.  It matters whether the survey methods being used by the organizations that design and administer them bear up to scientific scrutiny or not.  And, lastly, it also matters whether the methods used by those who do research comparing the effects of different policies and practices on student achievement in multiple countries have enough scientific merit to justify their use by policy makers to make national policy. These are consequential questions.  This is the first time that we have seen a sustained debate by some of America’s leading scholars on these matters.  It is not likely to be the last, and that appears to herald an era in which, for the first time in the United States, international surveys of student achievement are likely to take a prominent place in the public debate about education policy.
You may be wondering where I come out on the welter of claims and counterclaims I described above.  Now that I have laid my analytical framework on the table along with those of the other analysts, you are in a position to apply the same dose of skepticism to my conclusions as I urged you to apply to the others.   My take on the data we now have in hand is more or less as follows.

First, the usual note of caution.  One snapshot does not a movie make.  We should not declare a trend before we have more than one data point.  So we might want to see whether the changes in rankings suggested by the recent PIRLS and TIMSS data hold up over time.

Second, as many have pointed out, TIMSS and PIRLS put the accent on measuring how students do on what amounts to a consensus curriculum.  Did they learn what international experts think they should have been taught in the subjects they assess?  PISA measures the capacity of students to apply what they have learned in the classroom to proxies for real-world problems of the sort they might actually encounter outside the classroom.  I have a strong preference for the latter goal over the first, which mainly comes from an experience I had years ago, when Archie Lapointe, the director at that time of the Young Adult Literacy Survey, told me the following.  The survey asked the young people surveyed to add a column of figures and take a percentage of the result. Almost all could do it.  It also asked the same respondents to take a restaurant check, add up the items, get a total and calculate a tip.  Very few could do it.  Like Alfred North Whitehead, I have very little use for what he called “inert knowledge.”

Third, we need to keep in mind that the fine-grained distinctions in the rankings, for most countries that are near one another, are not statistically significant.  What we should really be paying attention to is the groupings of countries in the rankings, when countries are grouped in such a way that the measured differences among the groups are statistically significant.  If you look at it from this perspective, what we see is the United States still has a long way to go before the vast majority of its students score in the front ranks of performance at many grade or age levels in many subjects, which is how I would define top performers.

2011_OECD_PISAFourth, I think it is pretty clear from the OECD data that smaller proportions of American students score in the higher deciles of performance on the PISA tests, and more in the lower deciles than is the case for students from the top-performing countries.  If that is true, then it cannot also be true that the United States would do as well as the top-performing countries if only the poor, Black and Hispanic students were taken out of the rankings, as many American teachers and some policymakers maintain.  It is also clear from the OECD-PISA analysis, as I pointed out above, that, when the data are corrected for students’ socio-economic status, American schools are less effective than the schools of most of the countries measured at closing the gap between these students and students with higher socio-economic status.

This, of course, is not where Carnoy and Rothstein come out, but I think Andreas Schleicher won the battle between him, on the one hand, and Carnoy and Rothstein on the other.  But don’t take my word for it.  Read the claims and arguments made by both sides carefully.  There is a lot at stake in this conflict.

So, what then are we to make of the fact that, if Massachusetts, North Carolina and Florida were countries, they would have done very well indeed in the most recently released rankings?

The case of Florida, I think, is pretty straightforward.  The Florida Center for Reading Research, administered by Florida State University, is one of the nation’s leading centers for reading research.  Its methods are widely admired throughout the United States.  The state of Florida has managed to leverage this research program and its key figures to produce widespread implementation throughout the state of the methods advocated by the Center.  We can see the results in the PIRLS fourth grade reading results.  The question, of course, is what effect, if any, this will have on student performance in the upper grades as the students who have benefitted from these programs mature through the years.  That story has yet to be told.

In North Carolina, we are looking at a program of education reform that began with Governor Terry Sanford, whose first term as governor began in 1961.  Sanford’s unrelenting emphasis on improving education in the state laid the base for Governor James B. Hunt, Jr., who served as governor from 1977 to 1985 and again from 1993 to 2001, making him the longest serving governor in the state’s history.  Through that whole period, he never lost his focus on education as the key to the state’s economic growth, and, during that period, North Carolina showed more progress on student achievement as measured by the National Assessment of Education Progress than any other state in the United States.  Hunt’s agenda for education reform was profoundly affected by what he was learning about the strategies adopted by the top-performing countries in the world.  Like them, he focused on teacher quality, high quality instructional systems and early childhood education.  North Carolina was among the very first states in the United States to send delegations of key state policy-makers abroad to study the top performers.

Massachusetts is a similar story.  In this case the first phase of the reforms were driven by the business community, organized by Jack Rennie, a very successful businessman who worked hard to organize that community, and Paul Reville a public policy analyst.  They played the key role in pushing the Massachusetts Education Reform Act of 1993 through the legislature.  The Act provided hundreds of millions in new funding for the schools in exchange for explicit performance standards for students, set to international benchmarks and carefully drawn curriculum frameworks, also set to international benchmarks; a new comprehensive assessment system set to the standards and curriculum frameworks; much tougher standards for getting to be a teacher, intended to greatly ratchet up teachers’ command of the subjects they intended to teach, and a system to disclose student performance, school by school, with results reported by student subgroups, so that poor performance by these subgroups would not be hidden in the average scores for the school.  Right after the Act was passed, David Driscoll, until then the Deputy Commissioner of Education, was made Commissioner and remained in that position for ten years.  Under Driscoll’s leadership, Massachusetts, despite a great deal of pressure to do so, never backed off of its decision to set and to maintain internationally benchmarked standards, for both student performance and teacher certification.  After Driscoll left, the new governor created a new position in state government, to provide leadership to all the parts of government concerned primarily with education at all levels.  He filled that position with Paul Reville.  Between them, Driscoll and Reville provided the same kind of strength and continuity of leadership that Governor Hunt provided in North Carolina, and for a very similar agenda, an agenda that is in many respects consistent with our own analysis of the strategies used by the top performing nations to get to the top of the league tables.

You may or may not agree with my analysis of the kerfuffle over the release of the TIMSS and PIRLS results.  You may or may not agree with my explanation for the rise of Florida, Massachusetts and North Carolina on the PIRLS and TIMSS league tables.  But, in any case, I urge you to look at the contending papers, and come to your own conclusions.  All of us could benefit greatly from a long, loud, contentious effort to define what it means to be educated, and to better understand why some nations are more successful than others at educating the vast majority of their young people to whatever standard they choose.

* This is a second version of the original post for this month.  We misstated the conclusions presented by Martin Carnoy and Richard Rothstein in the report described in this newsletter.  We believe we have stated those conclusions accurately here, and apologize to the authors for the error.

For the record, however, the version of the Carnoy-Rothstein conclusions that we based our first statement on was itself based on the version of the report that Carnoy and Rothstein originally released, which claimed that their re-estimate of United States PISA scores would result in the United States ranking 4th among OECD countries in reading, and 10th in math, a major revision upwards of the US PISA rankings.  In their most recent version of their report, released last week, Rothstein and Carnoy revised these numbers downward somewhat to 6th in reading and 13th in math, but, as the post points out, even these numbers are contested.