On Standardized Tests

We believe it is important to directly address those critics who attack the use of standardized, comparable assessments (such as the TCAP, CMAS, ACT, and SAT) in K12 education. A good test of the strength of an argument is whether it still makes sense in another context. How many investors are demanding that we stop reporting corporate performance? How many citizens would like to see municipalities stop reporting crime and fire statistics? Or progress on rehabilitating Superfund sites? Or hospital safety statistics? How many parents and coaches are demanding that we reduce the number of games in sports, to allow our young athletes more time to be spent at practices? How many parents, theater, art, and music teachers are demanding that we reduce the number of performances, shows, and recitals, to allow our young artists more time to be spent developing their craft? After all, aren't games in sports, shows, recitals, and performances all just different types of assessments, that enable players, parents, and coaches to see how well a body of knowledge and skills has been mastered, and highlight those areas where more development is needed? You get the point.

At a time when a number of parties are increasingly attacking the use of assessments like the TCAP and ACT, it is important that we step back and understand what is really at stake in these arguments.

First, in Colorado the TCAP and CMAS assesses the extent to which students in grades 3 through 10 have met the state's academic standards in reading, writing, math, and science. A student who consistently meets (i.e., scores proficient or advanced) these standards should also meet the college and career readiness standards on the national ACT test that every Colorado student takes in Grade 11. This report from CDE makes it clear that both TCAP and ACT scores can be used to predict the need for a student to pay for non-credit remedial courses in college. This report from the ACT organization makes the same point, and this one provides very sobering data on how many students who fall behind in elementary and middle school actually catch up. In sum, it is clear that both the TCAP and the ACT assessments provide information that has predictive value. Moreover, isn't an attack on the the standardized, comparable assessment of student performance against state or national standards is an attack on those standards themselves? After all, if we are not going to assess students' performance against them, why do we need standards at all?

It is also very important to be aware of what state proficiency tests are really telling us. Back in 2007, the Fordham Institute issued this excellent report on what they termed the "proficiency illusion." By comparing the percent of students found to be proficient on state achievement tests to the percent proficient on the National Assessment of Educational Progress (NAEP), Fordham researchers found that many states were giving parents and policymakers an overly rosy view of their students' capabilities. That may have helped them politically, but it certainly did the kids no favors, as too many of them have no doubt learned the hard way. This discrepancy between state standards was a major impetus for aligning them across states -- something that still makes great sense to me, even though the particular project that did this -- known as the Common Core -- has since become an anathema for some people. Unfortunately, this 2015 report shows that the proficiency illusion is still with us. Here is a good commentary on its findings. And here is a short summary that shows the gap between what Colorado defines as "proficiency" and the much more stringent proficiency standard used by the NAEP.

More broadly, here is an excellent study that maps all state proficiency standards onto the 2013 NAEP results, and here is another study that maps state standards onto international achievement benchmarks.

Fordham's Mike Petrilli also has more to say about this continuing "honesty gap." And here is yet another commentary on how some states -- but not others -- have been strengthening their state standards in recent years. Unfortunately, the article also shows the extent to which the definition of "proficiency" varies widely across states.

And if anybody has any doubts about the enormous economic impact of the achievement gaps these assessments have uncovered -- and which we have yet to close -- this new report from RAND should dispel them once and for all.

Some will argue that the letter grades assigned by teachers are sufficient to assess student performance against these standards. However, both this report from the the College Board and this one from the ACT organization find just the opposite: because of grade inflation and varying grading standards across schools (e.g., a "B" at one school may be an "A" at another), the grades assigned by a teacher are an inadequate as a means of assessing a student's performance versus state standards, and incomparable across students, schools, districts, and states.

A different line of attack by testing critics is to claim that the ACT only tests for college readiness, and not every high school student goes to college. Our response to this attack is twofold. First, the ACT explicitly tests "college and career readiness." And as this article has noted, college readiness today is career readiness, given the changing nature of career demands. Second, for those who doubt that the ACT tests career readiness, we note that the correlation between ACT composite score results and scores on the military's ASVAP test is .77 -- further evidence that the ACT, does, in fact, measure career as well as college readiness.

Here is another excellent presentation on college and career readiness. And here is a frustrating research report on the failure of so many high school students to meet the standard of the ASVAB -- the military's entrance test.

Another claim that keeps coming up is that TCAP and ACT assessment results are invalid, because students don't take them seriously. With respect to the ACT, we struggle to believe that students who intend to go to college (which is well over 50% of them these days) don't take the ACT test seriously. More quantitatively, if students didn't take the ACT test seriously, then what would we expect to see in the ACT test results data? If students simply refused to fill in any bubbles, we would see a high percentage of tests with the minimum score. But we don't see that in the actual data. Alternatively, if students were simply randomly filling in bubbles, then we would see a distribution of scores centered at the midpoint (in a large sample). But again, that is now what we see; the average score on the Grade 11 ACT, which all students take, is above the midpoint.

And what about the TCAP? If students aren't taking that test seriously, but are taking the ACT seriously, then we would expect to see a big increase in proficiency rates from the Grade 10 TCAP to the Grade 11 ACT. But that's now what we see; rather, the proficiency rates are quite consistent between the two tests. Nor do we see statistical evidence of a large number of kids simply filling in random bubbles on the TCAP tests, so that, in the statewide sample, each possible TCAP score would have about the same number of students associated with it. If this was the case, then we would see statewide average grade-level TCAP scale scores that corresponded to a high percentage of students in either the partially proficient or proficient category. But that is now what we see; rather, we see a steady grade-to-grade decline of the percent of students who score at least proficiency on their TCAP tests. This outcome is not consistent with the assertion that kids don't take these tests seriously, and are either not answering the questions at all, or are simply randomly filling in bubbles.

We also have observed that there is a disturbing tendency in some school districts to use obscure rankings (rather than TCAP and ACT scores) to paint their achievement performance in a more favorable light (not that this is unknown in the corporate world either…).

For example, some districts seem to have developed a fascination with the U.S. News and World Report ranking of the "best high schools in America." All such rankings are inevitably fraught with problems. A critique of the USN&WR methodology can be found here. Needless to say, we don't think that a ranking of high schools based on a questionable methodology can offset more detailed achievement data that paints a much more disturbing picture (e.g., see this example from Jeffco, using eight years of overwhelming evidence that is based on detailed grade and subject matter achievement data for multiple student groups).

And here is an OpEd criticizing the Colorado School Grades and CDE School Performance Framework methodologies for measuring school, but not district, performance. And here is Peter Huidekoper’s excellent column explaining why the high percentage of Colorado students who must pay for non-credit remedial courses in collage suggests the we shouldn’t brag too much about rising high school graduation rates.

Finally, the attack on assessments also begs a larger question: how can any individual, team, or organization -- including K12 education professionals, schools, and districts-- improve their performance in the absence of consistent, comparable feedback over time? A cynic might observe that given the poor state of too many district's achievement track records, if you were a teacher, you might also be tempted to attack the whole idea of performance measurement...That said, as this excellent new article by the Brooking's Institution Russ Whitehurst makes clear, the current system of test-based accountability is in a state of flux, and a number of different outcomes are possible. It is a thought-provoking and very useful read.