I’ve recently had a dawning realization, one that I haven’t heard anyone else out there discuss. I was thinking about Maja Wilson’s terrific book, Rethinking Rubrics in Writing Assessment recently, and was struck again by the idea that the multiple choice test, as developed originally, was designed to serve a “gatekeeper” function.
The original IQ tests, designed by Edward Binet, were at first designed to help determine which children might need extra help. Later, the IQ test was adapted for the purpose of figuring out which soldiers were best suited for which tasks in the army. The ultimate purpose of some of these tests, according to Nigel Brush, was to “rank social classes and races according to their intellectual development.” In other words, there was an inherent element of racism involved. The testing was to “rank” students – not to help them.
In Maja Wilson’s book, she relates the history of how the College Board moved from written tests to more reliable standardized multiple choice tests, so that colleges could rank students to determine who was college material or not. The very concept of these tests was to decide who got in and who got left out. One variation of the multiple choice test, the Army Alpha Test, actually led to the National Origins Act of 1924, which created quotas designed to keep out the least intelligent immigrants – immigrants whose intelligence as a race had been determined by their test performance. Testing could be used to keep people out of colleges and universities, and out of the country. The tests weren’t devised initially to be something everybody could pass. They were devised to be so hard only a few people could pass them.
Now we are saying that everyone must pass them. This change is a stunning reversal. One might say that the we have moved from norm referenced tests designed to rank to criterion referenced tests designed to show mastery – but this is a label. The style of multiple choice question has changed very little, and is the same for both kinds of tests. What was once designed as a small gate to keep people out is now being used as a gate we must shove everyone through – but we have kept the gate just as small as it ever was. I’m seeing signs that we are starting to crush a few students in our attempt to get them through that narrow gate.
The other reversal I noted came as we looked at last year’s test scores at school during pre-planning week. We were looking at our students’ test rankings, which go from a Level 5 (the highest) to a Level 1 (the lowest) with a 3 being a “passing” score. At nearly every grade level, in both Math and Reading, the majority of students were in the “Level 3” range. Fewer students has scores of “2” or “4,” and fewest of all had a “1” or “2.”
I found myself muttering, “It’s a bell curve!”
In my era of being a student, and even in some of my college education classes, this “bell curve” was something to be desired. If everyone was passing a test, maybe it was a little too easy. If everyone was scoring high, it was much, much too easy. If everyone scored low, it was too hard. A bell curve meant that you had a “Goldilocks” assessment – not to easy or too difficult, but just right. It meant that you were doing your job. A bell curve was what everyone expected to see – and indeed is what one sees on a “norm referenced” test that distributes students according to their percentile rank. But now the bell curve is bad. Now everyone must pass, but without making the test any easier to accommodate this requirement. And yet, we are still seeing the bell curve on our criterion referenced test. Could it be that the bell curve just appears in nature, even when we try to avoid it?
Don’t get me wrong. I want all children to learn. I try to get all my students to learn. But I don’t think we are acknowledging the complete reversal of policy we are experiencing. The kinds of tests that were created as a very high benchmark to keep unworthy students out of colleges are now being set as the bar for all students reach. The bell curve, which used to be a sign of education validity, is now a sign that you are failing many of your students.
If our goals have changed, from exclusion to inclusion, from ranking students to reaching students, shouldn’t our tools for evaluating learning change as well? Hasn’t Science taught us that the way we look at something often determines what we see?
I’m merely raising questions. I don’t have the answers yet. But I think the questions are worth raising.