Grading on the Curve Is Always a Bad Idea

Via Teresa Nielsen Hayden, a smart overview and response to an upcoming recent Vanity Fair article about Microsoft’s “lost decade”. The main culprit, according to the article, was apparently the rigid use of “stack ranking” in assessing the work of teams and groups, requiring a manager to rank the work of individuals within a group according to fixed percentages: a small number “excellent”, a large number “adequate”, a small number “poor” (with the bottom-ranking members usually terminated).

It looks as if the article (and some of the excellent supporting links that Nielsen Hayden assembles) will document that this especially rigid use of stack ranking drove off some of Microsoft’s best employees, and that managers dreaded it as much as the people they were evaluating. Nielsen Hayden rightly points out that if you spend a lot of time and effort assembling the best team possible, the last thing you want is to be saddled with a mechanical, rigid evaluation system that will force you to shuck off part of that team and to passively insult the bulk of the remaining members by ranking them as “average”.

Like Nielsen Hayden, the comparison to other grading and evaluation schemes jumps out at me immediately. I’ve always viscerally disliked the concept of “grading to the curve”, found it morally dubious and distasteful. This is just empirical evidence that this approach doesn’t even achieve its own goals of cultivating and refining excellence. What I find especially wrong-headed is forcing students or employees to conform to a pre-determined distribution of performance when the group of students or employees who are being evaluated are the outcome of a prior process of intense selectivity. If you’ve spent considerable resources trying to hire the best employees or admit the best students, forcing some of them to fail simply because you believe in a “normal distribution” there should be failures is a confession that all your efforts at prior selectivity are a waste of time. It’s using a technical device as a cover for a mule-headed moral belief that even under the best circumstances with the best people, someone should fail, and most people should be simply adequate.

That’s the kind of hidden ideology that feeds into the abuse of meritocratic privilege that Christopher Hayes has dissected recently. This approach allows an entrenched elite to believe that it is the entitled outcome of relentless sifting and to ignore the systematic outcomes of such sifting to overall institutions, communities or organizations–or the overall society. If you keep having to shift the goals in order to ensure that real excellence is always achievable by only a small percentage, you completely lose sight of what the general outcome or goal of a class, a discipline, a workplace team actually is. The curve becomes the goal.

Grading is always unfair in some respect. But a curve is not a prophylactic against being arbitrary or having to make sensitive evaluations on an individual basis. It just transfers the weight of arbitrary judgment to a mechanism, outsources evaluation to a positive feedback loop that will inevitably push the entire system to a point of breakdown.

This entry was posted in Academia. Bookmark the permalink.

10 Responses to Grading on the Curve Is Always a Bad Idea

  1. Tiercelet says:

    Lurker here to point out that Theresa Nielsen Hayden’s last name is Nielsen Hayden. See .

  2. Timothy Burke says:

    Fixed! Thanks.

  3. Cait says:

    Note that grading on a curve can obscure mediocrity as well as excellence. I had law school professors who taught their classes very little, but because they graded on a curve, they had to give out the same percentage of A’s and B’s as good professors. Students had no incentive to complain, and there were no signs to the administration or to employers that students were learning nothing from the class.

  4. Timothy Burke says:

    Right. This point is coming up some on Twitter too, that many professors use a curve to raise up the lower scores rather than force conformity to a bell curve distribution. It’s not always punitive. But in that case, I’d still argue that the curve itself is a distraction from the deeper issue: what’s the goal here? What is a class trying to accomplish? Why are the outcomes being sought ones that students can’t seem to accomplish at as high a rate as the professor might want? (Raising up the lower end of the scores to a higher grade implies that the professor wants a favorable outcome, or wants students to meet a certain standard.) If the standard is routinely not met, something’s wrong with the standard or something’s wrong with the students. In either case, that ought to be faced squarely. If it’s the standard, change the standard. If it’s the students, a different pedagogy is required or a different selectivity is needed. The curve in that case is a sort of stalling.

    If it’s just a one-time thing, then the answer is easier: the test or exercise was too hard, too easy, something that the professor goofed on. I’ve done that. I don’t use a formal curve but if I think my assignment was implausibly difficult or convoluted, I nudge everyone up one-by-one unless I think the problem with a test or assignment is really an individual student’s fault.

  5. Nadav says:

    Just wanted to give a shout-out to the author of the Vanity Fair piece, Kurt Eichenwald — former Swatty and founding member of the a capella group, 16 Feet. Everything else he’s done with his life has paled by comparison.

  6. Tiercelet says:

    As a follow-up on the notion of using a curve to raise scores, speaking hypothetically and without any methodical research on the subject, I could nevertheless see this being appropriate in some types of courses where it’s routinely used — such as hard sciences. The issue there isn’t that the standard is too high, but rather that upper-end ability is too clustered: the sense is that you will lose resolution among your higher-end students if you make the test easy enough that most students are getting passing grades. Again without a proper statistical analysis, my impression is that there’s also a bimodal distribution of skills/preparedness in these classes: many people have strong background, and many lack adequate preparation for even an intro-level class (which sucks and more should be done to remediate rather than weed out, but that’s a whole different fight); and there’s a big middle where there can be movement and learning. So you write the test to be very discriminating among the top students, and then the curve pushes the scores back up into a more comfortable range and keeps the middles from failing while still weeding out the under-prepared.

    You see (or saw; I haven’t seen the math since the new scoring scale) the opposite thing happen in the Quantitative section of the GRE: the test could not be written to be very discriminating, so it’s basically high-school math, but a lot of people have strong background for this. As a result, a perfect score would put you in something like the 93rd percentile. There’s no other option if you don’t want to completely smash the midrange who are already rusty on algebra and geometry — basically you have to pick which part of the population the test will have good resolution for.

  7. For a test like the GRE, the scale and purpose of the test causes that bunching. But in a class, you’re not trying to evaluate the aptitude or skills of students–you’re trying to teach them the content and skills. In theory, you should be perfectly happy if 100% of the students hit your targets. I think you have other ways of knowing who the very top students are, and over time, a curriculum tends to branch out and offer some paths that differentiate students from one another. Curving in this case just seems to me to be an after-the-fact adaptation of sorts rather than an active, anticipatory pedagogical plan.

  8. Timothy Burke: Thank you — and yes, that’s it exactly. Grading on a curve doesn’t assess performance or measure success. Like “zero tolerance” policies, it’s an abnegation of judgment.

    If I imagine myself in charge of a department or work group in a firm that uses it for employee evaluations, the problem instantly becomes clear: this system says I’m no better at hiring, training, and managing my people, or at getting work done, than any other supervisor in the company. Since I’m actually pretty good at that, and I care about it passionately, I find the whole premise of stack ranking offensive.

    Good work is done by people whose hearts are in it. A management system that pretends that doesn’t matter is an abomination.

    Even if everyone in your department understands how arbitrary the rankings are, there’s no way you can explain why you gave someone a “no better than they should be” score. It’s their life. You can’t tell them not to take it personally. And even if you were to make the rank assignments completely random, for instance by drawing numbers out of a hat, you’re going to have employees who’ve done an extraordinary job, or have shouldered responsibilities that kept them working longer hours than anyone else, or who loyally stayed on when another company offered them a promotion and raise, or who aspire to eventually move on to some variety of work they’ll never get a chance to do if their evaluation scores aren’t high enough. They may say they don’t need a top ranking — but they do.

    It would be interesting to discuss stack ranking and fake meritocracy sometime. You know those motivational posters that only pointy-haired bosses believe in? If you think of them as a single unified divination system, you can scry someinteresting fortunes in them.

  9. Carl says:

    As usual I’m getting to this too late to do any good, but just wanted to leave a trace of appreciation for this great discussion and add that these were the issues that finally knocked me out of reading for the AP World History exams. I got far enough into it to see how the sausage was being made and it was pretty gross to me, for all the reasons mentioned here. The tests themselves are pretty impressive and ask for the right things, but basically what their curve does is guarantee a certain rate of pre-approval for college credit, (relatively) regardless of actual attainment of college levels of skill. That’s ugly when kids who aren’t college-ready get curved in, and it’s perhaps even more ugly when kids who are college-ready get curved out.

Comments are closed.