Big Pharma, Big Wonkery

In the current issue, Discover has an interesting article on pharmaceutical testing. On one level, it’s consistent with many other critiques of the pharmaceutical industry and of academic and medical researchers who do its bidding.

However, the article also raises a question about how literate even medical experts and researchers are, let alone the press and public, when it comes to reading or judging published research. The article details a number of the ways that pharmaceutical testing and research are often much less than they appear to be: small study populations that conceal the potential magnitude and frequency of dangerous or fatal side effects, benefits that are marginal or at the edge of statistical significance which are implied to be far greater in size, reviews of existing research which are cursory or cherry-picked for supporting data, and so on.

What I find interesting is that I think a significant proportion of all quantitatively-based social science research also has these characteristics. These kinds of practices are far more consequential and dangerous if they involve medical issues, but more than a few social scientists know how to use sleight-of-hand to get media attention, argue urgently for a new policy or piece of legislation, or redirect institutional efforts. More importantly, many researchers do so without any intent to defraud or consciously manipulate their results. It’s simply the standard for professional work, that any pattern or finding which rises above statistical significance crosses quickly into being urgently important. This is precisely what Deirdre McCloskey called the “secret sin” of economics, but it afflicts more than just economics as a discipline: social psychology, political science, sociology, sociolinguistics, population science, any quantitative work that deals with human society and human individuals tends to have the same problem.

My own understanding of this pattern came through reading a lot of the work that has been done on the effects of mass media, most particularly on the effects of violent images and representations on children. Many researchers active in relevant fields of study will tell you that the negative effects of those images have been demonstrated beyond a shadow of a doubt, that there is overwhelming scientific consensus. Look again and you’ll find a far more complicated picture. Antiquated studies with transparently bad research design that were conducted fifty years ago but are still cited in literature reviews as supporting evidence. (Thanks to several readers here who recommended I look at Richard Hamilton’s The Social Misconstruction of Reality on this point in particular: it’s a good description of how this comes to pass.) Literature reviews that blithely cherry-pick and ignore any studies which contradict their stated premise. Laboratory studies and controlled experiments that are simply assumed to predict and describe large-scale social patterns without any discussion of whether they actually do or not. Correlations confused happily with causations. And most importantly, teeny-tiny effect sizes magnified to monumental proportions.

Again, this is mostly not done with conscious intent. It’s how professionals work, it’s how careers are advanced, it’s how scholars avoid slumping into a grey morass where all causations, effects and patterns are indistinguishable from one another, and nothing can be said about anything. Quantitative or qualitative, whatever the methodology, all scholarly work on human societies has to reduce and manage tremendous complexity. Any contemplated action or intervention into human problems requires simplifications.

Whether we’re talking pharmaceutical studies or social policy, however, we need a professional process which pushes back on this tendency. Peer review in its conventional form simply isn’t good enough, for a variety of reasons. Instead what we need is a circuit-breaker of some kind between research and action, representation and intervention. Statistical significance as adequate for reporting a research finding, but some other test of significance entirely for when you want to counsel action, recommend policy, implement concrete changes or disseminate new treatments. At the same time, we need to stop making the delivery of policy or treatment or intervention the gold standard for “research which matters” within academic institutions, and to enshrine a new respect for negative or neutral results.

This entry was posted in Academia. Bookmark the permalink.

11 Responses to Big Pharma, Big Wonkery

  1. isorkin1 says:

    I’m sympathetic to the claim that there is a lot of bad statistics out there and people and the media (often) make more out of results than they should.

    But I also find the McCloskey argument about the use of statistical rather than economic significance wildly at odds with my own experience: at seminars with empirical papers, people always ask about practical importance (economic significance) of the result, and economics papers now often include a back-of-the-envelope calculation as to what the practical importance of the finding actually is. In papers where people don’t provide convincing examples of the practical importance of their findings, the post-seminar talk is always dismissive, on the view that the author hasn’t convinced the audience that they’ve detected some feature or fact about the world that matters.

    I worry that the talk of bad statistics starts to sound like statistics is bad or can’t accomplish anything, which is as patently silly as dismissing archival research as useless or misleading based on the many badly done archive-based monographs.

  2. mencius says:

    Professor Burke,

    Obviously, I can find nothing to criticize in this post. So I’ll try to extend it a little.

    As I imagine you know, there is one piece of pop culture whose depiction of the reality of modern government is incontrovertible: the old British TV show Yes, Minister. (If you don’t know the show, there is no need to watch it – the bound scripts are available, highly readable, and do not distract the reader with awful ’60s-BBC production values.)

    One of the major sources for Yes, Minister was the diary of a man named Richard Crossman, who was Minister for Housing during a period which was not, in retrospect, the golden age of residential architecture. That was the highest Crossman ever got, but he was no two-bit time-server – he was perhaps one of the last two real intellectuals in British politics. (No points for guessing the other.) In the late ’40s, for example, Crossman edited the New Fabian Essays, which at least purported to be a new manifesto for the postwar age (and makes pretty funny reading in 2008).

    Crossman describes the same practice of policy formation from below that my mother, for example, saw when she worked at DoE’s renewables policy shop in the Clinton era. Of course this is exactly the practice you describe above.

    At the time she was very distressed because she saw charlatans using it to extract billions of dollars from the Federal Government. In retrospect, of course, this was a flesh wound. All the same crooks are still at it, and they’ve worked up to trillions. (The name “Steve McIntyre” probably means nothing to you. But perhaps it should.)

    What’s even more distressing is that, in my opinion, this problem is not cosmetic. It is a fundamental engineering flaw of the entire progressive design for government, going all the way back to the Liberal Republicans or “Mugwumps,” the Pendleton Act, etc. Basically, people like the Adams brothers (Henry and C.F., whom I absolutely revere) decided that democracy, in the old traditional sense of a system in which the people elect leaders who manage the country, was a disaster. At least, it had produced the bombastic stupidity and corruption of the so-called Gilded Age.

    Therefore, they decided, government policies should be formulated and implemented not by politicians or their stooges and patsies, but by impartial experts. While it worked quite well for a while, it was just another case of the same old perpetual-motion machine: setting super-watchmen to watch the watchmen. The machine spins quite nicely for a while, but in the end you have exchanged corrupt watchmen for corrupt super-watchmen. Not an improvement. (And nor is returning to the corrupt watchmen a terribly tempting solution.)

    The specific result in government today is the ruthless domination of process. For anyone with any experience in the private sector, the overwhelming smell of government is the smell of sclerotic, mindless procedure. This is not a happy, flowery summer smell.

    But it remains, because it allows policy to be made by people who take no personal responsibility for the outcome of said policy. This carrion prize is too tasty to last long untasted. Who lost China? Certainly not Owen Lattimore. Why, he never even worked for the State Department. And so on.

    In the corporate world, major policy choices are routinely made by individuals who have what the military calls “mission orders” – they are responsible for results, but free as to tactics and (even more important) personnel. This executive freedom was a major characteristic of the most successful colonial regimes – the spectacular results of Clive, Hastings, etc, who could do so much simply because they were completely out of touch. (In the end, of course, it was the telegraph that killed the Raj.)

    The Danish political scientist Bent Flyvbjerg, retranslating a word from Aristotle, calls the quality by which a responsible individual makes an accurate intuitive decision “phronesis.” I think a more colloquial translation is “wisdom.” (This is especially understandable, of course, for the D&D generation – many of us have still not figured out the difference between intelligence and wisdom.)

    Phronesis can only be learned by making a series of progressively larger decisions for which the decider is personally responsible. Obviously, it is entirely absent from a process of decision-making in which the decision is controlled by whoever massages the fudged numbers in a pseudo-study.

    Unfortunately, phronetic government is Toryism, pure and simple. There is no sense in which a committee can exercise phronesis. A phronetic decision is a personal choice made by an individual who is responsible for results and free as to techniques.

    If you believe phronesis can produce better decisions than “social science,” you agree with the fundamental administrative principle of the Prussian General Staff, and you disagree with the fundamental administrative principle of both the Progressive Era and the progressive movement. In short: a case of Conquest’s first law. I’m not sure you would go that far, but I suspect you’d find Flyvbjerg’s work interesting, if you haven’t already looked into it.

  3. mencius says:

    isorkin1,

    If my experience in a very different field – computer science – is any guide, I believe you’ll find that while economists must present a “back-of-the-envelope calculation” of the practical value of their work, when they are challenged and evaluated they are generally challenged and evaluated on the details of their model, or theory, or whatever, and very seldom on the realism of the aforementioned calculation.

    Basically, as we used to put it in CS, the first paragraph of your paper needs to explain that if you can solve the atomic distributed multicast problem, you can cure cancer. (Nowadays, it is probably global warming.) Therefore, to solve the atomic distributed multicast problem: X. (Insert body of paper here.) “And in conclusion, our algorithm represents a significant step forward in the war on cancer.” Okay, I exaggerate. Slightly.

  4. peter55 says:

    There have been several critiques of statistical significance testing in the direction you are heading, and this is something known to most every statistician. (I speak as a recovering statistician.) The entire debate on the precautionary principle is, IMHO, about this very issue. One important fact to bear in mind (which you may already realize) is that the standard significance levels (5%) were chosen precisely because of an evaluation of the relative costs and benefits of acting under the two types of errors — but in a specific decision context, that of deciding the efficacy of new crop varieties in agricultural research. It is by no means obvious (and indeed, is clearly false) that the same levels of significance are appropriate in other decision contexts.

  5. isorkin1 says:

    Mencius,

    I can’t comment on how academic economists are evaluated. But even granting the premise that people are evaluated based on technique rather than importance, I still think the observation that people’s priors about the world are shifted by practical importance rather than statistical significance is noteworthy in this discussion. Since ultimately what this discussion is worried about is that people are using the wrong metric, statistical significance, to decide how the world works and what matters in the world, and I would contend that that is false in my experience.

  6. isorkin1 says:

    (And that my experience may have some external validity to other situations where people decide how the world works).

  7. mencius says:

    peter55, that crop-varieties thing blows my mind. Do you have a reference for it?

    isorkin1, the problem is that there is no obvious substitute for statistical significance. At least not as far as I’m aware, though I am not very aware.

    “Priors” demonstrates the problem – it presumes Bayesian thinking, which may be all very well if you’re a computer, but is not phronetic.

    Consider the whole motivation for this enterprise. Governments love statistics because they provide a way to camouflage the fact that every decision is, in the end, the personal choice of some person or persons.

    The ideal of objective, mechanically computed “public policy” – which these days you will see derived not only from the false analogy to physics, but also via “legal realism,” “human rights,” and other dubious inventions of the great century just ended – has been the holy grail of bureaucrats since Pontius Pilate was a little boy.

    We certainly see quantitative management and planning techniques in private enterprise, where the problem of sweeping responsibility under the carpet does not (or at least should not) exist. (It does exist, but typically at very minor levels.) But we see far less reliance on them, and far more on the principle of executive authority.

    Suffice it to say that if responsible, effective government is ever restored to the Earth, the process will produce a large number of “recovering statisticians.” In fact, it may even be necessary to set up treatment centers. The good news, though, is that if you have the math chops to be a statistician, you have the math chops to be a petroleum geologist.

  8. peter55 says:

    mencius —

    My source is:

    Lancelot Hogben [1957]: “Statistical Theory” (W. W. Norton).

    Note that the now-standard formal procedure for testing statistical hypotheses was first presented by Neyman and Pearson in:

    J. Neyman and E. S. Pearson [1928]: On the use and interpretation of certain test criteria for purposes of statistical inference, Part I. Biometrika, vol, 20A: pp. 175-240.

    The most famous statistician of the day was Ronald Fisher, who worked at Rothamsted Agricultural Research Station, in Harpenden, UK, and helped make it the statistical powerhouse it remains to this day.

  9. peter55 says:

    mencius —

    Since you frequently comment on the African-related posts of Timothy, this nugget of information may interest you: one of Rothamsted’s former staff members is the computer-scientist-turned-novelist J. M. Coetzee. Despite his Nobel Prize, they don’t seem to claim him as an alumni on their web-site, though.

  10. Timothy Burke says:

    Isaac, my feeling (and McCloskey’s) is that what’s sometimes missing when scholars move to argue that a finding should motivate policy or action is a philosophical argument about why the finding matters, that the mere fact that the finding has significance is used to make the argument that it is therefore something which one should act upon. So, for example, I actually agree that there are a few more recent studies of the effects of violent representations in visual media on children that confirm there is a statistically significant predisposition to violent action immediately following that exposure. But: the effect is small, it appears to be temporary, no one has any idea how to quantify its effects on everyday behavior and practice, and the predictions that one might make from the effect at the large-scale don’t hold up. (E.g., the same kinds of researchers argue that media to which children are exposed have become vastly more violent in the last fifty years; if exposure = violent actions, there should be a similar increase in reports of violence among children. There aren’t, and that’s *even* with an observer bias to report more such incidents, given the degree to which schools try to monitor violent and antisocial behavior to a far greater degree than in 1950.)

    But none of that is what’s really missing. What’s really missing is a philosophical argument about why acting to remove representations of violent from media to which children are exposed produces a social good which outweighs the social harm of removing those representations. First, because most such researchers don’t really understand what they mean by “violence” in anything but the most blandly social-scientistic way. What is the social harm if .5% more boys aged 6-10 push each other more often, or get into fights more often? Put that way, it’s hard to say. It takes stories, narratives, anecdotes for us to even begin to think about what the significance of that social change might be–and you can’t guarantee that any given story (good or bad) is the one that will be changed by this shift in social policy.

    On the other hand, there’s a real harm involved in removing that content from media to which children are exposed. First, an economic harm to cultural producers, because they have to stop making certain kinds of media that sell well now, and because making this change also requires hiring new staff to monitor content and circulation of product. (e.g., larger Standards & Practices divisions). Second, an imaginative and cultural harm to children themselves–because violence is an important part of drama, an important goad to imagination, an important part of the life of children. This is assuming that the policy doesn’t involve active government censorship, and that’s not a safe assumption, as many researchers and advocates in this area have called for active governmental controls.

    The point is that most of the people doing the work on the effects of violence don’t step anywhere into this argument. They publish a finding, and act as if the finding itself warrants action, simply because it’s a finding.

  11. mencius says:

    peter55, thanks.

    What??s really missing is a philosophical argument about why acting to remove representations of violent from media to which children are exposed produces a social good which outweighs the social harm of removing those representations.

    This is the phronesis problem in a nutshell. As so often in government (or even private management), a practical decision has to be taken on grounds that are philosophical, intuitive, etc. It is very easy to construct a broad, inclusive process (ie: a committee) for making important decisions on the basis of science. It is just about impossible to construct a broad, inclusive process for making important decisions on the basis of philosophy. So all you have is a hammer and everything looks like a nail.

Comments are closed.