Deirdre McCloskey’s great little pamphlet The Secret Sins of Economics, which you can read in expanded form in her books The Cult of Statistical Significance and If You’re So Smart, argues that one of the two “secret sins” mentioned in the title is that economics treats statistically significant results as if they were significant in every sense of that term. I don’t agree with her that this is peculiar to economics, however. A lot of social science that rests on quantitative data has the same issue.
I agree that if a researcher can establish that a particular effect or phenomenon has a statistically significant influence or role in social behavior, it matters, that this is a finding worth reporting. The problem is, as McCloskey notes, that some findings matter more than other findings, and that the reason they matter more or less can only be worked out through something other than statistical argument, that the weight we should give such a finding has to come from some philosophical, moral, political or normative claim.
This perspective hit me most forcefully when I first started reading the literature on media effects in relationship to violence and children’s television (mostly from social psychology) and it still holds for most media effects studies, such as work on video games. I’ve written before about how astonishingly weak some of the earlier foundational research in the field is when you look at it closely, such as Albert Bandura’s “Bobo doll” experiments. But I’m perfectly willing to accept that some of the later research demonstrates that there’s some kind of hazy relationship between media consumption by children and an immediate propensity to act aggressively in a controlled setting, some quantifiable effect. It’s just that it strikes me as pretty minor compared to all the other influences on behavior that are in motion in the complexity of the real world, in the explanation of real actions.
In part, I think that it’s minor because if the effect size were significant enough as to warrant serious discussion of a major change in public policy towards the content of popular culture (a change which would have, to put it mildly, philosophical or moral implications for an open or free society), the effect would have very singular and visible consequences on a large scale. At a general level, it’s fair to summarize the history of visual media accessible to children in the United States as having the following trends between 1940 and 2009: vastly more media consumption, far more unmediated by parents or adults, and media with a wider variety and type of representations of violence. So if you’ve found in a laboratory setting that among children there’s an observable relationship between consumption of representations of violence and some action that can plausibly be labeled aggressive, and you stack that up against the basic trends in children’s media over 70 years, you have a prediction. Several generations of children should be successively more and more aggressive or violent. It doesn’t work out that way, however you want to talk about what constitutes measurable violence at the large scale of American society. There are trends in violent crime, trends in interpersonal violence, trends in social tolerance of aggression, but they don’t match at all well against the steadily increasing prevalence of violent images in media accessible to children. That’s just sticking with the United States. Get comparative on a bigger scale and it gets even messier.
So if you want to argue against children’s media in general, or against representations of violence, you need to stop saying that the science proves it, that it’s all in the numbers. The studies that found small laboratory effects represent a prediction about large-scale consequences that already went bust, didn’t pan out, not going to happen. You’re going to have to roll up your sleeves and get into arguments about morality and politics, freedom and constraint, rights and ethics.
Sometimes when I make this argument, the rejoinder I hear from people who are strongly invested in media effects research is that media effects only become important when you’re focusing on uneducated and impoverished populations, that they’re cancelled out by education or wealth or strong family structures or good parenting, etcetera. Fine. Then the point still holds: focusing on media effects is a red herring, when the discussion should really be about education or poverty or family life or parenting practices.
——
I’m thinking about this today in part because I heard this morning on NPR a news story about new figures on average life expectancy in the developed world and how they’re expected to continue to climb steadily in the future. The researcher who was interviewed noted that the basic driver for this increase in life expectancy is the relative wealth of those societies and how that produces beneficial health dividends at several key points: better nutrition and caloric supply in childhood, better clinical medicine, better geriatric care, better standard of living all around.
The interesting thing is that the increase in not just life expectancy but to some extent quality of health during life is something that most people in Western societies are aware has been occurring. Sometimes I find that people project recent trends towards longer life backwards too far into the past, concluding that most premodern people suddenly dropped dead at 30 because that was life expectancy back then, when the truth is that most people born dropped dead before they were a year old. If they made it past childhood, they usually lived a lifespan not that far off the mid-20th Century norm. Human beings stopped dying so much in infancy first, well before they started living in steadily greater numbers past the age of 65.
So we’re aware that in this pretty important sense, the population of the developed world is healthier at this moment than it has ever been in world history, and barring some sudden catastrophic intervention such as a devastating pandemic, this trend appears likely to continue. In the developing world, not so much. Life expectancy in Zimbabwe, for example, has been moving full-throttle in the opposite direction for the past decade. Life expectancy is also a big indicator of inequality within societies, as different populations have often quite varying life expectancy.
Anyway. I raise this in relationship to the effect size problem because on the whole, this overall trend in the health of human populations ought to be a meaningful check or consideration to certain other kinds of conversations about public health. By no means all or even most of them: if you’re a researcher studying the health consequences of heroin addiction or the impact of antibiotic-resistant staph or innumerable other trends in conditions that have pronounced effects on particular groups or individuals, it hardly matters that as a whole, human beings are healthier and more long-lived than they were a century ago. But if, for example, you’re seriously concerned by rising obesity and you think that trend is very serious, you need to think about which kind of serious you mean. If you mean, serious in the sense of “more obesity means higher health care costs”, that’s pretty valid. If you mean, “more obesity means lower quality of life”, that might be, but you just started chasing a different kind of argumentative white rabbit down a different kind of hole at that point. If you mean, in some form or another, “more and more of us are going to die earlier and earlier”, it sort of looks like you’re wrong. And yet, I think if you look at popular rhetoric in the U.S. about rising obesity levels, that’s pretty much what it sounds like. You might do what media effects researchers do and clarify to say that what you really mean is that poor people are going to see a larger hit on their life expectancy because of rising obesity in their demographic, but then, as with media effects research, why get hung up on something other than the underlying problem, poverty? Or maybe there is an effect on life expectancy from rising obesity across the whole of society that is substantially cancelled out by trends in much larger effects (quality of medical care, overall nutrition and calorie supply, basic levels of physical fitness, etc.) In which case, you’re getting too strongly worried, in terms that are too strongly voiced, about a phenomenon that isn’t as forcefully important as you think it is.
There are a lot of examples like this spread across social science and natural science. Probably one of the root issues here, which I’ve talked about a lot on the blog, is that there’s very little interest in or reward for research which makes modest claims about mildly significant results. But maybe this is an even more important kind of research than straightforwardly negative findings, at least as far as fueling public policy and public discussion. If we don’t know what kinds of effects are present but not hugely significant in our lived environment, we can’t really know where we need to take small incremental actions instead of sweeping and drastic action that’s festooned with alarm bells and cries of urgency.
Herr Burke, it may surprise you but I generally agree with what you are saying about the effects of media consumption on violence. But I still think that media consumption has other serious adverse effects on society because it takes time away from better uses. Children’s brains have only one chance to grow. Watching representations of violence wastes, and I do mean “wastes,” precious time.
You know, more research saying, “poor people are disproportionately vulnerable to bad outcomes” has the same kind of problem as research making modest, limited claims based on a small effect size: there’s not much interest in it, and the pay-off for a researcher who makes that conclusion is small. Duh, poverty makes you more vulnerable to any kind of negative situation. Shocker! Also: provides no usable intervention.
Social scientists, public health researchers, and social service organizations spend a lot of time looking for ways to help poor people, and it’s easy to get caught up in the search for a magic bullet. Is it learning self-control through play? Early childhood literacy? Grocery stores in the inner city? Vegetables in school lunch? Gang tattoo removal programs? Workforce education? Using former violent offenders to intervene and talk down bad situations? Stricter gun licensing? These are all real solutions that people bring up, and a lot of them would be useful, but none of them address the core problem: poverty sucks. And I think that’s because there’s no strong constituency for addressing poverty per se in American politics/society, so policy-makers and social service organizations can’t get too focused on poverty, per se, if they want to be successful (in getting money and power).
“I agree that if a researcher can establish that a particular effect or phenomenon has a statistically significant influence or role in social behavior, it matters, that this is a finding worth reporting.”
Err, no. All that “statistically significant” means is that if an inferential statistic is based on a random sample, the measure is unlikely to have been different from some a priori well-we’re-not-going-to-publish-this level (usually 0), based on a usually a priori probability — e.g., that the measure is non-zero and is statistically significant at a .05 level means that with the model assumptions the chances that the measure is really zero (or on the other side) is less than 5%.
There are three major problems with common social-science and behavioral-studies uses of statistical significance. One is what you describe, the confusion of statistical significance with meaning, and on that point, you should not back down (though you do back down in the quoted sentence, if not in the rest). Even if all of the model assumptions are correct, you can easily get to statistical significance simply by having a large enough sample. This is very common with census public-use microdata samples. Don’t back down: effect size, effect size, effect size!
A second major problem is when the study is not a random sample of a broader population, either because you have a complete-population calculation or when you have a convenience sample (e.g., psychologists using undergraduate students in psych classes). In that case, using statistical significance implies some hypothetical superpopulation of which the study population is some random sample. This is an interesting absurdity — if you’re looking at a study using the entire population of home foreclosures in 2008, for what hypothetical superpopulation would this be a random sample — all of the hypothetical years of home foreclosures when we had George W. Bush as president and a once-in-a-lifetime financial crisis?
The third major problem is subtler and has to do with some of the logical inconsistencies in standard inferential statistics and a priori cut-lines for what statistical significance is. Whoever said that we want to be 95% certain that the measure is different from 0? Whoever set 0 as the a priori “we don’t care anymore” measure? There are some alternatives that some philosophers suggest (Bayesian perspectives on statistics), but that’s well outside my area of expertise.
Good points. I just want to throw a bone out to social science as it is commonly done, to make room for the work that people have done and are still doing.