By now you’ll all have seen Robert Yeh’s paper showing that there is no evidence that parachutes save the lives of people jumping from a plane. This raises a question for all social scientists: what counts as evidence? I suspect we tend to overweight some kinds of evidence, and underweight others.
Yeh’s paper is a lovely illustration of a general problem with randomized control trials – that they tell us how a treatment worked under particular circumstances, but are silent about its effects in other circumstances. They can lack external validity Yeh shows that parachutes are useless for someone jumping from a plane when it is on the ground. But this tells us nothing about their value when the plane is in the air – which is an important omission.
We should place this problem with RCTs alongside two other Big Facts in the social sciences. One is the replicability crisis. This does not afflict only psychology but other disciplines too: Campbell Harvey has said that most findings in financial economics are “likely false”. The other (related) is the fetishization of statistical significance despite the fact that, as Deirdre McCloskey has said (pdf), it “has little to do with a defensible notion of scientific inference, error analysis, or rational decision making” and “is neither necessary nor sufficient for proving discovery of a scientific or commercially relevant result.”
If we take all this together, it suggests that a lot of conventional evidence isn’t as compelling as it seems. Which suggests that maybe the converse is true. Perhaps there are some types of evidence we under-appreciate. For example:
- Non-quantitative observation. Some economists are sniffy about “casual empiricism”. But there needn’t be anything casual about it. In sociology and anthropology ethnography is an acceptable, laudable, tradition. And it used to be in economics. Adam Smith’s work was based upon decades of observation of real people. And perhaps Coase’s The Nature of the Firm (pdf) was founded on ethnographic evidence.
- Personal experience. Just before Christmas Betsey Stevenson and Robin Hanson argued about sexual harassment. A big part of the issue was what counts as evidence: Betsey emphasised the personal experience of victims, whilst Robin emphasized thought experiments and surveys. For me, Betsey has a point. One thing I find irritating about some “as if” modelling is that it ignores personal experience: Greg Clark’s story (pdf) of the rise of the factory system is one example, and the idea of unemployment as arising from a taste for leisure is another.
- Memory. The Easterlin paradox – that economic growth doesn’t make rich societies happier - has been well challenged in statistical terms. But what lends it credence for me is not so much the statistical evidence as memory. UK GDP per head is more than 50% higher than it was 30 years ago. But are we really happier than we were then? My memory suggests not. Of course, this doesn’t prove there’s zero link between GDP and happiness, but for me it’s evidence of a weak one.
- Music. Bruce Springsteen’s album Born in the USA – which is now a third of a century old – told us that deindustrialization was linked to threats to male identity and a yearning for the past. He was years ahead of political scientists in highlighting the issues that contributed to Trump’s election. Similarly, the rise of punk in the 70s betokened harsher economic and social times; the recent popularity of drill music alerts us to the fact that inner city youth have violent lives; and the ubiquity of homesickness in old folk songs suggest urbanization was accompanied by a sense of loss. And there are countless individual songs that tell us something. To take just one example, the Pistol Annies’ Got My Name Changed Back is gloriously vivid corroboration of Andrew Clark’s finding that we adapt to traumatic events. The Boss was exaggerating when he sang “We learned more from a three-minute record, baby, than we ever learned in school”, but he had a point.
- Literature. Eugenio Proto and colleagues have used machine reading of thousands of books to measure historic happiness. But we can of course use them less quantitatively. Surely Jane Eyre or Tess of the D’Urbervilles tell us something about the lives of real women in the 19th century.
You might object that these informal methods don’t generate proof. True. But do we need it? In finance, if you wait for proof that a strategy works before adopting it, you will buy at the top of the market. It’s entirely reasonable to act on probabilities. For example, if there’s a small chance of a disaster than can be cheaply averted, it’s wise to do so without waiting for more evidence.
In making this Feyerabendian argument for empirical diversity I am not discounting conventional scientific methods. Very often, these are vital. For example, I find the most convincing evidence that markets are prone to irrational bubbles to come from laboratory experiments (pdf) which show them to be commonplace: “real-world” mispricings can be explained in other ways. We should ask: what would count as evidence here? The answer will vary from context to context. I fear this trivial thought is under-appreciated.