Discussion: Inferring when Association Implies Causation

Noah Haber
Standard heath research dogma dictates that the “correct” way for authors to deal with weak causal inference is to just call it association. Papers that say that coffee is “associated/linked/correlated” with cancer are acceptable for publication, even if they don’t give any useful inference about the actual impact of drinking coffee, as long as they don’t use the word “caused.” While it is extremely difficult or even impossible to estimate the causal effect of coffee on cancer, it is relatively easy to publish a paper about the association between the two. As others have noted, this creates a serious issue where a huge number of misleading studies are published, get distributed to the public, distract from good studies, and do real harm.

Association is a powerful research tool to answer the right questions with the right methods, but not for the kinds of questions and methods for which you need causal inference. Stranger yet, the culture of “don’t use the word cause” is so strong that there are even papers which find really strong evidence of causality, but stay on the “conservative” side and just say association.

In CLAIMS, reviewers found that 34% of authors in our sample used stronger technical causal language than was appropriate given the methods. Most academic authors follow the technical rules, if not their spirit. What about the remaining 66%? How many of those implied causality through means other than “technical” language? Can we reasonably infer how these studies might have mislead through sloppy methods, hints, nudges, and reasonable misinterpretation?

If technical language is an unreliable method of determining whether the study implied causality, how can we infer those implications? I have a few ideas below for discussion, but would LOVE to hear your thoughts on where I get this wrong, better explanations, general disagreements, etc.

Decision implications

Go straight to the discussion section and read what the authors say people should do or change based on their results. In almost all cases that the authors recommending changing main exposure to change the outcome of interest, they implied causality. A study about the association between coffee and cancer that concludes that you should drink more or coffee to avoid cancer, or even if they simply say coffee is “safe” to drink, relies on estimating the causal effect of coffee on cancer. If their methods weren’t up for the task, the study is misleading.

In general, if the study was truly useful for association only, changing the exposure of interest will usually not be the main action implication. If the question of interest is disparities in outcomes between groups (such as race), the authors would, in general, not suggest that people switch groups. Similarly, finding associations to better target interventions don’t imply that we need to change the exposure, but rather that the exposure is a useful metric for identifying targets of interventions.

This can get tricky, particularly when the exposure of interest is a proxy for changing something else that is harder to measure, such as laws as a proxy for the causal impact the political and cultural circumstances that brings about a change in the law, plus the impact of the law itself. As usual, there is no simple rule or formula to follow.

Question of interest

In the great words of Randall Munroe of XKCD (hidden in the mouseover): “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there.'”

Some associations inherently imply causality. Virtually every study in which individual consumption of something is the exposure and some health effect is the outcome of interest implies causality. One way in which the association might inherently imply causation is simply the lack of useful alternative interpretations. For example, there is little plausible reason why merely studying the association between coffee in cancer is useful for anything except when you have identified causal effects of coffee on cancer.

I find it helpful to try think about plausible ways that the association between X and Y can be useful, firstly in my head and secondly from those the authors describe. For each item, I strike out ones that require causality to be inferred. If I have no items remaining and/or if the remaining items seem implausible, that may hint that the question of interest has inherent causal implications. Even then, there are two caveats: 1) my inability to come up with a good non-causal use does not mean one does not exist, and 2) even if one does exist, the association could still inherently imply causation.

Look for language in the grey zone

The list of words that are taboo because they mean causality is short, consisting mainly of “cause” and “impact.” The list of words in the grey zone is much longer, and not always obvious. My personal favorite is the word “effect.” For some reason, the phrase “the effect of X on Y” is more often considered technically equivalent to “the association between X and Y” than “the causal impact of changing X on Y.” While “effect” is sometimes used purely as shorthand, I find that it is more often used when authors want to imply causality but can’t say it. Curiously, “confounding”/”confounders” is not on the causal language taboo list, even though it implies causation by definition.

Statistical methods

Some statistical methods and data scenarios strongly imply causality. In many cases, this is simply because the methods eliminate all alternative interpretations, such as when authors control for dozens of “confounding” covariates. Some methods are developed specifically to estimate causal effects, and have limited application outside of causal inference.

This one is unfortunately in the statistical/causal inference experts only zone, since it requires a fairly deep understanding of what the statistics actually do and assume to tease out implications of causality.

Intent vs. implication

It is important to understand that the study authors making these implications aren’t generally bad people, and may genuinely not have intended to imply causality when inappropriate. In some cases, they may simply not mean to make causal implications. In other cases, they may have been led to certain uses of language by reviewers, editors, co-authors, or media writers. Alternatively, the most misleading articles are simply the ones that will be most likely to be published and written about, and therefore most likely to be seen.

However, as always, some of the blame and responsibility lies with us, the researchers. We should be careful generating studies where causation is implied, regardless of what the technical dogma tells us is right and wrong. We should learn to be more honest about what we are studying, embrace the limitations of science and statistics, and fight to create systems that allow us to do so.

Thoughts and comments welcome