Opinion: Misleading coffee studies have hidden consequences

Noah Haber
The following is the opinion of the author, and does not necessarily reflect scientific findings or theory.

A few weeks ago, Alex and I broke down why a study on coffee and its related media articles were misleading. While it might seem obvious that bad studies are bad for our health, the real damage that studies like this do is much deeper, though harder to see and measure. To understand why, we need to start from the obvious.

Direct impact: weak and misleading medical science leads to bad medical decisions

In 2015, two studies came out claiming that they found that statins, drugs typically used for high blood pressure, cause deadly side effects. The papers were both severely misleading, later resulting in retractions to statements in both papers, but before that happened media ran with their claims. A 2016 study led by Anthony Matthews looked at statin prescription and refill rates in the UK, and found compelling evidence that these two studies and their media coverage caused huge disruptions in statin refills and prescriptions, resulting in over 200,000 people ceasing taking their statins for a few months. I have plenty of nits to pick with this study, but the my biggest is that they probably underestimated their estimates of the total impact.

It is remarkably difficult to find the causal effect of weak and misleading causal evidence, but occasionally we get some hints. The example of statins is a particularly dramatic story for which we have the rare privilege of having strong evidence, and you can imagine that this sort of thing happens all the time and goes unmeasured.

Which brings us back to the coffee study in question. You would be right in thinking that coffee studies probably little to no direct harm or help. It’s just coffee. However, you would be wrong to think the problem stops there.

Weak and misleading articles crowd out rigorous ones

That headline space is precious. In principle, every one of those articles could have been about better studies that could be more useful to decision makers. Even better, those media articles could have been written about topics on which there is scientific consensus. Similarly, the time and funding those researchers spent on this misleading coffee article probably could have been put to better scientific use, although worth noting that many of the proposed mechanisms to achieve more intense control of scientific studies would probably do more harm than good.

However, headline space, scientific progress, funding, and consumer exposure are not really zero sum games. Taking away one headline does not automatically mean that it will be replaced with a better one, or replaced with anything at all. Further, while this may be a particularly expensive coffee study due to the genetics aspect, most are dirt cheap. If I had to guess, putting the time and money spent on those to other studies would probably not result in a huge net gain for public health.

The most important and impactful reasons why these studies and their media coverage are damaging are far more subtle, and far more insidious.

Weak and misleading science erodes public trust and discourse in science

As usual, a comedian is the one that described it best: Lewis Black’s late 90’s rant on scientific studies flip-flopping on eggs.

As we showed in CLAIMS, the majority of what people see of health science is weak, misleading, and/or inaccurate. These headlines make up nearly 100% of almost everyone’s exposure to health science. While that represents only a fraction of health science, extremely few are privileged with getting to see the big picture, and most of those people are not writing for mainstream news. If the near entirety of what people see of studies looks like scientists flip-flopping on eggs, it shouldn’t be surprising that trust in scientific institutions is cracking. If people only see the least reliable health science, distrust is a reasonable response.

Unfortunately, many of us were indeed caught by surprise over the last few years as we watched severe backlash against scientific thought and institutions coming from news outlets and political rhetoric. When it is difficult for people to distinguish scientific strength, and people are used to weak science, it allows anyone with sufficient lack of knowledge and/or willingness to take advantage of the situation to more easily reject scientific consensus without cause.

We own this, and we need to fix it

A study like the one we saw a few weeks ago should never have entered into the public sphere, and maybe should not have been done at all. It adds little to nothing of note to our scientific knowledge, misleads health decisions, and continues the erosion of public trust in our institutions. Other studies, such as the statins example, have more immediate consequences.

We have a responsibility as scientists to educate, collaborate with, listen to, and intervene in public discussion. We also have a responsibility encourage our best science, and reject our worst. Sometimes, that means trying things that are uncomfortable and risky.

The family of projects starting with CLAIMS are explicitly intended to be used to intervene and change the interaction of scientific institutions, media, and social media. Many of them are based in critique. Critical scientific review is an unusual thing to be doing at a time when trust in scientific institutions is low, and it makes for some strange (and severely mistaken) bedfellows. It’s a risk which we hope produces net positives for scientific progress and its impact on human lives.

Count the covariates: A proposed simple test for research consumers

Noah Haber
Trying to determine if a study shows causal effects is difficult and time consuming. Most of us don’t have that kind of time or training (yes, that includes almost all medical professionals too). I have a proposed idea for a potential test that anyone can do for any article linking some X to some health Y, and I want to hear your thoughts: count the covariates.

TL;DR: You may be able to get a decent idea of whether or not the study you just saw on social media about X linking to some Y shows a causal relationship by counting the number of covariates needed for the main analysis. The fewer variables controlled for, the more likely the study is to be interperetable as having strong causal inference. The more covariates, the more likely it is to be misleading.

A few important caveats: 1) THIS IS CURRENTLY UNTESTED, but we are currently working on formally testing a pilot of the idea; 2) It will certainly be imperfect, but it might be a good guideline; 3) This is probably only works for studies shared on social media; and 4) This is an idea intended for people who don’t have graduate degrees in epidemiology, econometrics, biostats, etc., but the more you know, the better.

Why it could work:

The key intuition here is twofold. A study that is “controlling” for a lot of variables 1) is usually trying to isolate a causal effect, regardless of the language used; but 2) can’t.

Let’s see why this might work, using that coffee study from last week as an example.

Controlling for a lot of variables implies estimating causal effects

The logic comes down to what it means to “control” for something. For example, smoking. The reason the authors control for smoking is because smoking messes with their estimation of the effect of coffee on mortality. People who drink more coffee also more likely to smoke. Smoking is bad for you. One reason, then, that people who drink more coffee might have different life expediencies is because they are likely to die earlier from smoking. So it makes sense to “control” for smoking then, right?

It does make sense, if you are trying to isolate the effect of drinking coffee on mortality. If you don’t care about that cause effect, and have some other reason to want to know this association, you generally don’t need or want to control for other variable. The more variables you control for, the less plausible it is that you are doing anything other than estimating a causal effect.

Controlling for a lot of variables implies inadequate methods to estimate a causal effect

Some research strategies get you great causal effect estimation without having to control for much of anything at all, such as randomized controlled trials, “natural experiments,” and many kinds of observational data analysis methods in the right scenarios. You can’t always do this successfully. Sometimes, you have to control or adjust for alternative explanations.

The problem is when you have to control for a LOT of alternative explanations. That generally means that there was no “cleaner” way to go about the study that didn’t require controlling for so many variables. That also means that there are probably a thousand other variables that they didn’t control for, or even have the data on those variables to start with. It only takes one uncontrolled for factor to ruin the effect analysis, and there are too many to count. There are also some slightly weirder statistical issues when you imperfectly control for something, and that’s more likely to happen when you are controlling for a lot of stuff.

In that coffee study, the authors controlled for the kitchen sink. However, coffee is related to basically everything we do. People from different cultural backgrounds have different coffee drinking habits. People with different kinds of jobs drink coffee differently. Fitness. Geographic region. Genetics. Social attitudes. You name it, and it is related to coffee. That’s not a problem by itself. What IS a problem is that all of those things ALSO impact how long you are going to live. If you have to control for everything, you can’t.

Count the covariates

To review: controlling for a lot of variables implies that you are looking for a causal effect, but ALSO implies that there is more that needed to be controlled for to actually have estimated a causal effect. See the catch-22?

We can also take a look at causal language here as well. Studies are often considered acceptable in scientific circles (i.e. peer review in journals) as long as they use “technically correct” language with regard to causality. We think that is seriously misleading, but that doesn’t stop those studies from hitting our newsfeeds.

The most likely scenario for most people seeing a study that uses strong causal language and controls for very little is that it’s one of those studies that actually can estimate causality, such as most randomized controlled trials. On the other hand, a study that uses weak causal language and controls for very little probably isn’t actually trying to estimate a causal effect, and our proposed rule doesn’t really say much about whether or not these studies are misleading.

We can also look at the language used, where studies may use stronger (effect/impact/cause) or weaker (association/correlation/link) causal language. It’s also worth considering how the authors state their evidence can be used, as that can also imply that their results are causal. The kinds of studies that control for a lot of variables and state it as such are a strange bunch and unlikely to be seen in your social media news feed. This rule just doesn’t work as well for them, but most are unlikely to see them anyway, so the rule is still mostly ok.

Important considerations and discussion

Multiple specifications can make this hard to deal with. In the phrase “number of covariates required for the main analysis,” there are two tricky words: required and main. Most studies have several ways of going at the same problem, and it’s difficult to determine what the “main” one is. It is common that a study might have both a “controlled” and “uncontrolled” version, which may or may have very different numbers produced. If the numbers don’t change much between those two versions (or, even better, you have the background to know what is required and not), controlling for them probably wasn’t “required,” so they may not need to count. It is notable that the coffee study we keep talking about doesn’t do anything of the kind. All plausible main analyses are heavily controlled, and as such would fail any version and interpretation of this test.

There is probably a paradox that occurs here (credit to Alex Breskin for pointing this out). In the case of multiple studies on the same topic using roughly the same methods, observational trials controlling for more covariates probably do better with regard to causal inference. But because we are not selecting among studies in that way, and we are intending this as a guideline for ALL studies on social media, the opposite may be true.

It is also worth noting that this may end up being mostly indistinguishable from RCT vs. everything else, which is not the intent.

There are also some sets of methods which do require moderate numbers of covariates to work, and occasionally these articles appear in our news feeds. One example from Ellen Moscoe is difference-in-difference studies for causal effects of policies. These typically need controls for time and place, which is at minimum two covariates.

We also just don’t know if this idea actually works. But it might, and we can test it.

Thoughts?

Any thoughts on why this might fail? Alternative proposed tests? Let us know in the comments or get in touch!

Do coffee studies make causal inference statisticians die earlier?

Alexander Breskin
Alexander Breskin
Noah Haber
This week, yet another article about the association between coffee and mortality plastered our social media feeds. This trope is so common that we used it as an example in our post on LSE’s Impact Blog, which happened to be released the very same day this study was published. We helped comment on the study and reporting for a post in Health News Review, which focused on how the media misinterpreted this study. Most news media made unjustifiable claims that suggested that drinking more coffee would increase life expectancy. The media side, however, is only half of the story. The other half is what went wrong on the academic side.

In order to have estimated a causal effect, the researchers would have needed to find a way to account for all possible reasons that people who drink more coffee might have higher/lower mortality that aren’t the direct result of coffee. For example, maybe people who drink a lot of coffee do so because they have to wake up early for work. Since people with jobs tend to be healthier than those who don’t, people who drink coffee may be living longer because they are healthy enough to work. However, this study can’t control for everything, so what they find is an association, but not an association that is useful for people wondering whether they should drink more or less coffee.

The study is very careful to use language which does not technically mean that they found that drinking more coffee causes longer life. That makes them technically correct, because their study is simply incapable of rigorously estimating a causal effect, and they don’t technically claim they do. Unfortunately, in the specific case of this study, hiding behind technically correct language is at least mildly disingenuous. Here is why:

1) The authors implied causation in their methodological approach

The analytic strategy provides key clues was designed to answer a causal question. Remember above where we talked about controlling for alternative explanations? If you are only interested in association (and there might be some reasons why you might want this, albeit a bit contrived), you don’t need to control for alternative explanations. As soon as you start trying to eliminate/control for alternative explanations, you are, by definition, trying to isolate the one effect of interest. This study tries to control for a lot of variables, and by doing so, trying to rule out alternative explanations for the association they found. There is no reason to eliminate “alternatives” unless you are interested in a specific effect.

2) The authors implied causality in their language, even without technically saying so

The authors propose several mechanistic theories for why the association was found, including “reduced inflammation, improved insulin sensitivity, and effects on liver enzyme levels and endothelial function.” Each of those theories implies a causal effect. When interpreting their results, they state that “coffee drinking can be a part of a healthy diet.” Again, that is a conclusion which is only relevant if they were looking at the causal effect coffee on health, which they cannot make. How can you say if coffee is ok to drink if you didn’t tell me anything about the effect of drinking coffee?

3) Alternative purposes of this study are implausible or meaningless

Effect modification by genetics

The stated purpose of the study and its contribution to the literature is about the role of genetics in regulating the impact of coffee on mortality. The problem here, again, is that in order to determine the impact of genetics on regulating the effect of coffee on mortality, you first have to have isolated the effect of coffee on mortality. You can not have “effect modification” without first having an “effect.” That’s a shame, because it is totally plausible that there was some neat genetics science in this study that we aren’t qualified to talk about.

Contribution to a greater literature

In general, we should ignore individual studies, and look at the consensus of evidence that is built up by many studies. However, there are literally hundreds of studies about coffee and mortality, almost all of which commit the exact same errors with regard to causation. One more study that is wrong for the same reason that all the other studies are wrong gives a net contribution of nearly nothing. They may be contributing to the genetics literature, but this study does not add any meaningful evidence to the question of whether or not I should have another coffee.

4) Duh.

Studying whether coffee is linked to mortality is inherently a causal question. To pretend otherwise is like a batter missing a swing, and then claiming they didn’t want to hit the ball anyway. Just by conducting this study, a causal effect is implied, but as we already noted this kind of study is not useful for causal inference. This specific issue is unfortunately common for studies in our media feeds, and was one of the reasons we did the CLAIMS study in the first place. We contend that researchers need to be upfront about the fact that they want to estimate causal effects, and to then consider whether or not it is reasonable to do so for the exposures and outcomes they are considering.

We also can not stress enough a more general point: the authors of this study and and the peer review process made a lot of mistakes, but this study does not represent all of of academic research. It is a shame that studies like these are what makes the top headlines time after time again instead of the excellent work done elsewhere.

Can’t we please just accept coffee (and wine and chocolate) for what it is: delicious?

Getting (very) meta part 3: Generating funding for future work

Noah Haber

The CLAIMS study took a team of 22 highly skilled people across multiple institutions about 800 person-hours carefully reading, reviewing, rating, and debating papers, not to mention the countless hours spent designing the review tool and protocol, managing the study, doing the analysis, and writing the manuscript. That is a difficult operation in the best of circumstances, but as shown near the top of the published article.

“Funding: The author(s) received no specific funding for this work.”

The entire CLAIMS study was done without any financial support, with all effort and data being donated for free. In part, the lack of funding helps keep us away from possible conflicts of interest, real or imagined, as we criticize ourselves and our peers. However, doing reviews at this scale without funding is a trick we could probably pull off only once.

We designed CLAIMS to stand on its own, but also act as a launching pad for a series of much larger projects. Our next steps – including measuring how and where scientific information is being distorted and designing better tools to do that kind of review at scale – require funding. If we are lucky, CLAIMS and this site will help generate interest in the topic through academic and social media. One constant feature of social media is that people talking about what is wrong with social media. Social media drives media press coverage. Press coverage of scientific studies may improve chances of funding future studies. Our study explores the state of health science at the point of social media consumption. At the very least, that it is no coincidence.

Put another way, CLAIMS is a health science study which critically examines health science in social media, while also designed to itself be consumed in social media to help fund further studies through the same mechanisms causing problems in the first place. Exploring and embracing that irony is one reason why we have this blog. We are experimenting in the intersection of social media and science using our own study, and documenting the process as transparently as possible.

If you are interested in what we are doing, and want to help out in any way, get in touch! We are looking for all kinds of people, whether you are a journalist, a scientist, a social media mogul, or a potential funder.

Getting (very) meta part 2: Press releases

Noah Haber

It’s been four weeks since CLAIMS was published, and we have now gotten a look at some of the discussion about the study. First up: the press releases.

The majority of our co-authors for CLAIMS are/were with either the Harvard TH Chan School of Public Health (HSPH) or the University of North Carolina at Chapel Hill (UNC) Gillings School of Global Public Health, so it was fitting that we had press-releases from these institutions. In both cases, we initiated contact with the press offices, who then wrote initial draft press releases. We then made suggestions and edits to those releases. The Harvard University press release was published on June 5, with the UNC press release on June 19.

In the CLAIMS study itself, our sample had nine studies whose author list had at least one Harvard University affiliation listed, and only one from UNC. Both of these schools are two of the most respected and highest ranked institutions in the field. It is intuitive that big name private universities are more likely to garner press than less generally-recognizable schools, and that seems to be a factor when someone drops the Big H. So far it is looking like this also holds true for click generation, as we are getting more traffic from the Harvard University press releases (more on that once we have collected more data). Name recognition matters.

As for content in my opinion, they were generally very accurate (we should hope so, since we helped edit them). The HSPH press release focused a touch more on media misinterpretation than might be ideal through the use of a side quote, given that we can’t do much to distinguish between the impact of academia, media, and social media spin and preference. The UNC one was a bit more neutral in that regard. We should note that it is incredibly difficult to craft these statements, and it is near impossible to write something like this with perfect emphasis on all the subtleties of the study.

One interesting difference between the two was how each institution emphasized its own institutional role in the study. As an example, I was a doctoral student at HSPH when we did the main part of the research, but a postdoc at UNC when the article was published, so I have both affiliations for the publication. I was the same person in each press release, but my descriptors were notably different:

From Harvard University:

“Noah Haber, who recently completed his Sc.D. in health economics in the Department of Global Health and Population at Harvard T.H. Chan School of Public Health. . .”

From UNC:

“. . . Noah Haber, ScD, postdoctoral scholar at UNC’s Carolina Population Center. . .”

During this process, we learned first-hand just how incredibly difficult it is to write science for a general audience. In the case of our public explainer, we had the luxury of space; if a concept was best described in two paragraphs, we could use it. On a press-release, writers have two paragraphs for their audience to absorb ideas it typically took years of intense study and investigation to work with. On top of that, the most important purpose of a press release is to generate press (and therefore funding). Spin, whether intended or not, is tempting, and easy miss when it happens.

Did we do a good job editing these press releases? Did we, ironically, overstate the strength of our own findings? Did we mislead, whether intentionally or not? We would like to believe that we did our best to be accurate and honest, but we are rarely the best judges of ourselves.

Getting (very) meta

Noah Haber

What happens when the authors of a meta-science study involving scientific translation on media and social media evaluates the translation of its own study?

The CLAIMS study is, in part, about the intersections and interactions between academia, media, and social media. We like experimenting with new ideas and new approaches to tackling the spread of misinformation in health research across the whole pathway from research generation to consumption. With that in mind, we’re getting (very) meta on this site, and playing with ideas on how our own study is received and interpreted.

  1. A public explainer of the CLAIMS study
    Traditionally, academic authors’ responsibility for communicating research findings stops at the point at which the paper is published in an academic journal. After that point, it’s left to other academics, press release writers, journalists, bloggers, and sharers. Translation into more popular news and social media formats leaves opportunities for spin and misunderstanding. We wanted to make sure that people discussing our work understand it well by putting it into everyday language that we designed and believe to be the an accurate translation.
  2. Suggestions on how to (and how NOT to) discuss on social media
    Crafting a tweet is hard in the best of circumstances, and extra hard when there is important nuance to the study. In our main page, we give some suggested headlines that are reasonably accurate. More importantly, we give tweets we expect people might be tempted to write that are NOT accurate, and explain why.
  3. News and social media tracking and evaluation
    Given that we have provided #1 and #2 for anyone interested in writing about the study, we can do something even more interesting: keep track of who writes about and cites our work, and evaluate it. We have provided a lot of resources for those who are interested. We’re also reaching out to media outlets to help make sure discussion of the work is accurate. We don’t know who might be interested in writing about the study, but we are very curious to see what they get right and wrong if they do. The plan for now is to simply search for academic papers, blogs, and news media about and/or citing our study, keep a copy of it, and see what we think. If there are a lot of articles, we may do something more formal and systematically evaluate those articles for potential misinformation. This is not intended to be a a research-grade study, but it may help inform some future hypotheses and give some fodder for interesting discussion.

Will major news outlets write about our study? What aspects of the study are of greatest interest? What do people get right, and more importantly, what do people get wrong? I honestly have no idea, and that’s my favorite kind of hypothesis.

Welcome to MetaCausal

Noah Haber
MetaCausal is dedicated to the relationships between science, statistics, and people. We feature research, discussion, news, and everything in between, starting with our own published work. We plan on growing from a small scale blog to something quite a bit bigger, and we’re excited to be scaling up very soon.

This is also the home of our kickoff project: the CLAIMS study. The CLAIMS study looks at the strength of causal inference in studies and articles shared in social media. Your news feed is probably filled with articles saying things like “study finds chocolate linked to cancer.” We wanted to know if those studies shared on social media actually find that changing chocolate consumption actually caused/prevented cancer,  or were their methods not able to distinguish correlation from causation.

The CLAIMS team identified the most popular media articles about academic studies assessing the association between any exposure and health outcome, and systematically reviewed them and the media articles about them for causal strength and language. We found that the studies most likely to be seen by social media consumers in 2015 were very unlikely to show causation, and were slightly overstated. The media articles most likely to be read about them were very likely to be overstated and inaccurately described. This study is accepted for publication in PLoS One, pending final publication.

This site hosts the full dataset, code, and methods for full transparency. This site is in a holding pattern while we wait for official publication of the CLAIMS study. Once that happens, we’re going to be spending a lot of time thinking and discussing the state of science on this site, based in part from the findings of the CLAIMS study.

We’ll post public explainers about science and stats. We’ll do some oddball public experiments. We’ll post our own public versions of the studies we’ve been making. We’ll have opinions and analysis on a whole range of topics from technical science to social media, reports on our own studies, and everything in between.