XvY: Augmented, Targeted, and Amplified Peer Review for Health Science

The pathway from research generation to consumption is a complex process which can yield inaccurate, overstated, and/or inappropriately selected health science at every step along the way, from academia to social and traditional media. As demonstrated by the CLAIMS study, what consumers get at the end of that pathway is typically weak and misleading. CLAIMS was designed as the foundation of a much, much larger project to try to dig deep into why and how we got here, and more importantly, how to fix it. We are making the plan public, and looking for your help to make it happen.

XvY is designed bolster science’s self-righting mechanisms, extend them into public discourse and decision-making, and help navigate its future though refreshing, augmenting, and amplifying a very old idea in science: peer review. It is a vehicle for both developing methods of determining and translating strengths and weaknesses of studies, and deploying that information all the way from academia to social media. Instead of replacing existing institutions, we want to change the scientific, media, and social environment to let the best science and best ideas shine.

XvY is designed to:

  • Develop a set of resources and services which provide researchers, consumers, and decision makers with credible and useful peer review to use scientific expertise to tell weak and misleading from strong and accurate
  • Integrate and deploy review with social media delivery systems, whether as simple as reply tweets or as deep as integrations with social media platforms.
  • Publish public review datasets for decision-makers to look up detailed study strength from both individual studies and track records beyond publication counts.
  • Make it all:
    • Easy to use and understand for everyone from high school students to PhD-level specialist researchers
    • Deployable at the speed of modern publishing and social media
    • Inexpensive enough to sustain for decades
    • Credible enough to be trusted by researchers, decision-makers, and the public

We are starting with one of the central issues in health science: causal inference. Strong causal inference is always hard, as is figuring out whether studies actually could find that some X actually causes some health Y. Unfortunately, correlation is often mistaken as causation (even in academia), resulting in perpetual misleading studies and translation. That leads to people making bad decisions from bad information and deep societal issues with scientific and media credibility, deserved or not.

The idea has four components, each of which is designed and integrated in a way that strengthens and feeds the other parts. Part 1) is large scale targeted scientific review at all points from journal publication to consumption. Part 2) uses that review data to develop and augment review tools and processes to be more effective, less costly, faster, and more accessible. Part 3) identifies mechanisms and processes which impact research strength, translation, and utilization. Finally, Part 4) takes the enhanced review tools and strategies, and deploys them en masse across scientific journals, media, and social media.

Part 1) Causal Inference Scientific Review from Generation to Consumption

XvY has its entire foundation rooted in peer review, starting with the CLAIMS study. The CLAIMS study looked at the end of the pathway, at what people were receiving on social media compared to what the studies themselves were able (or unable) to show. Part 1) of XvY extends this idea all the way along the pathway from publication to dissemination. We will start by pulling from random selections of scientific articles from the top medical journals, based on what is popular and unpopular in social media, different levels of citation counts, and volume of articles about them in traditional media. Our expert reviewers will systematically tease out the strengths and weaknesses of the studies and the language used to describe them. The reviews will yield a relatively complete picture of where along the pathway things are going wrong most and gives us leads to study how and why things go wrong. More importantly, it gives us a dataset to work with to develop better review tools.

Part 2) Review tool development and augmentation

The CLAIMS review tool was a of our proposed review tool concepts, integrating multidisciplinary expertise, technical statistics, internal validity, and generalization to generate strength and language review ratings based on applicability to research and actual decision-making. We learned quite a bit from CLAIMS, but most importantly found that the review tool itself and the review process were largely successful. When we can pair review tool development with actual review, we can start to make our review tools much better, faster, and easier to use for different purposes and users.

In the first round, we can simply improve the existing CLAIMS review tool, by examining how items in the review tool predict final strength ratings. Items which had little impact might be shortened or eliminated, while items which had more impact might be expanded upon or made more prominent. We can identify areas which reviewers had problems with and add clarification. In the full version, we can experiment with some unusual and counter-intuitive items and question formats. We will also develop a simplified short version to be used as an add-on module for peer reviewers for academic journals.

In the second round, we can develop augmented tools which add expert-review based guidance for end users. Using the training data from in-depth review, we can start to add algorithms which choose more impactful items for reviewers to consider, suggested responses, and/or early cutoff if sufficient evidence of strength or weakness has been provided. The amount of assistance will depend on the data available and reviewer needs.

For example, a journal or media editor is screening studies for further consideration, and comes across a cross-sectional chocolate vs. cancer study. The reviewer enters in some basic information about the study based on prompts. As the reviewer goes, the augmentation engine compares entered data with the pre-existing review dataset, and might could be increasingly confidence that the study is weak and/or misleading, based on similarities with other already-reviewed studies. The tool could indicate this directly, modify the questionnaire logic to probe specific areas, or do nothing and allow the reviewer to come to their own conclusions, depending on the reviewer’s needs.

The tool will be tunable to different audiences and different purposes, while also understanding and indicating when there isn’t enough data for augmented review or when more expertise is required. Better yet, all versions can come from the same central unified framework, allowing additional reviews to be added to the training dataset and improving augmentation, leading to a positive reinforcement cycle.

Part 3) Mechanisms for impact

The third part explores what, where, and how misleading information in causal inference develops and spreads, and what knobs we can try turning to make the biggest impact. The actual design of these studies will be largely determined based on what we find in the previous two parts, but we’re looking at everything from psych lab studies and public social media experiments to creating randomly generated studies to test publication biases.

For example, we have a hunch that most people assume causality when presented with association/correlation/link, etc. If we can understand bit more about how people think when seeing standard headlines, we might be able to make better headlines, or better yet, better systems to counter misleading ones. Psychology laboratory settings also provide relatively inexpensive ways of pre-piloting the designs of our interventions.

In certain circumstances, we can quantify the causal effect of biases and perverse incentives on research production, dissemination, and actual health impact. We are currently working on designing an automated study generator, which randomizes the properties of studies to experimentally estimate the effects of publication and reporting biases. We can further try to better understand the pathways and mechanisms that produce and fail to screen out misleading information. We can estimate network, cyclical, and multiplier effects that may result in perverse races to the bottom, particularly among public interest items.

We can take the targets with got in the first part of XvY and to find the best knobs to try to turn. Then, we see what happens when we turn them in the real world.

Part 4) Implementation, experimentation, and deployment

As specialists in causal inference, economics, and epidemiology, we view our work in terms of the impact it has on human lives. To make that impact, we plan on deploying our systems across the entire health research spectrum, from pre-publication to Twitter, and carefully measuring the impact we make. Everything will be deployed in ways that let us identify what does and doesn’t work, how, and why. Here’s the plan:

Social media deployment

Initial experimental deployment on social media will start with simple reply-tweeting and posting of bite-sized review summaries, with links to more thorough multi-level review that includes both consumer and research-grade review information. For example, a person sharing a misleading coffee study might receive a reply that says “An expert team of reviewers found that this study could not estimate whether drinking more coffee had any impact on health, and contained moderately misleading language. More information available at XvY.science/q89agsb.” Later, we can integrate our services with social media platforms themselves, similar to how Facebook and others are integrating third-party fact-checking services. The end goal is to influence sharing behaviors, both of the original sender and more importantly of their recipients.

Beating Twitter to the punch on all research might seem impossible, but we have two importing things going for us:

1) A small number of articles take up VASTLY outsized proportions of media and social media space. Our 50 articles in CLAIMS represented over half of health research social media shares in 2015. We don’t need to fully review every article that gets published; we just need to do a bit of predicting and get a most of the ones most likely to become popular. On top of knowing what predicts popularity from earlier parts of this project, we can use signals like press releases and social media tools to target high impact articles before they go viral.

2) Most popular articles are weak and misleading in ways that are relatively easy to detect. Fast screening versions of our tools can cast a wide net to identify the weakest and most misleading articles first, which will capture the large majority of popular articles. We can then follow up with more thorough review for articles that are harder to screen out, more popular, and more complicated. We can further improve targeting and effect using many of the same strategies and data that social media moguls and advertisers use, through identifying key movers to tailoring messages to specific demographics.

Public dataset deployment

As we continue to expand and refine our review datasets and methods, we will have an ever-increasing set of studies and media articles which we have reviewed, and will make them publicly available and easily readable by all kinds of engines. The full reviews will be made publicly available and summarized at multiple levels. The top-level review will be a simple, tweet-length, public-language strength and language summary. The second level will be a one-paragraph public-language summary of the review. The third level will be a single-page technical summary of review findings, intended largely for researchers for people with research backgrounds. Finally, the full original review documents, methods, and revision history will be fully available as downloads. We may also make a public API to allow other services to utilize review data.

For example, review data may look something like this review summary, taken from the CLAIMS review

In addition to impact through allowing individuals to share review information to peers, the dataset will serve as a source for determining the credibility of individuals, media outlets, journals, and other institutions. While funders and hiring organizations typically rely on citation and publication counts, the dataset provides an independent source strength and value. Positive reviews may help more credible researchers, journalists, and institutions’ reputations among funders, hirers, decision-makers, and the public. That is particularly valuable for those with stronger methods and language, but lower volume publications. Of course, the opposite is also true.

Long term goals and sustainability


The model for impact is multi-pronged, which reflects the nature of modern science. XvY is designed to foster better alignment of incentives and outcomes all across the health research impact pathway both through direct effects and indirect impact on the scientific environment. We will create the tools and resources to help decision-makers of all kinds better synthesize and critically understand research. We will push that information out to those decision makers and influencers to make better-informed decisions. XvY will both highlight areas where change is needed and create pressure to engage in and experiment with new models and ideas. We anticipate the largest impacts of XvY will be through catalyzing other scientific, media, and social media reform efforts.

Academic partnership

Independence critical assessment requires an unusual blend of participation in and independence from research institutions. We plan to do both through foundation of a small research center tied to one or more major research universities, while maintaining largely independent administrative and funding structures. Maintaining academic status ensures a grounding in cutting edge methodologies and access to highly skilled researchers and reviewers, while independent funding reduces perceived and actual conflicts of interest.

Sustainability through tools as a service

While early phases of XvY will most likely be funded through research and private donor sources, we plan on offering and expanding the continual public review processes for free public use well into the long run. Ideally, we would like to be able to do so without relying entirely on ephemeral donor and grant support to a self-sustaining model. We may be able to take advantage of our experience and expertise in statistical review and offer review and critique as a service. While the tools will continually be improved, there is a limit to which the review tools can supplant real expertise. Over time, we will have developed a network of skilled researchers and reviewers, as well as strategies and mechanisms for hiring and training new ones. Media organizations, journal editors, foundations, policy organizations, and others may want detailed expert review, but may lack in-house capabilities. Major foundations, for example, may want to better understand the causal impact that their funded projects have on human lives. Foundations could collaborate with XvY to critically review reports from funded organizations, and suggest improvements both in the way that results are estimated, and collaborate on project design to improve evaluation. Journals could employ services to screen out potentially weak and misleading studies. All funds generated through these activities would go toward sustaining, expanding, and improving publicly-available review tools and resources.

Who are we?

The project designed led by Noah Haber alongside many of the same folks behind the original CLAIMS study, most notably Ellen Moscoe, Emily R. Smith, and Alexander Breskin. Our partnerships extend as widely as the problems we hope to address, from academics specializing in everything from psychology to statistics, to science journalists. We’re always looking to extend our network, so please get in touch!

We need your help

At the end of the day, we want to make XvY invisible, unnecessary, and obsolete. The larger our impact, the less we need to do. We want to make it so that the research decision-makers see is high quality and conclusive when available, little weak and misleading research when not. We want to foster a scientific environment that is positively engaged with the public without the perverse incentives that come with popular appeal. Unfortunately, we are a long way from those goals. To get there, we need your help. We are looking for talented people help us build our tools. We are looking for people and organizations who can champion these ideas, as well as those who can challenge them. Of course, we are also looking for funding to get us going.

If you are in any way interested in the project, whether just to ask questions or to help out, please get in touch through our contact form, and we will get back to you as soon as we can.