A few weeks ago, I was approached to guide public-facing strength of evidence assessments for two separate projects related to COVID-19. I got a hold of a few folks (main credit to Emily Smith here) to create a framework that accomplishes two main goals 1) Frame strength of evidence review against a near-universal fixed standard, and 2) Translate strength of evidence with its applicability to decision-making for public audiences.

In the interest of getting these ideas circulating and getting critique, feedback, and suggestions, I wanted to make this framework public. If you like the ideas, feel free to steal, remix, revise etc them. If you don’t, let us know why!

Framework for peer review in open science

Consider the main claims and question of interest of the paper, drawing particularly from the abstract and title. Consider a hypothetical ideal study to test that question of interest in the population of greatest interest without regard to whether such a study is feasible, ethical, or even physically possible. 

Examples (in drop-down, mouseover, etc.) For example, if the study is about the impact of city masking orders, the hypothetical ideal study might be a cluster randomized controlled trial where every city was randomly assigned to have masking orders. However, for the question of whether masks are protective, the hypothetical ideal study might involve random assignment of actually using masks on an individual level (with perfect compliance and adherence). A hypothetical ideal study for a diagnostic or screening test might involve testing against a fictional perfect test with no false positives or negatives. The hypothetical ideal study will be specific to each question of interest.

Now consider the following categories for strength of evidence:

Strong: The main study claims are very well-justified by its methods and data. There is little room for doubt that the study produced has very similar results and conclusions as compared with the hypothetical ideal study of relevance. The study’s main claims should be considered conclusive and actionable without reservation.

Reliable: The main study claims are generally justified by its methods and data. The results and conclusions are likely to be similar to the hypothetical ideal study. There are some minor caveats or limitations, but they would/do not change the major claims of the study. The study provides sufficient strength of evidence on its own that its main claims should be considered actionable, with some room for future revision.

Potentially informative: The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

Not informative: The flaws in the data and methods in this study are sufficiently serious that they do not substantially justify the claims made. It is not possible to say whether the results and conclusions would match that of the hypothetical ideal study. The study should not be considered as evidence by decision-makers.

Misleading: Serious flaws and errors in the methods and data render the study conclusions misinformative. The results and conclusions of the ideal study are at least as likely to conclude the opposite of its results and conclusions than agree. Decision-makers should not consider this evidence in any decision.

At-a-glance table

Claims are _ by the methods and dataDecision-makers should consider the claims in this study _ based on the methods and data.
Strongvery well-justifiedactionable without reservation
Reliablegenerally justifiedactionable with limitations
Potentially informativenot strongly justified, but may yield some insight. not actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it
Not informativenot substantially justifiednot actionable
Misleadingnot at all justifiedmisinformative

Which of these categories best represents your view of the methods, data, results, and claims from the study, and why?

Thoughts and comments welcome

