top of page
Search

Despite Numerous Guidelines, the Gap Remains Wide Between What Meta-Analysts Should Do and What They Actually Do

Introduction


The list of guidelines and best-practice recommendations for meta-analysts is extensive. The Cochrane Handbook and a growing number of discipline-specific guides (e.g., Havránek et al., 2020; Iršová et al., 2024) outline in detail how meta-analyses should be conducted—from how studies are selected, to how data dependence is handled, to how publication bias is assessed. Despite that, there is little evidence about what meta-analysts actually do.


In a recent paper, my coauthors (Allen Wu, Jane Duan, and Elizabeth Tipton) and I analyzed 1,000 meta-analyses across 10 different disciplines to describe prevailing meta-analytic practices and to assess how well they align with established recommendations. Our goal was to document the state of the field, identify areas for improvement, and provide a cross-disciplinary benchmark for both applied researchers and methodologists designing simulation studies.


What Meta-Analysts Do


Our study spanned disciplines as diverse as medicine, psychology, education, biology, environmental science, and economics, sampling 100 meta-analyses from each field published around November 2021. This yielded a uniquely broad snapshot of current practice.Perhaps not surprisingly, we found that meta-analytic practice varies widely across disciplines — in study size, methodological sophistication, and adherence to statistical principles. Below is a selective, list-based summary of our main findings.


1. Dependence Among Effect Sizes is Often Ignored

A majority (57%) of meta-analyses included multiple effect sizes per primary study yet treated these as independent observations, violating key statistical assumptions.


2. Unpublished Studies Are Infrequently Included

Only about 31% of meta-analyses included unpublished ('grey literature') studies. Disciplines differed substantially in their treatment of unpublished studies. While fewer than 20% of meta-analyses in medicine, biology, or related fields incorporated unpublished studies; roughly two-thirds of psychology and economics meta-analyses did so.


3. Effect Size Metrics Vary — and Are Sometimes Misused

Different fields use different effect size measures: ratios in medicine, mean differences in psychology and education, and correlations in economics. Correlation-based metrics pose problems because their standard errors depend on the effect size itself, biasing results unless corrected (e.g., Fisher’s z or N-weights).


4. Random-Effects Models DominateRoughly 83% of all meta-analyses used random-effects estimators. Fixed effects are used relatively rarely, and almost never alone. Few meta-analyses use multilevel or Bayesian models.


5. Heterogeneity is High and Underexplored

Median I² across disciplines was 80%, indicating considerable heterogeneity, yet fewer than half of all meta-analyses used meta-regression to explore its sources.


6. Publication Bias Testing Is Common, but Effect Estimates Are Rarely Adjusted

About 73% of meta-analyses tested for publication bias, though most relied on funnel plots or Egger regressions, both of which have low power. Few took the next step: only around 20% actually adjusted their reported effect sizes.


What Meta-Analysts Should Do


The diversity of methods across disciplines is natural and even healthy. However, our review revealed several systematic gaps between what meta-analysts do and what they should do given current, best-practice recommendations. Below is a list of areas for improvement along with recommendations.


Recommendation: Use Meta-Regression More and More Effectively


R1a. Meta-analysts should use meta-regression more often to explore why effect sizes differ across studies. It helps identify patterns that a simple pooled mean hides and can be used to calculate best-practice estimates.


R1b. Meta-regressions should include multiple covariates, rather than including one variable at a time, to see whether findings hold once other factors are taken into account.


Recommendation: Get the Weighting Right


R2a. Ensure that study weights are independent of the effect sizes themselves. Weighting schemes that depend on the effect can distort results.


R2b. When using correlations as effect sizes, avoid tests for publication bias based on the correlation’s standard error. More appropriate alternatives include proxy measures derived from sample size or transformations to Fisher’s z.


Recommendation: Correct for Publication Bias and Check Robustness


R3a. Always adjust for publication bias, even when tests don’t flag it—most tests lack the power to detect moderate bias.


R3b. Use selection models and multivariate regression approaches as complementary robustness checks.


R3c. Apply two or more correction methods and compare results; convergence across methods strengthens confidence in the findings.


Recommendation: Account for Dependence Among Effects


R4a. Don’t treat multiple estimates from the same study as independent. Ignoring dependence leads to overstated precision and inefficient estimates.


R4b. When using cluster-robust standard errors, prefer the CR2 estimator with Satterthwaite degrees of freedom—it provides more reliable small-sample inference than the older CR1 version.


Bringing It All Together


Meta-analysis has transformed how evidence is synthesized across disciplines. Yet our review of 1,000 recent studies shows that, despite abundant guidelines and powerful software, practice still trails principle. Our final analysis analyzed each of the ten disciplines to see how well they aligned with the recommendations above. For each discipline, we tracked the percent of studies that met the respective standards. To facilitate reading, we color-coded the associated compliance rates:


Blue: High compliance (67% and up)

Yellow: Medium compliance (34-66%(

Orange: Low compliance (33% and below)

Gray: Not applicable (NA)


The disciplines are ordered lexicographically by their overall level of compliance—starting with those that meet the largest number of recommendations at the highest threshold (blue cells), followed by those with more medium-compliance instances (yellow cells), and finally those with the greatest number of low-compliance instances (orange cells).


ree

We hope our study will serve as a resource for researchers conducting their first meta-analyses, a benchmark for the design of simulation studies, and a reference point for applied meta-analysts aiming to align their methods more closely with best-practice standards.


REFERENCES:




 
 
 

Recent Posts

See All
MAER-Net Blog - Introduction

Introduction to the MAER-Net Blog Content overview How to create a new account for the MAER-Net Blog. Access your profile . Subscribe to...

 
 
 
bottom of page