# Addressing sample overlap in meta-analysis

As you know, empirical studies in economics often use very similar (or even the same) data, causing empirical outcomes to be highly correlated. Heiko Rachinger and I recently published __a paper in Research Synthesis Methods__ providing a solution to this sample overlap problem. We show that, in the case of regression estimates, the correlation between any pair of estimates with overlapping samples is relatively simple to calculate if we know the number of overlapping observations in the primary datasets. Doing this for all pairs of estimates allows us to determine the optimal weight to each estimate. We call it the generalized-weights (GW) estimator. Intuitively, two estimates with highly overlapping samples get lower weights than they would if the samples did not overlap.

The main drawback of the GW procedure is the extra effort it requires at the coding stage, since the number of overlapping observations between pairs of estimates is the key input. To facilitate the implementation of the GW estimator, we have made available a __Stata code__ that automatizes the whole procedure. All the user needs to supply is a geographical identifier (e.g., an index of countries) and the sample period. The current version of the code can only handle relatively simple overlap structures, but we are working on extensions to cover some complex cases that are likely to be found in practice (such as, for example, overlap between data defined at different levels of aggregation). Heiko will discuss some of these extensions at the MAER-net colloquium in Athens. If you have encountered any particular type of sample overlap in your work that you would like to share with us, we would really appreciate it!