Building Meta-Analysis Datasets with AI Assistance: A Case Study. Part 2: Using ChatGPT to Assist with Collecting and Organizing Studies

bob.reed
Mar 17
10 min read

Introduction

In the previous post, I described how I used ChatGPT to help reason through the conceptual structure of a meta-analysis on inflation and subjective well-being and to arrive at a set of Boolean search terms suitable for use across multiple bibliographic databases. That post ended at a natural stopping point: the search logic was defined, inclusion and exclusion criteria were explicit, and the remaining task was to take those search terms into the world and see how they behaved in practice.

This post picks up at that point. Its focus is not on constructing search strings, but on executing them—running the same conceptual search across different databases, dealing with database-specific quirks and institutional constraints, deciding how much to retrieve from each source, and organizing the resulting records in Zotero in a way that supports transparent screening downstream.

Since this blog series is about how AI can help with meta-analyses, I should also note that most of this blog was authored by Claude.ai. As in the first blog, I copy and pasted my interactions with ChatGPT into Claude and asked it to write a blog. While I did some editing, much -- if not most -- of the following blog was written by Claude.ai.

What follows documents what actually happened when I took a single set of Boolean terms into Scopus, Business Source Complete (via EBSCO), SSRN, and RePEc. Although the conceptual target was held constant, the databases behaved very differently. Some searches returned manageable result sets immediately; others exploded into thousands of records. Some platforms supported clean bulk export; others did not. Several judgment calls had to be made along the way—not only about how to refine searches, but also about when to stop refining and when further tightening would risk excluding relevant studies.

Throughout this process, ChatGPT functioned less as a retrieval tool than as a way to make those judgments explicit. It helped diagnose why particular databases behaved as they did, clarify which refinements were substantively defensible, and separate technical constraints from conceptual ones. The result was not a perfectly frictionless workflow, but a traceable one—one in which each decision can be explained, and if necessary defended, after the fact.

Searching Scopus: establishing a baseline

I began with Scopus. Scopus is far from perfect, but for this topic it has several advantages: good coverage of economics and social science journals, transparent field restrictions, and reliable interaction with reference managers such as Zotero. With ChatGPT's assistance, I constructed a Boolean query requiring at least one inflation-related term and at least one subjective well-being term, applied to titles, abstracts, and keywords. The query also explicitly excluded common mental-health and satisfaction terms that I had already decided were out of scope.

The first Scopus run returned just under 200 records. Importantly, this result set felt right. Scanning titles and abstracts, I recognized many familiar papers alongside some newer or less obvious ones. The proportion of false positives was modest, and the noise was manageable. This initial success was instructive. It provided a reference point for what reasonable recall looked like for this topic. Later, when other databases returned thousands of hits, that Scopus search served as a reminder that the problem was not my conceptual framing but the behavior of the database.

All Scopus results were imported directly into Zotero using the Zotero browser connector and placed into a dedicated collection labeled clearly by database. At this stage, I imported metadata only—titles, abstracts, authors, keywords, DOIs. I deliberately did not bulk-download PDFs.

The EconLit problem

The next step, at least in principle, should have been EconLit. For economists, EconLit often plays a canonical role in systematic searches, and I fully expected to include it. However, when I attempted to access EconLit through my university library, it turned out that my institution does not subscribe to EconLit. This is less unusual than many economists realize, but it presents an immediate methodological choice: either accept incomplete coverage or identify an alternative database.

This is exactly the sort of mundane constraint that rarely appears in published methods sections but often shapes what is actually done. Rather than glossing over it, I treated it as a design problem worth thinking through carefully. After some discussion, I decided to substitute Business Source Complete (BSC) via EBSCOhost. This decision was pragmatic rather than ideal. Business Source Complete has broad coverage of economics-adjacent journals but also includes management, psychology, analytics, and practitioner outlets. Used incautiously, it can easily swamp a well-defined search with thousands of irrelevant records.

First pass in Business Source Complete: a diagnostic failure

The first time I ran my inflation–well-being query in Business Source Complete, the result count exceeded 6,000 records. That number alone was enough to signal a problem.

At that scale, importing into Zotero would be unwieldy, and title–abstract screening would be inefficient and error-prone. Importantly, this explosion was not a sign that the topic was suddenly broader—it was a sign that the database interpreted key terms like "happiness" and "well-being" in far more expansive ways than Scopus.

Here, ChatGPT was useful as a diagnostic partner. Instead of reacting by tightening the search arbitrarily, I could reason through why this database behaved differently and what kinds of constraints would be legitimate to impose.

Practical constraints on retrieval from Business Source Complete

A second issue surfaced almost immediately once it became clear that Business Source Complete would require substantial refinement: unlike Scopus, it does not support true bulk export of large result sets. There is no option to select all records, no ability to download hundreds of citations in a single action, and native export tools are limited to small, page-by-page batches. At the scale of several thousand hits, this made direct export impractical even before questions of screening efficiency arose.

This constraint matters because it shapes what counts as a feasible workflow. In Scopus, broad exploratory searches are tolerable precisely because the platform supports clean, one-shot ingestion into reference managers. Business Source Complete behaves more like a content delivery platform than a bibliographic database in this respect, and attempting to treat it as the latter leads quickly to friction.

Rather than working around this limitation by repeatedly exporting small batches—a process that would have been both time-consuming and error-prone—I relied on Zotero's browser connector to capture records directly from the results list once the search logic had been brought under control. This approach allowed me to import bibliographic metadata (titles, abstracts, authors, and identifiers) without attempting to download full texts and without relying on EBSCO's constrained export functionality.

The inability to bulk download, therefore, provided an additional reason to focus effort on refining the search before importing anything. It reinforced the broader methodological principle guiding the process: retrieval design and dataset construction should minimize manual intervention and UI-driven decisions, not multiply them. Once the search had been tightened to a manageable and conceptually coherent set of records, Zotero's connector provided a stable and transparent bridge between Business Source Complete and the downstream screening workflow.

Iterative tightening as a methodological process

Rather than jumping straight from 6,000 records to some arbitrary cutoff, I tightened the search in stages, checking the effect of each change. The first adjustment was to introduce an explicit economics anchor—terms such as "macroeconomic," "monetary policy," or "fiscal." Initially, I applied these terms across all fields. The result count barely changed. This was a useful failure. It revealed that these words appear frequently in full-text metadata, references, or journal descriptions and therefore do little to discriminate at the search stage.

The next adjustment was more effective: restricting the anchor terms to the abstract field only. This single change dramatically reduced noise. Abstract-level constraints are stronger because abstracts summarize what a paper is actually about, rather than incidental contexts. Even then, the result set was larger than ideal, hovering around 1,500 records. At this point, I made a further decision that aligned directly with my research question: require explicit mention of inflation in the abstract. Since inflation had to be the exposure variable anyway, this was not an arbitrary restriction but an enforcement of a substantive requirement.

Finally, I applied the Academic (Peer-Reviewed) Journals filter available in EBSCO. I was careful about this step. Filtering by publication type is often legitimate; filtering by outcomes, countries, or methods is more questionable. In this case, since I planned to cover working papers separately via SSRN and RePEc, restricting Business Source Complete to peer-reviewed journals made sense.

After these refinements, the EBSCO result set dropped to 174 records. At this point, the search was no longer noisy but also not implausibly narrow. Titles and abstracts looked recognizably similar to those returned by Scopus, though with some distinct additions.

Knowing when to stop refining

One of the hardest judgments in search design is deciding when to stop. There is almost always one more tweak that could reduce the result count further. However, every additional restriction increases the risk of false negatives. This is where having a clear sense of the conceptual target—and a reference point from Scopus—was helpful. A set of 174 records from Business Source Complete was consistent with what EconLit might plausibly have returned for this topic. Further tightening would have been driven more by convenience than by methodological necessity. At that point, I locked the search.

Importing into Zotero: metadata, not PDFs

All 174 EBSCO records were imported into Zotero using the Zotero browser connector and stored in a separate collection labeled clearly by database and search round. Again, I imported metadata only. Several times during this process, Zotero warned me that my library exceeded cloud storage limits for attachment syncing. This was not a problem since I was not intending to download PDFs at this. Bulk-downloading full texts before screening is inefficient and creates file-management overhead for papers that will later be excluded. My guiding principle was simple: search first, screen second, retrieve PDFs last.

Searching SSRN and RePEc

In addition to Scopus and Business Source Complete, I also searched SSRN and RePEc to capture relevant working papers and preprints. These platforms played a supplementary role in the overall workflow, extending coverage beyond peer-reviewed journals without relying on Google Scholar as a primary retrieval tool.

The mechanics of ingestion differed across the two platforms. For SSRN, Zotero's browser connector functioned as expected. Individual records or small sets of records could be captured directly from search results or paper landing pages, with bibliographic metadata and abstracts imported cleanly into Zotero. As with the journal databases, I did not attempt to bulk-download PDFs at this stage; working papers identified through SSRN were treated as bibliographic records to be screened alongside journal articles later.

RePEc, by contrast, did not support direct ingestion via the Zotero connector in a reliable way. Instead, records had to be exported manually using RePEc's BibTeX export functionality and then imported into Zotero. This added an extra step to the workflow, but it also made explicit a distinction that often remains implicit: even within the category of "working paper repositories," platforms differ sharply in how well they integrate with reference management tools.

These differences did not affect the conceptual role of SSRN and RePEc in the review. In both cases, searches were keyword-based rather than Boolean-intensive, reflecting the more limited search functionality available on these platforms. Records from SSRN and RePEc were stored in separate Zotero collections to preserve provenance and were screened using the same title–abstract criteria as journal articles. What differed was not the logic of inclusion, but the mechanics of getting the metadata into a form that could be handled systematically downstream.

Taken together, these searches yielded a raw set of records that reflected both the breadth of the literature and the heterogeneity of the data sources. Before any deduplication, the database-specific counts were as follows: 196 records from Scopus, 174 from Business Source Complete, 39 from SSRN, and 58 from RePEc. Each set was stored in its own Zotero collection to preserve source provenance and to make subsequent duplicate detection transparent. At this stage, the objective was coverage rather than parsimony; overlap across databases was expected and intentionally left unresolved until all sources had been ingested.

Tracking provenance and deduplication

Once all records had been imported from the four databases, the next step was deduplication and understanding how much unique coverage each database provided. As each database was imported, I tagged every item with its source—"source:Scopus," "source:EBSCO," and so on. This tagging was done immediately upon import, since once deduplication began, provenance information would otherwise be lost.

The raw counts before deduplication were 467 items total (Scopus: 196, EBSCO: 174, RePEc: 58, SSRN: 39). After running Zotero's duplicate detection tool and removing a handful of foreign-language papers, the counts dropped to 440 items. When merged into a single MASTER collection, Zotero preserved all source tags on duplicated items, yielding 361 unique items.

To determine how many items were uniquely contributed by each database, I used Zotero's Advanced Search function four times. For each database, I constructed a search requiring that database's tag while excluding all others. For example, to identify items unique to Scopus: Tag—is—source:Scopus AND Tag—is not—source:EBSCO AND Tag—is not—source:SSRN AND Tag—is not—source:RePEc.

The unique contributions were: Scopus (143), EBSCO (127), RePEc (17), and SSRN (20), totaling 307 unique items. This revealed substantial overlap across databases—a 23% duplication rate—but confirmed the value of searching multiple sources. Scopus and EBSCO contributed the vast majority of unique material, while RePEc and SSRN added relatively little, likely because many working papers eventually appear in journals already indexed elsewhere. The source-tagging approach proved essential for PRISMA-compliant reporting and for evaluating whether the search strategy could be simplified in future work.

At this point—361 unique items, organized by source, with metadata imported and provenance tracked—the dataset was ready for export. I exported the MASTER collection from Zotero as an RIS file and uploaded it to SysRev for title-abstract screening. PDFs had still not been downloaded. That step would wait until after screening reduced the set to a more manageable number of likely-relevant studies.

Organizing Zotero collections deliberately

Each database—Scopus, Business Source Complete, SSRN, RePEc—was represented by its own Zotero collection. This was not just an organizational convenience. It preserves provenance, facilitates transparent PRISMA reporting later, and makes deduplication more controlled. Deduplication was postponed intentionally until after all databases had been ingested. Deduplicating mid-stream risks inconsistent merges and lost provenance information.8 ChatGPT did not make these decisions for me, but it helped me articulate why I was making them—and why certain temptations (such as deleting "obviously irrelevant" records early) should be resisted.

What ChatGPT did—and did not—do

It is worth being explicit about the limits of ChatGPT's role here.

ChatGPT did not:

search databases on my behalf,
identify "the right" papers,
decide which studies were relevant,
or replace judgment at any stage.

What it did do was function as an always-available methodologist: asking what a given restriction accomplished, whether it aligned with the research question, and how a reviewer might view the choice later. In several cases, that prevented me from making changes that would have been convenient but hard to justify.

Lessons from the process

A few general lessons emerged from this exercise.

First, search design is not trivial, even for apparently straightforward topics. Database behavior matters enormously.

Second, institutional constraints—such as lack of access to EconLit—are not failures, but they require transparent adaptation rather than quiet omission.

Third, separating conceptual decisions from technical implementation is critical. ChatGPT helped mainly with the former.

Finally, stopping rules matter. There is a point beyond which further refinement yields diminishing returns and increasing risk of bias.

Looking ahead

At the point where this post ends, all records from Scopus and Business Source Complete were safely stored in Zotero, organized by database, and ready for deduplication and title–abstract screening in Sysrev. PDFs had not yet entered the picture, and that was intentional.

In my next post, I will describe how screening proceeded and how AI-assisted tools can—and cannot—support that stage. But that is a separate problem. Searching is hard enough on its own!

Final reflection

Using ChatGPT in this process did not make searching automatic or effortless. What it did was slow me down in the right places. It encouraged explicit reasoning, made tacit judgments visible, and helped me respond to real-world constraints in a principled way.

For me, that is where conversational AI adds value in academic research—not by replacing expertise, but by making it more self-aware.