staff scientist Lister Hill National Center for Biomedical Communications
Objectives: The importance of observational data in healthcare research continues to grow due to its ability to provide valuable contextual information about real-world treatments and cost-effective opportunities for large-scale studies. Currently, no standards exist to guide the collection and presentation of article citations of observational data research literature. Using an existing standard such as the Medical Subject Headings controlled vocabulary can introduce an improved method for revealing themes in this dynamic subset of healthcare research. This project proposes a methodology that informatics librarians and researchers can use to analyze the existing corpus of research in the bibliographies of selected observational databases.
Methods: Clinical Practice Research Datalink (CPRD) was chosen to serve as the gold standard observational database for the development of a methodology to characterize research outputs. Using Python and the NCBI E-Utilities, metadata was collected on articles listed in CPRD’s bibliography and analyzed to develop frequency tables for MeSH keywords. The methodology was applied to two more observational databases – PEDSnet and FDA Sentinel – with the goal of reproducing the methodology with different datasets and identifying potential challenges.
Results: This project developed a broad methodology for characterizing research by identifying top keywords and creating visualizations for CPRD, PEDSnet, and FDA Sentinel observational databases. The top twenty keywords for each database were identified and visualizations demonstrating the breakdown of research were created. Additionally, the project identified key barriers to collecting the observational database bibliographies and proposed areas for future research in this area.
Conclusions: The methodology developed in this project demonstrates one way of characterizing observational data research through the MeSH controlled vocabulary specifically for observational databases tracking research outputs through a bibliography which can be replicated by librarians and researchers for other similar databases. Future research in this area should consider how to automate identification of observational data research and construct bibliographies for databases that are not regularly tracking research outputs.