Objectives: Each year, National Library of Medicine analysts evaluate proposed terms for inclusion in the Medical Subject Headings (MeSH) controlled vocabulary. This project demonstrates how Python can automate stages of new MeSH term analysis workflow, and how librarians and researchers can use this same method for insight into whether their preferred term is likely to appear in MeSH in the near future, and obtain the scope of how their preferred term has been previously indexed.
Methods: Two features of new MeSH term analysis were implemented and refined through this project, and one feature was exploratorily designed utilizing original Python code supported by existing open source Python libraries, potential new MeSH terms, and PubMed citation information. Dan Cho, Lead MeSH Analyst and sponsor for this project, created a Jupyter Notebook which two NLM Associate Fellows refined, identifying and testing areas where computational efficiency and user results could be improved. Finally, the associates conducted research on natural language processing techniques in Python and established preliminary parameters for a fully developed NLP feature.
Results: Three measurable efficiency improvements were identified through comparison of computational processing time, and the modifications were integrated into the process drafted in the original Jupyter Notebook. In total, these improvements resulted in a reduction of computational processing time of approximately 90%; the reduction was not uniform in testing times. However, this project provides the functionality to accept a user's term input, count its appearances in titles and abstracts in MEDLINE PubMed citations, and then return the percentages of existing MeSH terms under which this selected group of citations has been previously indexed.
Conclusions: The modifications to new MeSH term analysis this project presents improve the process of quickly getting scoping information about existing and potential MeSH terms, and the literature associated with them. Therefore, further testing and implementation of these automation processes in a format that does not require Python knowledge would be a useful functional progression. Future research could explore the viability of natural language processing solutions to further inform analysts and provide additional context.