Perelman School of Medicine at the University of Pennsylvania
Introduction: The grading of antenatal hydronephrosis (ANH) on postnatal renal ultrasounds dictates management of patients with ANH. Multiple grading systems attempt to standardize hydronephrosis grading, yet poor inter-observer reliability persists. Machine learning methods may provide tools to improve accuracy and efficiency of hydronephrosis grading. We sought to develop an automated convolutional neural network (CNN) model to classify renal ultrasounds according to the Urinary Tract Dilation (UTD) system as an adjunct for hydronephrosis grading. Methods: A retrospective, single-institution cohort of postnatal renal ultrasounds from patients with and without hydronephrosis with radiologist UTD grading was obtained. An algorithm used image labels to select sagittal renal images from each study. These preprocessed images were analyzed by a VGG16 pre-trained ImageNet CNN model. 3K-fold stratified cross-validation was used to predict on the dataset. The model classified images into five UTD classes (normal, mild hydronephrosis not meeting criteria for P1, P1, P2, or P3). These predictions were compared to radiologist grading. Confusion matrices evaluated model performance. Gradient class activation mapping demonstrated image features driving model predictions. Results: We identified 610 patients with 2,828 renal ultrasound series. Per radiologist grading, 847 were normal, 601 were mild hydronephrosis, 336 were P1, 662 were P2, and 442 were P3. The model predicted hydronephrosis grade with 79% overall accuracy and classified 97% of the studies correctly or within one grade of the radiologist grade. The model classified 94% of normal studies, 74% of mild hydronephrosis studies, 68% of P1 ultrasounds, 74% of P2 ultrasounds, and 90% of P3 ultrasounds accurately. Gradient class activation mapping demonstrated the renal collecting system drove the model’s predictions. Conclusions: An automated CNN model classified hydronephrosis on renal ultrasounds by the UTD system with promising accuracy compared to prior studies. The model graded images based on the UTD system using appropriate imaging features. These findings suggest a possible adjunctive role for machine learning systems in the grading of ANH. SOURCE OF Funding: Pennsylvania Department of Health Tobacco Grant Funding Agency Number SAP #4100085749