Machine Learning and Artificial Intelligence
Thomas Hadler, MSc
M. Sc.
Charité – Universitätsmedizin Berlin, ECRC, MDC, DZHK, Germany
Thomas Hadler, MSc
M. Sc.
Charité – Universitätsmedizin Berlin, ECRC, MDC, DZHK, Germany
Jan Gröschel, MD
MD
Charité – Universitätsmedizin Berlin, ECRC, MDC, DZHK, Germany
Steffen Lange
Prof.
Department of Computer Sciences, Hochschule Darmstadt - University of Applied Sciences, Darmstadt, Germany, Germany
Jeanette Schulz-Menger, MD
Professor
Charité – Universitätsmedizin Berlin, ECRC, MDC, Helios Klinikum Berlin Buch, DZHK, Berlin, Germany
Berlin, Berlin, Germany
CMR provides state-of-the-art cardiac volume and mass assessments by contouring the ventricles (RV, LV) and the left myocardium (LVM)1. Artificial intelligence (AI) approaches, such as UNets and architectural relatives, calculate clinical parameters in the range of expert interobserver differences2. However, AIs make human-atypical segmentation errors that disregard cardiac geometry3, impeding their integration into clinical routine. Enhancing AIs with the ability to reflect on the quality of their segmentations would be a step towards trustworthy AI in clinical routine (Fig. 1).
The aim of this abstract is to introduce an approach for a reflective AI and evaluate its proficiency.
Methods: We designed such a reflective AI, which offers both state-of-the-art segmentations, and in addition to this, quality estimations of these segmentation. This is implemented by adding permanent dropout layers to a UNet (Dropunet). First, it produces a segmentation S for the input image. Next, it computes – based on Monte Carlo dropout4 – further segmentations for the same image, which are compared to S. We use three metrics for these comparisons: Dice similarity coefficient (DSC), Hausdorff distance (HD) and the area difference (AD). Finally, the comparison results are fed to a regression algorithm to compute the quality estimation of segmentation S (Fig. 2). The Dropunet segmented short-axis cine images of 150 patients taken from clinical routine and acquired with both a bSSFP and a compressed sensing (CS) sequence (1.5T Avanto Fit, Siemens). This dataset’s bSSFP and CS images were contoured by an expert reader, producing 300 image sets with RV, LV and LVM segmentations. The Dropunet was trained three times on two thirds of the dataset to predict segmentations of the remaining third. It produced 31 distinct segmentation predictions per image, allowing for the first segmentation to be compared to the other 30 predictions for quality estimates. The Pearson’s correlation coefficient was calculated for predicted and real metric values. Interactive Dice correlation plots allowed the visualization of outliers.
Results: The Dropunet produced segmentations for LV, RV and LVM with high agreement between network and expert on segmentation metrics DSC/HD/AD (Mean: 92%/4mm/0cm²) and clinical parameters (Mean±std of differences: LV end-systolic volume (ESV): 1.4ml±9ml, LV end-diastolic volume (EDV): 1.4ml±11ml, RVESV: -1ml±13ml, RVEDV: 3.5ml±15ml, LVM Mass: 2g±11g). Predicted Dice values and calculated error estimations that correlated well (68%/76%/67%, 61%/84%/70%, 66%/82%/36% for RV, LV, LVM respectively). Visual inspection of the correlation plots revealed significant outliers (Fig. 3).
Conclusion: Our approach towards a reflective AI, the Dropunet, produces excellent segmentations and clinical results with the added benefit of promising quality assessments of these segmentations. Reflective AIs that offer error warnings for manual interventions will be more easily integrated into clinical routine.