Machine Learning and Artificial Intelligence
Graham Cole, MD, PhD
Dr
Imperial College Healthcare NHS Trust, United States
Sameer Zaman, MD
Cardiology Clinical Research Fellow
Imperial College Healthcare NHS Trust (supported by UK Research and Innovation [UKRI Centre for Doctoral Training in AI for Healthcare grant number EP/S023283/1), United Kingdom
Kavitha Vimalesvaran
Dr
Imperial College Healthcare NHS Trust supported by UK Research and Innovation [UKRI Centre for Doctoral Training in AI for Healthcare grant number EP/S023283/1, United Kingdom
James P. Howard, MD, MA
Dr
Imperial College Healthcare NHS Trust (supported by The British Heart Foundation [FS/ICRF/22/26039]), England, United Kingdom
Digby Chappell
Mr
Imperial College London (supported by UK Research and Innovation [UKRI Centre for Doctoral Training in AI for Healthcare grant number EP/S023283/1), United Kingdom
Marta Varela
Dr
National Heart and Lung Institute, Imperial College London (supported by The British Heart Foundation Centre of Research Excellence at Imperial College London [RE/18/4/34215]), United Kingdom
Nicholas Peters
Prof
National Heart and Lung Institute, Imperial College London, United Kingdom
Darrel Francis, MD
Prof
National Heart and Lung Institute, Imperial College London, United Kingdom
Anil Bharath
Prof
Imperial College London, United Kingdom
Nick Linton
Dr
Imperial College Healthcare NHS Trust, United Kingdom
Cardiac MRI (CMR) generates huge imaging datasets; the potential of these datasets for machine learning is limited by a lack of high volume labelled training data (1). Getting the most labelling value from expert clinicians’ limited labelling time is a major challenge in the development of deep learning for CMR (2). We present a novel method for ground-truth labelling of CMR image data by leveraging multiple clinician experts ranking multiple images on a single ordinal axis, rather than manual labelling of one image at a time. We apply this strategy to train a deep learning model to classify the anatomical position of CMR images. This allows the automated removal of slices that do not contain the left ventricular (LV) myocardium.
Methods:
Anonymised late gadolinium LV short-axis slices from 300 random scans (3552 individual images) were extracted from a clinical CMR database. The anatomical position of each image slice relative to the LV was labelled using two different strategies performed for five hours each on a bespoke clinical image annotation platform (3): (i) ’One-image-at-a-time' strategy: each slice was labelled according to its position: ‘too basal’, ‘LV’ or ‘too apical’ individually by one of three experts (Figure 1) and (ii) ‘Multiple-image-ranking’: Three independent experts ordered image slices according to their relative position from ‘most-basal’ to ‘most apical’ in batches of 8 until each image had been viewed at least 3 times (Figure 2). The labelled datasets were split into training, validation and test sets (80:10:10). Two convolutional neural networks (based on the Resnet architecture) were trained for a 3-way classification task (each model using data from one labelling strategy). The models’ performance was evaluated by accuracy, F1-score and area under the receiver operating characteristics curve (ROC AUC).
Results:
229 images were excluded due to artefact, leaving 3323 remaining images labelled by both strategies. The model trained using labels from the ‘multiple-image-ranking strategy’ performed better than the model using the ‘one-image-at-a-time’ labelling strategy (Accuracy 86% vs 72% (p=0.02); F1-score 0.86 vs 0.75; ROC AUC 0.95 vs 0.86)(Figure 2). For expert clinicians performing this task manually the intra-observer variability was low (Cohen’s Kappa = 0.90), but the inter-observer variability was higher (Cohen’s Kappa = 0.77).
Conclusion:
We present proof of concept that, given the same clinician labelling effort, comparing multiple images side-by-side using a ‘multiple-image-ranking’ strategy achieves ground truth labels for deep learning more accurately than by classifying images individually (Table 1). We demonstrate a potential clinical application: the automatic removal of unrequired CMR images. This leads to increased efficiency by focussing human and machine attention on images which are needed to answer clinical questions.