Erik A. Holzwanger, MD1, Mohammad Bilal, MD2, Jeremy R. Glissen Brown, MD2, Shailendra Singh, MD3, Aymeric Becq, MD4, Kenneth Ernest-Suarez, MD5, Tyler M. Berzin, MD2; 1Tufts University Medical Center, Boston, MA; 2Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA; 3West Virginia University, Charleston, WV; 4Saint Antoine Hospital, Boston, MA; 5Hospital México, University of Costa Rica, Hospital México, University of Costa Rica, San Jose, San Jose, Costa Rica
Introduction: An important outcome in most studies of computer-aided colon polyp detection (CADe) is the number of false positive (FP) alarms. However, there is no consensus definition regarding how a false positive should be defined for CADe polyp detection, and definitions vary substantially across studies, making it difficult to compare the performance of different CADe systems. We aimed to study the diagnostic performance of CADe during colonoscopy based on different threshold definitions for FP alarms. Methods: A previously validated CADe system for colon polyp detection was applied to 62 colonoscopy videos. Different benchmarks of false positive alerts were determined based on the time an alarm box was continuously traced by the system. The different benchmarks were: i) > 0.5 seconds [Group 1], ii) > 1 second [Group 2], and iii) > 2 seconds [Group 3]. Our primary outcome was to evaluate the variation in FP results using different FP benchmarks defined above, and its impact on specificity of the CADe system. Results: A total of 62 colonoscopies were analyzed. There were a total of 1635 false positives (mean 26.3 FP per colonoscopy). There were 111 FPs in group 1, 23 FPs in group 2 and 3 FPs in group 3. With an FP threshold of > 0.5 seconds, specificity and accuracy values were found to be 93.2% and 97.8% respectively, with an FP threshold of > 1 seconds, specificity and accuracy were 98.6 % and 99.5% and when the FP definition was changed to > 2 seconds, specificity and accuracy were reported at 99.8% and 99.9%, respectively. Fair or poor bowel preparation scores were associated with a higher rate of false positive alarms [OR 12.5, (95% CI: 2.5-63.3), p-value 0.002]. Discussion: Our analysis demonstrates how different threshold definitions for false positives can dramatically impact the number of reported FP results and thus, the perceived diagnostic performance of CADe for colon polyp detection. We suggest that a consensus benchmark for defining false positives is needed to standardize the interpretation of data for CADe in colonoscopy. The presence of a false positive alert box for >2 seconds is a reasonable and clinically practical threshold to use for future clinical studies on CADe.
Table 1. Patient and Procedural Characteristics
Table 2. Diagnostic performance of Computer Aided Detection using different benchmarks for False Positive Alerts
Disclosures: Erik Holzwanger indicated no relevant financial relationships. Mohammad Bilal indicated no relevant financial relationships. Jeremy Glissen Brown indicated no relevant financial relationships. Shailendra Singh indicated no relevant financial relationships. Aymeric Becq indicated no relevant financial relationships. Kenneth Ernest-Suarez indicated no relevant financial relationships. Tyler Berzin: Boston Scientific – Consultant. Fujifilm – Consultant. Medtronic – Consultant. Wision AI – Consultant.