Breast cancer detection accuracy of AI in an entire screening population: a retrospective, multicentre study

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Breast cancer detection accuracy of AI in an entire screening population : a retrospective, multicentre study. / Elhakim, Mohammad Talal; Stougaard, Sarah Wordenskjold; Graumann, Ole; Nielsen, Mads; Lång, Kristina; Gerke, Oke; Larsen, Lisbet Brønsro; Rasmussen, Benjamin Schnack Brandt.

In: Cancer Imaging, Vol. 23, No. 1, 127, 2023.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Elhakim, MT, Stougaard, SW, Graumann, O, Nielsen, M, Lång, K, Gerke, O, Larsen, LB & Rasmussen, BSB 2023, 'Breast cancer detection accuracy of AI in an entire screening population: a retrospective, multicentre study', Cancer Imaging, vol. 23, no. 1, 127. https://doi.org/10.1186/s40644-023-00643-x

APA

Elhakim, M. T., Stougaard, S. W., Graumann, O., Nielsen, M., Lång, K., Gerke, O., Larsen, L. B., & Rasmussen, B. S. B. (2023). Breast cancer detection accuracy of AI in an entire screening population: a retrospective, multicentre study. Cancer Imaging, 23(1), [127]. https://doi.org/10.1186/s40644-023-00643-x

Vancouver

Elhakim MT, Stougaard SW, Graumann O, Nielsen M, Lång K, Gerke O et al. Breast cancer detection accuracy of AI in an entire screening population: a retrospective, multicentre study. Cancer Imaging. 2023;23(1). 127. https://doi.org/10.1186/s40644-023-00643-x

Author

Elhakim, Mohammad Talal ; Stougaard, Sarah Wordenskjold ; Graumann, Ole ; Nielsen, Mads ; Lång, Kristina ; Gerke, Oke ; Larsen, Lisbet Brønsro ; Rasmussen, Benjamin Schnack Brandt. / Breast cancer detection accuracy of AI in an entire screening population : a retrospective, multicentre study. In: Cancer Imaging. 2023 ; Vol. 23, No. 1.

Bibtex

@article{b9d08b6ba8644aadb81fe2142aa056a6,
title = "Breast cancer detection accuracy of AI in an entire screening population: a retrospective, multicentre study",
abstract = "Background: Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. Methods: We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching at mean first reader sensitivity (AIsens) and specificity (AIspec). Reference standard was histopathology-proven breast cancer or cancer-free follow-up within 24 months. Coprimary endpoints were sensitivity and specificity, and secondary endpoints were positive predictive value (PPV), negative predictive value (NPV), recall rate, and arbitration rate. Accuracy estimates were calculated using McNemar{\textquoteright}s test or exact binomial test. Results: Out of 272,008 screening mammograms from 158,732 women, 257,671 (94.7%) with adequate image data were included in the final analyses. Sensitivity and specificity were 63.7% (95% CI 61.6%-65.8%) and 97.8% (97.7-97.8%) for first reader, and 73.9% (72.0-75.8%) and 97.9% (97.9-98.0%) for combined reading, respectively. Standalone AIsens showed a lower specificity (-1.3%) and PPV (-6.1%), and a higher recall rate (+ 1.3%) compared to first reader (p < 0.0001 for all), while Standalone AIspec had a lower sensitivity (-5.1%; p < 0.0001), PPV (-1.3%; p = 0.01) and NPV (-0.04%; p = 0.0002). Compared to combined reading, Integrated AIsens achieved higher sensitivity (+ 2.3%; p = 0.0004), but lower specificity (-0.6%) and PPV (-3.9%) as well as higher recall rate (+ 0.6%) and arbitration rate (+ 2.2%; p < 0.0001 for all). Integrated AIspec showed no significant difference in any outcome measures apart from a slightly higher arbitration rate (p < 0.0001). Subgroup analyses showed higher detection of interval cancers by Standalone AI and Integrated AI at both thresholds (p < 0.0001 for all) with a varying composition of detected cancers across multiple subgroups of tumour characteristics. Conclusions: Replacing first reader in double reading with an AI could be feasible but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload.",
keywords = "Artificial intelligence, Breast cancer, Deep learning, Double reading, Mammography screening",
author = "Elhakim, {Mohammad Talal} and Stougaard, {Sarah Wordenskjold} and Ole Graumann and Mads Nielsen and Kristina L{\aa}ng and Oke Gerke and Larsen, {Lisbet Br{\o}nsro} and Rasmussen, {Benjamin Schnack Brandt}",
note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",
year = "2023",
doi = "10.1186/s40644-023-00643-x",
language = "English",
volume = "23",
journal = "Cancer Imaging",
issn = "1740-5025",
publisher = "BioMed Central Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - Breast cancer detection accuracy of AI in an entire screening population

T2 - a retrospective, multicentre study

AU - Elhakim, Mohammad Talal

AU - Stougaard, Sarah Wordenskjold

AU - Graumann, Ole

AU - Nielsen, Mads

AU - Lång, Kristina

AU - Gerke, Oke

AU - Larsen, Lisbet Brønsro

AU - Rasmussen, Benjamin Schnack Brandt

N1 - Publisher Copyright: © 2023, The Author(s).

PY - 2023

Y1 - 2023

N2 - Background: Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. Methods: We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching at mean first reader sensitivity (AIsens) and specificity (AIspec). Reference standard was histopathology-proven breast cancer or cancer-free follow-up within 24 months. Coprimary endpoints were sensitivity and specificity, and secondary endpoints were positive predictive value (PPV), negative predictive value (NPV), recall rate, and arbitration rate. Accuracy estimates were calculated using McNemar’s test or exact binomial test. Results: Out of 272,008 screening mammograms from 158,732 women, 257,671 (94.7%) with adequate image data were included in the final analyses. Sensitivity and specificity were 63.7% (95% CI 61.6%-65.8%) and 97.8% (97.7-97.8%) for first reader, and 73.9% (72.0-75.8%) and 97.9% (97.9-98.0%) for combined reading, respectively. Standalone AIsens showed a lower specificity (-1.3%) and PPV (-6.1%), and a higher recall rate (+ 1.3%) compared to first reader (p < 0.0001 for all), while Standalone AIspec had a lower sensitivity (-5.1%; p < 0.0001), PPV (-1.3%; p = 0.01) and NPV (-0.04%; p = 0.0002). Compared to combined reading, Integrated AIsens achieved higher sensitivity (+ 2.3%; p = 0.0004), but lower specificity (-0.6%) and PPV (-3.9%) as well as higher recall rate (+ 0.6%) and arbitration rate (+ 2.2%; p < 0.0001 for all). Integrated AIspec showed no significant difference in any outcome measures apart from a slightly higher arbitration rate (p < 0.0001). Subgroup analyses showed higher detection of interval cancers by Standalone AI and Integrated AI at both thresholds (p < 0.0001 for all) with a varying composition of detected cancers across multiple subgroups of tumour characteristics. Conclusions: Replacing first reader in double reading with an AI could be feasible but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload.

AB - Background: Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. Methods: We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching at mean first reader sensitivity (AIsens) and specificity (AIspec). Reference standard was histopathology-proven breast cancer or cancer-free follow-up within 24 months. Coprimary endpoints were sensitivity and specificity, and secondary endpoints were positive predictive value (PPV), negative predictive value (NPV), recall rate, and arbitration rate. Accuracy estimates were calculated using McNemar’s test or exact binomial test. Results: Out of 272,008 screening mammograms from 158,732 women, 257,671 (94.7%) with adequate image data were included in the final analyses. Sensitivity and specificity were 63.7% (95% CI 61.6%-65.8%) and 97.8% (97.7-97.8%) for first reader, and 73.9% (72.0-75.8%) and 97.9% (97.9-98.0%) for combined reading, respectively. Standalone AIsens showed a lower specificity (-1.3%) and PPV (-6.1%), and a higher recall rate (+ 1.3%) compared to first reader (p < 0.0001 for all), while Standalone AIspec had a lower sensitivity (-5.1%; p < 0.0001), PPV (-1.3%; p = 0.01) and NPV (-0.04%; p = 0.0002). Compared to combined reading, Integrated AIsens achieved higher sensitivity (+ 2.3%; p = 0.0004), but lower specificity (-0.6%) and PPV (-3.9%) as well as higher recall rate (+ 0.6%) and arbitration rate (+ 2.2%; p < 0.0001 for all). Integrated AIspec showed no significant difference in any outcome measures apart from a slightly higher arbitration rate (p < 0.0001). Subgroup analyses showed higher detection of interval cancers by Standalone AI and Integrated AI at both thresholds (p < 0.0001 for all) with a varying composition of detected cancers across multiple subgroups of tumour characteristics. Conclusions: Replacing first reader in double reading with an AI could be feasible but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload.

KW - Artificial intelligence

KW - Breast cancer

KW - Deep learning

KW - Double reading

KW - Mammography screening

U2 - 10.1186/s40644-023-00643-x

DO - 10.1186/s40644-023-00643-x

M3 - Journal article

C2 - 38124111

AN - SCOPUS:85180254657

VL - 23

JO - Cancer Imaging

JF - Cancer Imaging

SN - 1740-5025

IS - 1

M1 - 127

ER -

ID: 378184756