Clinical Validation of Artificial Intelligence–Augmented Pathology Diagnosis Demonstrates Significant Gains in Diagnostic Accuracy in Prostate Cancer Detection
Patricia Raciti, MD; Jillian Sue, MS; Juan A. Retamero, MD; Rodrigo Ceballos, MSc; Ran Godrich, MS; Jeremy D. Kunz, MSc; Adam Casson, BS; Dilip Thiagarajan, MS; Zahra Ebrahimzadeh, MSc; Julian Viret, MEng; Donghun Lee, MEng; Peter J. Schüffler, DrSc; George DeMuth, MS; Emre Gulturk, MSc; Christopher Kanan, PhD; Brandon Rothrock, PhD; Jorge Reis-Filho, MD, PhD, FRCPath; David S. Klimstra, MD; Victor Reuter, MD; Thomas J. Fuchs, DrSc
Independent real-world application of a clinical-grade automated prostate cancer detection system
Leonard M da Silva, Emilio M Pereira, Paulo Go Salles, Ran Godrich, Rodrigo Ceballos, Jeremy D Kunz, Adam Casson, Julian Viret, Sarat Chandarlapaty, Carlos Gil Ferreira, Bruno Ferrari, Brandon Rothrock, Patricia Raciti, Victor Reuter, Belma Dogdas , George DeMuth , Jillian Sue, Christopher Kanan , Leo Grady , Thomas J Fuchs, Jorge S Reis-Filho, artificial intelligence; deep learning; diagnosis; histopathology; machine learning; prostate cancer; screening, Artificial intelligence (AI)-based systems applied to histopathology whole-slide images have the potential to improve patient care through mitigation of challenges posed by diagnostic variability, histopathology caseload, and shortage of pathologists. We sought to define the performance of an AI-based automated prostate cancer detection system, Paige Prostate, when applied to independent real-world data. The algorithm was employed to classify slides into two categories: benign (no further review needed) or suspicious (additional histologic and/or immunohistochemical analysis required). We assessed the sensitivity, specificity, positive predictive values (PPVs), and negative predictive values (NPVs) of a local pathologist, two central pathologists, and Paige Prostate in the diagnosis of 600 transrectal ultrasound-guided prostate needle core biopsy regions (‘part-specimens’) from 100 consecutive patients, and to ascertain the impact of Paige Prostate on diagnostic accuracy and efficiency. Paige Prostate displayed high sensitivity (0.99; CI 0.96-1.0), NPV (1.0; CI 0.98-1.0), and specificity (0.93; CI 0.90-0.96) at the part-specimen level. At the patient level, Paige Prostate displayed optimal sensitivity (1.0; CI 0.93-1.0) and NPV (1.0; CI 0.91-1.0) at a specificity of 0.78 (CI 0.64-0.89). The 27 part-specimens considered by Paige Prostate as suspicious, whose final diagnosis was benign, were found to comprise atrophy (n = 14), atrophy and apical prostate tissue (n = 1), apical/benign prostate tissue (n = 9), adenosis (n = 2), and post-atrophic hyperplasia (n = 1). Paige Prostate resulted in the identification of four additional patients whose diagnoses were upgraded from benign/suspicious to malignant. Additionally, this AI-based test provided an estimated 65.5% reduction of the diagnostic time for the material analyzed. Given its optimal sensitivity and NPV, Paige Prostate has the potential to be employed for the automated identification of patients whose histologic slides could forgo full histopathologic review. In addition to providing incremental improvements in diagnostic accuracy and efficiency, this AI-based system identified patients whose prostate cancers were not initially diagnosed by three experienced histopathologists. © 2021 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.
An independent assessment of an artificial intelligence system for prostate cancer detection shows strong diagnostic accuracy
Modern Pathology, Sudhir Perincheri, Angelique Wolf Levi, Romulo Celli, Peter Gershkovich, David Rimm, Jon Stanley Morrow, Brandon Rothrock, Patricia Raciti, David Klimstra, John Sinard, Prostate cancer is a leading cause of morbidity and mortality for adult males in the US. The diagnosis of prostate carcinoma is usually made on prostate core needle biopsies obtained through a transrectal approach. These biopsies may account for a significant portion of the pathologists’ workload, yet variability in the experience and expertise, as well as fatigue of the pathologist may adversely affect the reliability of cancer detection. Machine-learning algorithms are increasingly being developed as tools to aid and improve diagnostic accuracy in anatomic pathology. The Paige Prostate AI-based digital diagnostic is one such tool trained on the digital slide archive of New York’s Memorial Sloan Kettering Cancer Center (MSKCC) that categorizes a prostate biopsy whole-slide image as either “Suspicious” or “Not Suspicious” for prostatic adenocarcinoma. To evaluate the performance of this program on prostate biopsies secured, processed, and independently diagnosed at an unrelated institution, we used Paige Prostate to review 1876 prostate core biopsy whole-slide images (WSIs) from our practice at Yale Medicine. Paige Prostate categorizations were compared to the pathology diagnosis originally rendered on the glass slides for each core biopsy. Discrepancies between the rendered diagnosis and categorization by Paige Prostate were each manually reviewed by pathologists with specialized genitourinary pathology expertise. Paige Prostate showed a sensitivity of 97.7% and positive predictive value of 97.9%, and a specificity of 99.3% and negative predictive value of 99.2% in identifying core biopsies with cancer in a data set derived from an independent institution. Areas for improvement were identified in Paige Prostate’s handling of poor quality scans. Overall, these results demonstrate the feasibility of porting a machine-learning algorithm to an institution remote from its training set, and highlight the potential of such algorithms as a powerful workflow tool for the evaluation of prostate core biopsies in surgical pathology practices.
Novel artificial intelligence system increases the detection of prostate cancer in whole slide images of core needle biopsies
Modern Pathology, Patricia Raciti, Jillian Sue, Rodrigo Ceballos, Ran Godrich, Jeremy D. Kunz, Supriya Kapur, Victor Reuter, Leo Grady, Christopher Kanan, David S. Klimstra, Thomas J. Fuchs, Prostate cancer (PrCa) is the second most common cancer among men in the United States. The gold standard for detecting PrCa is the examination of prostate needle core biopsies. Diagnosis can be challenging, especially for small, well-differentiated cancers. Recently, machine learning algorithms have been developed for detecting PrCa in whole slide images (WSIs) with high test accuracy. However, the impact of these artificial intelligence systems on pathologic diagnosis is not known. To address this, we investigated how pathologists interact with Paige Prostate Alpha, a state-of-the-art PrCa detection system, in WSIs of prostate needle core biopsies stained with hematoxylin and eosin. Three AP-board certified pathologists assessed 304 anonymized prostate needle core biopsy WSIs in 8 hours. The pathologists classified each WSI as benign or cancerous. After ~4 weeks, pathologists were tasked with re-reviewing each WSI with the aid of Paige Prostate Alpha. For each WSI, Paige Prostate Alpha was used to perform cancer detection and, for WSIs where cancer was detected, the system marked the area where cancer was detected with the highest probability. The original diagnosis for each slide was rendered by genitourinary pathologists and incorporated any ancillary studies requested during the original diagnostic assessment. Against this ground truth, the pathologists and Paige Prostate Alpha were measured. Without Paige Prostate Alpha, pathologists had an average sensitivity of 74% and an average specificity of 97%. With Paige Prostate Alpha, the average sensitivity for pathologists significantly increased to 90% with no statistically significant change in specificity. With Paige Prostate Alpha, pathologists more often correctly classified smaller, lower grade tumors, and spent less time analyzing each WSI. Future studies will investigate if similar benefit is yielded when such a system is used to detect other forms of cancer in a setting that more closely emulates real practice.
Clinical-grade computational pathology using weakly supervised deep learning on whole slide images
Nature Medicine, Gabriele Campanella, Matthew G. Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J. Busam, Edi Brogi, Victor E. Reuter, David S. Klimstra, Thomas J. Fuchs, The development of decision support systems for pathology and their deployment in clinical practice have been hindered by the need for large manually annotated datasets. To overcome this problem, we present a multiple instance learning-based deep learning system that uses only the reported diagnoses as labels for training, thereby avoiding expensive and time-consuming pixel-wise manual annotations. We evaluated this framework at scale on a dataset of 44,732 whole slide images from 15,187 patients without any form of data curation. Tests on prostate cancer, basal cell carcinoma and breast cancer metastases to axillary lymph nodes resulted in areas under the curve above 0.98 for all cancer types. Its clinical application would allow pathologists to exclude 65–75% of slides while retaining 100% sensitivity. Our results show that this system has the ability to train accurate classification models at unprecedented scale, laying the foundation for the deployment of computational decision support systems in clinical practice.