Blog

enero 22, 2024

The Virchow Foundation Model, Explained: A Q&A with an AI Scientist

On January 18, 2024 Paige released new performance results on Virchow. Some of these results are detailed below but can be read in full on ArXiv.

In the Fall of 2023, Paige announced its collaboration with Microsoft Research to build the world’s largest image-based AI model to fight cancer. True to its commitment, Paige promptly delivered by releasing results for the million-slide digital pathology Foundation Model named Virchow1. In the few weeks after the initial results were made public¹, substantial advancements have already been achieved in enhancing the model’s capabilities.

To shed light on Paige’s unique approach to this groundbreaking version of the Foundation Model, we sat down with Siqi Liu, Director of AI Science, to gain a deeper understanding of Paige’s methodology and its implications for the future of AI in cancer diagnosis and treatment.

Q: Can you provide an overview of the updates made to the Foundation Model, Virchow?

A: We are thrilled to introduce Virchow V1, a major upgrade to our Foundation Model, now fully trained on 1.5 million H&E-stained slides for superior task performance. The Model produces detailed tile embeddings from whole slide images (WSIs), offering a rich, data-driven foundation for a broad array of digital pathology applications. These embeddings can be seen as intricate digital fingerprints of tissue tiles, capturing unique histological features that empower pathologists with advanced insights for diagnosis, research, and tailored patient care.

The updated model represents a true breakthrough in computational pathology for several reasons:

First, the novel Virchow embeddings have enabled us to develop a pan-tumor detection system with the proficiency to identify cancer across various organ types, one of the first of its kind for pathology. Our findings demonstrate that the Virchow embeddings surpass baseline models in accuracy, exhibiting exceptional performance especially in the detection of rare cancers.

Additionally, our updated model leads in tile-level benchmarks, surpassing both baselines and its predecessor. This includes a range of public and internally curated pan-tissue benchmarks, reinforcing our model’s robustness and versatility.

Finally, our efforts have also advanced the frontiers of predictive analytics, as the model exhibits extraordinary precision in pinpointing digital biomarkers, a testament to the potential of AI and machine learning in enhancing diagnostic methodologies.

Q: What are some of the early results of the pan-tumor model? What does this mean for the future of AI in cancer detection?

A: Through well-designed experiments leveraging high-quality clinical data, our pan-tumor model—powered by Virchow’s embeddings—has been found to excel in detecting a broad spectrum of cancers. It shows a specimen-level AUC of 0.95 for common cancers and 0.93 for rare cancers occurring in fewer than 40,000 cases annually in the US.

The pan-tumor model’s effectiveness soundly demonstrates the Foundation Model’s strength, bolstering our confidence in a unified AI-driven approach, and setting the stage for continued development of tools that can support pathologists in critical areas like rare tumor detection that have previously been lacking AI support. The Virchow model also enables the simultaneous development of multiple AI applications, encompassing cancer detection, grading, subtyping, measurement, quantification, segmentation, etc. This will significantly enhance Paige’s ability to develop robust product suites across many tissue types, offering the comprehensive support to accompany pathologists throughout their clinical workflow.

Q: Why were other applications not previously able to effectively identify rare cancers? How has the Foundation Model solved this issue?

A: Traditional machine learning models often struggle with rare cancers due to limited data, impeding accurate pattern recognition necessary for diagnosis. Our Foundation Model overcomes these challenges by utilizing a diverse, large-scale dataset that enables the model to learn on a vast array of tissue types, understand cancer morphology, and apply this knowledge to effectively discern the tissue patterns of rare cancers despite data scarcity.

Q: Why is training on a million-slide dataset crucial for advancing digital pathology imaging?

A: Training on a massive, million-slide dataset is vital for digital pathology to create algorithms that are precise and universally applicable. Such extensive data encompasses the diverse and complex scenarios encountered in clinical practice, ensuring that the algorithms are not only empirically robust but also clinically valid.

Real-world clinical case data is ideal for Foundation Models as it includes the subtle variations and intricate patterns necessary for accurate diagnostics, especially for rare conditions often underrepresented in smaller datasets. This equips the model to perform effectively in real-world medical settings, leading to improved patient care. The dataset also includes common artefacts seen on Whole Slide Images, such as cracks, bubbles, dust, pen-marking, variations in slide preparation and staining reagents, etc.

For example, powered by robust clinical data, the new pan-tumor model can recognize cancers that pathologists, especially in smaller hospitals around the globe, may not have seen before, ensuring that these cancers are not overlooked and that all patients have access to the best possible care.

Q: How is the Virchow model tailored to meet the demands of real-world digital pathology applications?

A: The model employs a Vision Transformer (ViT-H), striking a strategic balance between model representation power and computational cost. This makes it both powerful enough to process complex pathology data and cost-effective for widespread use in real clinical environments, thus aligning with the practical needs of digital pathology products.

Q: What makes Virchow V1 the state-of-the-art Foundation Model for computational pathology?

A: When we describe the Virchow model as ‘state-of-the-art,’ we’re referring to:

Its exceptional performance across a spectrum of benchmark tasks specifically selected for their relevance to practical digital pathology
The use of the most advanced computer vision and AI technologies to-date that were tailored for computational pathology.

This superior level of performance indicates that products developed using our Foundation Model are poised to deliver greater clinical impact. By integrating the latest advancements in AI and machine learning, these products can enhance diagnostic precision, accelerate pathology workflows, and support faster pharmaceutical research. The term ‘state-of-the-art’ reflects not just our current technological excellence but also the potential to transform future practices in digital pathology, helping to improve patient outcomes and unlock more efficient healthcare delivery systems.

Q: What’s next for Virchow?

A: We plan to broaden our benchmark suite, enhancing the testing of the Virchow V1 model across a wider spectrum of computational pathology applications. This expansion will yield deeper insights for ongoing improvements and pinpoint how Virchow’s embeddings can be leveraged to enhance existing and develop new digital pathology AI applications.

In parallel, we’re excited to advance the development of Virchow V2. In collaboration with the exceptional team at Microsoft Research, our focus will be twofold:

Firstly, we aim to significantly enlarge the training dataset and model size while also extending our capabilities to include a broader spectrum of stains beyond H&E.

Secondly, we’re dedicated to refining our training methodologies to establish the most effective strategies for model research and development.

Together, these two goals ultimately will work in tandem to evolve the concept of a Foundation Model in digital pathology and oncology broadly. With cutting-edge algorithm research, Virchow V2 is set to surpass the current capabilities of Virchow V1, paving the way for new innovations in the field that will positively impact patients.

In advancing our roadmap, we aim to complement the collective efforts in the academic field and the industry. Our goal with the Virchow V1 model is to highlight the significant potential of scaling in computational pathology. We are committed to collaborating and sharing our findings with the broader research community to collectively push the boundaries of the field.

To learn more about Paige’s Foundational Model and the published results, view our publication in ArXiv.

—

¹Vorontsov E, Bozkurt A, Casson AI et al. VIRCHOW: A MILLION-SLIDE DIGITAL PATHOLOGY FOUNDATION MODEL. ArXiv. Updated Preprint posted online January 18, 2024. arXiv:2309.07778v5

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necesario

Siempre activado

Las cookies necesarias son absolutamente imprescindibles para que el sitio web funcione correctamente. Estas cookies garantizan las funcionalidades básicas y las características de seguridad del sitio web, de forma anónima.

Cookie	Duración	Descripción
__cf_bm		Esta cookie, establecida por Cloudflare, se utiliza para dar soporte a Cloudflare Bot Management.
__hssrc		Esta cookie es establecida por Hubspot cada vez que cambia la cookie de sesión. La cookie __hssrc puesta a 1 indica que el usuario ha reiniciado el navegador, y si la cookie no existe, se asume que es una nueva sesión.
cookielawinfo-checkbox-advertisement		Establecida por el plugin GDPR Cookie Consent, esta cookie se utiliza para registrar el consentimiento del usuario para las cookies de la categoría "Publicidad" .
cookielawinfo-checkbox-analytics		Establecida por el plugin GDPR Cookie Consent, esta cookie se utiliza para registrar el consentimiento del usuario para las cookies de la categoría "Analytics" .
cookielawinfo-checkbox-funcional		La cookie es establecida por el plugin GDPR Cookie Consent para registrar el consentimiento del usuario para las cookies en la categoría "Funcional".
cookielawinfo-checkbox-necessary		Establecida por el plugin GDPR Cookie Consent, esta cookie se utiliza para registrar el consentimiento del usuario para las cookies de la categoría "Necesaria" .
cookielawinfo-checkbox-others		Establecida por el plugin GDPR Cookie Consent, esta cookie se utiliza para almacenar el consentimiento del usuario para las cookies de la categoría "Otros".
cookielawinfo-checkbox-performance		Establecida por el plugin GDPR Cookie Consent, esta cookie se utiliza para almacenar el consentimiento del usuario para las cookies de la categoría "Rendimiento".
CookieLawInfoConsent		Registra el estado del botón por defecto de la categoría correspondiente y el estado de CCPA. Sólo funciona en coordinación con la cookie principal.
elementor		Esta cookie es utilizada por el tema de WordPress del sitio web. Permite al propietario del sitio web implementar o cambiar el contenido del mismo en tiempo real.

Analítica

Las cookies analíticas se utilizan para entender cómo interactúan los visitantes con el sitio web. Estas cookies ayudan a proporcionar información sobre las métricas del número de visitantes, la tasa de rebote, la fuente de tráfico, etc.

Cookie	Duración	Descripción
__hssc		HubSpot establece esta cookie para hacer un seguimiento de las sesiones y para determinar si HubSpot debe incrementar el número de sesión y las marcas de tiempo en la cookie __hstc.
__hstc		Esta es la principal cookie establecida por Hubspot, para el seguimiento de los visitantes. Contiene el dominio, la marca de tiempo inicial (primera visita), la última marca de tiempo (última visita), la marca de tiempo actual (esta visita) y el número de sesión (se incrementa para cada sesión posterior).
_gid		Instalada por Google Analytics, la cookie _gid almacena información sobre el uso que hacen los visitantes de un sitio web, al tiempo que crea un informe analítico del rendimiento del sitio web. Algunos de los datos que se recogen son el número de visitantes, su procedencia y las páginas que visitan de forma anónima.
El problema es que no se puede hacer nada para evitarlo.		La cookie _ga, instalada por Google Analytics, calcula los datos de los visitantes, de las sesiones y de las campañas y también hace un seguimiento del uso del sitio para el informe de análisis del mismo. La cookie almacena información de forma anónima y asigna un número generado aleatoriamente para reconocer a los visitantes únicos.
gat_UA-144495997-2		Una variación de la cookie _gat establecida por Google Analytics y Google Tag Manager para permitir a los propietarios de sitios web rastrear el comportamiento de los visitantes y medir el rendimiento del sitio. El elemento de patrón del nombre contiene el número de identidad único de la cuenta o sitio web al que se refiere.
hubspotutk		HubSpot establece esta cookie para hacer un seguimiento de los visitantes del sitio web. Esta cookie se transmite a HubSpot cuando se envía el formulario y se utiliza cuando se deduplican los contactos.

enero 22, 2024

The Virchow Foundation Model, Explained: A Q&A with an AI Scientist

Breaking Through the Complexity of Cancer Detection: A Practical Use Case of How Paige’s Foundation Model is Revolutionizing Pathology

The Digital Pathology Dilemma: Navigating Cloud and On-Prem Solutions

Embracing AI: The Third Revolution in Pathology

Building Integrated Workflows with the Paige Platform: Our Approach to Laboratory Information System Integration