Blog

Januar 22, 2024

The Virchow Foundation Model, Explained: A Q&A with an AI Scientist

On January 18, 2024 Paige released new performance results on Virchow. Some of these results are detailed below but can be read in full on ArXiv.

In the Fall of 2023, Paige announced its collaboration with Microsoft Research to build the world’s largest image-based AI model to fight cancer. True to its commitment, Paige promptly delivered by releasing results for the million-slide digital pathology Foundation Model named Virchow1. In the few weeks after the initial results were made public¹, substantial advancements have already been achieved in enhancing the model’s capabilities.

To shed light on Paige’s unique approach to this groundbreaking version of the Foundation Model, we sat down with Siqi Liu, Director of AI Science, to gain a deeper understanding of Paige’s methodology and its implications for the future of AI in cancer diagnosis and treatment.

Q: Can you provide an overview of the updates made to the Foundation Model, Virchow?

A: We are thrilled to introduce Virchow V1, a major upgrade to our Foundation Model, now fully trained on 1.5 million H&E-stained slides for superior task performance. The Model produces detailed tile embeddings from whole slide images (WSIs), offering a rich, data-driven foundation for a broad array of digital pathology applications. These embeddings can be seen as intricate digital fingerprints of tissue tiles, capturing unique histological features that empower pathologists with advanced insights for diagnosis, research, and tailored patient care.

The updated model represents a true breakthrough in computational pathology for several reasons:

First, the novel Virchow embeddings have enabled us to develop a pan-tumor detection system with the proficiency to identify cancer across various organ types, one of the first of its kind for pathology. Our findings demonstrate that the Virchow embeddings surpass baseline models in accuracy, exhibiting exceptional performance especially in the detection of rare cancers.

Additionally, our updated model leads in tile-level benchmarks, surpassing both baselines and its predecessor. This includes a range of public and internally curated pan-tissue benchmarks, reinforcing our model’s robustness and versatility.

Finally, our efforts have also advanced the frontiers of predictive analytics, as the model exhibits extraordinary precision in pinpointing digital biomarkers, a testament to the potential of AI and machine learning in enhancing diagnostic methodologies.

Q: What are some of the early results of the pan-tumor model? What does this mean for the future of AI in cancer detection?

A: Through well-designed experiments leveraging high-quality clinical data, our pan-tumor model—powered by Virchow’s embeddings—has been found to excel in detecting a broad spectrum of cancers. It shows a specimen-level AUC of 0.95 for common cancers and 0.93 for rare cancers occurring in fewer than 40,000 cases annually in the US.

The pan-tumor model’s effectiveness soundly demonstrates the Foundation Model’s strength, bolstering our confidence in a unified AI-driven approach, and setting the stage for continued development of tools that can support pathologists in critical areas like rare tumor detection that have previously been lacking AI support. The Virchow model also enables the simultaneous development of multiple AI applications, encompassing cancer detection, grading, subtyping, measurement, quantification, segmentation, etc. This will significantly enhance Paige’s ability to develop robust product suites across many tissue types, offering the comprehensive support to accompany pathologists throughout their clinical workflow.

Q: Why were other applications not previously able to effectively identify rare cancers? How has the Foundation Model solved this issue?

A: Traditional machine learning models often struggle with rare cancers due to limited data, impeding accurate pattern recognition necessary for diagnosis. Our Foundation Model overcomes these challenges by utilizing a diverse, large-scale dataset that enables the model to learn on a vast array of tissue types, understand cancer morphology, and apply this knowledge to effectively discern the tissue patterns of rare cancers despite data scarcity.

Q: Why is training on a million-slide dataset crucial for advancing digital pathology imaging?

A: Training on a massive, million-slide dataset is vital for digital pathology to create algorithms that are precise and universally applicable. Such extensive data encompasses the diverse and complex scenarios encountered in clinical practice, ensuring that the algorithms are not only empirically robust but also clinically valid.

Real-world clinical case data is ideal for Foundation Models as it includes the subtle variations and intricate patterns necessary for accurate diagnostics, especially for rare conditions often underrepresented in smaller datasets. This equips the model to perform effectively in real-world medical settings, leading to improved patient care. The dataset also includes common artefacts seen on Whole Slide Images, such as cracks, bubbles, dust, pen-marking, variations in slide preparation and staining reagents, etc.

For example, powered by robust clinical data, the new pan-tumor model can recognize cancers that pathologists, especially in smaller hospitals around the globe, may not have seen before, ensuring that these cancers are not overlooked and that all patients have access to the best possible care.

Q: How is the Virchow model tailored to meet the demands of real-world digital pathology applications?

A: The model employs a Vision Transformer (ViT-H), striking a strategic balance between model representation power and computational cost. This makes it both powerful enough to process complex pathology data and cost-effective for widespread use in real clinical environments, thus aligning with the practical needs of digital pathology products.

Q: What makes Virchow V1 the state-of-the-art Foundation Model for computational pathology?

A: When we describe the Virchow model as ’state-of-the-art,‘ we’re referring to:

Its exceptional performance across a spectrum of benchmark tasks specifically selected for their relevance to practical digital pathology
The use of the most advanced computer vision and AI technologies to-date that were tailored for computational pathology.

This superior level of performance indicates that products developed using our Foundation Model are poised to deliver greater clinical impact. By integrating the latest advancements in AI and machine learning, these products can enhance diagnostic precision, accelerate pathology workflows, and support faster pharmaceutical research. The term ’state-of-the-art‘ reflects not just our current technological excellence but also the potential to transform future practices in digital pathology, helping to improve patient outcomes and unlock more efficient healthcare delivery systems.

Q: What’s next for Virchow?

A: We plan to broaden our benchmark suite, enhancing the testing of the Virchow V1 model across a wider spectrum of computational pathology applications. This expansion will yield deeper insights for ongoing improvements and pinpoint how Virchow’s embeddings can be leveraged to enhance existing and develop new digital pathology AI applications.

In parallel, we’re excited to advance the development of Virchow V2. In collaboration with the exceptional team at Microsoft Research, our focus will be twofold:

Firstly, we aim to significantly enlarge the training dataset and model size while also extending our capabilities to include a broader spectrum of stains beyond H&E.

Secondly, we’re dedicated to refining our training methodologies to establish the most effective strategies for model research and development.

Together, these two goals ultimately will work in tandem to evolve the concept of a Foundation Model in digital pathology and oncology broadly. With cutting-edge algorithm research, Virchow V2 is set to surpass the current capabilities of Virchow V1, paving the way for new innovations in the field that will positively impact patients.

In advancing our roadmap, we aim to complement the collective efforts in the academic field and the industry. Our goal with the Virchow V1 model is to highlight the significant potential of scaling in computational pathology. We are committed to collaborating and sharing our findings with the broader research community to collectively push the boundaries of the field.

To learn more about Paige’s Foundational Model and the published results, view our publication in ArXiv.

—

¹Vorontsov E, Bozkurt A, Casson AI et al. VIRCHOW: A MILLION-SLIDE DIGITAL PATHOLOGY FOUNDATION MODEL. ArXiv. Updated Preprint posted online January 18, 2024. arXiv:2309.07778v5

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Erforderlich

immer aktiv

Notwendige Cookies sind für das ordnungsgemäße Funktionieren der Website unbedingt erforderlich. Diese Cookies gewährleisten grundlegende Funktionen und Sicherheitsmerkmale der Website in anonymer Form.

Cookie	Dauer	Beschreibung
__cf_bm		Dieses Cookie wird von Cloudflare gesetzt und dient der Unterstützung des Cloudflare Bot Managements.
__hssrc		Dieses Cookie wird von Hubspot immer dann gesetzt, wenn es das Sitzungscookie ändert. Das __hssrc-Cookie, das auf 1 gesetzt ist, zeigt an, dass der Benutzer den Browser neu gestartet hat, und wenn das Cookie nicht existiert, wird angenommen, dass es sich um eine neue Sitzung handelt.
cookielawinfo-checkbox-analytics		Dieses Cookie wird vom GDPR Cookie Consent Plugin gesetzt und dient dazu, die Zustimmung des Nutzers zu den Cookies der Kategorie "Analytics" zu erfassen.
cookielawinfo-checkbox-anzeige		Dieser Cookie wird vom GDPR Cookie Consent Plugin gesetzt und dient dazu, die Zustimmung des Nutzers zu den Cookies der Kategorie "Werbung" zu erfassen.
cookielawinfo-checkbox-functional		Das Cookie wird vom GDPR Cookie Consent Plugin gesetzt, um die Zustimmung des Nutzers für die Cookies in der Kategorie "Funktional" zu erfassen.
cookielawinfo-checkbox-necessary		Dieser Cookie wird vom GDPR Cookie Consent Plugin gesetzt und dient dazu, die Zustimmung des Nutzers zu den Cookies der Kategorie "Notwendig" zu erfassen.
cookielawinfo-checkbox-others		Dieses Cookie wird vom GDPR Cookie Consent Plugin gesetzt und dient dazu, die Zustimmung des Nutzers für Cookies der Kategorie "Andere" zu speichern.
cookielawinfo-checkbox-performance		Dieses Cookie wird vom GDPR-Cookie-Consent-Plugin gesetzt und dient dazu, die Zustimmung des Nutzers für Cookies der Kategorie "Leistung" zu speichern.
CookieLawInfoConsent		Zeichnet den Standard-Schaltflächenstatus der entsprechenden Kategorie und den Status von CCPA auf. Er funktioniert nur in Abstimmung mit dem primären Cookie.
elementor		Dieses Cookie wird von dem WordPress-Theme der Website verwendet. Sie ermöglicht es dem Eigentümer der Website, den Inhalt der Website in Echtzeit zu implementieren oder zu ändern.

Analytik

Analytische Cookies werden verwendet, um zu verstehen, wie Besucher mit der Website interagieren. Diese Cookies helfen dabei, Informationen über die Anzahl der Besucher, die Absprungrate, die Verkehrsquelle usw. zu erhalten.

Cookie	Dauer	Beschreibung
__hssc		HubSpot setzt dieses Cookie, um Sitzungen zu verfolgen und um zu bestimmen, ob HubSpot die Sitzungsnummer und die Zeitstempel im __hstc-Cookie erhöhen soll.
__hstc		Dies ist das Haupt-Cookie, das von Hubspot gesetzt wird, um Besucher zu verfolgen. Sie enthält die Domäne, den Anfangszeitstempel (erster Besuch), den letzten Zeitstempel (letzter Besuch), den aktuellen Zeitstempel (dieser Besuch) und die Sitzungsnummer (die bei jeder nachfolgenden Sitzung erhöht wird).
_ga		Das _ga-Cookie, das von Google Analytics installiert wird, berechnet Besucher-, Sitzungs- und Kampagnendaten und verfolgt auch die Nutzung der Website für den Analysebericht der Website. Das Cookie speichert Informationen anonym und weist eine zufällig generierte Nummer zu, um eindeutige Besucher zu erkennen.
_gat_UA-144495997-2		Eine Variante des _gat-Cookies, das von Google Analytics und Google Tag Manager gesetzt wird, um Website-Besitzern zu ermöglichen, das Besucherverhalten zu verfolgen und die Leistung der Website zu messen. Das Musterelement im Namen enthält die eindeutige Identitätsnummer des Kontos oder der Website, auf die es sich bezieht.
_gid		Das von Google Analytics installierte _gid-Cookie speichert Informationen darüber, wie Besucher eine Website nutzen, und erstellt einen Analysebericht über die Leistung der Website. Zu den gesammelten Daten gehören die Anzahl der Besucher, ihre Herkunft und die Seiten, die sie anonym besuchen.
hubspotutk		HubSpot setzt dieses Cookie, um die Besucher der Website zu verfolgen. Dieses Cookie wird bei der Übermittlung eines Formulars an HubSpot weitergegeben und bei der Deduplizierung von Kontakten verwendet.

Januar 22, 2024

The Virchow Foundation Model, Explained: A Q&A with an AI Scientist

Breaking Through the Complexity of Cancer Detection: A Practical Use Case of How Paige’s Foundation Model is Revolutionizing Pathology

The Digital Pathology Dilemma: Navigating Cloud and On-Prem Solutions

Embracing AI: The Third Revolution in Pathology

Building Integrated Workflows with the Paige Platform: Our Approach to Laboratory Information System Integration