Blog

November 29, 2023

How Foundation Models Can Transform Pathology

A Q&A with AI scientists working in field of pathology

Paige was founded with the mission of revolutionizing cancer care. To do so, it is critical that we dedicate ourselves to tireless innovation and pushing the boundaries of what is possible with digital pathology technology.

Leading this charge are Brandon Rothrock, PhD, an expert in computer vision, machine learning, robotics, and autonomous systems, and Siqi Liu, PhD, a leader in industrial scale machine learning and medical data analysis. Together, they have been pioneering our latest work on Foundation Model development.

We sat down with Brandon and Siqi to learn more about what foundation models are, how their development might impact pathology, and what this work means for the future of cancer care:

What is challenging about building pathology AI today?

First, AI systems for detecting cancer require extensive data to accurately generalize across various organs and cancer subtypes, which is difficult to gather and process. Secondly, ensuring the accuracy of the ground truth data, essential for training AI, is challenging at large scales. Last, the subtle and diverse patterns in histopathology, again crucial for cancer detection, demand advanced algorithms for precise recognition. As we consider how to build systems for increasingly uncommon or rare cancers, the burden of sourcing sufficient sample sizes exacerbates these challenges, and quickly becomes infeasible. Lack of data availability to develop models for rare biomarkers or drug response makes the problem even more dire.

What are foundation models and how can they be applied to pathology?

Foundation models are a general term for large-scale models trained on expansive and diverse data. Typically, these models are trained using self-supervision and do not need ground-truth labels, but can be effectively adapted to specific applications using a relatively small volume of labeled data. This presents a direct solution to the data cliff problem.

The foundation model being developed at Paige is a large-scale model trained on the natural distribution of all cases handled at one of the world’s leading cancer centers. This dataset, which is millions of slides in volume, consists of all tissue types and cancer conditions. Although the model is still under development, we have already demonstrated excellent performance on many very diverse tasks.

Are there any challenges with the foundation model approach?

Yes, challenges still exist in developing foundation models for pathology. One of the most prominent challenges is handling extreme data imbalance. The natural prevalence of cancer types follows a long-tail distribution, resulting in rare cancers forming the highest number of cases. Furthermore, only a small number of slides within a case, or potentially only a small foci within a single slide, may contain the cancer of interest. The challenge is to design the AI algorithms and training procedures to learn how to differentiate the histologic patterns of interest, particularly when those patterns are not labeled with ground-truth and may be dominated by an overwhelming amount of confounding patterns.

What is unique about the foundation model Paige is currently developing?

The Paige Virchow model marks a revolutionary advance in AI for cancer detection. It is the first model to be trained on a million-scale dataset, a feat that required substantial compute resources and engineering efforts, setting a new benchmark in the field. Further, the model employed cutting-edge self-supervised learning algorithms, eliminating the need for manually annotated ground truth. This approach allowed for greater scalability and adaptability in training. Specifically tailored for digital pathology, the model also included adjustments to better handle the unique characteristics of digital pathology images, enhancing its effectiveness in this specialized area.

How will Paige’s foundation model bring new capabilities to pathology?

Our foundation models in pathology are currently used to catalyze both the improvements of our existing models and for the development of new diagnostic and biomarker AI algorithms. The ability to rapidly adapt the foundation model to new tasks while maintaining a high level of performance and generalization allows us to build mature models much faster. Paige is also investigating exposing this AI development workflow externally to allow 3rd parties to rapidly develop new AI systems using our foundation model in a manner that preserves data privacy and still retains the advantages of large-scale pre-training.

Beyond building conventional diagnostic and biomarker detection capabilities, there are potentially many novel and exciting applications of a foundation model in pathology that are currently speculative, but have the potential to be transformative. This could include allowing pathologists and scientists to interact with the system in a natural way to explain a prediction, cooperatively discover new insights, or automatically advise or create reports and analytics.

How do we develop these models safely and responsibly?

As with any AI model for healthcare, safety and responsibility are paramount to realizing such benefits. We address the safety and reliability of our foundation model in at least three key ways; First, we do not expose the foundation model directly to the user. Second, we rely on conventional testing and validation for the safety of our AI systems. Lastly is the innate objectivity of the data used for development. Unlike generative AI systems developed in other industries that rely heavily on innately biased human-generated data such as natural language or human feedback, our foundation model learns directly from histology imagery that is free from this influence. As such, we can rely on established and mature methods for dealing with bias and performance qualification.

The current version of the Virchow foundation model Paige has created has already marked a turning point in pathology AI. Now, we are continuing to develop this model to go above and beyond what is possible in pathology today. As we accelerate pathology AI technology further, we hope to unlock novel capabilities that can redefine the way pathologists work and ultimately impact patient care for the better.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.

Analytics

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hstc	5 months 27 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-144495997-2	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

November 29, 2023

How Foundation Models Can Transform Pathology

Transforming Drug Discovery and Scientific Innovation with Foundation Model Technology

Paige & Cornell Tech Students Collaborate to Advance Patient-Centric AI Technology

Breaking Through the Complexity of Cancer Detection: A Practical Use Case of How Paige’s Foundation Model is Revolutionizing Pathology

The Digital Pathology Dilemma: Navigating Cloud and On-Prem Solutions