Scientific Approach - Human Immunome Project

Decoding the Immune System

The Human Immunome Project (HIP) is overseeing the generation of the largest immunological dataset in history and will use these data to build AI models of the immune system that can accelerate medical research and drug discovery, decrease healthcare costs worldwide, and most importantly, improve health for all.

Our first five years will focus on building an immunological dataset, representative of the diversity of humanity and immune responses, that will power our AI models.

Background

The human immune system is a dynamic, multi-scale network that involves uniquely complex interactions between molecules, cells, and organs. It monitors for, reacts to, and defends the body against invading pathogens and other stressors and therefore plays a critical role in maintaining human health.

Despite tremendous advances in applying multi-omics approaches and immune monitoring technologies, data remains limited in both size and scope. As a result, less than one percent of the immunological data necessary to understand immune diversity on a global scale are available. These gaps prevent us from harnessing the power of the immune system to improve human health.

Our Plan to Generate the World’s Largest Immunological Dataset

The Human Immunome Project is focused on generating multi-omic immunological baseline data that reflects the diverse human population across age, geography, sex, and socioeconomic status. Data generation will occur in a phased approach.

Phase I: Begin Data Generation and Set Stage for Scale

Three Goals:

1. Establish and Collect Data at Pilot Sites

We are establishing up to ten state-of-the-art collection sites in Africa, Australia, East Asia, Europe, the Middle East, North America, and South America. Each of these “Pilot Sites” will collect extensive, high-resolution data from approximately 5,000 participants over a five-year period.

Blood and saliva samples will be taken every six months. For the first three years, all participants will receive an annual flu vaccine with samples taken right before vaccination, 24 hours post vaccination, seven days post vaccination, and again 28 days after vaccination—these samples will be in addition to the standard bi-annual samples taken from all participants. After the initial three year period, vaccinations will be provided to half of the participants. Data collected will include thousands of immunological parameters (cells, transcripts, antibodies, proteins, epigenetic modifications, and metabolites) from each individual, a selection of which are outlined in the table below.

Biological Entity Measured	Assay	Benefit
Cells (abundance)	Multi-parameter flow cytometry	Cells are the atomic unit of the immune system. Capture how much we have of each cell-type.
Cells (state)	CITE-seq: simultaneous single-cell protein and mRNA	Different cells have different function. Define cell-functionality through high-dimensional mapping of its mRNA transcript state. Identify new cell types and states.
Cells (specificity)	Simultaneous single-cell TCR and BCR sequencing and mRNA	Immune cells can build specific recognitions programs based on past encounters. Map the diversity of specificity programs and its strength (clonality) for each cell.
Serum proteins	Multiplex targeted measurement of blood circulating proteins	Map immune cell communication signals.
mRNA genes	Gene expression bulk RNA seq	Measures the average expression level of all genes across all cells in a person. Robust, and relatively cheap.
Metabolites	Mass-spec	Cell function requires energy. Approximate cell activity by concentration of small molecules (metabolites).
DNA (human + mitochondria)	Genomic data: whole genome sequencing	Associate phenotypic variance with underlying genomic variance.
Cell-free chromatin	Chromatin state in cell free DNA in serum	Access information on cell states from internal organs based on remnant of cells in blood.
Stool microbiome	16s + metagenomics	Identify relation between gut microbiome and immune cell state variation.

These data will allow us to assess developmental and age-dependent changes and to computationally stitch together the temporal trajectory of different age groups of both male and females worldwide. Additionally, it will enable us to conduct high-resolution measurements of major immune cell frequencies; conduct single cell analyses of proteins, transcripts, immune repertoire, and chromatin accessibility; and measure circulating protein and metabolite levels, as well as cell-free DNA, and see how it changes over time and among different populations.

In addition to launching our data collection efforts, the Pilot Sites will serve as a testing ground for HIP to finalize its protocols and infrastructure and address any issues that arise.

2. Finalize HIP Site Plan for Scaling

In close collaboration with the Pilot Sites, and drawing from their experiences, our team will refine the protocol and site structure, establishing a replicable site model that can be scaled to 100+ sites around the globe, including in areas with limited resources and infrastructure, in Phase II.

This process will include refining—and further developing, if needed—the required assays, ensuring a standardized quality, and prioritizing the most important immune parameters to capture as we scale data collection. Crucially, HIP will support local capacity building and training personnel in sample collection.

The result will be a fleet of human immunology study sites supported by the Human Immunome Project and a network of global partners that provide the necessary tools for researchers and practitioners in every corner for the world.

3. Engineer Immune Monitoring Kits (IMKs)

Due to the complexity of the immune system, many platforms and assays are needed to generate relevant data, leading to high costs and too large a burden on patients. While the scientific community has developed the advanced tools required to measure the immune system broadly at high resolution, there is still a trade-off between technology maturity and resolution. Namely, the more resolution provided by an assay (i.e., multimodal single cell approaches), the more likely it is to be expensive and not yet ready for robust, reproducible deployment at the scale required by HIP.

To overcome financial, universal deployment, and patient collection challenges, we are developing and engineering our own Immune Monitoring Kits, a collection of assays, which will allow us to capture standardized and multi-omic data from the diverse human population at scale.

The kits will be engineered through multiple iterations using HIP-generated data and computational modeling to simplify data collection and limit cost by prioritizing a sub-set of assays. For example, the Pilot Sites will use IMKv1, which consists of 11 distinct high-resolution assays to generate the initial deep immune profiling required in Phase I. Artificial intelligence models will use these data to identify the key elements of variance in the immune system, and this prioritized subset of focused features will inform the next iteration of the IMKs. As a result, IMKv2 will capture data only on targeted immune variables, allowing HIP to deploy these targeted and comprehensive assays to generate data on a larger scale in Phase II. This process of (1) generating high resolution data from a limited subpopulation, (2) automating the process of reducing dimension, and (3) integrating more data from more subjects will continue until the IMKs are fully optimized, enabling the measurement of immune function and variance across all populations, including special populations like children and the elderly, at scale.

Phase I data will be used to develop and validate a machine learning architecture for HIP’s AI models. The goal at this point is to develop, train, and validate a model(s) that captures the immune system dynamics and can predict, given the baseline immune state, an individual’s response to perturbation.

Phase II: Scale Data Collection and Advance AI Models

Two Goals:

1. Establish 100+ Data Collection Sites Globally

Achieving a holistic understanding of immune status, response, and variance, requires data from all human populations at a scientifically relevant scale. In Phase II we will drastically increase longitudinal worldwide data generation to move toward this goal, expanding from approximately 10 Pilot Sites to more than 100 study sites worldwide.

Our global study design and scaled geographic coverage will enable HIP sites to capture the diversity of individuals and thus immune functions and responses. To this end, Phase II will also see the expansion from blood and saliva samples to the additional inclusion of non-blood accessible tissues such as skin, gut, and tonsil. The prioritization of diversity in samples and populations is critical to ensuring our dataset reflects immune variance across age, ethnicity, sex, and socioeconomic status. Below is an overview of the cohort demographics to be captured at each site.

Geography	Sex	Age	Socioeconomic
• Africa • Australia • East Asia • Europe • Middle East • North America • South America	• Male •Female	• Newborn 0 -1 Month Pre-Pubertal Children • Pre-Pubertal Children 1 month – 10 years. • Every decade of life from 1-100	Regional based divisions based on income: • Low • Low Middle • Middle • High Middle • High

Unlike the Pilot Sites, existing infrastructure and capacity among the Phase II locations will be varied; HIP’s scientific plan is designed to address their specific gaps. Phase II will feature numerous site types to ensure high-quality, standardized data and build immunological capacity in all locations. This includes standard study sites that will fully execute HIP’s protocol as well as “Sister Sites” that receive mentorship from the Pilot Sites to ensure effective implementation, “Collection Sites,” that will take samples but not engage in any measurements, and “Hubs,” advanced labs where full measurements and analysis can be conducted. The regional Hubs will also serve as a source of further knowledge transfer and capacity-building, offering mentorship to the sites within their area. These sites will be established on a rolling basis worldwide to ensure full coverage of all populations.

While data collection sites will be highly distributed, HIP will serve as the centralized “mission control,” overseeing the global network and managing logistics, scientific strategy, data stewardship, and the AI build. Our Scientific and Medical Leadership Committee, comprised of global experts in immunology, AI, clinical operations, and beyond, is responsible for study design, operations and logistics, and overall scientific direction, including final decisions on inclusion/exclusion of assays and measurement modalities.

2. Deploy Immune Monitoring Kit

A key enabling component to initiate Phase II is the evolution and global deployment of HIP’s Immune Monitoring Kits—the cost-efficient and researcher friendly tool to capture standardized, multi-omic data.

IMKv2, which will capture refined data only on targeted immune variables, will be launched during Phase II to enable data generation on the larger scale. Initial data from the Phase II sites will be used to further refine and automate the IMKs, eventually leading to more advanced versions that will further decrease cost and improve usability.

The Immune Monitoring Kits are central to HIP’s scientific approach as it not only ensures consistent measurements, but also empowers local sites to develop data generation, processing, and storage capacities. As a result, the IMKs will endow low- and middle-income countries with the tools to contribute data and analyses—and reap the benefits—at the same level of the most advanced immunological institutes of the world, establishing high quality standards and further encouraging global buy-in and participation.

The IMKs are not only critical to the execution of the Human Immunome Project but can also set a new standard in immune monitoring that can be used by stakeholders in every environment worldwide.

Successful execution of Phase II will enable HIP to move from training and validating a predictive AI model to building a more comprehensive AI model that can predict immune responses and health trajectories of individuals using baseline information.

Phase III: Modeling a New Frontier in Science and Medicine

While the first years focus on the generation of data and development of a predictive understanding of the immune system, the subsequent years are dedicated to both growing the dataset and expanding the AI models from quantitative and predictive to mechanistic and “computable.” These advanced and publicly available models will allow not only prediction of immune response outcomes using baseline information, but will provide quantitative, mechanistic information on how the immune system operates and identify novel immune targets. Such information will transform our ability to generate immune interventions to optimize health outcomes, with applications in all areas of health including vaccine development, infectious diseases, autoimmunity, pandemic preparedness, cancer, and neurodegeneration.

Two additional pillars will be pursued to achieve these goals. First, we will develop artificial immune systems to recapitulate the major features of tissue immunity and multi-tissue immune interactions, which underlie all major functions of the immune system (e.g., from protection against infection and cancer to maintaining homeostasis). This process will allow rapid data generation and testing of interventions, including both therapeutic and genetic modifications of the immune system. Second, we will develop quantitative, predictive models for molecular recognition. Though immune functions are results of molecular interactions—for example lymphocyte receptors such as T- and B-cell receptors can recognize and remember molecular patterns—the immune field currently has extremely limited capacity both experimentally and computationally to interrogate and predict such molecular recognition at scale. By tackling these two major pillars, HIP will generate the technology and data needed to achieve not only predictive models of the immune system but also tools to enable intervention and truly mechanistic understanding of the immune system at the personal level.