Scientific Approach

Decoding the Immune System

The Human Immunome Project (HIP) is overseeing the generation of the largest immunological dataset in history and will use these data to build AI models of the immune system that can accelerate medical research and drug discovery, decrease healthcare costs worldwide, and most importantly, improve health for all.

Our first five years will focus on building an immunological dataset, representative of the diversity of humanity and immune responses, that will power our AI models.

Background

The human immune system is a dynamic, multi-scale network that involves uniquely complex interactions between molecules, cells, and organs. It monitors for, reacts to, and defends the body against invading pathogens and other stressors and therefore plays a critical role in maintaining human health.

Despite tremendous advances in applying multi-omics approaches and immune monitoring technologies, existing data remains limited in both size and scope. As a result, immunologists estimate that less than 1% of the data required to understand global immune function and diversity are available. These gaps prevent us from harnessing the power of the immune system to improve human health. The Human Immunome Project’s scientific plan is designed to address this data gap.

Our Plan to Generate the World’s Largest Immunological Dataset

The Human Immunome Project is focused on generating multi-omic immunological baseline and longitudinal data that reflects humanity’s diversity across age, geography, sex, and socioeconomic status. Data generation will occur in a phased approach. When complete, it will be the world’s largest and most diverse immunological dataset, incorporating immune measurements from ~225,000 people.

Phase I: Launch Data Generation and Establish Machine Learning Model

Three Goals:

1. Establish and Collect Data at Pilot Sites

We are establishing 10 to 13 state-of-the-art collection sites in Africa, Australia, East Asia, Europe, the Middle East, North America, and South America. Each of these “Pilot Sites” will collect extensive, high-resolution data from approximately 1,500 participants over a five-year period.

Participants will provide five blood samples a year and receive an annual flu vaccine to monitor how they respond to perturbation. Samples will be collected before vaccination to establish a baseline and again 24 hours after vaccination, 7 days after vaccination, 28 days after vaccination, and 6 months after vaccination. Data collected will include thousands of immunological parameters (cells, transcripts, antibodies, proteins, epigenetic modifications, and metabolites) from each individual, a selection of which are outlined in the table below.

Biological Entity MeasuredAssayBenefit
Cells (abundance)Multi-parameter flow cytometryCells are the atomic unit of the immune system. Capture how much we have of each cell-type.
Cells (state)CITE-seq: simultaneous single-cell protein and mRNADifferent cells have different function. Define cell-functionality through high-dimensional mapping of its mRNA transcript state. Identify new cell types and states.
Cells (specificity)Simultaneous single-cell TCR and BCR sequencing and mRNAImmune cells can build specific recognitions programs based on past encounters. Map the diversity of specificity programs and its strength (clonality) for each cell.
Serum proteinsMultiplex targeted measurement of blood circulating proteinsMap immune cell communication signals.
mRNA genesGene expression bulk RNA seqMeasures the average expression level of all genes across all cells in a person. Robust, and relatively cheap.
MetabolitesMass-specCell function requires energy. Approximate cell activity  by concentration of small molecules (metabolites).
DNA (human + mitochondria)Genomic data: whole genome sequencingAssociate phenotypic variance with underlying genomic variance.
Cell-free chromatinChromatin state in cell free DNA in serumAccess information on cell states from internal organs based on remnant of cells in blood.
Stool microbiome16s + metagenomicsIdentify relation between gut microbiome and immune cell state variation.

These data will allow us to assess developmental and age-dependent changes and to computationally stitch together the temporal trajectory of different age groups of both male and females worldwide. Additionally, it will enable us to conduct high-resolution measurements of major immune cell frequencies; conduct single cell analyses of proteins, transcripts, immune repertoire, and chromatin accessibility; and measure circulating protein and metabolite levels, as well as cell-free DNA, and see how it changes over time and among different populations.

In addition to launching our data collection efforts, the Pilot Sites will serve as a testing ground for HIP to finalize its protocols and infrastructure and address any issues that arise.   

2. Engineer Immune Monitoring Kits (IMKs)

Due to the complexity of the immune system, many platforms and assays are needed to generate relevant data, leading to high costs and too large a burden on patients. While the scientific community has developed the advanced tools required to measure the immune system broadly at high resolution, there is still a trade-off between technology maturity and resolution. Namely, the more resolution provided by an assay (i.e., multimodal single cell approaches), the more likely it is to be expensive and not yet ready for robust, reproducible deployment at the scale required by HIP.

To overcome financial, universal deployment, and patient collection challenges, we are developing and engineering our own Immune Monitoring Kits, a collection of assays, which will allow us to capture standardized and multi-omic data from the diverse human population at scale.

The kits will be engineered through multiple iterations using HIP-generated data and computational modeling to simplify data collection and limit cost by prioritizing a sub-set of assays. For example, the Pilot Sites will use IMKv1, which consists of 11 distinct high-resolution assays to generate the initial deep immune profiling required in Phase I. Artificial intelligence models will use these data to identify the key elements of variance in the immune system, and this prioritized subset of focused features will inform the next iteration of the IMKs. As a result, IMKv2 will capture data only on targeted immune variables, allowing HIP to deploy these targeted and comprehensive assays to generate data on a larger scale. This process of (1) generating high resolution data from a limited subpopulation, (2) automating the process of reducing dimension, and (3) integrating more data from more subjects will continue until the IMKs are fully optimized, enabling the measurement of immune function and variance across all populations, including special populations like children and the elderly, at scale.   

3. Develop Machine Learning Model

The data collected in Phase I will profile immune states across scales: from molecular (genes and proteins, including various immune receptors), cellular (immune cell counts), and organismic (serum proteins and metabolites) to clinical levels (comorbidities, exposure history, markers of inflammation). This high-resolution, multimodal data will provide the information needed to develop, optimize, and validate machine learning architectures that can integrate various biological data types and capture the distinct mechanisms and hierarchical levels of immune regulation. To date, such a model has not been feasible due to a lack of data.

HIP is filling this data gap and, by the completion of Phase I, will develop and validate a machine learning architecture that can integrate various modalities in a biologically meaningful manner. The goal at this stage is for a model(s) that captures immune system dynamics and can predict, given the baseline immune state, an individual’s response to perturbation.

This model(s) will also play a crucial role in identifying the molecular and cellular markers that must be measured by the Immune Monitoring Kits to obtain a full profile of an individual’s immune state. As such, it will be a necessary component of moving to Phase II of the scientific plan, in which we will scale data collection, deploy IMKs worldwide, and build more advanced AI models.

Phase I data will be used to develop and validate a machine learning architecture for HIP’s AI models. The goal is to develop, train, and validate a model(s) that captures the immune system dynamics and can predict, given the baseline immune state, an individual’s response to perturbation.

Phase II: Scale Data Collection and Advance AI Models

Three Goals:

1. Establish 150 Data Collection Sites Globally

Achieving a holistic understanding of immune status, response, and variance, requires data from all human populations at a scientifically relevant scale. In Phase II we will drastically increase longitudinal data generation to move toward this goal, expanding from approximately 10 to 13 Pilot Sites to approximately 150 study sites worldwide. Sites will be distributed globally with 75% located in low- and middle-income countries and 25% in high-income countries.

Our global study design and scaled geographic coverage will enable HIP sites to capture the diversity of individuals and thus immune functions and responses. The prioritization of population diversity is critical to ensuring our dataset reflects immune variance across age, ethnicity, environment, sex, and socioeconomic status. Below is an overview of the cohort demographics to be captured at each site.

GeographySexAgeSocioeconomic
• Africa
• Australia
• East Asia
• Europe
• Middle East
• North America
• South America
• Male
•Female
• Newborn
0 -1 Month Pre-Pubertal Children
• Pre-Pubertal Children
1 month – 10 years.
• Every decade of life from 1-100
Regional based divisions based on income:
• Low
• Low Middle
• Middle
• High Middle
• High

Unlike the Pilot Sites, existing infrastructure and capacity among the Phase II locations will be varied; HIP’s scientific plan is designed to address their specific gaps. Phase II will feature two main site categories to ensure high-quality, standardized data and build immunological capacity in all locations. This includes “Hubs,” advanced labs that collect, store, analyze, and model data as well as lower capacity “Collection Sites,” that will collect data from the population in their area and transport samples to their associated Hub. Each Hub will oversee four Collection Sites. They will provide mentorship and serve as a source of knowledge transfer and capacity-building. These sites will be established on a rolling basis worldwide to ensure full coverage of all populations.

While data collection sites will be highly distributed, HIP will serve as the centralized “mission control,” overseeing the global network and managing logistics, scientific strategy, data stewardship, and the AI build. Our Scientific and Medical Leadership Committee, comprised of global experts in immunology, AI, clinical operations, and beyond, is responsible for study design, operations and logistics, and overall scientific direction, including final decisions on inclusion/exclusion of assays and measurement modalities.

2. Deploy Immune Monitoring Kit

A key enabling component to initiate Phase II is the evolution and global deployment of HIP’s Immune Monitoring Kits—the cost-efficient and researcher friendly tool to capture standardized, multi-omic data.

IMKv2, which will capture refined data only on targeted immune variables, will be launched during Phase II to enable data generation on the larger scale. Initial data from the Phase II sites will be used to further refine and automate the IMKs, eventually leading to more advanced versions that will further decrease cost and improve usability.

The Immune Monitoring Kits are central to HIP’s scientific approach as it not only ensures consistent measurements, but also empowers local sites to develop data generation, processing, and storage capacities. As a result, the IMKs will endow low- and middle-income countries with the tools to contribute data and analyses—and reap the benefits—at the same level of the most advanced immunological institutes of the world, establishing high quality standards and further encouraging global buy-in and participation.

The IMKs can also set a new standard in immune monitoring that can be used by stakeholders in every environment worldwide.

3. Advance AI Models

Successful execution of Phase II will enable HIP to move from training and validating a predictive machine learning model to building a more comprehensive AI model that can predict immune responses and health trajectories of individuals using baseline information.

As more and more immune profiles become available, including in diverse contexts such as cancer patients receiving immune-modulating anti-cancer therapies and individuals receiving rheumatologic interventions, we can start to further train the model(s) to predict, once again based on the pre-intervention state, the individual response to a broader set of immune perturbations, including more specific ones such as checkpoint inhibitors and TNF blockers.

Phase III: Modeling a New Frontier in Science and Medicine

While the first years focus on the generation of data and development of a predictive understanding of the immune system, the subsequent years are dedicated to both growing the dataset and expanding the AI models from quantitative and predictive to mechanistic and “computable.” These advanced and publicly available models will allow not only prediction of immune response outcomes using baseline information, but will provide quantitative, mechanistic information on how the immune system operates and identify novel immune targets. Such information will transform our ability to generate immune interventions to optimize health outcomes, with applications in all areas of health including vaccine development, infectious diseases, autoimmunity, pandemic preparedness, cancer, and neurodegeneration.

Two additional pillars will be pursued to achieve these goals. First, we will develop artificial immune systems to recapitulate the major features of tissue immunity and multi-tissue immune interactions, which underlie all major functions of the immune system (e.g., from protection against infection and cancer to maintaining homeostasis). This process will allow rapid data generation and testing of interventions, including both therapeutic and genetic modifications of the immune system. Second, we will develop quantitative, predictive models for molecular recognition. Though immune functions are results of molecular interactions—for example lymphocyte receptors such as T- and B-cell receptors can recognize and remember molecular patterns—the immune field currently has extremely limited capacity both experimentally and computationally to interrogate and predict such molecular recognition at scale. By tackling these two major pillars, HIP will generate the technology and data needed to achieve not only predictive models of the immune system but also tools to enable intervention and truly mechanistic understanding of the immune system at the personal level.