Mapping the Immune System
Beginning in 2024, the Human Immunome Project (HIP) will oversee the generation of the largest immunological dataset in history and will use this data to build AI models of the immune system that can accelerate medical research and drug discovery, improve health, and decrease healthcare costs worldwide.
Our first five years will focus on building an immunological dataset, representative of the diversity of humanity and immune responses, that will power our AI models.
The human immune system is a dynamic, multi-scale network that involves uniquely complex interactions between molecules, cells, and organs. It monitors for, reacts to, and defends the body against invading pathogens and other stressors and therefore plays a critical role in maintaining human health.
Despite tremendous advances in applying multi-omics approaches and immune monitoring technologies, data remains limited in both size and scope. As a result, less than one percent of the immunological data necessary to understand immune diversity on a global scale are available. These gaps prevent us from harnessing the power of the immune system to improve human health.
Our Five-Year Plan to Generate the World’s Largest Immunological Dataset
Over the next five years, the Human Immunome Project is focused on generating multi-omic immunological baseline data that reflects the diverse human population across age, geography, health conditions, sex, and socioeconomic status. Data generation will occur in a phased approach.
1. Collect Data at Pioneering Sites
We are establishing seven state-of-the-art collection sites in Africa, Australia, East Asia, Europe, the Middle East, North America, and South America. Dubbed “Pioneering Sites,” each of these locations will collect extensive data from 500 participants. Over the course of three months, we will collect two blood samples for all participants, as well as tissue samples (skin, gut, and tonsil biopsies, specifically) from a small subset (10%), and conduct high-resolution analysis of thousands of immunological parameters (cells, transcripts, antibodies, proteins, epigenetic modifications, and metabolites) to establish an immunologic baseline dataset. Data will also be collected from subset of participants after they receive a routine vaccination, such as the annual influenza or SARS-CoV-2 vaccine, to assess functional responses to a perturbation and how responses vary by sex, age, geography, and other determinants.
|Biological Entity Measured||Assay||Benefit|
|Cells (abundance)||Multi-parameter flow cytometry||Cells are the atomic unit of the immune system. Capture how much we have of each cell-type.|
|Cells (state)||CITE-seq: simultaneous single-cell protein and mRNA||Different cells have different function. Define cell-functionality through high-dimensional mapping of its mRNA transcript state. Identify new cell types and states.|
|Cells (specificity)||Simultaneous single-cell TCR and BCR sequencing and mRNA||Immune cells can build specific recognitions programs based on past encounters. Map the diversity of specificity programs and its strength (clonality) for each cell.|
|Serum proteins||Multiplex targeted measurement of blood circulating proteins||Map immune cell communication signals.|
|mRNA genes||Gene expression bulk RNA seq||Measures the average expression level of all genes across all cells in a person. Robust, and relatively cheap.|
|Metabolites||Mass-spec||Cell function requires energy. Approximate cell activity by concentration of small molecules (metabolites).|
|DNA (human + mitochondria)||Genomic data: whole genome sequencing||Associate phenotypic variance with underlying genomic variance.|
|Cell-free chromatin||Chromatin state in cell free DNA in serum||Access information on cell states from internal organs based on remnant of cells in blood.|
|Stool microbiome||16s + metagenomics||Identify relation between gut microbiome and immune cell state variation.|
After the baseline collection, we will continue to draw samples from the participants every six months for a three-year period. This longitudinal data will allow us to assess developmental and age-dependent changes and to computationally stitch together the temporal trajectory of different age groups of both male and females across the globe. Additionally, it will enable us to conduct high-resolution measurements of major immune cell frequencies; conduct single cell analyses of proteins, transcripts, immune repertoire, and chromatin accessibility; and measure circulating protein and metabolite levels, as well as cell-free DNA, and see how it changes over time and among different populations.
2. Establish HIP Site Plan for Scaling
In close collaboration with the Pioneering Sites, and drawing from their experiences, our team will refine the protocol and site structure, establishing a replicable site model that can be scaled to 70-100 sites around the globe (see below), including in areas with limited resources and infrastructure, in Phase II.
This process will include refining—and further developing, if needed—the required assays, ensuring a standardized quality and prioritizing the most important immune parameters to capture as we scale data collection. Crucially, HIP will support local capacity building, training personnel in sample collection, first in blood and skin and then in other tissues, including gut biopsies.
The result will be a fleet of human immunology study sites supported by the Human Immunome Project and a network of global partners that provide the necessary tools for researchers and practitioners in every corner for the world.
3. Engineer Immune Monitoring Kit (IMK)
While the scientific community has developed the advanced tools required to measure the immune system broadly at high resolution, there is still a trade-off between technology maturity and resolution. Namely, the more resolution provided by an assay (i.e., multimodal single cell approaches), the more likely it is to be expensive and not yet ready for robust, reproducible deployment at the scale required by HIP.
To overcome replicability, financial, and universal deployment challenges, we are developing and engineering our own Immune Monitoring Kit, which will allow us to capture standardized and multi-omic data from the diverse human population at scale. The kit will simplify data collection and limit cost by focusing on a prioritized sub-set of assays, which will capture most (>80%) of the information needed and ensuring reproducible results. The kit will measure a significantly smaller number of immune parameters than those targeted during Phase I because of the inherent informational redundancy and correlated nature of immune parameters.
The final engineering of the IMK will be done in close collaboration with the Pioneering Sites, which will use their data to inform the development of the IMK, including iterating between testing and evaluation and engineering and refinement. When complete, it will focus on three data types: (1) measurement of major immune cell frequencies; (2) single cell measurements of a subset of proteins and transcripts; and (3) single cell measurements of chromatin accessibility.
Achievement of these three goals during Phase I will substantially increase the amount of immunological data available and will enable the Human Immunome Project to initiate Phase II of our scientific plan, drastically expanding and rapidly scaling data collection worldwide.
1. Establish 70-100 Data Collection Sites Globally
In 2027, the Human Immunome Project will begin to implement Phase II, which begins with the scaling of study sites as facilitated by the achievements of Phase I. We will scale to 70-100 sites spread worldwide with n increasing from 500 to approximately 10,000 per site. Our global study protocol and scaled participant number is designed to capture diversity of individuals and thus immune functions and responses. This prioritization is critical to ensuring our dataset reflects immune variation in age, ethnicity, health conditions, sex, and socioeconomic status, recognizing that far too many scientific studies have failed in this regard and we cannot address global public health without capturing global diversity. Below is an overview of the cohort demographics.
|• Africa |
• East Asia
• Middle East
• North America • South America
|Male Female||• Newborn |
• 0 -1 Month Pre-Pubertal Children
• 1 month – 10 years.
• Every decade of life from 1-100
|Regional based divisions based on income: |
• Low Middle
• High Middle
|• Healthy individuals |
• Immuno-compromised • Chronic infections
While data collection sites will be highly distributed, HIP will serve as the centralized “mission control,” overseeing the global network and managing the logistics and scientific strategy. A Scientific Steering Committee, comprised of global experts in immunology, AI, and beyond, is responsible for study design, operations and logistics, and overall scientific direction, including final decisions on inclusion/exclusion of assays and measurement modalities.
2. Deploy Immune Monitoring Kit
A key enabling component to initiate Phase II is the completion and global deployment of HIP’s Immune Monitoring Kit. As the number of sites and participants is scaled the IMK will ensure the collection of standardized data.
The Immune Monitoring Kit is central to HIP’s scientific approach as it not only ensures consistent measurements, but also empowers local sites to develop data generation, processing, and storage capacities. As a result, the IMK will endow lower and middle-income countries with the tools to contribute data and analysis—and reap the benefits—at the same level of the most advanced immunological institutes of the world, establishing high quality standards and further encouraging global buy-in and participation.
Successful execution of Phase II will enable the development of AI models of the immune system that can predict immune responses and health trajectories of individuals using baseline information. Development of this platform is ongoing with AI industry partners and will occur simultaneously with the generation of HIP’s immunological dataset.
The Next Five Years, 2029-2033
While the first five years focus on the generation of data and development of predictive understanding of the immune system, the following five is dedicated to growing the quantitative and predictive models and building the first-ever, and publicly available, mechanistic, “computable” models of the immune system. These models will allow not only prediction of immune response outcomes using baseline information, but will provide quantitative, mechanistic information on how the immune system operates. Such information will transform our ability to generate immune interventions to optimize health health outcomes, with applications in all areas of health including vaccine development, infectious diseases, autoimmunity, pandemic preparedness, cancer, and neurodegeneration.
Two additional pillars will be pursued to achieve these goals. First, we will develop artificial immune systems to recapitulate the major features of tissue immunity and multi-tissue immune interactions, which underlie all major functions of the immune system (e.g., from protection against infection and cancer to maintaining homeostasis). This process will allow rapid data generation and testing of interventions, including both therapeutic and genetic modifications of the immune system. Second, we will develop quantitative, predictive models for molecular recognition. Though immune functions are results of molecular interactions—for example lymphocyte receptors such as T- and B-cell receptors can recognize and remember molecular patterns—we currently have extremely limited capacity both experimentally and computationally to interrogate and predict such molecular recognition at scale. By tackling these two major pillars, HIP will generate the technology and data needed to achieve not only predictive models of the immune system but also tools to enable intervention and truly mechanistic understanding of the immune system at the personal level.