Cells interact with their local environment to enact global tissue function. By harnessing gene–gene covariation in cellular neighborhoods from spatial transcriptomics data, the covariance environment (COVET) niche representation and the environmental variational inference (ENVI) data integration method model phenotype–microenvironment interplay and reconstruct the spatial context of dissociated single-cell RNA sequencing datasets.
The mission
The recent proliferation of methods for high-resolution spatial molecular measurement is broadening our potential to investigate how cells interact with their neighbors and organize tissue feaures1. However, we lack robust and information-dense ways to represent a cell’s microenvironment quantitatively. Most analysis methods simply encode the cell microenvironment using cell-type fractions, thereby collapsing expression data along a single dimension. We sought to develop a mathematical representation of the cellular niche that captures the richness of its constituent phenotypic states and that is efficient to compute, robust to noise, and biologically meaningful. Spatial datasets are currently limited to a few hundred genes, underpowering the search for important spatial trends in gene expression. This could be remedied by a method that leverages matched transcriptome-wide single-cell RNA sequencing (scRNA-seq) data to power spatial inference with our new quantitative representation of the cellular niche. The approach would surpass the capabilities of current tools by painting spatial information onto dissociated scRNA-seq data.
The solution
We based our niche representation, COVET, on the gene–gene expression covariation of a cell with its nearest cellular neighbors, as this covariation captures biological relationships and is insensitive to measurement artifacts2 (Fig. 1). The key breakthroughs were defining covariation relative to the entire dataset, to make cells comparable, and devising a metric to measure similarity between the COVET matrices of individual cells. We developed a fast method for calculating optimal transport between COVET matrices; critically, this similarity measure can be plugged directly into all the existing powerful methods for single-cell transcriptomics analysis. The COVET niche representation can thus be easily and rapidly computed for millions of cells in spatial data and subjected to dimensionality reduction, clustering, trajectory and diffusion component analysis to find highly interpretable spatial trends in both cell and gene expression space.
The usefulness of COVET grows with the number of genes measured by spatial profiling technologies. To take full advantage of current spatial datasets, which measure relatively few genes, we developed ENVI, a conditional variational autoencoder that incorporates COVET’s niche information and unifies complementary spatial and single-cell modalities into a single latent space. Unlike other machine learning approaches for spatial and single-cell data integration, ENVI models the full transcriptome as well as spatial information for all cells, and notably, it harbors two decoders that generate output from its latent space: one that imputes missing genes in a spatial image and another that confers spatial information directly onto dissociated single cells. ENVI worked successfully across disparate spatial measurement platforms, finding both discrete and continuous spatial expression trends in early gastrulation and spinal cord development, localizing rare interneuron subtypes in the motor cortex, and delineating related immune subpopulations from Xenium imaging of brain tissue bearing a metastasis that spatial data alone failed to distinguish.
Future directions
We look forward to seeing researchers use COVET and ENVI to learn which genes drive spatial trends and interactions across diverse environments. Our efficient calculation of optimal transport ensures that very large datasets, which often contain rare but important biological signals, can benefit from interrogation tools developed for single-cell analysis. ENVI reveals that supervising machine learning with spatial data can recover key environmental factors from gene expression alone. For spatial technologies that evolve to measure many hundreds to thousands of genes, COVET will provide rich transcriptional information without the need for single-cell data integration.
Our approach could be augmented to identify a broader set of spatial trends. COVET is currently limited to assessing a single user-determined niche size at a time, whereas intercellular interactions can span very different length scales. Although different COVET representations can be computed rapidly for different niche sizes on the same data, a multiscale implementation could uncover biological trends at multiple scales more systematically.
As spatial platforms capture more intercellular signaling molecules, computation will also evolve to reveal more functional interactions between cells. Future work will hopefully be able to extract such information from the COVET representation by exploiting modularity in receptor and ligand expression3.
Doron Haviv & Tal Nawy
Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Expert opinion
“The novelty of the ENVI model lies in its ability to utilize spatial context (COVET) and full transcriptome information to learn reliable information transfer between modalities, which is one of the main reasons why ENVI outperforms other algorithms.” Kun Qu, University of Science and Technology of China, Hefei, China.
Behind the paper
As spatial transcriptomics technologies become ubiquitous, it is imperative to provide researchers with methods that put spatial context first. COVET capitalizes on the full continuous and quantitative aspects of spatial data. Our goal was to bridge complementary information from spatial and single-cell data in silico, to produce full-transcriptome datasets containing spatial information. One of the greatest impacts on COVET development was not conceptual, but mathematical. Serendipitously, by approximating the optimal transport distance between Gaussians, we found that covariance matrices can be efficiently compared via a variant of Euclidean distance. This renders the interpretation of COVET practically instantaneous using existing popular scRNA-seq analysis tools. This study required expertise across machine-learning, cancer immunology and developmental biology, demonstrating the power of collaborative efforts spanning multiple domains of knowledge. D.H.
From the editor
“While there are an abundance of computational analysis tools for spatial transcriptomics data, this paper stood out because of the innovative way in which spatial correlations are modeled.” Editorial Team, Nature Biotechnology.
References
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022). A review article that covers recent developments in multiplexed spatial measurement technologies and analysis methods.
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018). This paper analyzes breast tumor immune data, showing that gene–gene covariance is a robust measure that overcomes artifacts such as technical batch effects.
Burdziak, C. et al. Epigenetic plasticity cooperates with cell–cell interactions to direct pancreatic tumorigenesis. Science 380, eadd5327 (2023). This paper profiles genetic mouse models of pancreatic cancer to show that oncogenic mutation and tissue injury reshape communication between receptor and ligand gene modules in the tumor microenvironment.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is a summary of: Haviv, D. et al. The covariance environment defines cellular niches for spatial inference. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02193-4 (2024)
Rights and permissions
About this article
Cite this article
Capturing and modeling cellular niches from dissociated single-cell and spatial data. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02207-1
Published:
DOI: https://doi.org/10.1038/s41587-024-02207-1