Abstract
Modern multiomic technologies can generate deep multiscale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make analyses and integration of high-dimensional omic datasets challenging. Here we present Significant Latent Factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous false discovery rate control. Using SLIDE on single-cell and spatial omic datasets, we uncovered significant interacting latent factors underlying a range of molecular, cellular and organismal phenotypes. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches. More importantly, it provides biological inference beyond prediction that other methods do not afford. Thus, SLIDE is a versatile engine for biological discovery from modern multiomic datasets.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data including the SSc scRNA-seq, 10X Visium, Slide-seq, CD4 T cell scRNA-seq and TCR-seq datasets and associated documentation are available at https://github.com/jishnu-lab/SLIDE and at https://github.com/jishnu-lab/SLIDEpre. Corresponding stable releases are available at https://doi.org/10.5281/zenodo.10159961 and https://doi.org/10.5281/zenodo.10159957, respectively. The relevant datasets have also been deposited at the Gene Expression Omnibus (accession IDs: GSE245112 and GSE247410 for the spatial and T1D datasets, respectively). Source data are provided with this paper.
Code availability
All code and documentation is available at https://github.com/jishnu-lab/SLIDE and at https://github.com/jishnu-lab/SLIDEpre. Corresponding stable releases are available at https://doi.org/10.5281/zenodo.10159961 and https://doi.org/10.5281/zenodo.10159957, respectively.
References
Altman, N. & Krzywinski, M. Regression diagnostics. Nat. Methods 13, 385–386 (2016).
Peddireddy, S. P. et al. Antibodies targeting conserved non-canonical antigens and endemic coronaviruses associate with favorable outcomes in severe COVID-19. Cell Rep. 39, 111020 (2022).
Das, J. et al. Delayed fractional dosing with RTS,S/AS01 improves humoral immunity to malaria via a balance of polyfunctional NANP6- and Pf16-specific antibodies. Medicine 2, 1269–1286 e1269 (2021).
Suscovich, T. J. et al. Mapping functional humoral correlates of protection against malaria challenge following RTS,S/AS01 vaccination. Sci. Transl. Med. 12, eab4757 (2020).
Lu, L. L. et al. Antibody Fc glycosylation discriminates between latent and active tuberculosis. J. Infect. Dis. 13, 2093–2102 (2020).
Ackerman, M. E. et al. Route of immunization defines multiple mechanisms of vaccine-mediated protection against SIV. Nat. Med. 24, 1590–1598 (2018).
Das, J. et al. Mining for humoral correlates of HIV control and latent reservoir size. PLoS Pathog. 16, e1008868 (2020).
Li, S. et al. Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat. Immunol. 15, 195–204 (2014).
Vafaee, F. et al. A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis. NPJ Syst. Biol. Appl 4, 20 (2018).
Nakaya, H. I. et al. Systems biology of vaccination for seasonal influenza in humans. Nat. Immunol. 12, 786–795 (2011).
Bzdok, D., Altman, N. & Krzywinski, M. Statistics versus machine learning. Nat. Methods 15, 233–234 (2018).
Bing, X. et al. Essential regression: a generalizable framework for inferring causal latent factors from multi-omic datasets. Patterns 3, 100473 (2022).
Bing, X., Bunea, F., Royer, M. & Das, J. Latent model-based clustering for biological discovery. iScience 14, 125–135 (2019).
Barber, R. F. & Candés, E. J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055–2085 (2015).
Tabib, T. et al. Myofibroblast transcriptome indicates SFRP2hi fibroblast progenitors in systemic sclerosis skin. Nat. Commun. 12, 4384 (2021).
Stifano, G. et al. Skin gene expression is prognostic for the trajectory of skin disease in patients with diffuse cutaneous systemic sclerosis. Arthritis Rheumatol. 70, 912–919 (2018).
Nazari, B. et al. Altered dermal fibroblasts in systemic sclerosis display podoplanin and CD90. Am. J. Pathol. 186, 2650–2664 (2016).
Bhattacharyya, S. et al. Tenascin-C drives persistence of organ fibrosis. Nat. Commun. 7, 11703 (2016).
Rice, L. M. et al. A longitudinal biomarker for the extent of skin disease in patients with diffuse cutaneous systemic sclerosis. Arthritis Rheumatol. 67, 3004–3015 (2015).
Farina, G., Lafyatis, D., Lemaire, R. & Lafyatis, R. A four-gene biomarker predicts skin disease in patients with diffuse cutaneous systemic sclerosis. Arthritis Rheum. 62, 580–588 (2010).
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
Boulesteix, A. L. & Strimmer, K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 8, 32–44 (2007).
Bair, E., Hastie, T., Paul, D. & Tibshirani, R. Prediction by supervised principal components. J. Am. Stat. Assoc. 101, 119–137 (2006).
Xue, D. et al. Expansion of fcγ receptor IIIa-positive macrophages, Ficolin 1-positive monocyte-derived dendritic cells, and plasmacytoid dendritic cells associated with severe skin disease in systemic sclerosis. Arthritis Rheumatol. 74, 329–341 (2022).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Berkowitz, J. S. et al. Cell type-specific biomarkers of systemic sclerosis disease severity capture cell-intrinsic and cell-extrinsic circuits. Arthritis Rheumatol. 75, 1819–1830 (2023).
Gourh, P. et al. HLA and autoantibodies define scleroderma subtypes and risk in African and European Americans and suggest a role for molecular mimicry. Proc. Natl Acad. Sci. USA 117, 552–562 (2020).
Apostolidis, S. A. et al. Single cell RNA sequencing identifies HSPG2 and APLNR as markers of endothelial cell injury in systemic sclerosis skin. Front. Immunol. 9, 2191 (2018).
Wu, M. et al. Identification of cadherin 11 as a mediator of dermal fibrosis and possible role in systemic sclerosis. Arthritis Rheumatol. 66, 1010–1021 (2014).
Khanna, D. et al. Tofacitinib blocks IFN-regulated biomarker genes in skin fibroblasts and keratinocytes in a systemic sclerosis trial. JCI Insight 7, e159566 (2022).
Gregory, L. G. & Lloyd, C. M. Orchestrating house dust mite-associated allergy in the lung. Trends Immunol. 32, 402–411 (2011).
He, K. et al. Blimp-1 is essential for allergen-induced asthma and Th2 cell development in the lung. J. Exp. Med. 217, e20190742 (2020).
Rodriques, S. G. et al. SLIDE-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).
Goltsev, Y. et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell 174, 968–981 e915 (2018).
Zhang, X. & Kohl, J. A complex role for complement in allergic asthma. Expert Rev. Clin. Immunol. 6, 269–277 (2010).
Nobs, S. P. et al. PPARγ in dendritic cells and T cells drives pathogenic type-2 effector responses in lung inflammation. J. Exp. Med. 214, 3015–3035 (2017).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Blank, C. U. et al. Defining ‘T cell exhaustion’. Nat. Rev. Immunol. 19, 665–674 (2019).
Altin, J. A. et al. Ndfip1 mediates peripheral tolerance to self and exogenous antigen by inducing cell cycle exit in responding CD4+ T cells. Proc. Natl Acad. Sci. USA 111, 2067–2074 (2014).
Hu, Z. et al. Annexin A5 is essential for PKCθ translocation during T-cell activation. J. Biol. Chem. 295, 14214–14221 (2020).
Szabo, P. A. et al. Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease. Nat. Commun. 10, 4706 (2019).
Zakharov, P. N., Hu, H., Wan, X. & Unanue, E. R. Single-cell RNA sequencing of murine islets shows high cellular complexity at all stages of autoimmune diabetes. J. Exp. Med. 217, e20192362 (2020).
Acknowledgements
J.D. was supported in part by NIAID DP2AI164325, NIAID R01AI170108 and NHGRI U01HG012041. The authors acknowledge support from the University of Pittsburgh Center for Research Computing through the high-performance computing resources provided. The authors acknowledge all members of the Das lab for helpful discussions.
Author information
Authors and Affiliations
Contributions
J.D. conceived of the project and supervised all aspects. X.B., F.B. and M.W. developed the theoretical foundations of the method. J.R., H.X., A.R. and A.B.I.R. implemented SLIDE. R.A.L. designed and assembled the SSc cohort; T.T. carried out the corresponding scRNA-seq experiments. A.V.J. designed the T1D scRNA-seq/TCR-seq experiments, which were executed by P.M.Z. A.C.P. designed the 10X Visium and Slide-seq experiments, which were carried out by K.H. J.D. designed all computational analyses which were carried out by J.R. and H.X. J.R., H.X. and J.D. interpreted results with inputs from R.A.L., A.V.J. and A.C.P. J.D., J.R. and H.X. wrote the manuscript with input from all authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling editor: Madhura Mukhopadhyay, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Note 1 and Method.
Source data
Source Data Fig. 2
Statistical source data for Fig. 2c,f–h,I–k.
Source Data Fig. 3
Statistical source data for Fig. 3c,f,h,g,j,m.
Source Data Fig. 4
Statistical source data for Fig. 4c,f,g,i,l.
Source Data Fig. 5
Statistical source data for Fig. 5c,g,h,k,l.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rahimikollu, J., Xiao, H., Rosengart, A. et al. SLIDE: Significant Latent Factor Interaction Discovery and Exploration across biological domains. Nat Methods 21, 835–845 (2024). https://doi.org/10.1038/s41592-024-02175-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02175-z
This article is cited by
-
Artificial intelligence for omics data analysis
BMC Methods (2024)
-
De novo identification of CD4+ T cell epitopes
Nature Methods (2024)
-
Investigating immunity
Nature Methods (2024)