Revisiting the use of structural similarity index in Hi-C

Lee, Hanjun; Blumberg, Bruce; Lawrence, Michael S.; Shioda, Toshihiro

doi:10.1038/s41588-023-01594-6

Matters Arising
Published: 05 December 2023

Revisiting the use of structural similarity index in Hi-C

Nature Genetics volume 55, pages 2049–2052 (2023)Cite this article

2072 Accesses
1 Citations
19 Altmetric
Metrics details

Subjects

Matters Arising to this article was published on 05 December 2023

The Original Article was published on 19 October 2020

Access through your institution

Buy or subscribe

arising from S. Galan et al. Nature Genetics https://doi.org/10.1038/s41588-020-00712-y (2020)

Identifying dynamic changes in chromatin conformation is a fundamental task in genetics that is rapidly advancing understanding of how genes are expressed and regulated in cells. CHESS (Comparison of Hi-C Experiments using Structural Similarity), a computational algorithm developed by Galan and Machnik et al.¹, is a prominent example of this growing endeavor in the discipline to achieve systematic identification of structural differences from chromosome conformation capture studies, such as those performed with Hi-C. Here we report that the main output of CHESS, the structural similarity index (SSIM), is more strongly influenced by unrelated genomic variables than by actual structural differences, leading to independently reproduced findings that the genome-wide distribution of SSIM remains largely unchanged in diverse query–reference pairs, even when query and reference reads are shuffled together to eliminate differences. Our findings advise caution in the use of the CHESS algorithm, specifically in interpreting SSIM profiles generated by CHESS, and emphasize the need for standardized quality control protocols in chromosome conformation capture studies to accurately evaluate metric dependency on structural differences.

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Distributions of mean SSIM in Hi-C experiments.**

**Fig. 2: Intrinsic limitations of applying SSIM in Hi-C experiments.**

Data availability

HIC files for both the DLBCL and healthy B cell datasets² are available for download at https://github.com/vaquerizaslab/chess/tree/master/examples/dlbcl, while the raw FASTQ files can be accessed from the ArrayExpress archive under accession code E-MTAB-5875. COOL files for the reproduced and shuffled data can be accessed from https://github.com/hanjunlee21/StructuralSimilarity/tree/main/COOL and have been deposited at Zenodo (https://doi.org/10.5281/zenodo.7937194). HIC files for seven human cell types^5,6 are available for download at the Gene Expression Omnibus under accession code GSE63525. FASTQ files for the GM12878 dataset are available for download at GSM2360314. The DNase I hypersensitivity assay dataset for GM12878 is available for download at https://www.encodeproject.org/experiments/ENCSR000EMT/. Source data are provided with this paper.

Code availability

All code required for the reproduction of our findings is available on GitHub (https://github.com/hanjunlee21/StructuralSimilarity) and has been deposited at Zenodo (https://doi.org/10.5281/zenodo.7937194). The HiCShuffle source code is publicly available at https://github.com/hanjunlee21/HiCShuffle and is indexed in PyPI as hicshuffle. The HiCShuffle source code has been deposited in Zenodo at https://doi.org/10.5281/zenodo.7937187. The CHESS source code¹ is publicly available at https://github.com/vaquerizaslab/CHESS and is indexed in PyPI as chess-hic.

References

Galan, S. et al. CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction. Nat. Genet. 52, 1247–1255 (2020).
Article CAS PubMed PubMed Central Google Scholar
Díaz, N. et al. Chromatin conformation analysis of primary patient tissue using a low input Hi-C method. Nat. Commun. 9, 4938 (2018).
Article PubMed PubMed Central Google Scholar
Van Der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014).
Article PubMed PubMed Central Google Scholar
Ing-Simmons, E., Machnik, N. & Vaquerizas, J. M. Reply to: Revisiting the use of structural similarity index in Hi-C. Nat. Genet. https://doi.org/10.1038/s41588-023-01595-5 (2023).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sanborn, A. L. et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA 112, E6456–E6465 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central Google Scholar
Müller, C. A. et al. The dynamics of genome replication using deep sequencing. Nucleic Acids Res. 42, e3 (2014).
Article PubMed Google Scholar
Van Steensel, B. & Belmont, A. S. Lamina-associated domains: links with chromosome architecture, heterochromatin, and gene repression. Cell 169, 780–791 (2017).
Article PubMed PubMed Central Google Scholar
Djekidel, M. N., Chen, Y. & Zhang, M. Q. FIND: difFerential chromatin INteractions Detection using a spatial Poisson process. Genome Res. 28, 412–422 (2018).
Article CAS PubMed PubMed Central Google Scholar
Knight, P. A. & Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 33, 1029–1047 (2013).
Article Google Scholar
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Article PubMed Google Scholar
Busby, M. A. et al. Expression divergence measured by transcriptome sequencing of four yeast species. BMC Genomics 12, 635 (2011).
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA, USA
Hanjun Lee, Michael S. Lawrence & Toshihiro Shioda
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Hanjun Lee & Michael S. Lawrence
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Hanjun Lee
Department of Pathology, Harvard Medical School, Charlestown, MA, USA
Hanjun Lee & Michael S. Lawrence
Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
Bruce Blumberg
Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, USA
Bruce Blumberg
Department of Medicine, Harvard Medical School, Charlestown, MA, USA
Toshihiro Shioda

Authors

Hanjun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Blumberg
View author publications
You can also search for this author in PubMed Google Scholar
Michael S. Lawrence
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Shioda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L., B.B., M.S.L. and T.S. conceptualized the study. H.L. designed the software and carried out the investigation. H.L. prepared and wrote the original draft of the manuscript. H.L., B.B., M.S.L. and T.S. reviewed and edited the draft. H.L., M.S.L. and T.S. supervised the study.

Corresponding authors

Correspondence to Hanjun Lee or Toshihiro Shioda.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Schematic of data shuffling.

To destroy any significant differences in chromatin contacts between the query and reference input files of CHESS, reads from the FASTQ files of Díaz et al.² were shuffled to create hybrid FASTQ files containing identical fraction of reads from DLBCL and NORMAL libraries. DLBCL, diffuse large B-cell lymphoma.

Extended Data Fig. 2 Assessment of the mean SSIM subtraction approach proposed by Ing-Simmons et al.4.

a, Distributions of mean SSIM in chromosome 2p for each chromatin-contact map comparison. Gray line indicates the subtracted mean SSIM value that is defined as the difference between the mean SSIM value of diffuse large B cell lymphoma versus healthy B cells (blue) and the mean SSIM value of two shuffled datasets (red). b, Scatter plot on the relationship between the mean SSIM values of diffuse large B cell lymphoma versus healthy B cells and the subtracted mean SSIM values (Pearson’s r = −0.012, P = 0.796; two-tailed test). DLBCL, diffuse large B cell lymphoma; SSIM, structural similarity index measure.

Source data

Extended Data Fig. 3 Assessment of the heuristic approach proposed by Ing-Simmons et al.4.

a, Scatter plots on three key metrics (mean SSIM, inverse of the Fano factor, and mean absolute fold change). Magenta dots indicate regions that passed the heuristically defined thresholds proposed by Ing-Simmons et al.⁴ (bottom 10th percentile for mean SSIM and 90th percentile for the Fano factor), while gray dots indicate regions that failed the thresholds. For each group, three representative regions were selected for further analyses (panels 1–6). b, Chromatin-contact maps for panels 1–6. Regions that passed the heuristically defined thresholds exhibited shallow read coverage and showed limited evidence of differential chromatin contact. DLBCL, diffuse large B-cell lymphoma; SSIM, structural similarity index measure.

Source data

Extended Data Fig. 4 Schematic of data shuffling using HiCShuffle.

HiCShuffle is a python-based software that is indexed in PyPI as hicshuffle. HiCShuffle generates four GZIP-compressed shuffled FASTQ files for paired-end experiments. Each FASTQ file would contain half of the query FASTQ file and half of the reference FASTQ file. Both FASTQ and GZIP-compressed FASTQ formats are compatible with HiCShuffle. HiCShuffle is compatible with UNIX-based systems.

Supplementary information

Reporting Summary

Source data

Source Data Extended Data Figs. 2 and 3

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, H., Blumberg, B., Lawrence, M.S. et al. Revisiting the use of structural similarity index in Hi-C. Nat Genet 55, 2049–2052 (2023). https://doi.org/10.1038/s41588-023-01594-6

Download citation

Received: 07 September 2021
Accepted: 17 October 2023
Published: 05 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.1038/s41588-023-01594-6

This article is cited by

Reply to: Revisiting the use of structural similarity index in Hi-C
- Elizabeth Ing-Simmons
- Nick Machnik
- Juan M. Vaquerizas
Nature Genetics (2023)

Revisiting the use of structural similarity index in Hi-C

Subjects

Access options

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 Schematic of data shuffling.

Extended Data Fig. 2 Assessment of the mean SSIM subtraction approach proposed by Ing-Simmons et al.4.

Extended Data Fig. 3 Assessment of the heuristic approach proposed by Ing-Simmons et al.4.

Extended Data Fig. 4 Schematic of data shuffling using HiCShuffle.

Supplementary information

Reporting Summary

Source data

Source Data Extended Data Figs. 2 and 3

Rights and permissions

About this article

Cite this article

This article is cited by

Reply to: Revisiting the use of structural similarity index in Hi-C

Search

Quick links

Subjects

Access options

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 Schematic of data shuffling.

Extended Data Fig. 2 Assessment of the mean SSIM subtraction approach proposed by Ing-Simmons et al.4.

Extended Data Fig. 3 Assessment of the heuristic approach proposed by Ing-Simmons et al.4.

Extended Data Fig. 4 Schematic of data shuffling using HiCShuffle.

Supplementary information

Reporting Summary

Source data

Source Data Extended Data Figs. 2 and 3

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Reply to: Revisiting the use of structural similarity index in Hi-C

Search

Quick links