Pirastu et al.1 perform the largest GWAS to date on male-pattern baldness (MPB), discover 71 loci (of which 30 are new) and draw inference about its heritability and genetic architecture. They report a SNP heritability on the scale of liability (hl2) of 94%, with 38% of total heritability explained by the 71 loci. From these estimates, they draw strong conclusions about the genetic architecture of MPB. However, the chosen definition of the phenotype and the applied transformation to the unobserved scale of liability have led to a large upwards bias of the estimates of these parameters, as shown here in theory and from data.
In the UK Biobank (UKB), MPB is measured on a four-point ordinal scale (values 1–4, with 1 representing no sign of baldness). Using the same UKB sub-sample selection as Pirastu et al. (unrelated British, genetically Caucasian, n = 54,813), the proportion of men with self-report MPB in each category is 0.317, 0.229, 0.269 and 0.185, respectively. In analysis, the authors ignore 23% of the population with a score of 2, and define ‘cases’ as those with self-reported scores of 3 or 4, and ‘controls’ as self-reported scores of 1, leading to a ‘prevalence’ of 59%. Yet the reported hl2 estimates are presented as if parameters in the (whole) population. An implicit assumption of their approach is that those self-reporting a score of 2, which they consider to be ‘rather dubious baldness’, are randomly drawn from the population. To determine if this assumption is valid, we took the 47 most associated independent autosomal loci that were identified independently2,3,4,5,6,10 of the UKB data (to avoid bias) and then used the same UKB data as in Pirastu et al. to estimate the frequencies of the trait-increasing alleles for each of the 4 scores. The results (Fig. 1) show that these frequencies are approximately linear in scores 1–4, and clearly score 2 is not random with respect to liability. Moreover, the observed pattern is consistent with an additive model on the scale of these scores. Therefore, since a score of 2 is correlated with liability to MPB, ignoring individuals with a score of 2, without accounting for the resulting extreme tail ascertainment, will lead to a bias in the estimate of genetic parameters. We derived from theory the general transformation equation that should be applied to the estimate of heritability made on the binary observed scale in samples that are ascertained based on tail selection and/or oversampling of cases or controls (\(h_{o[s]}^2\)) to achieve unbiased estimates of hl2 (equation [1] in Supplementary Methods).
We first replicated the results of Pirastu et al., using their sampling design and model (as best as we could deduce from the details provided) and using the same UK Biobank data. The estimate \(h_{o[s]}^2\) for scores 3 + 4 vs. score 1 using GCTA7 was 0.61 (s.e. = 0.03). If this is transformed to the scale of liability using the standard equation8 (equation [2] in Supplementary Methods) then the estimate of hl2 is 0.98 (standard error, s.e. = 0.04) similar to the estimate reported by Pirastu et al. However, the correct transformation (equation [1] in Supplementary Methods) generates an estimate of 0.64 (s.e. = 0.03). To empirically explore assumptions of the liability threshold model, we analysed random samples of 20,000 males dichotomised in a number of ways (Table 1). These analyses generated estimates of hl2 in the range of 0.61–0.75. We also analysed MPB on the continuous scale of 1–4, which does not remove information through dichotomisation, transforming the estimate of heritability to the liability scale hl2 = 0.69 (s.e. = 0.03)9 (equation [3] in Supplementary Methods).
We estimated the variance explained by the 107 SNP predictor from the difference in the estimate of total phenotypic variance in models excluding and including the predictor as a fixed effect. This method for estimation of the contribution of the SNP predictor to trait variation differs to that presented by Pirastu et al. In contrast to their approach, it does not depend on unbiased estimation of genetic variance in the two models. Moreover, it is accurate (the s.e. of estimating a phenotypic variance is small) and quantifies a parameter that is most relevant to epidemiology and risk prediction. From the estimate of the variance explained by the predictor, we calculated the proportion of variance it explained on the observed scale and then transformed this proportion to the scale of liability. Results (Table 1) imply that the variance in liability attributable to this predictor is ~15–20%, substantially less than claimed by the authors.
In conclusion, the evidence presented by Pirastu et al. is not consistent with the claims that virtually all variation in liability to MPB is genetic and that common SNPs capture all that variation. A correct transformation from the observed scale to a scale of liability results in an estimate of SNP heritability of ~60–70%, and the 71-loci (107-SNP predictor) explains about 15–20% of variation in liability.
Change history
20 November 2018
The original version of this Article contained an error in the spelling of the author Julia Sidorenko, which was incorrectly given as Julia Sirodenko. This has now been corrected in both the PDF and HTML versions of the Article. Further, the sixth sentence of the second paragraph of the Correspondence and the legend to Fig. 1 incorrectly omitted citation of work by Heilmann-Helmbach, S. et al. This has now been corrected in both the PDF and HTML versions of the Article.
References
Pirastu, N. et al. GWAS for male-pattern baldness identifies 71 susceptibility loci explaining 38% of the risk. Nat. Commun. 8, 1584 (2017).
Li, R. et al. Six novel susceptibility Loci for early-onset androgenetic alopecia and their unexpected association with common diseases. PLOS Genet. 8, e1002746 (2012).
Hillmer, A. M. et al. Susceptibility variants for male-pattern baldness on chromosome 20p11. Nat. Genet. 40, 1279–1281 (2008).
Brockschmidt, F. F. et al. Susceptibility variants on chromosome 7p21.1 suggest HDAC9 as a new candidate gene for male-pattern baldness. Br. J. Dermatol. 165, 1293–1302 (2011).
Richards, J. B. et al. Male-pattern baldness susceptibility locus at 20p11. Nat. Genet. 40, 1282–1284 (2008).
Heilmann, S. et al. Androgenetic alopecia: identification of four genetic risk loci and evidence for the contribution of WNT signaling to its etiology. J. Invest. Dermatol. 133, 1489–1496 (2013).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Gianola, D. Heritability of polychotomous characters. Genetics 93, 1051–1055 (1979).
Heilmann-Helmbach, S. et al. Meta-analysis identifies novel risk loci and yields systematic insights into the biology of male-pattern baldness. Nat Commun. 8, 14694 (2017).
Acknowledgements
This research has been conducted using the UK Biobank Resource under project 12514.
Author information
Authors and Affiliations
Contributions
P.M.V. and N.R.W. designed the experiment and derived theory. C.Y., J.S., R.E.M., and L.Y. performed analyses, and all authors contributed to writing the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yap, C.X., Sidorenko, J., Marioni, R.E. et al. Misestimation of heritability and prediction accuracy of male-pattern baldness. Nat Commun 9, 2537 (2018). https://doi.org/10.1038/s41467-018-04807-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-018-04807-3
This article is cited by
-
Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder
Nature Genetics (2024)
-
Genetic heterogeneity and subtypes of major depression
Molecular Psychiatry (2022)
-
Genetic and clinical characteristics of treatment-resistant depression using primary care records in two UK cohorts
Molecular Psychiatry (2021)
-
A large population-based investigation into the genetics of susceptibility to gastrointestinal infections and the link between gastrointestinal infections and mental illness
Human Genetics (2020)
-
A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment
Nature Neuroscience (2019)