Main

Germline mutations are the proximate source of genomic innovation and inherited diseases4. Consequently, considerable effort has been spent on characterizing the molecular processes underlying these mutations and estimating germline mutation rates (GMRs). Mutations are rare events, yet the frequency at which they are introduced into genomes at each generation varies considerably across taxa, from approximately 10−11 mutations per site per generation in unicellular eukaryotes up to approximately 10−7 mutations per site per generation in multicellular eukaryotes1,5,6. Inferring the driving forces of GMR evolution has important implications for understanding the mechanisms underlying mutagenesis. Several hypotheses have been proposed to explain variation in GMRs among lineages. Some of these invoke molecular mechanisms such as DNA methylation7 or microsatellite instability8, whereas others invoke external factors such as exposure to mutagenic environments9. Other studies have argued that life-history traits might explain some of the variation both in the prevalence of mutations and in the ability to repair DNA. In particular, the generation time10 and the metabolic rate11 have been suggested to be key life-history traits that could be associated with germline mutations. From a long-term evolutionary perspective, the ‘drift barrier hypothesis’ proposes that lower mutation rates may reflect the increased efficiency of natural selection at reducing the occurrence of mutations in species with large effective population sizes3.

However, a lack of accurate and standardized GMR estimation has so far precluded testing current hypotheses of GMR evolution. Pedigree-based estimates of GMRs per generation have recently been published for a handful of vertebrate species, mainly focusing on humans and primates12,13,14,15,16,17. Furthermore, a recent comparative study of 16 mammalian species identified an effect of lifespan on somatic mutation rates inferred from the sequencing of intestinal crypts18. Nevertheless, interspecific comparisons of GMR variation remain restricted in taxonomic scope19, partly due to the difficulty of comparing GMR estimates derived using different methodologies2. For example, alternative bioinformatic pipelines used in different studies can yield GMR estimates that vary by a factor of two, even when applied to the same parent–offspring trios2. This highlights the importance of applying consistent analytical pipelines for interspecies comparisons of GMRs. We therefore generated high-depth genome sequences (average coverage of more than 67×) for 323 individuals representing 151 trios of 68 vertebrate species, including 36 mammals, 18 birds, 8 ray-finned fishes and 6 reptiles (Supplementary Table 1). We then quantified species-specific GMRs across this wide range of vertebrate taxa using consistent bioinformatics pipelines to test long-standing evolutionary hypotheses on GMR evolution.

Per-generation mutation rate variation

We first estimated the per generation GMR (µgeneration) for each trio (that is, mother, father and offspring) by comparing parental and offspring genomes (Fig. 1a, Supplementary Tables 2 and 3 and Supplementary Figs. 1–5 for details on the method). Overall, µgeneration varies by a factor of 40 across all species. On average, mutation rates per generation are higher in reptiles (average of all species 1.17 × 10−8, 95% CI of the mean = 5.34 × 10−9 to 1.80 × 10−8) and birds (average of all species 1.01 × 10−8, 95% CI of the mean = 6.10 × 10−9 to 1.42 × 10−8) than in mammals (average of all species 7.97 × 10−9, 95% CI of the mean = 7.04 × 10−9 to 8.90 × 10−9) and fishes (average of all species 5.97 × 10−9, 95% CI of the mean = 4.39 × 10−9 to 7.55 × 10−9). However, the difference among the four major classes of vertebrates is not overall statistically significant (analysis of variance (ANOVA): F = 1.86, P = 0.15). Furthermore, the amount of variation in µgeneration among species tends to be higher for birds and lower for mammals and fishes (Fig. 1a), although this variation is arguably modest given large differences in life-history traits among these species (for example, there is a 2.8 million-fold difference in the body mass of killer whales and Siamese fighting fish, and there is a 93-fold difference in the generation time between humans and Texas banded geckos).

Fig. 1: Variation in GMRs and their association with life-history traits across 68 vertebrate species.
figure 1

a, The phylogenetic tree of 68 species is based on UCE data and is calibrated with fossil data at 14 nodes (see Methods; Extended Data Fig. 3 and Supplementary Fig. 8). The average pedigree-based mutation rates per generation for each species, which are represented by the squares, show 40-fold variation among species. The 95% binomial confidence intervals are shown, and individual trios are represented by round points. See Supplementary Table 8 and Extended Data Fig. 4 for a comparison with published estimated rates of closely related species. b, The per-generation mutation rate is significantly associated with the average parental age at the time of offspring production across all individuals with known paternal age (105 trios), using linear regression. For birds, this relationship is statistically significant after removing a single outlier, the Darwin’s rhea. c, The male-to-female contribution ratio (α) is estimated for groups of vertebrates having at least 30 mutations phased to their parents of origin in each group. The highest male bias (7.6:1) is found in two bird lineages, whereas fishes and reptiles show negligible male bias. The data are represented with 95% confidence intervals based on the binomial variance. The silhouette of Syngnathus scovelli was created by J.S. All other silhouettes are from PhyloPic (http://phylopic.org), except for one of the silhouettes of Sarcophilus harrisii, which was created by S. Werning, and the silhouette of Pan troglodytes, which was created by T. M. Keesey (vectorization) and T. Hisgett (photography); both are available under a CC-BY 3.0 licence (https://creativecommons.org/licenses/by/3.0).

Species with longer generation intervals are expected to have higher per-generation mutation rates due to a combination of a larger number of cell divisions in spermatogenesis and more time for DNA damage to accumulate12,13,14,20. For the 105 trios for which parental age was known at reproduction, we found a significant positive association between µgeneration and the average parental age at reproduction (linear regression adjusted r2 = 0.14, P = 3.9 × 10−5; Fig. 1b). This pattern is also significant for the 60 mammalian trios with known parental ages (linear regression adjusted r2 = 0.37, P = 1.6 × 10−7) and for the 32 bird trios after excluding a single outlier, the Darwin’s rhea (linear regression adjusted r2 = 0.31, P = 0.0005). Furthermore, all three of these regressions have similar positive y-intercept values on the order of approximately 0.59 × 10−8 mutations per site per generation. For the trios with known parental ages, paternal and maternal ages at conception are strongly correlated (linear regression adjusted r2 = 0.77, P < 2.2 × 10−16; Extended Data Fig. 1). However, multiple linear regression showed that the age of the father is the most significant explanatory variable (adjusted r2 = 0.15, P = 9.3 × 10−5; paternal age P = 0.018; maternal age P = 0.785). Thus, a stronger effect of paternal than maternal age on the mutation rate seems to be universal for birds and mammals due to more germline mutations accumulating throughout the life of the male.

The specific types of de novo mutations (DNMs) observed across the 151 trios are concordant with the results of previous studies of individual species12,13,14,21,22,23,24,25, including a ratio of transitions over transversions of 2.3 (95% CI on binomial distribution = 2.2–2.5) and a high proportion (48.5%, 95% CI on binomial distribution = 46.7–50.3%) of transitions from strong base pairing to weak base pairing (C:G > T:A) across all DNMs (Supplementary Table 4). Among C:G > T:A mutations, 42.4% (95% CI on binomial distribution = 39.9–45.0%) occurred at CpG sites. The direction of mutations from one base to another (that is, the spectrum of mutation) differed significantly across vertebrate classes (χ2 = 30.0, d.f. = 15, P = 0.012; Supplementary Table 4 and Supplementary Fig. 6). We also found significant differences among vertebrate classes for A > C mutations (χ2 = 16.2, d.f. = 3, P = 0.001) and for C > A mutations (χ2 = 8.8, d.f. = 3, P = 0.032). In particular, fish species exhibit significantly fewer A > C mutations and significantly more C > A mutations than the other vertebrate classes. However, this mutation pattern does not appear to be associated with genome-wide CG content, as overall, the CG content of fishes is similar to that of mammals and birds and lower than that of reptiles (Supplementary Fig. 7). Finally, there is no significant difference between the classes of species in the percentage of all mutations located in CpG sites (χ2 = 4.3, d.f. = 3, P = 0.23), implying that high mutation rates at CpG sites are a conserved feature across vertebrates.

Variable male-driven evolution

In mammals and birds, the much larger number of germ-cell divisions per generation in the male germ line leads to the expectation of a male mutation rate bias, coined the ‘male-driven evolution hypothesis’26,27. However, very little is known about interspecific variation in the magnitude of the male-to-female ratio of the contribution of germline mutations (α). Previous studies have reported high α values in mammals (ranging from 1.0 to 20.1)28 and birds (ranging from 3.9 to 6.5)29 based on indirect estimates obtained by comparing rates of sequence divergence on the autosomes and sex chromosomes (see Extended Data Fig. 2 and Supplementary Table 5). However, other evolutionary forces can also act differently on the X chromosome and autosomes. For example, stronger natural selection on the X chromosome could lead to lower than expected divergence from the common ancestor, upwardly biasing estimates of α28. Furthermore, estimates of α derived in this way are averages over a phylogenetic branch and may thus differ from the contemporary species α. Here we directly quantified α by assigning the parental origin of the DNMs. Around 48% of all 3,034 DNMs across all of the trios could be phased to their parental origin (see Supplementary Table 6 for positions of all mutations). Owing to the relatively small number of mutations in each trio (Supplementary Table 2), we analysed male bias after taxonomically grouping the species into classes and orders (Fig. 1c).

Mammals showed a male bias of α = 2.3 (95% CI = 2.0–2.6). In general, our α estimates are in line with previous estimates derived for similar species based on genome alignments30,31. For example, we found that among mammals, primates have the largest male bias with α = 3.8 (95% CI = 2.6–5.7), similar to what was previously reported for several species belonging to this group12,13,14,21,22,32,33. Rodents have the lowest male bias among the mammals in our study, with α = 2.1 (95% CI = 1.4–3.1), consistent with a previous study based on mouse pedigrees34. This pattern can be explained by the short generation time of rodents, which leads to a smaller difference in cell divisions between the male and female germ lines35. However, the variation in α is relatively small given the variation in generation time among species (for example, between 30 years for humans and 8 months for the short-tailed opossum). Thus, an alternative hypothesis to explain the observed α would be a higher contribution of DNA damage, specifically in the male germ line for species with large generation times31.

Birds also showed an overall high male bias with α = 3.2 (95% CI = 2.5–4.1), although there is appreciable variation among different lineages. In particular, passerine birds and waterbirds (Pelecaniformes and Sphenisciformes) exhibited the largest male bias, both with α = 7.6 (95% CI = 4.3–13.5 for Passeriformes and 95% CI = 3.5–16.3 for Pelecaniformes and Sphenisciformes). High levels of male–male competition will lead to an increased amount of sperm being produced and faster sperm turnover, which would be expected to cause a higher male bias36. Indeed, many passerine birds have large cloacal protuberances37 and relatively heavy testes38, which are often used as proxies of sperm competition39. For instance, in two of the passerine species included in our study, testes represent between 1.2% (for Turdus merula) and over 2% (for Saxicola maurus) of the total body mass38. Moreover, extra-pair mating is common in many passerine birds40 as well as in penguins41, also indicating a high level of sperm competition. Overall, our results lend further support to the male-driven hypothesis in birds and mammals27.

By contrast, reptiles have a relatively small male bias with α = 1.5 (95% CI = 1.2–1.8), whereas fishes appear to have a greater proportion of mutations of maternal origin (α = 0.8), although the 95% CI overlaps 1 (95% CI = 0.5–1.4). This variation among vertebrate classes can be explained by differences in the process of gametogenesis. Although most birds and mammals produce sperm cells continuously through time42, reptiles and fishes tend to be seasonal breeders, producing sperm cells during a limited period before the mating season43,44,45, which will tend to reduce differences in cell division numbers between males and females, leading to more equal α. Moreover, female fishes are usually synchronous ovulators46, producing hundreds to millions of eggs at the same time followed by a proliferation of new oogonia47. This implies that females continually produce germ cells throughout their life, which would further reduce the difference in cell division number between males and females.

Species with lower sex bias also exhibited a larger proportion of shared mutations between siblings, with 12.0% (s.e. of 6.5%) of shared mutations between siblings for fish and 8.1% (s.e. of 5.3%) for reptiles compared with 1.5% (s.e. of 0.7%) for mammals and 2.2% (s.e. of 1.4%) for birds (Supplementary Table 7). An explanation for the repeated occurrence of those mutations is that they appear during the primordial germ cell specification in one of the parents48. The occurrence of primordial germ cell specification mutations is independent of parental sex. Consequently, a higher number of primordial germ cell specification mutations in some vertebrate groups could be an alternative explanation for the lower male-biased contribution to DNMs.

Yearly mutation rates

To use our results for phylogenetic dating and to compare the speed of evolution among species with different generation times, we needed estimates of yearly mutation rates. Different methods have been used in the literature to estimate yearly mutation rates. When sample sizes are small, yearly rates are commonly inferred by dividing the per-generation rate by the average age of the parents (or the generation time if parental age is unknown)49,50,51. However, this method implicitly assumes a constant accumulation of mutations from conception to reproduction, that is, the regression line of mutation rate on parental age should run through the origin. Our results (Fig. 1b), as well as previous studies of mice, humans and cats20,34, imply that parents always carry a minimum number of mutations in their gametes regardless of their age. This could lead to the yearly rate being overestimated for a given species if the sampled trio (or trios) had young parents compared with the average generation time for that species52. Consequently, we built a model that incorporates this mutational contribution at birth. Unfortunately, small per-species sample sizes in our dataset precluded modelling the effects of parental age separately for each species. However, we observed very similar intercepts and slopes across taxonomic groups, allowing us to fit a joint model for all species. A Poisson model explaining the number of mutations in each trio using a mutational contribution at birth and a weighted average of paternal and maternal age fits the data surprisingly well. To incorporate interspecific variation in male bias, we used the per-species fraction of paternal and maternal mutations estimated using read-backed phasing to weigh the average of the parental ages for each trio. Using this model, the number of predicted mutations matches the observed number with an overall r2 of 0.73 (mammalian r2 = 0.58, avian r2 = 0.51; Supplementary Note 1).

The yearly rates inferred with the naive method of dividing the per-generation rate by parental age (µyearly) and the rates obtained with our model (µyearly_modelled) yielded similar results (Pearson’s correlation r2 = 0.40, P = 0.002), and for 55% of the species, µyearly falls within the 95% confidence interval of the µyearly_modelled. As expected, the estimates showed the greatest differences for those species in which the parents reproduced far from the generation time, with the model-based estimates being smaller for those species that reproduced earlier than their generation time and larger for those species that reproduced later than their generation time. For example, the pigs in our dataset reproduced at around 6 months of age, which is more than 5 years earlier than the estimated generation time of this species. Thus, µyearly = 8.64 × 10−9 mutations per site per year was potentially overestimated compared with the µyearly_modelled = 1.05 × 10−9 mutations per site per year at the generation time. Conversely, the yearly rate of the Texas banded gecko was potentially underestimated at µyearly = 3.17 × 10−9 mutations per site per year using the reproductive age of 2 years of age from our dataset, whereas the modelled rate was µyearly_modelled = 1.96 × 10−8 mutations per site per year at a generation time of between 3 and 4 months. Both the naive method and the modelled method have been used in the literature to estimate yearly rates and both have caveats owing to the underlying assumptions they require. Bearing this in mind, we decided to use µyearly_modelled for the current analysis as we believe that this measure is more representative of the yearly rate at the generation time for each species (estimated yearly rates are provided in Supplementary Table 9 for comparison).

The estimated average µyearly_modelled varies more than 120-fold among species (Supplementary Note 1 and Supplementary Table 9), with the highest µyearly_modelled estimated for the Texas banded gecko at 1.96 × 10−8 mutations per site per year (95% CI = 1.23 × 10−8 to 2.83 × 10−8), whereas the lowest µyearly_modelled estimates were obtained for two bird species, the griffon vulture and the snowy owl, both with less than 0.18 × 10−9 mutations per site per year (snowy owl: µyearly_modelled = 0.16 × 10−9, 95% CI = 0.05 × 10−9 to 0.34 × 10−9; griffon vulture: µyearly_modelled = 0.17 × 10−9, 95% CI = 0.07 × 10−9 to 0.32 × 10−9). This large amount of interspecific variation is remarkable given that pedigree-based GMR estimates of individual species assessed by previous separate studies only show an approximately 16-fold variation in yearly GMRs34,51. Within primates, we observed a twofold variation across species and found a general trend for rates to be higher in the New World monkeys than in the great apes. This is consistent with previous independent estimates from primates19 and supports the ‘hominoid slowdown’ hypothesis53,54,55,56.

Next, we used µyearly_modelled to assess the strength of the association between GMRs and long-term evolutionary substitution rates. To obtain an estimate of the long-term substitution rate, we used the alignment of ultraconserved elements (UCEs), which are more likely to align among taxonomically distant species, plus 1,000 bp of flanking regions on each side of the UCE sequences, which will more closely reflect the neutral substitution rate57. We found a significant positive correlation between µyearly_modelled and the UCE substitution rate after excluding domesticated species owing to their overall much higher yearly mutation rates (see the following section; phylogenetic generalized least squares (PGLS): adjusted r2 = 0.23, P = 0.002; Fig. 2a). This pattern is especially pronounced for mammals (PGLS: adjusted r2 = 0.44, P = 0.0008), even after removing the two outliers (PGLS: adjusted r2 = 0.32, P = 0.009). We also found a significant relationship between µyearly_modelled and the long-term substitution rate inferred using whole-genome alignments (PGLS: adjusted r2 = 0.12, P = 0.02; Fig. 2b).

Fig. 2: GMRs are associated with long-term substitution rates.
figure 2

a,b, There is a positive association between the modelled yearly pedigree-based mutation rates and the macroevolutionary substitution rates when using phylogenetic regression (PGLS) on both UCEs and their flanking sequences (a) and whole-genome alignments (WGAs) (b). The grey dashed lines indicate equality. See Extended Data Fig. 5 for plots of the same data on a log scale and Extended Data Fig. 6 for a comparison of UCE and WGA methods.

Life-history traits shape GMR variation

To test various hypotheses relating to the causes of GMR variation among species, we tested for associations between the modelled mutation rate per generation (µgeneration_modelled) and life-history traits including mating system (monogamy versus polygamy), maturation time, body mass, longevity, fecundity and the generation time (Supplementary Table 9). We used the µgeneration_modelled instead of the µgeneration as the former is less dependent on the age of the parents and is more representative of the rate at generation time for a given species. Although taking into account phylogenetic relatedness, many of these traits are significantly associated with µgeneration_modelled including the generation time (PGLS: adjusted r2 = 0.15, P = 0.002; Fig. 3a), the maturation time (PGLS: adjusted r2 = 0.18, P = 0.0006; Fig. 3b) and the number of offspring per generation (PGLS: adjusted r2 = 0.10, P = 0.013; Fig. 3c). Species with a higher number of offspring per generation also showed significantly lower µgeneration_modelled when considering only mammalian species (PGLS: adjusted r2 = 0.17, P = 0.011), but this relationship was not significant for birds (PGLS: adjusted r2 = −0.066, P = 0.720). Collectively, these traits explain almost 18% of the variation in µgeneration_modelled (multiple PGLS: adjusted r2 = 0.18, P = 0.004). The other life-history traits that we tested, including longevity, mating strategy and body mass, are not significantly associated with µgeneration_modelled (see Extended Data Fig. 7).

Fig. 3: Predictors of interspecific variation in GMRs.
figure 3

ac, Significant positive associations are found using phylogenetic regression (PGLS) between the modelled per-generation mutation rates and three life-history traits: species-specific mean generation time (a), age at sexual maturity (b) and the number of offspring per generation (c). In total there are 55 species with modelled per-generation rates, including 32 mammalian and 15 avian species. The box plot in c represents the median, the interquartile range and the maximum and minimum excluding outliers. d, Species-specific average per-generation mutation rates are negatively associated with the harmonic mean of the effective population size (Ne) over the past 1 million years, using phylogenetic regression (PGLS).

Another key parameter for species evolution is the effective population size (Ne), which impacts genetic drift and the efficacy of selection. To investigate the effect of Ne on µgeneration_modelled and to test the drift barrier hypothesis3, which predicts the evolution of higher mutation rates in species with small Ne, we calculated Ne using the pairwise sequentially Markovian coalescent method based on one randomly selected father per species. To avoid circularity, we estimated Ne based on the substitution rate calculated from the UCE alignment (Supplementary Table 9). Indeed, if Ne was estimated using the pedigree-based mutation rate, a stronger correlation might arise between Ne and the mutation rate (see Extended Data Fig. 8). We found a significant negative association between µgeneration_modelled and the harmonic mean Ne per species over the past 30,000–1,000,000 years (PGLS: adjusted r2 = 0.08, P = 0.020; Fig. 3d) as would be expected under the drift barrier hypothesis. This relationship is mainly driven by mammals (PGLS: adjusted r2 = 0.31, P = 0.0006), a signal that is also observed when using the harmonic average Ne over a smaller timescale (30,000–130,000 years; PGLS: adjusted r2 = 0.10, P = 0.04, Extended Data Fig. 8). The most appropriate timeframe used to estimate Ne depends on the evolutionary time necessary for the mutation rate to adapt to changes in Ne. However, the pairwise sequentially Markovian coalescent method cannot accurately estimate recent Ne. To overcome this limitation, we also estimated Ne as π/4μ, with nucleotide diversity (π) and the substitution rate per site per generation (μ) estimated from the UCE alignments. This results in a similar negative association between Ne and µgeneration_modelled (linear regression: adjusted r2 = 0.83, P = 2.2 × 10−16; Extended Data Fig. 9), further supporting the drift barrier hypothesis. However, caution should be taken as Ne estimates rely on generation times inferred from contemporary observations, whereas generation times could conceivably have changed over evolutionary timescales. Furthermore, population size depends negatively on the generation time (PGLS Ne in log scale: adjusted r2 = 0.20, P = 0.0004). Therefore, a negative association between Ne and μ could potentially be driven by a large effect of the generation time on per-generation mutation rates.

High yearly rates in domesticated species

Domestication imposes strong artificial selection, recurrent genetic bottlenecks or both. Our dataset includes 22 domesticated or semi-wild species that have been bred in captivity for many generations. When using the naive method of dividing the per-generation rate by the parental age, these species show significantly higher µyearly than the non-domesticated species (PGLS: adjusted r2 = 0.13, P = 0.0015; Fig. 4a). The higher mutation rates of domesticated animals are likely due to strong artificial selection for traits such as shorter generation times. Indeed, using µyearly_modelled, we found no difference between domesticated and non-domesticated species (PGLS: adjusted r2 = 0.037, P = 0.08; Fig. 4b). Consequently, the higher yearly mutation rate observed in domesticated species is more likely to be explained by the lowering of reproductive age associated with domestication rather than by an inherent change to the mutational process caused by relaxed selection on the mutation rate due to small population sizes and bottlenecks associated with domestication58,59.

Fig. 4: The yearly GMRs are higher in domesticated species than in non-domesticated species.
figure 4

a, Yearly GMRs are significantly higher in domesticated or farmed species than in wild species (using phylogenetic regression (PGLS) on a total of 68 species). b, Using the modelled mutation rate instead (using phylogenetic regression (PGLS) on a total number of 55 species) shows that there is no difference in yearly GMRs between domesticated and non-domesticated animals, suggesting that this difference is mainly driven by the shorter generation time of domesticated species. The box plots represent the median, the interquartile range and the maximum and minimum excluding outliers.

Conclusions

Here we analysed pedigree-based GMR variation in an unprecedentedly broad phylogenetic context. We showed that there is a consistent male bias in mammals and birds, whereas reptiles and fish exhibited more evenly matched contributions of DNMs between parents. This could be due to contrasting mutagenic processes, such as differences in male and female germline cell division observed in mammals and birds, or differences among species in the proportion of DNMs occurring in primordial germ cell specification versus in the parental germ lines. Our results also support the drift barrier hypothesis, as we found a negative association between the per-generation mutation rate and effective population size. Moreover, our results suggest that an appreciable proportion of the variation in the GMR can be explained by life-history traits, including maturation time and the number of offspring per generation. Our study also highlights the importance of the generation time, as illustrated by the particular case of domesticated animals, in which exceptionally high yearly mutation rate estimates can be explained by artificially induced short generation times. In addition, some of the trio samples in our study were collected from captive animals at zoos or conservation centres. These populations might have different generation times than those in the wild, which could potentially introduce biases into some of our mutation rate estimates. Future studies should focus on wild pedigree samples, which can be accessed from long-term conservation and monitoring programmes60.

Methods

Samples

Samples were collected from zoos, zoological museums, research institutes and farms from all over the world. Samples were provided from collaborators for research that was undertaken at the Natural History Museum of Denmark, permit 2020-12-7186-00733 from the Danish Ministry of Environment and Food, and when applicable, CITES Certificate of Scientific Exchange number DK003. Genomic DNA was extracted using DNeasy Blood and Tissue Kits (Qiagen) following the manufacturer’s instructions. BGIseq libraries were built in China National GeneBank (CNGB), Shenzhen, China, and whole-genome paired-end sequencing (read length 2 × 100 bp) were performed on the BGISEQ500 platform. We aimed for 60–80× raw sequence coverage per sample. A total of 68 species for which a reference genome was available were retained in the final dataset, representing 151 trios for which whole blood or other tissue material was available for DNA extraction and for which parentage had been genetically determined61. Information on the samples is provided in Supplementary Table 1.

GMR estimation

We applied a similar bioinformatic analysis pipeline to our previous study of rhesus macaques12. Raw reads were trimmed with SOAPnuke filter62. The mapping was conducted with BWA-MEM version 0.7.15 (ref. 63). The versions of the reference genomes for each species are provided in Supplementary Table 9. A post-mapping step removed any reads mapping to multiple regions of the genome as well as duplicated reads using Picard MarkDuplicates 2.7.1. We called variants for each individual using HaplotypeCaller in BP-RESOLUTION mode with GATK 4.0.7.0 (ref. 64). This mode returns a genotype quality and depth for all positions of the genome, not only the polymorphic sites. As recommended by GATK best practices, GenomicsDBImport combined all gVCF files per species into a single file and GenotypeGVCF applied a joint genotyping of all samples within a given species (see Supplementary Table 3 with details of raw sequences coverage, mapping quality, and coverage after mapping and variant calling). Similar filtering methods to those in our previous study were then applied to detect DNMs12. Therefore, each trio was filtered as followed:

  1. (1)

    For site filtering, the variant positions were filtered with the following parameters: QualByDepth (QD) < 2.0, FisherStrand (FS) > 20.0, RMSMappingQuality (MQ) < 40.0, MQRankSum < −2.0, MQRankSum > 4.0, ReadPosRankSum < −3.0, ReadPosRankSum > 3.0 and StrandOddsRatio (SOR) > 3.0 according to previously tested filters12.

  2. (2)

    For Mendelian violations, variants that deviated from Mendelian inheritance were selected using GATK SelectVariant and refined with an R script to keep only sites in which both parents were homozygous for the reference allele (HomRef), and the offspring was heterozygous (Het).

  3. (3)

    For allelic balance filter, in the case of a DNM, approximately 50% of the reads in the offspring should support the alternative alleles. Our allelic balance filter cut-off was 30–70% of the reads supporting the alternative allele, similar to previous studies12,32,65,66.

  4. (4)

    For depth filter (DP), only positions with a DP > 0.5 × mdepth and DP < 2 × mdepth for each individual were kept, with mdepth being the average depth of the trio. This strict DP filter minimized the effects of sequencing errors in regions of low sequencing depth and mis-mapping errors in high-coverage regions.

  5. (5)

    For genotype quality filter (GQ), to ensure that only high-quality genotypes were retained for the analysis of trios, we removed all sites where one individual of the trio had a GQ < 60 (see Supplementary Fig. 2 for a comparison of various GQ thresholds on a subset of species).

In addition, we called variants with bcftools (version 1.2)67 in the region of the candidate DNMs and removed the sites that appeared as false-positive calls (that is, at least one parent had the same variant as the offspring or the offspring had no variant). The number of candidates discarded varied among species (Supplementary Table 2). This quality control step produced similar results to a manual check with IGV68. Moreover, calling variants with different variant callers has been shown to be an efficient method to reduce false-positive calls2. All positions of DNMs are provided in Supplementary Table 6. In addition, we showed that sample type, reference genome quality and mapping quality can affect the results on the number of candidates, the false-positive rate and false-negative rate (FNR), yet, the estimated mutation rates are not affected (Supplementary Figs. 3–5).

To estimate per-generation rates, we divided the number of candidate DNMs, without the apparent false-positive candidates, per the callable genome. A site was considered callable when it passed the same filters as the polymorphic sites, that is, when both parents were HomRef (filter 2) and the three individuals passed the depth filter (filter 4) and the genotype quality threshold (filter 5). On the sites considered callable, we applied a correction for the FNR, that is, the proportion of sites where true DNMs will not be called as such. Two methods have been used in the literature to estimate FNR: one is the simulation of mutations and the other is a correction on the filters that are not accounted for in the callable genome. As in our previous study of GMR12, we used the latter method, which is more conservative. This corrected for the remaining filters that can only be applied on polymorphic sites, such as the site filters and the allelic balance filter (filter 2). We estimated the proportion of sites that would be filtered away by the site filters on the parameters following a known distribution (FS, MQRankSum and ReadPosRankSum), and the expected sites filtered away by the allelic balance filter as the number of true heterozygote sites (one parent HomRef, the other parent HomAlt and their offspring Het) outside the allelic balance threshold. The mutation rate per site per generation was then estimated per trio as µgeneration = DNMs/((1 − FNR) × 2 × CG). We estimated the 95% binomial confidence interval per species using the binconf() function in R, with the default Wilson score.

To calculate yearly rates (µyearly), we divided the per-generation rate by the average age of the parents at the time of reproduction weighted by the relative contribution of each parent (inferred with α for 105 trios) or by the generation time (for 46 trios without parental ages). The resulting µyearly estimates were averaged per species (for 29 species with multiple trios available). These yearly rates are dependent on the age of reproduction of the parents. Therefore, to calculate a yearly rate at generation time, we first modelled how the mutation rate of a trio was affected by the weighted average of the parental ages (using the paternal fraction estimated for that species as a weight). We then extended the model to fit how each species deviated from the average and used this to correct for differences between the observed reproductive age in our dataset and the expected generation time of a species (see Supplementary Note 1). With this, we estimated a new µyearly_modelled and a µgeneration_modelled that are more representative of the rate at generation time for each species.

Phylogenetic analysis

The phylogeny was built based on two sets of UCEs: 5,472 baits for 5,060 UCEs in tetrapods57 and 2,628 baits for 1,314 UCEs in acanthomorphs69. We used the Phyluce software70 to locate the probes in the reference genomes of our 68 species with 6 additional species contained in our original dataset. We extracted a flanking region of ±1,000 bp for each probe and aligned them with Mafft aligner version 7.470 (ref. 71). We then created a 75% completion matrix, that is, each alignment contains at least 75% of the taxa (55 species), resulting in 63 alignments from the acanthomorph set and 2,742 probes from the tetrapod set (all alignments are available on Figshare). A phylogenetic tree was built using IQ-TREE version 2.0.3 (ref. 72), with the appropriate substitution model inferred for each of the 2,805 alignments, a maximum likelihood tree search and 1,000 bootstrap replicates. To validate our tree, we also estimated a second tree based on a MultiZ alignment to the human genome and obtained similar results (Extended Data Fig. 9). The phylogenetic tree was calibrated to absolute time using the chronos function of the ‘ape’ package in R, with a smoothing parameter lambda of 0 and a ‘relaxed’ model73,74. Fourteen nodes were calibrated following previously published calibrations. The robustness of the tree was assessed by removing each node independently (see Extended Data Fig. 3).

  1. (1)

    Actinopterygii/Sarcopterygii: divergence time 416 million years ago (Ma), upper bound 425.4 Ma75

  2. (2)

    The first node in the Actinopterygii group: divergence time 378.2 Ma76

  3. (3)

    Sauropsida (birds and reptiles)/Synapsida (mammals): divergence time 313.4 Ma77

  4. (4)

    Archosauria (birds)/Testudines: divergence time 260 Ma78

  5. (5)

    The basal nodes of the Lepidosauria: divergence time 222.8 Ma79

  6. (6)

    First mammalian node, Eutheria/Metatheria: divergence time 160.7 Ma75

  7. (7)

    Galloanserae/Neoaves: divergence time 66 Ma77

  8. (8)

    Glire/Primates: divergence time 61.7 Ma77

  9. (9)

    Basal gekkotan node: divergence time 54 Ma80

  10. (10)

    Passeriformes/Psittaciformes: divergence time 51.81 Ma81

  11. (11)

    Cynoglossidae/Paralichthyidae: divergence time 50 Ma76

  12. (12)

    Sus scrofa/other Cetartiodactyla: divergence time 48.5 Ma77

  13. (13)

    Canidae/Arctoidea: divergence time 37.1 Ma75

  14. (14)

    Hominoidea/Cercopithecoidea: divergence time 23.5 Ma77

Mutational spectrum and sex bias

To analyse the spectrum of mutation, we grouped the trios into higher taxonomic levels, that is, mammals, birds, fishes and reptiles. Thus, the percentages reported are based on the total candidate mutations from each group of species. We explored the genomic context of the mutations from a C or a G base to determine whether they were located in CpG sites (respectively followed by a G or preceded by a C) (see Supplementary Table 4). We phased the DNMs to their parental origin using the read-backed phasing method described previously (GitHub: https://github.com/besenbacher/POOHA)82. This method uses the read-pairs containing a DNM and another heterozygous variant to determine the parental origin of the mutation when the heterozygous variant is present in both the offspring and one of the parents. The phasing allowed us to identify parental biases in the contribution of the DNMs by grouping multiple species to increase the number of phased mutations and obtain a minimum of 30 phased mutations per taxon. From this analysis, we omitted the Egyptian roussette (Rousettus aegyptiacus), Chinese tree shrew (Tupaia belangeri), griffon vulture (Gyps fulvus), blue-throated macaw (Ara glaucogularis), snowy owl (Bubo scandiacus) and Darwin’s rhea (Rhea pennata), as these could not be grouped with another monophyletic clade. To quantify the effect of parental age, a linear regression between the per-generation mutation rate and the average parental age at the time of reproduction was implemented using the lm function in R. Multiple linear regression was also used to identify whether paternal or maternal age was the strongest predictor of the empirical mutation rate.

Life-history trait analysis

We tested the effect of various life-history traits (fitted as continuous and discrete variables) on the yearly rate for each species using PGLS analysis in the R package ‘caper’83 (see Supplementary Table 9 for details about each life-history trait).

Effective population size

We used pairwise sequentially Markovian coalescent (PSMC) models to estimate the effective population size of each species84. Fastq sequences were obtained using bam format aligned sequences of one randomly selected father per species and were converted into fastq format using samtools mpileup command and vcf2fq. As recommended, the minimum depth was set to one-third of the average for the sample and twice the average for the maximum. For mammals, fish and reptiles, the parameters of the PSMC were set to –N25 for the maximum number of iterations of the algorithm, –t15 as the upper limit for the time to the most recent common ancestor, –r5 for the initial θ/ρ value, and finally the atomic intervals –p of ‘4 + 25 × 2 + 4 + 6’. These parameters were used previously for PSMC analysis of various species, including primates84,85, cetaceans86, Felidae87, fishes88,89 and turtles90. For birds, we used different parameters according to the literature with –N30 –t5 –r5 (ref. 91). Finally, to simulate the history inferred by PSMC, we parameterized the generation time and the mutation rate inferred from the UCE alignment. We then explored the effect of the harmonic mean Ne over windows of 30,000 years to 1,000,000 years. We also compared Ne estimated obtained with this method with those estimated based on Ne = π/4μ. Nucleotide diversity (π) was calculated using ANGSD92. This approach was implemented in three consecutive steps. From the alignment files, a global estimate of the site frequency spectrum was inferred using a maximum likelihood method, then the empirical π value was estimated per site, and finally, a sliding window approach was used to estimate π for each species. We used a window size of 50 kb and a step size of 10 kb together with an average pairwise estimation of the π to obtain global estimates of π. This analysis was restricted to unrelated individuals from each species, which corresponded to the 2 unrelated parents for 55 species, between 3 and 7 individuals for 10 species, and 3 species were excluded from this analysis as the parents were first-degree relatives.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.