Michael Schmitt and Jesse Salk picked a tough problem to tackle. Working in the laboratory of Lawrence Loeb at the University of Washington, they wanted to identify the rate of low-frequency mutations in cancer cells. Salk, a medical resident, explains, “Cancer is a disease highly related to genetic heterogeneity; there is a lot of underlying variation that confers resistance to therapy.”

The immediate obstacle they ran into was that high-throughput sequencing platforms have an error rate that is orders of magnitude too high to detect many naturally occurring variants. Illumina's HiSeq, for example, introduces one error in every 1,000 bases, whereas the somatic mutation rate is estimated to lie between 1 in 108 and 1 in 1011.

Over the years, a number of other researchers have developed techniques to lower the error rate of high-throughput sequencing, either computationally—by improving the quality of base calls—or experimentally—by tagging all reads derived from the same original fragment and discarding any changes that are not seen in all reads with the same tag. These approaches can reduce the error rate to at best 10−5, but this is still not accurate enough for Schmitt and Salk's question.

Salk recalls being inspired by nature's ways of error correction. “There is reciprocally stored information in DNA,” he says. “If there is an error in one strand that does not match the other side, it is recognized and corrected.” They wanted to use the same approach to distinguish artifacts from real mutations and thus enlisted the help of Scott Kennedy, a postdoctoral fellow in the Loeb lab, to work out the computational and experimental problems.

The team's solution sounds deceptively simple but took some trial and error to bring to fruition. Each double-stranded DNA fragment is labeled on both ends with a duplex tag (a random, double-stranded, complementary adaptor) prior to single-strand sequencing. The resulting reads can then by grouped in two ways: by clustering single-strand sequences with the same tags to derive the single-strand consensus sequence—an approach similar to previous error-correction methods—or by matching the two complementary strands among single-strand consensus clusters to get the duplex consensus sequence. A mutation is only counted as real if it is supported by both strands.

In a first proof-of-principle experiment, the researchers sequenced a 7-kilobase bacteriophage genome with a mutation frequency of 3 × 10−6. Their single-strand consensus sequences showed a mutation rate of 3.4 × 10−5, indicating that 90% of all mutations seen were still artifacts. Their duplex consensus sequence, however, resulted in a rate of 2.5 × 10−6, very close to the actual frequency.

The accuracy of duplex sequencing is impressive, but it requires considerable sequencing depth. To accurately quantify rare mutations, one needs to sequence an order of magnitude below the mutation frequency. For the M13 phage genome, this is doable; for an entire human genome, it would be very expensive. Instead, Schmitt and Kennedy applied duplex sequencing to the human mitochondrial genome, showing that the majority of mutations occurs at the origin of replication. Similarly, exomes or a select number of genes should be within reach of the method.

Loeb plans to use duplex sequencing on a few areas in a tumor genome. “So far the predominant mutations that are measured with sequencing are clonal,” says Loeb, “but now I have the opportunity to measure [mutation frequency] subclonally and random[ly]. We can now look at cells with a great level of precision that has not been possible before.”

Rare variants are of interest not only to the cancer field. Loeb and his colleagues also want to explore their effect in aging and neurodegeneration and seek rare variants in mixed microbial populations.

Now that the idea is developed, it's easy to plug it into existing pipelines and pursue rare variants in any system of choice.