Research articles

A method for identifying genetic heterogeneity within phenotypically defined disease subgroups

James Liley, John Todd and Chris Wallace present a statistical method for determining whether disease-associated variants have different effect sizes in phenotypically defined subgroups of disease cases. The test can be combined with existing methods to determine whether genetic heterogeneity is driven by population stratification or by different mechanisms of disease pathology.

James Liley
John A Todd
Chris Wallace
Technical Report26 Dec 2016
Robust and scalable inference of population history from hundreds of unphased whole genomes

Yun Song and colleagues present SMC++, a statistical method for population history inference capable of analyzing unphased whole genomes and sample sizes much larger than can be analyzed by current methods. The authors apply SMC++ to sequence data from human, Drosophila and finch populations.

Jonathan Terhorst
John A Kamm
Yun S Song
Technical Report26 Dec 2016
Scaling probabilistic models of genetic variation to millions of humans

John Storey, David Blei and colleagues present a method, TeraStructure, for estimating population structure from human genomic data sets on a scale not possible with current methods. TeraStructure is able to analyze data from the Human Genome Diversity Panel and the 1000 Genomes Project in less than three hours.

Prem Gopalan
Wei Hao
John D Storey
Technical Report07 Nov 2016
M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Gill Bejerano and colleagues present M-CAP, a classifier that estimates variant pathogenicity in clinical exome data sets. They show that M-CAP outperforms other existing methods at all thresholds and correctly dismisses 60% of rare missense variants of uncertain significance at 95% sensitivity.

Karthik A Jagadeesh
Aaron M Wenger
Gill Bejerano
Technical Report24 Oct 2016
Reference-based phasing using the Haplotype Reference Consortium panel

Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.

Po-Ru Loh
Petr Danecek
Alkes L Price
Technical Report03 Oct 2016
Unsupervised detection of cancer driver mutations with parsimony-guided learning

Runjun Kumar, S. Joshua Swamidass and Ron Bose present an unsupervised parsimony-guided method, ParsSNP, for prioritizing candidate cancer driver mutations. They apply ParsSNP to a gastric cancer data set and predict potential driver mutations not detected by other methods, including truncations in known tumor-suppressor genes and previously confirmed drivers.

Runjun D Kumar
S Joshua Swamidass
Ron Bose
Technical Report12 Sept 2016
Tensor decomposition for multiple-tissue gene expression experiments

Victoria Hore, Jonathan Marchini and colleagues present a method for multiple-tissue gene expression studies aimed at uncovering gene networks linked to genetic variation. They apply their method to RNA sequencing data from adipose, skin and lymphoblastoid cell lines and identify several biologically relevant gene networks with a genetic basis.

Victoria Hore
Ana Viñuela
Jonathan Marchini
Technical Report01 Aug 2016
Rapid genotype imputation from sequence without reference panels

Richard Mott, Simon Myers and colleagues present a new imputation method, STITCH, which does not require genotyping arrays or high-quality reference panels. They use STITCH to accurately impute genotypes in both outbred laboratory mice and a sample human population directly from low-coverage (<2×) sequencing data.

Robert W Davies
Jonathan Flint
Richard Mott
Technical Report04 Jul 2016
Fast and accurate long-range phasing in a UK Biobank cohort

Po-Ru Loh, Pier Francesco Palamara and Alkes Price develop a new long-range phasing method, Eagle, that harnesses long, shared identical-by-descent tracts and can be applied to large outbred populations. They use Eagle to phase samples from the UK Biobank and find that it is faster and has better accuracy than existing methods.

Po-Ru Loh
Pier Francesco Palamara
Alkes L Price
Technical Report06 Jun 2016
Haplotype estimation for biobank-scale data sets

Jonathan Marchini and colleagues develop a new method for haplotype phasing, SHAPEIT3, capable of handling large data sets from biobanks containing >100,000 genotyped samples. They find that their method is fast and accurate, with a low switch error rate, and can be scaled to data sets from increasingly larger cohorts.

Jared O'Connell
Kevin Sharp
Jonathan Marchini
Technical Report06 Jun 2016
A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases

Soumya Raychaudhuri, Buhm Han and colleagues present a statistical method to distinguish whether shared genetic risk variants among complex traits are driven by whole-group pleiotropy or a subset of individuals who constitute a genetically heterogeneous subgroup. They use the method to examine genetic sharing among autoimmune diseases and between major depressive disorder and schizophrenia and find that most genetic sharing cannot be explained by subgroup heterogeneity but that, in contrast, seronegative rheumatoid arthritis is a heterogeneous condition.

Buhm Han
Jennie G Pouget
Soumya Raychaudhuri
Technical Report16 May 2016
A multiple-phenotype imputation method for genetic studies

Andy Dahl and colleagues present a method for imputing missing phenotype data in genetic studies with multiple correlated phenotypes where samples can have any level of relatedness. They apply their method to simulated and real data sets and show that it improves the sensitivity to detect association signals.

Andrew Dahl
Valentina Iotchkova
Jonathan Marchini
Technical Report22 Feb 2016
A spectral approach integrating functional genomic annotations for coding and noncoding variants

Iuliana Ionita-Laza, Kenneth McCallum and colleagues developed an unsupervised statistical approach, Eigen, that integrates different functional annotations into a single measure of functional importance for coding and noncoding variants. Their meta-score can outperform the recently proposed CADD score and can be applied to fine-mapping studies.

Iuliana Ionita-Laza
Kenneth McCallum
Joseph D Buxbaum
Technical Report04 Jan 2016

Research articles

A method for identifying genetic heterogeneity within phenotypically defined disease subgroups

Robust and scalable inference of population history from hundreds of unphased whole genomes

Scaling probabilistic models of genetic variation to millions of humans

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Reference-based phasing using the Haplotype Reference Consortium panel

Unsupervised detection of cancer driver mutations with parsimony-guided learning

Tensor decomposition for multiple-tissue gene expression experiments

Rapid genotype imputation from sequence without reference panels

Fast and accurate long-range phasing in a UK Biobank cohort

Haplotype estimation for biobank-scale data sets

A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases

A multiple-phenotype imputation method for genetic studies

A spectral approach integrating functional genomic annotations for coding and noncoding variants

Search

Quick links

Research articles

Filter By:

Search

Quick links