Research articles

Filter By:

Article Type
Year
  • James Liley, John Todd and Chris Wallace present a statistical method for determining whether disease-associated variants have different effect sizes in phenotypically defined subgroups of disease cases. The test can be combined with existing methods to determine whether genetic heterogeneity is driven by population stratification or by different mechanisms of disease pathology.

    • James Liley
    • John A Todd
    • Chris Wallace
    Technical Report
  • John Storey, David Blei and colleagues present a method, TeraStructure, for estimating population structure from human genomic data sets on a scale not possible with current methods. TeraStructure is able to analyze data from the Human Genome Diversity Panel and the 1000 Genomes Project in less than three hours.

    • Prem Gopalan
    • Wei Hao
    • John D Storey
    Technical Report
  • Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.

    • Po-Ru Loh
    • Petr Danecek
    • Alkes L Price
    Technical Report
  • Runjun Kumar, S. Joshua Swamidass and Ron Bose present an unsupervised parsimony-guided method, ParsSNP, for prioritizing candidate cancer driver mutations. They apply ParsSNP to a gastric cancer data set and predict potential driver mutations not detected by other methods, including truncations in known tumor-suppressor genes and previously confirmed drivers.

    • Runjun D Kumar
    • S Joshua Swamidass
    • Ron Bose
    Technical Report
  • Victoria Hore, Jonathan Marchini and colleagues present a method for multiple-tissue gene expression studies aimed at uncovering gene networks linked to genetic variation. They apply their method to RNA sequencing data from adipose, skin and lymphoblastoid cell lines and identify several biologically relevant gene networks with a genetic basis.

    • Victoria Hore
    • Ana Viñuela
    • Jonathan Marchini
    Technical Report
  • Richard Mott, Simon Myers and colleagues present a new imputation method, STITCH, which does not require genotyping arrays or high-quality reference panels. They use STITCH to accurately impute genotypes in both outbred laboratory mice and a sample human population directly from low-coverage (<2×) sequencing data.

    • Robert W Davies
    • Jonathan Flint
    • Richard Mott
    Technical Report
  • Po-Ru Loh, Pier Francesco Palamara and Alkes Price develop a new long-range phasing method, Eagle, that harnesses long, shared identical-by-descent tracts and can be applied to large outbred populations. They use Eagle to phase samples from the UK Biobank and find that it is faster and has better accuracy than existing methods.

    • Po-Ru Loh
    • Pier Francesco Palamara
    • Alkes L Price
    Technical Report
  • Jonathan Marchini and colleagues develop a new method for haplotype phasing, SHAPEIT3, capable of handling large data sets from biobanks containing >100,000 genotyped samples. They find that their method is fast and accurate, with a low switch error rate, and can be scaled to data sets from increasingly larger cohorts.

    • Jared O'Connell
    • Kevin Sharp
    • Jonathan Marchini
    Technical Report
  • Soumya Raychaudhuri, Buhm Han and colleagues present a statistical method to distinguish whether shared genetic risk variants among complex traits are driven by whole-group pleiotropy or a subset of individuals who constitute a genetically heterogeneous subgroup. They use the method to examine genetic sharing among autoimmune diseases and between major depressive disorder and schizophrenia and find that most genetic sharing cannot be explained by subgroup heterogeneity but that, in contrast, seronegative rheumatoid arthritis is a heterogeneous condition.

    • Buhm Han
    • Jennie G Pouget
    • Soumya Raychaudhuri
    Technical Report
  • Andy Dahl and colleagues present a method for imputing missing phenotype data in genetic studies with multiple correlated phenotypes where samples can have any level of relatedness. They apply their method to simulated and real data sets and show that it improves the sensitivity to detect association signals.

    • Andrew Dahl
    • Valentina Iotchkova
    • Jonathan Marchini
    Technical Report
  • Iuliana Ionita-Laza, Kenneth McCallum and colleagues developed an unsupervised statistical approach, Eigen, that integrates different functional annotations into a single measure of functional importance for coding and noncoding variants. Their meta-score can outperform the recently proposed CADD score and can be applied to fine-mapping studies.

    • Iuliana Ionita-Laza
    • Kenneth McCallum
    • Joseph D Buxbaum
    Technical Report