Technical Reports

Filter By:

  • LeafCutter is a new tool that identifies variable intron splicing events from RNA-seq data for analysis of complex alternative splicing. The method does not require transcript annotation and can be used to map splicing quantitative trait loci.

    • Yang I. Li
    • David A. Knowles
    • Jonathan K. Pritchard
    Technical Report
  • Covariates for multiphenotype studies (CMS), a new approach for testing for associations from large-scale datasets, leverages genetic and environmental factors shared between correlated variables measured on the same samples. Applying CMS to real and simulated data demonstrates a large increase in power equivalent to that gained by doubling the sample size.

    • Hugues Aschard
    • Vincent Guillemot
    • Noah Zaitlen
    Technical Report
  • Graphtyper is a fast and scalable method for variant genotyping that aligns short-read sequence data to a pangenome. Graphtyper was able to accurately genotype ∼90 million sequence variants in the whole genomes of ∼28,000 Icelanders, including those in six HLA genes.

    • Hannes P Eggertsson
    • Hakon Jonsson
    • Bjarni V Halldorsson
    Technical Report
  • Adam Siepel and colleagues report a new computational method, LINSIGHT, that combines evolutionary conservation and functional genomic information to predict the fitness consequences of noncoding mutations in the human genome. They use LINSIGHT to show that fitness consequences of enhancer mutations depend on tissue and cell type specificity and promoter constraints.

    • Yi-Fei Huang
    • Brad Gulko
    • Adam Siepel
    Technical Report
  • Kun Zhang and colleagues present a metric called methylation haplotype load (MHL) that quantifies methylation patterns within blocks of tightly linked CpG dinucleotides. They show that the MHL can distinguish samples from different human somatic tissues and that it can be used to improve detection of cancer-derived circulating DNA and identify its tissue of origin.

    • Shicheng Guo
    • Dinh Diep
    • Kun Zhang
    Technical Report
  • Adam Phillippy, Curtis Van Tassell, Timothy Smith and colleagues present a new reference genome assembly for the domestic goat using a pipeline that improves contiguity of the assembly by more than 250-fold. The pipeline uses a combination of short- and long-read sequencing, optical mapping, and chromatin interaction mapping.

    • Derek M Bickhart
    • Benjamin D Rosen
    • Timothy P L Smith
    Technical ReportOpen Access
  • Stuart Orkin, Daniel Bauer and colleagues present DNA Striker, a computational tool to design variant-aware saturating-mutagenesis screens with multiple CRISPR-associated nucleases. They apply their methodology to the HBS1L-MYB intergenic region, which is associated with red-blood-cell traits, and identify putative regulatory elements that control MYB expression.

    • Matthew C Canver
    • Samuel Lessard
    • Stuart H Orkin
    Technical Report
  • James Liley, John Todd and Chris Wallace present a statistical method for determining whether disease-associated variants have different effect sizes in phenotypically defined subgroups of disease cases. The test can be combined with existing methods to determine whether genetic heterogeneity is driven by population stratification or by different mechanisms of disease pathology.

    • James Liley
    • John A Todd
    • Chris Wallace
    Technical Report
  • John Storey, David Blei and colleagues present a method, TeraStructure, for estimating population structure from human genomic data sets on a scale not possible with current methods. TeraStructure is able to analyze data from the Human Genome Diversity Panel and the 1000 Genomes Project in less than three hours.

    • Prem Gopalan
    • Wei Hao
    • John D Storey
    Technical Report
  • Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.

    • Po-Ru Loh
    • Petr Danecek
    • Alkes L Price
    Technical Report
  • Runjun Kumar, S. Joshua Swamidass and Ron Bose present an unsupervised parsimony-guided method, ParsSNP, for prioritizing candidate cancer driver mutations. They apply ParsSNP to a gastric cancer data set and predict potential driver mutations not detected by other methods, including truncations in known tumor-suppressor genes and previously confirmed drivers.

    • Runjun D Kumar
    • S Joshua Swamidass
    • Ron Bose
    Technical Report
  • Victoria Hore, Jonathan Marchini and colleagues present a method for multiple-tissue gene expression studies aimed at uncovering gene networks linked to genetic variation. They apply their method to RNA sequencing data from adipose, skin and lymphoblastoid cell lines and identify several biologically relevant gene networks with a genetic basis.

    • Victoria Hore
    • Ana Viñuela
    • Jonathan Marchini
    Technical Report
  • Richard Mott, Simon Myers and colleagues present a new imputation method, STITCH, which does not require genotyping arrays or high-quality reference panels. They use STITCH to accurately impute genotypes in both outbred laboratory mice and a sample human population directly from low-coverage (<2×) sequencing data.

    • Robert W Davies
    • Jonathan Flint
    • Richard Mott
    Technical Report
  • Po-Ru Loh, Pier Francesco Palamara and Alkes Price develop a new long-range phasing method, Eagle, that harnesses long, shared identical-by-descent tracts and can be applied to large outbred populations. They use Eagle to phase samples from the UK Biobank and find that it is faster and has better accuracy than existing methods.

    • Po-Ru Loh
    • Pier Francesco Palamara
    • Alkes L Price
    Technical Report
  • Jonathan Marchini and colleagues develop a new method for haplotype phasing, SHAPEIT3, capable of handling large data sets from biobanks containing >100,000 genotyped samples. They find that their method is fast and accurate, with a low switch error rate, and can be scaled to data sets from increasingly larger cohorts.

    • Jared O'Connell
    • Kevin Sharp
    • Jonathan Marchini
    Technical Report
  • Soumya Raychaudhuri, Buhm Han and colleagues present a statistical method to distinguish whether shared genetic risk variants among complex traits are driven by whole-group pleiotropy or a subset of individuals who constitute a genetically heterogeneous subgroup. They use the method to examine genetic sharing among autoimmune diseases and between major depressive disorder and schizophrenia and find that most genetic sharing cannot be explained by subgroup heterogeneity but that, in contrast, seronegative rheumatoid arthritis is a heterogeneous condition.

    • Buhm Han
    • Jennie G Pouget
    • Soumya Raychaudhuri
    Technical Report
  • Andy Dahl and colleagues present a method for imputing missing phenotype data in genetic studies with multiple correlated phenotypes where samples can have any level of relatedness. They apply their method to simulated and real data sets and show that it improves the sensitivity to detect association signals.

    • Andrew Dahl
    • Valentina Iotchkova
    • Jonathan Marchini
    Technical Report
  • Iuliana Ionita-Laza, Kenneth McCallum and colleagues developed an unsupervised statistical approach, Eigen, that integrates different functional annotations into a single measure of functional importance for coding and noncoding variants. Their meta-score can outperform the recently proposed CADD score and can be applied to fine-mapping studies.

    • Iuliana Ionita-Laza
    • Kenneth McCallum
    • Joseph D Buxbaum
    Technical Report