Featured
-
-
Article
| Open AccessStructured information extraction from scientific text with large language models
Extracting scientific data from published research is a complex task required specialised tools. Here the authors present a scheme based on large language models to automatise the retrieval of information from text in a flexible and accessible manner.
- John Dagdelen
- , Alexander Dunn
- & Anubhav Jain
-
Article
| Open AccessOverlay databank unlocks data-driven analyses of biomolecules for all
In this work, the authors report NMR lipids Databank to promote decentralised sharing of biomolecular molecular dynamics (MD) simulation data with an overlay design. Programmatic access enables analyses of rare phenomena and advances the training of machine learning models.
- Anne M. Kiirikki
- , Hanne S. Antila
- & O. H. Samuli Ollila
-
Article
| Open AccessA simulation-based analysis of the impact of rhetorical citations in science
Authors of scientific papers are generally discouraged from citing works that had no direct influence on their research. This paper uses simulations to show that such rhetorical citations may have underappreciated effects on the scientific community, such as deconcentrating attention away from already highly-cited papers.
- Honglin Bao
- & Misha Teplitskiy
-
Article
| Open Accessvcfdist: accurately benchmarking phased small variant calls in human genomes
Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human genome sequencing. Here, the authors show that current approaches are biased towards certain variant representations and develop a new approach to ensure consistent and accurate benchmarking, regardless of the original variant representations.
- Tim Dunn
- & Satish Narayanasamy
-
Article
| Open AccessExtracting medicinal chemistry intuition via preference machine learning
Over their careers, medicinal chemists develop a gut feeling for what is a promising molecule. Here, the authors use machine learning models to learn this intuition and show that it can be successfully applied in several drug discovery scenarios.
- Oh-Hyeon Choung
- , Riccardo Vianello
- & José Jiménez-Luna
-
Article
| Open AccesslesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation
Public proteomics data often lack essential metadata, limiting their potential. To address this, the authors developed lesSDRF, a tool to simplify the process of metadata annotation, thereby ensuring that data leave a lasting, impactful legacy well beyond their initial publication.
- Tine Claeys
- , Tim Van Den Bossche
- & Lennart Martens
-
Article
| Open AccessSimulation of undiagnosed patients with novel genetic conditions
Rare Mendelian disorders pose a major diagnostic challenge, but evaluation of automated tools that aim to uncover causal genes tools is limited. Here, the authors present a computational pipeline that simulates realistic clinical datasets to address this deficit.
- Emily Alsentzer
- , Samuel G. Finlayson
- & Isaac S. Kohane
-
Article
| Open AccessCD36 mediates SARS-CoV-2-envelope-protein-induced platelet activation and thrombosis
Aberrant coagulation and thrombosis are associated with severe SARS-CoV-2 infection. Here, the authors show that the E protein are associated with coagulation disorders in COVID-19 patients and could directly enhance platelet activation and thrombosis through a CD36/p38 MAPK/NF-kB signaling axis.
- Zihan Tang
- , Yanyan Xu
- & Tingting Liu
-
Article
| Open AccessAn open resource combining multi-contrast MRI and microscopy in the macaque brain
Linking microscale cellular structures to macroscale features of the brain is required to fully understand its structure and function. Here, the authors present a resource which combines multi-contrast microscopy and MRI of a single whole macaque brain to facilitate multimodal analyses.
- Amy F. D. Howard
- , Istvan N. Huszar
- & Karla L. Miller
-
Article
| Open AccessUncertainty in non-CO2 greenhouse gas mitigation contributes to ambiguity in global climate policy feasibility
The potential for the mitigation of global non-CO2 greenhouse gases is highly uncertain. Harmsen et al. estimate this uncertainty and show that it has large implications for the feasibility of reaching the Paris Climate Agreement targets.
- Mathijs Harmsen
- , Charlotte Tabak
- & Detlef van Vuuren
-
Comment
| Open AccessGreater genetic diversity is needed in human pluripotent stem cell models
While there are a growing number of human pluripotent stem cell repositories, genetic diversity remains limited in most collections and studies. Here, we discuss the importance of incorporating diverse ancestries in these models to improve equity and accelerate biological discovery.
- Sulagna Ghosh
- , Ralda Nehme
- & Lindy E. Barrett
-
Article
| Open AccessSystematic evidence and gap map of research linking food security and nutrition to mental health
There is a broad range of research available on the relationship between food security and mental health. Here the authors carry out a systematic mapping of evidence on food security and nutrition related to mental health and identifies trends in themes, setting, and study design over the 20 year period studied.
- Thalia M. Sparling
- , Megan Deeney
- & Suneetha Kadiyala
-
Comment
| Open AccessCrowd-sourcing observations of volcanic eruptions during the 2021 Fagradalsfjall and Cumbre Vieja events
This study explores the scientific potential of crowdsourced observations during volcanic eruptions, using the 2021 Fagradalsfjall (Iceland) and Cumbre Vieja (Canary Islands) events as case studies.
- Fabian B. Wadsworth
- , Edward W. Llewellin
- & Alejandro Polo Santabárbara
-
Article
| Open AccessThe 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data
This paper describes the ‘4DN Data Portal’ that hosts data generated by the 4D Nucleome network, including Hi-C and other chromatin conformation capture assays, as well as various sequencing-based and imaging-based assays. Raw data have been uniformly processed to increase comparability and the portal is implemented with visualization tools to browse the data without download.
- Sarah B. Reiff
- , Andrew J. Schroeder
- & Peter J. Park
-
Article
| Open AccessBAMboozle removes genetic variation from human sequence data for open data sharing
Transparent data sharing is central to scientific progress, but limited for human sequencing data because of patient privacy concerns. Here, the authors propose an approach that removes certain types of genetic information in sequencing data, without affecting count-based downstream analyses.
- Christoph Ziegenhain
- & Rickard Sandberg
-
Article
| Open AccessThe influence of decision-making in tree ring-based climate reconstructions
Tree rings are a crucial archive for Common Era climate reconstructions, but the degree to which methodological decisions influence outcomes is not well known. Here, the authors show how different approaches taken by 15 different groups influence the ensemble temperature reconstruction from the same data.
- Ulf Büntgen
- , Kathy Allen
- & Jan Esper
-
Article
| Open AccessPredictive performance of international COVID-19 mortality forecasting models
Forecasts of COVID-19 mortality have been critical inputs into a range of policies, and decision-makers need information about their predictive performance. Here, the authors gather a panel of global epidemiological models and assess their predictive performance across time and space.
- Joseph Friedman
- , Patrick Liu
- & Emmanuela Gakidou
-
Article
| Open AccessSarcoma classification by DNA methylation profiling
Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.
- Christian Koelsche
- , Daniel Schrimpf
- & Andreas von Deimling
-
Article
| Open AccessFramework for quality assessment of whole genome cancer sequences
Working with cancer genomes from multiple projects can increase investigative power, but quality of sequences can vary. Here, the authors present a framework for comparing whole genome sequencing quality to help researchers guide downstream analyses and exclude poor quality samples.
- Justin P. Whalley
- , Ivo Buchhalter
- & Ivo G. Gut
-
Article
| Open AccessRetrospective evaluation of whole exome and genome mutation calls in 746 cancer samples
With the generation of large pan-cancer whole-exome and whole-genome sequencing projects, a question remains about how comparable these datasets are. Here, using The Cancer Genome Atlas samples analysed as part of the Pan-Cancer Analysis of Whole Genomes project, the authors explore the concordance of mutations called by whole exome sequencing and whole genome sequencing techniques.
- Matthew H. Bailey
- , William U. Meyerson
- & Christian von Mering
-
Article
| Open AccessDifferent scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets
Schulz et al. systematically benchmark performance scaling with increasingly sophisticated prediction algorithms and with increasing sample size in reference machine-learning and biomedical datasets. Complicated nonlinear intervariable relationships remain largely inaccessible for predicting key phenotypes from typical brain scans.
- Marc-Andre Schulz
- , B. T. Thomas Yeo
- & Danilo Bzdok
-
Article
| Open AccessBlind spots in global soil biodiversity and ecosystem function research
Soil organism biodiversity contributes to ecosystem function, but biodiversity and function have not been equivalently studied across the globe. Here the authors identify locations, environment types, and taxonomic groups for which there is currently a lack of biodiversity and ecosystem function data in the existing literature.
- Carlos A. Guerra
- , Anna Heintz-Buschart
- & Nico Eisenhauer
-
Perspective
| Open AccessInferring causation from time series in Earth system sciences
Questions of causality are ubiquitous in Earth system sciences and beyond, yet correlation techniques still prevail. This Perspective provides an overview of causal inference methods, identifies promising applications and methodological challenges, and initiates a causality benchmark platform.
- Jakob Runge
- , Sebastian Bathiany
- & Jakob Zscheischler
-
Comment
| Open AccessThe problem with unadjusted multiple and sequential statistical testing
In research studies, the need for additional samples to obtain sufficient statistical power has often to be balanced with the experimental costs. One approach to this end is to sequentially collect data until you have sufficient measurements, e.g., when the p-value drops below 0.05. I outline that this approach is common, yet that unadjusted sequential sampling leads to severe statistical issues, such as an inflated rate of false positive findings. As a consequence, the results of such studies are untrustworthy. I identify the statistical methods that can be implemented in order to account for sequential sampling.
- Casper Albers
-
Article
| Open AccessWhy rankings of biomedical image analysis competitions should be interpreted with care
Biomedical image analysis challenges have increased in the last ten years, but common practices have not been established yet. Here the authors analyze 150 recent challenges and demonstrate that outcome varies based on the metrics used and that limited information reporting hampers reproducibility.
- Lena Maier-Hein
- , Matthias Eisenmann
- & Annette Kopp-Schneider
-
Article
| Open AccessAssessment of the impact of shared brain imaging data on the scientific literature
Data sharing is recognized as a way to promote scientific collaboration and reproducibility, but some are concerned over whether research based on shared data can achieve high impact. Here, the authors show that neuroimaging papers using shared data are no less likely to appear in top-ranked journals.
- Michael P. Milham
- , R. Cameron Craddock
- & Arno Klein