A primer on deep learning in genomics.
Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.
View details for PubMedID 30478442
Exploring patterns enriched in a dataset with contrastive principal component analysis
2018; 9: 2134
Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
View details for PubMedID 29849030
Model-Based Estimation of Respiratory Parameters from Capnography, With Application to Diagnosing Obstructive Lung Disease
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
2017; 64 (12): 2957–67
We use a single-alveolar-compartment model to describe the partial pressure of carbon dioxide in exhaled breath, as recorded in time-based capnography. Respiratory parameters are estimated using this model, and then related to the clinical status of patients with obstructive lung disease.Given appropriate assumptions, we derive an analytical solution of the model, describing the exhalation segment of the capnogram. This solution is parametrized by alveolar CO2 concentration, dead-space fraction, and the time constant associated with exhalation. These quantities are estimated from individual capnogram data on a breath-by-breath basis. The model is applied to analyzing datasets from normal (n = 24) and chronic obstructive pulmonary disease (COPD) (n = 22) subjects, as well as from patients undergoing methacholine challenge testing for asthma (n = 22).A classifier based on linear discriminant analysis in logarithmic coordinates, using estimated dead-space fraction and exhalation time constant as features, and trained on data from five normal and five COPD subjects, yielded an area under the receiver operating characteristic curve (AUC) of 0.99 in classifying the remaining 36 subjects as normal or COPD. Bootstrapping with 50 replicas yielded a 95% confidence interval of AUCs from 0.96 to 1.00. For patients undergoing methacholine challenge testing, qualitatively meaningful trends were observed in the parameter variations over the course of the test.A simple mechanistic model allows estimation of underlying respiratory parameters from the capnogram, and may be applied to diagnosis and monitoring of chronic and reversible obstructive lung disease.
View details for DOI 10.1109/TBME.2017.2699972
View details for Web of Science ID 000417722600020
View details for PubMedID 28475040