I am a statistician who works on methods for gaining insight into multivariate biological data, with the ultimate goal of unlocking the potential of big data to improve human health.
Doctor of Philosophy, Rice University (2013)
Bachelor of Arts, Harvard University (2005)
Chiara Sabatti, Postdoctoral Faculty Sponsor
Joint Bayesian variable and graph selection for regression models with network-structured predictors
STATISTICS IN MEDICINE
2016; 35 (7): 1017-1031
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival. Copyright © 2015 John Wiley & Sons, Ltd.
View details for DOI 10.1002/sim.6792
View details for Web of Science ID 000371680500005
View details for PubMedID 26514925
- Bayesian Inference of Multiple Gaussian Graphical Models JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2015; 110 (509): 159-174
Bayesian Graphical Network Analyses Reveal Complex Biological Interactions Specific to Alzheimer's Disease
JOURNAL OF ALZHEIMERS DISEASE
2015; 44 (3): 917-925
With different approaches to finding prognostic or diagnostic biomarkers for Alzheimer's disease (AD), many studies pursue only brief lists of biomarkers or disease specific pathways, potentially dismissing information from groups of correlated biomarkers. Using a novel Bayesian graphical network method, with data from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging, the aim of this study was to assess the biological connectivity between AD associated blood-based proteins. Briefly, three groups of protein markers (18, 37, and 48 proteins, respectively) were assessed for the posterior probability of biological connection both within and between clinical classifications. Clinical classification was defined in four groups: high performance healthy controls (hpHC), healthy controls (HC), participants with mild cognitive impairment (MCI), and participants with AD. Using the smaller group of proteins, posterior probabilities of network similarity between clinical classifications were very high, indicating no difference in biological connections between groups. Increasing the number of proteins increased the capacity to separate both hpHC and HC apart from the AD group (0 for complete separation, 1 for complete similarity), with posterior probabilities shifting from 0.89 for the 18 protein group, through to 0.54 for the 37 protein group, and finally 0.28 for the 48 protein group. Using this approach, we identified beta-2 microglobulin (β2M) as a potential master regulator of multiple proteins across all classifications, demonstrating that this approach can be used across many data sets to identify novel insights into diseases like AD.
View details for DOI 10.3233/JAD-141497
View details for Web of Science ID 000349616000020
View details for PubMedID 25613103
Characterization of biological pathways associated with a 1.37 Mbp genomic region protective of hypertension in Dahl S rats
2014; 46 (11): 398-410
The goal of the present study was to narrow a region of chromosome 13 to only several genes and then apply unbiased statistical approaches to identify molecular networks and biological pathways relevant to blood-pressure salt sensitivity in Dahl salt-sensitive (SS) rats. The analysis of 13 overlapping subcongenic strains identified a 1.37 Mbp region on chromosome 13 that influenced the mean arterial blood pressure by at least 25 mmHg in SS rats fed a high-salt diet. DNA sequencing and analysis filled genomic gaps and provided identification of five genes in this region, Rfwd2, Fam5b, Astn1, Pappa2, and Tnr. A cross-platform normalization of transcriptome data sets obtained from our previously published Affymetrix GeneChip dataset and newly acquired RNA-seq data from renal outer medullary tissue provided 90 observations for each gene. Two Bayesian methods were used to analyze the data: 1) a linear model analysis to assess 243 biological pathways for their likelihood to discriminate blood pressure levels across experimental groups and 2) a Bayesian graphical modeling of pathways to discover genes with potential relationships to the candidate genes in this region. As none of these five genes are known to be involved in hypertension, this unbiased approach has provided useful clues to be experimentally explored. Of these five genes, Rfwd2, the gene most strongly expressed in the renal outer medulla, was notably associated with pathways that can affect blood pressure via renal transcellular Na(+) and K(+) electrochemical gradients and tubular Na(+) transport, mitochondrial TCA cycle and cell energetics, and circadian rhythms.
View details for DOI 10.1152/physiolgenomics.00179.2013
View details for Web of Science ID 000336716200003
View details for PubMedID 24714719
Investigating Multiple Candidate Genes and Nutrients in the Folate Metabolism Pathway to Detect Genetic and Nutritional Risk Factors for Lung Cancer
2013; 8 (1)
Folate metabolism, with its importance to DNA repair, provides a promising region for genetic investigation of lung cancer risk. This project investigates genes (MTHFR, MTR, MTRR, CBS, SHMT1, TYMS), folate metabolism related nutrients (B vitamins, methionine, choline, and betaine) and their gene-nutrient interactions.We analyzed 115 tag single nucleotide polymorphisms (SNPs) and 15 nutrients from 1239 and 1692 non-Hispanic white, histologically-confirmed lung cancer cases and controls, respectively, using stochastic search variable selection (a Bayesian model averaging approach). Analyses were stratified by current, former, and never smoking status.Rs6893114 in MTRR (odds ratio [OR] = 2.10; 95% credible interval [CI]: 1.20-3.48) and alcohol (drinkers vs. non-drinkers, OR = 0.48; 95% CI: 0.26-0.84) were associated with lung cancer risk in current smokers. Rs13170530 in MTRR (OR = 1.70; 95% CI: 1.10-2.87) and two SNP*nutrient interactions [betaine*rs2658161 (OR = 0.42; 95% CI: 0.19-0.88) and betaine*rs16948305 (OR = 0.54; 95% CI: 0.30-0.91)] were associated with lung cancer risk in former smokers. SNPs in MTRR (rs13162612; OR = 0.25; 95% CI: 0.11-0.58; rs10512948; OR = 0.61; 95% CI: 0.41-0.90; rs2924471; OR = 3.31; 95% CI: 1.66-6.59), and MTHFR (rs9651118; OR = 0.63; 95% CI: 0.43-0.95) and three SNP*nutrient interactions (choline*rs10475407; OR = 1.62; 95% CI: 1.11-2.42; choline*rs11134290; OR = 0.51; 95% CI: 0.27-0.92; and riboflavin*rs8767412; OR = 0.40; 95% CI: 0.15-0.95) were associated with lung cancer risk in never smokers.This study identified possible nutrient and genetic factors related to folate metabolism associated with lung cancer risk, which could potentially lead to nutritional interventions tailored by smoking status to reduce lung cancer risk.
View details for DOI 10.1371/journal.pone.0053475
View details for Web of Science ID 000314021500014
View details for PubMedID 23372658
Inferring metabolic networks using the Bayesian adaptive graphical lasso with informative priors
Statistics and Its Interface
2013; 6 (4): 547-558
View details for DOI 10.4310/SII.2013.v6.n4.a12
Regularized Partial Least Squares with an Application to NMR Spectroscopy
Statistical Analysis and Data Mining
2013; 6 (4): 302-314
View details for DOI 10.1002/sam.11169