I am a statistician who works on methods for gaining insight into multivariate biological data, with the ultimate goal of unlocking the potential of big data to improve human health.

Professional Education

  • Doctor of Philosophy, Rice University (2013)
  • Bachelor of Arts, Harvard University (2005)

Stanford Advisors

All Publications

  • Joint Bayesian variable and graph selection for regression models with network-structured predictors STATISTICS IN MEDICINE Peterson, C. B., Stingo, F. C., Vannucci, M. 2016; 35 (7): 1017-1031


    In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival. Copyright © 2015 John Wiley & Sons, Ltd.

    View details for DOI 10.1002/sim.6792

    View details for Web of Science ID 000371680500005

    View details for PubMedID 26514925

  • Bayesian Inference of Multiple Gaussian Graphical Models JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Peterson, C., Stingo, F. C., Vannucci, M. 2015; 110 (509): 159-174
  • Bayesian Graphical Network Analyses Reveal Complex Biological Interactions Specific to Alzheimer's Disease JOURNAL OF ALZHEIMERS DISEASE Rembach, A., Stingo, F. C., Peterson, C., Vannucci, M., Do, K., Wilson, W. J., Macaulay, S. L., Ryan, T. M., Martins, R. N., Ames, D., Masters, C. L., Doecke, J. D. 2015; 44 (3): 917-925


    With different approaches to finding prognostic or diagnostic biomarkers for Alzheimer's disease (AD), many studies pursue only brief lists of biomarkers or disease specific pathways, potentially dismissing information from groups of correlated biomarkers. Using a novel Bayesian graphical network method, with data from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging, the aim of this study was to assess the biological connectivity between AD associated blood-based proteins. Briefly, three groups of protein markers (18, 37, and 48 proteins, respectively) were assessed for the posterior probability of biological connection both within and between clinical classifications. Clinical classification was defined in four groups: high performance healthy controls (hpHC), healthy controls (HC), participants with mild cognitive impairment (MCI), and participants with AD. Using the smaller group of proteins, posterior probabilities of network similarity between clinical classifications were very high, indicating no difference in biological connections between groups. Increasing the number of proteins increased the capacity to separate both hpHC and HC apart from the AD group (0 for complete separation, 1 for complete similarity), with posterior probabilities shifting from 0.89 for the 18 protein group, through to 0.54 for the 37 protein group, and finally 0.28 for the 48 protein group. Using this approach, we identified beta-2 microglobulin (β2M) as a potential master regulator of multiple proteins across all classifications, demonstrating that this approach can be used across many data sets to identify novel insights into diseases like AD.

    View details for DOI 10.3233/JAD-141497

    View details for Web of Science ID 000349616000020

    View details for PubMedID 25613103

  • Characterization of biological pathways associated with a 1.37 Mbp genomic region protective of hypertension in Dahl S rats PHYSIOLOGICAL GENOMICS Cowley, A. W., Moreno, C., Jacob, H. J., Peterson, C. B., Stingo, F. C., Ahn, K. W., Liu, P., Vannucci, M., Laud, P. W., Reddy, P., Lazar, J., Evans, L., Yang, C., Kurth, T., Liang, M. 2014; 46 (11): 398-410


    The goal of the present study was to narrow a region of chromosome 13 to only several genes and then apply unbiased statistical approaches to identify molecular networks and biological pathways relevant to blood-pressure salt sensitivity in Dahl salt-sensitive (SS) rats. The analysis of 13 overlapping subcongenic strains identified a 1.37 Mbp region on chromosome 13 that influenced the mean arterial blood pressure by at least 25 mmHg in SS rats fed a high-salt diet. DNA sequencing and analysis filled genomic gaps and provided identification of five genes in this region, Rfwd2, Fam5b, Astn1, Pappa2, and Tnr. A cross-platform normalization of transcriptome data sets obtained from our previously published Affymetrix GeneChip dataset and newly acquired RNA-seq data from renal outer medullary tissue provided 90 observations for each gene. Two Bayesian methods were used to analyze the data: 1) a linear model analysis to assess 243 biological pathways for their likelihood to discriminate blood pressure levels across experimental groups and 2) a Bayesian graphical modeling of pathways to discover genes with potential relationships to the candidate genes in this region. As none of these five genes are known to be involved in hypertension, this unbiased approach has provided useful clues to be experimentally explored. Of these five genes, Rfwd2, the gene most strongly expressed in the renal outer medulla, was notably associated with pathways that can affect blood pressure via renal transcellular Na(+) and K(+) electrochemical gradients and tubular Na(+) transport, mitochondrial TCA cycle and cell energetics, and circadian rhythms.

    View details for DOI 10.1152/physiolgenomics.00179.2013

    View details for Web of Science ID 000336716200003

    View details for PubMedID 24714719

  • Investigating Multiple Candidate Genes and Nutrients in the Folate Metabolism Pathway to Detect Genetic and Nutritional Risk Factors for Lung Cancer PLOS ONE Swartz, M. D., Peterson, C. B., Lupo, P. J., Wu, X., Forman, M. R., Spitz, M. R., Hernandez, L. M., Vannucci, M., Shete, S. 2013; 8 (1)


    Folate metabolism, with its importance to DNA repair, provides a promising region for genetic investigation of lung cancer risk. This project investigates genes (MTHFR, MTR, MTRR, CBS, SHMT1, TYMS), folate metabolism related nutrients (B vitamins, methionine, choline, and betaine) and their gene-nutrient interactions.We analyzed 115 tag single nucleotide polymorphisms (SNPs) and 15 nutrients from 1239 and 1692 non-Hispanic white, histologically-confirmed lung cancer cases and controls, respectively, using stochastic search variable selection (a Bayesian model averaging approach). Analyses were stratified by current, former, and never smoking status.Rs6893114 in MTRR (odds ratio [OR] = 2.10; 95% credible interval [CI]: 1.20-3.48) and alcohol (drinkers vs. non-drinkers, OR = 0.48; 95% CI: 0.26-0.84) were associated with lung cancer risk in current smokers. Rs13170530 in MTRR (OR = 1.70; 95% CI: 1.10-2.87) and two SNP*nutrient interactions [betaine*rs2658161 (OR = 0.42; 95% CI: 0.19-0.88) and betaine*rs16948305 (OR = 0.54; 95% CI: 0.30-0.91)] were associated with lung cancer risk in former smokers. SNPs in MTRR (rs13162612; OR = 0.25; 95% CI: 0.11-0.58; rs10512948; OR = 0.61; 95% CI: 0.41-0.90; rs2924471; OR = 3.31; 95% CI: 1.66-6.59), and MTHFR (rs9651118; OR = 0.63; 95% CI: 0.43-0.95) and three SNP*nutrient interactions (choline*rs10475407; OR = 1.62; 95% CI: 1.11-2.42; choline*rs11134290; OR = 0.51; 95% CI: 0.27-0.92; and riboflavin*rs8767412; OR = 0.40; 95% CI: 0.15-0.95) were associated with lung cancer risk in never smokers.This study identified possible nutrient and genetic factors related to folate metabolism associated with lung cancer risk, which could potentially lead to nutritional interventions tailored by smoking status to reduce lung cancer risk.

    View details for DOI 10.1371/journal.pone.0053475

    View details for Web of Science ID 000314021500014

    View details for PubMedID 23372658

  • Inferring metabolic networks using the Bayesian adaptive graphical lasso with informative priors Statistics and Its Interface Peterson, C., Vannucci, M., Karakas, C., Choi, W., Ma, L., Maletic-Savatic, M. 2013; 6 (4): 547-558
  • Regularized Partial Least Squares with an Application to NMR Spectroscopy Statistical Analysis and Data Mining Allen, G., Peterson, C., Vannucci, M., Maletic-Savatic, M. 2013; 6 (4): 302-314

    View details for DOI 10.1002/sam.11169