Bio


Trained in the French school of Data Analysis in Montpellier, Susan Holmes has been working in non parametric multivariate statistics applied to Biology since 1985. She has taught at MIT, Harvard and was an Associate Professor of Biometry at Cornell before moving to Stanford in 1998. She created the Thinking Matters class: Breaking Codes and Finding patterns and likes working on big messy data sets, mostly from the areas of Immunology, Cancer Biology and Microbial Ecology. Her theoretical interests include applied probability, MCMC (Monte Carlo Markov chains), Graph Limit Theory, Differential Geometry and the topology of the space of Phylogenetic Trees. She wrote the book Modern Statistics for Modern Biology with Wolfgang Huber from EMBL and teaches the material as a crash course (BIOS221) regularly every year. Her current focus is improving the statistical analyses and reproducibility of data in perturbation studies of the Human Microbiome.

Administrative Appointments


  • CoDirector, Mathematical and Computational Sciences IDP (2002 - 2017)
  • Director, VIGRE program in Statistics (2005 - 2014)
  • Professor, Member, BioX (2003 - Present)

Honors & Awards


  • CASBS Fellow, Center for the Advanced study of the Behavioral Sciences (2017-2018)
  • Breiman Lecturer, N(eur)IPS (December, 2016)
  • Fellow, Fields Institute in Mathematical Sciences, Toronto, Canada (2015)
  • Director's Transformative Research Award, NIH (2013)
  • John Henry Samter University Fellow in Undergraduate Education, Stanford (2012)
  • Fellow of the Institute of Mathematical Statistics, IMS (2005)

Boards, Advisory Committees, Professional Organizations


  • Member and Chair (2010-2012), Science Board, NimBioS (2007 - 2012)
  • Science Boards, Fields Research Institute in the Mathematical Sciences (2012 - 2015)
  • Council Member, Institute of Mathematical Statistics (2018 - 2021)
  • Scientific Advisory Board, Sydney Mathematics Research Institute (2018 - Present)

Program Affiliations


  • Symbolic Systems Program

Current Research and Scholarly Interests


Our work focuses on large heterogeneous multi-layer data analyses. Whether using image analysis and segmentation for the study of cancer and immune cell interactions, or brain imaging and DNA sequence analyses for the study of dependencies between genetic and neurological dynamics, all these statistical studies have involved large complex datasets of different types where dynamics of interactions between different components of a system are the key to understanding the underlying biology.

We have generalized methods such as Principal Components Analysis (PCA) to more diverse data incorporating spatial information as well as tree dependency structures. This has proved useful in the study of drug resistant mutations in HIV and in the study of the dynamics of bacterial communities in the Human Microbiome.

The statistical bases for these nonparametric methods are computer intensive methods using optimization and Kernels and we often find useful embeddings of high dimensional data in low dimensional structures, the extreme case being finding a natural ordering in high dimensional data. More general manifolds have also proved useful in one of our current projects, joint with Xavier Pennec of INRIA-SophiaAntiolis which focuses on the uses of differential geometry in computational anatomy and image processing.

In a long term collaboration with Professor David Relman (Stanford Medical School) we are developing a multi-table toolbox of non parametric methods that enable users to normalize and visualize the multiple facets of the microbiota in the human body under different classes of perturbations. The tools developed in this project are all open source packages developed in R and provide an example of reproducible research in action.

Projects


  • Study of the dynamics of the human microbiome., Stanford University (March 1, 2013 - 2018)

    We are using statistical multivariate methods we are devloping new methods for modeling and visualizing the dynamics of bacterial community networks in the human microbiome.

    Location

    United States

    Collaborators

  • Hierarchical Testing, Stanford University (8/1/2012)

    Develop tools for testing evokutionary signals in bacterial data.

    Location

    Stanford

    Collaborators

    • Alfred Spormann, Independent Labs, Institutes, and Centers (Dean of Research)
  • Multivariate Data Analysis of Drug Resistance in HIV, Stanford University (9/1/2003)

    We use phylogenetic trees, networks and multivariate analyses to deompose the complexities of drug resistance in HIV.

    Location

    Stanford

    Collaborators

  • Multiway Data Analysis for the Human Microbiome, Stanford University (11/1/2013 - 10/31/2018)

    Combining 16S sRNA data, metabolic and transcriptomic data to predict resilience in the human microbiome after perturbations.

    Location

    Stanford

    Collaborators

    • David Relman, Professor, Stanford

2023-24 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • Sub-communities of the vaginal microbiota in pregnant and non-pregnant women. Proceedings. Biological sciences Symul, L., Jeganathan, P., Costello, E. K., France, M., Bloom, S. M., Kwon, D. S., Ravel, J., Relman, D. A., Holmes, S. 2023; 290 (2011): 20231461

    Abstract

    Diverse and non-Lactobacillus-dominated vaginal microbial communities are associated with adverse health outcomes such as preterm birth and the acquisition of sexually transmitted infections. Despite the importance of recognizing and understanding the key risk-associated features of these communities, their heterogeneous structure and properties remain ill-defined. Clustering approaches are commonly used to characterize vaginal communities, but they lack sensitivity and robustness in resolving substructures and revealing transitions between potential sub-communities. Here, we address this need with an approach based on mixed membership topic models. Using longitudinal data from cohorts of pregnant and non-pregnant study participants, we show that topic models more accurately describe sample composition, longitudinal changes, and better predict the loss of Lactobacillus dominance. We identify several non-Lactobacillus-dominated sub-communities common to both cohorts and independent of reproductive status. In non-pregnant individuals, we find that the menstrual cycle modulates transitions between and within sub-communities, as well as the concentrations of half of the cytokines and 18% of metabolites. Overall, our analyses based on mixed membership models reveal substructures of vaginal ecosystems which may have important clinical and biological associations.

    View details for DOI 10.1098/rspb.2023.1461

    View details for PubMedID 38018105

  • Comparative analysis of cell-cell communication at single-cell resolution. Nature biotechnology Wilk, A. J., Shalek, A. K., Holmes, S., Blish, C. A. 2023

    Abstract

    Inference of cell-cell communication from single-cell RNA sequencing data is a powerful technique to uncover intercellular communication pathways, yet existing methods perform this analysis at the level of the cell type or cluster, discarding single-cell-level information. Here we present Scriabin, a flexible and scalable framework for comparative analysis of cell-cell communication at single-cell resolution that is performed without cell aggregation or downsampling. We use multiple published atlas-scale datasets, genetic perturbation screens and direct experimental validation to show that Scriabin accurately recovers expected cell-cell communication edges and identifies communication networks that can be obscured by agglomerative methods. Additionally, we use spatial transcriptomic data to show that Scriabin can uncover spatial features of interaction from dissociated data alone. Finally, we demonstrate applications to longitudinal datasets to follow communication pathways operating between timepoints. Our approach represents a broadly applicable strategy to reveal the full structure of niche-phenotype relationships in health and disease.

    View details for DOI 10.1038/s41587-023-01782-z

    View details for PubMedID 37169965

    View details for PubMedCentralID 8104132

  • Generative Models: An Interdisciplinary Perspective ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION Sankaran, K., Holmes, S. P. 2023; 10: 325-352
  • Latent variable modeling for the microbiome BIOSTATISTICS Sankaran, K., Holmes, S. P. 2019; 20 (4): 599–614
  • Ten quick tips for effective dimensionality reduction. PLoS computational biology Nguyen, L. H., Holmes, S. n. 2019; 15 (6): e1006907

    View details for DOI 10.1371/journal.pcbi.1006907

    View details for PubMedID 31220072

  • TRACKING NETWORK DYNAMICS: A SURVEY USING GRAPH DISTANCES ANNALS OF APPLIED STATISTICS Donnat, C., Holmes, S. 2018; 12 (2): 971–1012
  • Bayesian Nonparametric Ordination for the Analysis of Microbial Communities JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Ren, B., Bacallado, S., Favaro, S., Holmes, S., Trippa, L. 2017; 112 (520): 1430–42

    Abstract

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

    View details for PubMedID 29430070

    View details for PubMedCentralID PMC5804367

  • DADA2: High-resolution sample inference from Illumina amplicon data. Nature methods Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J., Holmes, S. P. 2016; 13 (7): 581-583

    Abstract

    We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

    View details for DOI 10.1038/nmeth.3869

    View details for PubMedID 27214047

    View details for PubMedCentralID PMC4927377

  • Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology McMurdie, P. J., Holmes, S. 2014; 10 (4)

    Abstract

    Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

    View details for DOI 10.1371/journal.pcbi.1003531

    View details for PubMedID 24699258

  • Pro-inflammatory feedback loops define immune responses to pathogenic Lentivirus infection. Genome medicine Wilk, A. J., Marceau, J. O., Kazer, S. W., Fleming, I., Miao, V. N., Galvez-Reyes, J., Kimata, J. T., Shalek, A. K., Holmes, S., Overbaugh, J., Blish, C. A. 2024; 16 (1): 24

    Abstract

    BACKGROUND: The Lentivirus human immunodeficiency virus (HIV) causes chronic inflammation and AIDS in humans, with variable rates of disease progression between individuals driven by both host and viral factors. Similarly, simian lentiviruses vary in their pathogenicity based on characteristics of both the host species and the virus strain, yet the immune underpinnings that drive differential Lentivirus pathogenicity remain incompletely understood.METHODS: We profile immune responses in a unique model of differential lentiviral pathogenicity where pig-tailed macaques are infected with highly genetically similar variants of SIV that differ in virulence. We apply longitudinal single-cell transcriptomics to this cohort, along with single-cell resolution cell-cell communication techniques, to understand the immune mechanisms underlying lentiviral pathogenicity.RESULTS: Compared to a minimally pathogenic lentiviral variant, infection with a highly pathogenic variant results in a more delayed, broad, and sustained activation of inflammatory pathways, including an extensive global interferon signature. Conversely, individual cells infected with highly pathogenic Lentivirus upregulated fewer interferon-stimulated genes at a lower magnitude, indicating that highly pathogenic Lentivirus has evolved to partially escape from interferon responses. Further, we identify CXCL10 and CXCL16 as important molecular drivers of inflammatory pathways specifically in response to highly pathogenic Lentivirus infection. Immune responses to highly pathogenic Lentivirus infection are characterized by amplifying regulatory circuits of pro-inflammatory cytokines with dense longitudinal connectivity.CONCLUSIONS: Our work presents a model of lentiviral pathogenicity where failures in early viral control mechanisms lead to delayed, sustained, and amplifying pro-inflammatory circuits, which in turn drives disease progression.

    View details for DOI 10.1186/s13073-024-01290-y

    View details for PubMedID 38317183

  • Longitudinal gut microbiota composition of South African and Nigerian infants in relation to tetanus vaccine responses. Microbiology spectrum Iwase, S. C., Osawe, S., Happel, A. -., Gray, C. M., Holmes, S. P., Blackburn, J. M., Abimiku, A., Jaspan, H. B. 2024: e0319023

    Abstract

    Gut microbiota plays an essential role in immune system development. Since infants HIV-exposed and uninfected (iHEU) are more vulnerable to infectious diseases than unexposed infants, we explored the impact of HIV exposure on gut microbiota and its association with vaccine responses. This study was conducted in two African countries with rapidly increasing numbers of iHEU. Infant HIV exposure did not substantially affect gut microbial succession, but geographic location had a strong effect. However, both the relative abundance of specific gut microbes and HIV exposure were independently associated with tetanus titers, which were also influenced by baseline tetanus titers (maternal transfer). Our findings provide insight into the effect of HIV exposure, passive maternal antibody, and gut microbiota on infant humoral vaccine responses.

    View details for DOI 10.1128/spectrum.03190-23

    View details for PubMedID 38230936

  • Abrupt perturbation and delayed recovery of the vaginal ecosystem following childbirth. Nature communications Costello, E. K., DiGiulio, D. B., Robaczewska, A., Symul, L., Wong, R. J., Shaw, G. M., Stevenson, D. K., Holmes, S. P., Kwon, D. S., Relman, D. A. 2023; 14 (1): 4141

    Abstract

    The vaginal ecosystem is closely tied to human health and reproductive outcomes, yet its dynamics in the wake of childbirth remain poorly characterized. Here, we profile the vaginal microbiota and cytokine milieu of participants sampled longitudinally throughout pregnancy and for at least one year postpartum. We show that delivery, regardless of mode, is associated with a vaginal pro-inflammatory cytokine response and the loss of Lactobacillus dominance. By contrast, neither the progression of gestation nor the approach of labor strongly altered the vaginal ecosystem. At 9.5-months postpartum-the latest timepoint at which cytokines were assessed-elevated inflammation coincided with vaginal bacterial communities that had remained perturbed (highly diverse) from the time of delivery. Time-to-event analysis indicated a one-year postpartum probability of transitioning to Lactobacillus dominance of 49.4%. As diversity and inflammation declined during the postpartum period, dominance by L. crispatus, the quintessential health-associated commensal, failed to return: its prevalence before, immediately after, and one year after delivery was 41%, 4%, and 9%, respectively. Revisiting our pre-delivery data, we found that a prior live birth was associated with a lower odds of L. crispatus dominance in pregnant participants-an outcome modestly tempered by a longer ( > 18-month) interpregnancy interval. Our results suggest that reproductive history and childbirth in particular remodel the vaginal ecosystem and that the timing and degree of recovery from delivery may help determine the subsequent health of the woman and of future pregnancies.

    View details for DOI 10.1038/s41467-023-39849-9

    View details for PubMedID 37438386

    View details for PubMedCentralID 4355684

  • Longitudinal gut microbiota composition of South African and Nigerian infants in relation to tetanus vaccine responses. Research square Iwase, S. C., Jaspan, H. B., Happel, A. U., Holmes, S. P., Abimiku, A., Osawe, S., Gray, C. M., Blackburn, J. M. 2023

    Abstract

    Infants who are exposed to HIV but uninfected (iHEU) have higher risk of infectious morbidity than infants who are HIV-unexposed and uninfected (iHUU), possibly due to altered immunity. As infant gut microbiota may influence immune development, we evaluated the effects of HIV exposure on infant gut microbiota and its association with tetanus toxoid (TT) vaccine responses.We evaluated gut microbiota by 16S rRNA gene sequencing in 278 South African and Nigerian infants during the first and at 15 weeks of life and measured antibodies against TT vaccine by enzyme-linked immunosorbent assay (ELISA) at matched time points.Infant gut microbiota and its succession were more strongly influenced by geographical location and age than by HIV exposure. Microbiota of Nigerian infants drastically changed over 15 weeks, becoming dominated by Bifidobacterium longum subspecies infantis. This change was not observed among EBF South African infants. Lasso regression suggested that HIV exposure and gut microbiota were independently associated with TT vaccine responses at week 15, and that high passive antibody levels may mitigate these effects.In two African cohorts, HIV exposure minimally altered the infant gut microbiota compared to age and country, but both specific gut microbes and HIV exposure independently predicted humoral vaccine responses.

    View details for DOI 10.21203/rs.3.rs-3112263/v1

    View details for PubMedID 37461449

    View details for PubMedCentralID PMC10350179

  • Modeling the heterogeneity in COVID-19's reproductive number and its impact on predictive scenarios. Journal of applied statistics Donnat, C., Holmes, S. 2023; 50 (11-12): 2518-2546

    Abstract

    The correct evaluation of the reproductive number R for COVID-19 is central in the quantification of the potential scope of the pandemic and the selection of an appropriate course of action. In most models, R is modeled as a constant - effectively averaging out the inherent variability of the transmission process due to varying individual contact rates, population densities, or temporal factors amongst many. Yet, due to the exponential nature of epidemic growth, the error due to this simplification can be rapidly amplified, and its extent remains unknown. How can this intrinsic variability be percolated into epidemic models, and its impact, better quantified? We study this question here through a Bayesian perspective that captures at scale the heterogeneity of a population and environmental conditions, creating a bridge between the traditional agent-based and compartmental approaches. We use our model to simulate the spread as well as the impact of different social distancing strategies on real COVID-19 data, and highlight the significant impact of the heterogeneity. We emphasize that the contribution of this paper focuses on discussing the importance of the impact of R's heterogeneity on uncertainty quantification from a statistical viewpoint, rather than developing new predictive models.

    View details for DOI 10.1080/02664763.2021.1941806

    View details for PubMedID 37554662

    View details for PubMedCentralID PMC10405777

  • Disrupted memory T cell expansion in HIV-exposed uninfected infants is preceded by premature skewing of T cell receptor clonality. bioRxiv : the preprint server for biology Dzanibe, S., Wilk, A. J., Canny, S., Ranganath, T., Alinde, B., Rubelt, F., Huang, H., Davis, M. M., Holmes, S., Jaspan, H. B., Blish, C. A., Gray, C. M. 2023

    Abstract

    While preventing vertical HIV transmission has been very successful, the increasing number of HIV-exposed uninfected infants (iHEU) experience an elevated risk to infections compared to HIV-unexposed and uninfected infants (iHUU). Immune developmental differences between iHEU and iHUU remains poorly understood and here we present a longitudinal multimodal analysis of infant immune ontogeny that highlights the impact of HIV/ARV exposure. Using mass cytometry, we show alterations and differences in the emergence of NK cell populations and T cell memory differentiation between iHEU and iHUU. Specific NK cells observed at birth were also predictive of acellular pertussis and rotavirus vaccine-induced IgG and IgA responses, respectively, at 3 and 9 months of life. T cell receptor Vβ clonotypic diversity was significantly and persistently lower in iHEU preceding the expansion of T cell memory. Our findings show that HIV/ARV exposure disrupts innate and adaptive immunity from birth which may underlie relative vulnerability to infections.

    View details for DOI 10.1101/2023.05.19.540713

    View details for PubMedID 37292866

    View details for PubMedCentralID PMC10245741

  • Profiling the human intestinal environment under physiological conditions. Nature Shalon, D., Culver, R. N., Grembi, J. A., Folz, J., Treit, P. V., Shi, H., Rosenberger, F. A., Dethlefsen, L., Meng, X., Yaffe, E., Aranda-Diaz, A., Geyer, P. E., Mueller-Reif, J. B., Spencer, S., Patterson, A. D., Triadafilopoulos, G., Holmes, S. P., Mann, M., Fiehn, O., Relman, D. A., Huang, K. C. 2023

    Abstract

    The spatiotemporal structure of the human microbiome1,2, proteome3 and metabolome4,5 reflects and determines regional intestinal physiology and may have implications for disease6. Yet, little is known about the distribution of microorganisms, their environment and their biochemical activity in the gut because of reliance on stool samples and limited access to only some regions of the gut using endoscopy in fasting or sedated individuals7. To address these deficiencies, we developed an ingestible device that collects samples from multiple regions of the human intestinal tract during normal digestion. Collection of 240 intestinal samples from 15 healthy individuals using the device and subsequent multi-omics analyses identified significant differences between bacteria, phages, host proteins and metabolites in the intestines versus stool. Certain microbial taxa were differentially enriched and prophage induction was more prevalent in the intestines than in stool. The host proteome and bile acid profiles varied along the intestines and were highly distinct from those of stool. Correlations between gradients in bile acid concentrations and microbial abundance predicted species that altered the bile acid pool through deconjugation. Furthermore, microbially conjugated bile acid concentrations exhibited amino acid-dependent trends that were not apparent in stool. Overall, non-invasive, longitudinal profiling of microorganisms, proteins and bile acids along the intestinal tract under physiological conditions can help elucidate the roles of the gut microbiome and metabolome in human physiology and disease.

    View details for DOI 10.1038/s41586-023-05989-7

    View details for PubMedID 37165188

  • HIV-1 Group M Capsid Amino Acid Variability: Implications for Sequence Quality Control of Genotypic Resistance Testing. Viruses Tao, K., Rhee, S. Y., Tzou, P. L., Osman, Z. A., Pond, S. L., Holmes, S. P., Shafer, R. W. 2023; 15 (4)

    Abstract

    With the approval of the HIV-1 capsid inhibitor, lenacapavir, capsid sequencing will be required for managing lenacapavir-experienced individuals with detectable viremia. Successful sequence interpretation will require examining new capsid sequences in the context of previously published sequence data.We analyzed published HIV-1 group M capsid sequences from 21,012 capsid-inhibitor naïve individuals to characterize amino acid variability at each position and influence of subtype and cytotoxic T lymphocyte (CTL) selection pressure. We determined the distributions of usual mutations, defined as amino acid differences from the group M consensus, with a prevalence ≥ 0.1%. Co-evolving mutations were identified using a phylogenetically-informed Bayesian graphical model method.162 (70.1%) positions had no usual mutations (45.9%) or only conservative usual mutations with a positive BLOSUM62 score (24.2%). Variability correlated independently with subtype-specific amino acid occurrence (Spearman rho = 0.83; p < 1 × 10-9) and the number of times positions were reported to contain an HLA-associated polymorphism, an indicator of CTL pressure (rho = 0.43; p = 0.0002).Knowing the distribution of usual capsid mutations is essential for sequence quality control. Comparing capsid sequences from lenacapavir-treated and lenacapavir-naïve individuals will enable the identification of additional mutations potentially associated with lenacapavir therapy.

    View details for DOI 10.3390/v15040992

    View details for PubMedID 37112972

    View details for PubMedCentralID PMC10143361

  • Highly Ambiguous HIV-1 pol Positions Encoding Multiple Amino Acids Usually Result from Antiviral or Immune Selection Pressure. AIDS research and human retroviruses Tao, K., Rhee, S., Tzou, P. L., Holmes, S., Shafer, R. W. 2022

    Abstract

    HIV-1 pol nucleotide ambiguities encoding amino acid mixtures occur commonly during population-based genotypic drug resistance testing. However, few studies have addressed the validity of sequences with fully ambiguous codons (FACs) containing codons translatable to more than four amino acids. We identified 839 published HIV-1 pol sequences with 846 FACs at 131 positions and determined their distribution relative to 215 HLA-associated pol positions (HAPs) and 84 drug-resistance positions. Among HIV-1 RT and protease sequences from ART-naive and -experienced persons, there was a strong correlation between the likelihood a position was a FAC and that it was an HAP (Spearman's correlation coefficient rho >0.40; p<1e-6). Among HIV-1 RT sequences from ART-experienced persons, these was a correlation between the likelihood that a position was a FAC and that it was a drug-resistance position (rho=0.2; p=8e-4). In the context of population-based genotypic resistance testing, FACs usually result from antiviral or immune selection pressure.

    View details for DOI 10.1089/AID.2022.0094

    View details for PubMedID 36515174

  • SARS-CoV-2 escapes direct NK cell killing through Nsp1-mediated downregulation of ligands for NKG2D. Cell reports Lee, M. J., Leong, M. W., Rustagi, A., Beck, A., Zeng, L., Holmes, S., Qi, L. S., Blish, C. A. 2022: 111892

    Abstract

    Natural killer (NK) cells are cytotoxic effector cells that target and lyse virally infected cells; many viruses therefore encode mechanisms to escape such NK cell killing. Here, we interrogate the ability of SARS-CoV-2 to modulate NK cell recognition and lysis of infected cells. We find that NK cells exhibit poor cytotoxic responses against SARS-CoV-2-infected targets, preferentially killing uninfected bystander cells. We demonstrate that this escape is driven by downregulation of ligands for the activating receptor NKG2D (NKG2D-L). Indeed, early in viral infection, prior to NKG2D-L downregulation, NK cells are able to target and kill infected cells; however, this ability is lost as viral proteins are expressed. Finally, we find that SARS-CoV-2 non-structural protein 1 (Nsp1) mediates downregulation of NKG2D-L and that Nsp1 alone is sufficient to confer resistance to NK cell killing. Collectively, our work demonstrates that SARS-CoV-2 evades direct NK cell cytotoxicity and describes a mechanism by which this occurs.

    View details for DOI 10.1016/j.celrep.2022.111892

    View details for PubMedID 36543165

  • Genotypic correlates of resistance to the HIV-1 strand transfer integrase inhibitor cabotegravir. Antiviral research Rhee, S., Parkin, N., Harrigan, P. R., Holmes, S., Shafer, R. W. 2022: 105427

    Abstract

    Cabotegravir (CAB) is an integrase strand transfer inhibitor (INSTI) formulated as a long-acting injectable drug approved for pre-exposure prophylaxis and use with a long acting rilpivirine formulation for therapy in patients with virological suppression. However, there has been no comprehensive review of the genetic mechanisms of CAB resistance. Studies reporting the selection of drug resistance mutations (DRMs) by CAB and the results of in vitro CAB susceptibility testing were reviewed. The impact of integrase mutations on CAB susceptibility was assessed using regularized regression analysis. The most commonly selected mutations in the 24 persons developing virological failure while receiving CAB included Q148R (n = 15), N155H (n = 7), and E138K (n = 5). T97A, G118R, G140 A/R/S, and R263K each developed in 1-2 persons. With the exception of T97A, G118R, and G140 A/R, these DRMs were also selected in vitro while G140R was selected in the SIV macaque model. Although these DRMs are similar to those occurring in persons receiving the related INSTI dolutegravir, Q148R was more likely to occur with CAB while G118R and R263K were more likely to occur with dolutegravir. Regularized regression analysis identified 14 DRMs significantly associated with reduced CAB susceptibility including six primary DRMs which reduced susceptibility on their own including G118R, Q148 H/K/R, N155H, and R263K, and eight accessory DRMs including M50I, L74 F/M, T97A, E138K, and G140 A/C/S. Isolates with Q148 H/K/R in combination with L74M, E138 A/K, G140 A/S, and N155H often had >10-fold reduced CAB susceptibility. M50I, L74M, and T97A are polymorphic mutations that alone did not appear to increase the risk of virological failure in persons receiving a CAB-containing regimen. Careful patient screening is required to prevent CAB from being used during active virus replication. Close virological monitoring is required to minimize CAB exposure to active replication to prevent the emergence of DRMs associated with cross-resistance to other INSTIs.

    View details for DOI 10.1016/j.antiviral.2022.105427

    View details for PubMedID 36191692

  • Chimpanzee and pig-tailed macaque iPSCs: Improved culture and generation of primate cross-species embryos. Cell reports Roodgar, M., Suchy, F. P., Nguyen, L. H., Bajpai, V. K., Sinha, R., Vilches-Moure, J. G., Van Bortle, K., Bhadury, J., Metwally, A., Jiang, L., Jian, R., Chiang, R., Oikonomopoulos, A., Wu, J. C., Weissman, I. L., Mankowski, J. L., Holmes, S., Loh, K. M., Nakauchi, H., VandeVoort, C. A., Snyder, M. P. 2022; 40 (9): 111264

    Abstract

    As our closest living relatives, non-human primates uniquely enable explorations of human health, disease, development, and evolution. Considerable effort has thus been devoted to generating induced pluripotent stem cells (iPSCs) from multiple non-human primate species. Here, we establish improved culture methods for chimpanzee (Pan troglodytes) and pig-tailed macaque (Macaca nemestrina) iPSCs. Such iPSCs spontaneously differentiate in conventional culture conditions, but can be readily propagated by inhibiting endogenous WNT signaling. As a unique functional test of these iPSCs, we injected them into the pre-implantation embryos of another non-human species, rhesus macaques (Macaca mulatta). Ectopic expression of gene BCL2 enhances the survival and proliferation of chimpanzee and pig-tailed macaque iPSCs within the pre-implantation embryo, although the identity and long-term contribution of the transplanted cells warrants further investigation. In summary, we disclose transcriptomic and proteomic data, cell lines, and cell culture resources that may be broadly enabling for non-human primate iPSCs research.

    View details for DOI 10.1016/j.celrep.2022.111264

    View details for PubMedID 36044843

  • Robust variation in infant gut microbiome assembly across a spectrum of lifestyles. Science (New York, N.Y.) Olm, M. R., Dahan, D., Carter, M. M., Merrill, B. D., Yu, F. B., Jain, S., Meng, X., Tripathi, S., Wastyk, H., Neff, N., Holmes, S., Sonnenburg, E. D., Jha, A. R., Sonnenburg, J. L. 2022; 376 (6598): 1220-1223

    Abstract

    Infant microbiome assembly has been intensely studied in infants from industrialized nations, but little is known about this process in nonindustrialized populations. We deeply sequenced infant stool samples from the Hadza hunter-gatherers of Tanzania and analyzed them in a global meta-analysis. Infant microbiomes develop along lifestyle-associated trajectories, with more than 20% of genomes detected in the Hadza infant gut representing novel species. Industrialized infants-even those who are breastfed-have microbiomes characterized by a paucity of Bifidobacterium infantis and gene cassettes involved in human milk utilization. Strains within lifestyle-associated taxonomic groups are shared between mother-infant dyads, consistent with early life inheritance of lifestyle-shaped microbiomes. The population-specific differences in infant microbiome composition and function underscore the importance of studying microbiomes from people outside of wealthy, industrialized nations.

    View details for DOI 10.1126/science.abj2972

    View details for PubMedID 35679413

  • Statistical Modeling for Practical Pooled Testing During the COVID-19 Pandemic STATISTICAL SCIENCE Comess, S., Wang, H., Holmes, S., Donnat, C. 2022; 37 (2): 229-250

    View details for DOI 10.1214/22-STS857

    View details for Web of Science ID 000798149000006

  • Labeling Self-Tracked Menstrual Health Records With Hidden Semi-Markov Models IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS Symul, L., Holmes, S. 2022; 26 (3): 1297-1308

    Abstract

    Globally, millions of women track their menstrual cycle and fertility via smartphone-based health apps, generating multivariate time series with frequent missing data. To leverage this type of data for studies of fertility or studies of the effect of the menstrual cycle on symptoms and diseases, it is critical to have methods for identifying reproductive events, such as ovulation, pregnancy losses or births. Here, we present a hierarchical approach relying on hidden semi-Markov models that adapts to changes in tracking behavior, explicitly captures variable and state dependent missingness, allows for variables of different type, and quantifies uncertainty. The accuracy on simulated data reaches 98% with no missing data and 90% with realistic missingness. On our partially labeled real-world time series, the accuracy reaches 93%. Our method also accurately predicts cycle length by learning user characteristics. Its implementation is publicly available (HiddenSemiMarkov R package) and transferable to any health time series, including self-reported symptoms and occasional tests.

    View details for DOI 10.1109/JBHI.2021.3110716

    View details for Web of Science ID 000766665300038

    View details for PubMedID 34495854

  • Natural Killer Cell Receptors and Ligands Are Associated With Markers of HIV-1 Persistence in Chronically Infected ART Suppressed Patients. Frontiers in cellular and infection microbiology Ivison, G. T., Vendrame, E., Martinez-Colon, G. J., Ranganath, T., Vergara, R., Zhao, N. Q., Martin, M. P., Bendall, S. C., Carrington, M., Cyktor, J. C., McMahon, D. K., Eron, J., Jones, R. B., Mellors, J. W., Bosch, R. J., Gandhi, R. T., Holmes, S., Blish, C. A., ACTG 5321 Team 2022; 12: 757846

    Abstract

    The latent HIV-1 reservoir represents a major barrier to achieving a long-term antiretroviral therapy (ART)-free remission or cure for HIV-1. Natural Killer (NK) cells are innate immune cells that play a critical role in controlling viral infections and have been shown to be involved in preventing HIV-1 infection and, in those who are infected, delaying time to progression to AIDS. However, their role in limiting HIV-1 persistence on long term ART is still uncharacterized. To identify associations between markers of HIV-1 persistence and the NK cell receptor-ligand repertoire, we used twin mass cytometry panels to characterize the peripheral blood NK receptor-ligand repertoire in individuals with long-term antiretroviral suppression enrolled in the AIDS Clinical Trial Group A5321 study. At the time of testing, participants had been on ART for a median of 7 years, with virological suppression <50 copies/mL since at most 48 weeks on ART. We found that the NK cell receptor and ligand repertoires did not change across three longitudinal samples over one year-a median of 25 weeks and 50 weeks after the initial sampling. To determine the features of the receptor-ligand repertoire that associate with markers of HIV-1 persistence, we performed a LASSO normalized regression. This analysis revealed that the NK cell ligands CD58, HLA-B, and CRACC, as well as the killer cell immunoglobulin-like receptors (KIRs) KIR2DL1, KIR2DL3, and KIR2DS4 were robustly predictive of markers of HIV-1 persistence, as measured by total HIV-1 cell-associated DNA, HIV-1 cell-associated RNA, and single copy HIV-RNA assays. To characterize the roles of cell populations defined by multiple markers, we augmented the LASSO analysis with FlowSOM clustering. This analysis found that a less mature NK cell phenotype (CD16+CD56dimCD57-LILRB1-NKG2C-) was associated with lower HIV-1 cell associated DNA. Finally, we found that surface expression of HLA-Bw6 measured by CyTOF was associated with lower HIV-1 persistence. Genetic analysis revealed that this was driven by lower HIV-1 persistence in HLA-Bw4/6 heterozygotes. These findings suggest that there may be a role for NK cells in controlling HIV-1 persistence in individuals on long-term ART, which must be corroborated by future studies.

    View details for DOI 10.3389/fcimb.2022.757846

    View details for PubMedID 35223535

  • Arcsine laws for random walks generated from random permutations with applications to genomics JOURNAL OF APPLIED PROBABILITY Fang, X., Gan, H. L., Holmes, S., Huang, H., Pekoz, E., Rollin, A., Tang, W. 2021; 58 (4): 851-867
  • Stereotypic Expansion of T Regulatory and Th17 Cells during Infancy Is Disrupted by HIV Exposure and Gut Epithelial Damage. Journal of immunology (Baltimore, Md. : 1950) Dzanibe, S., Lennard, K., Kiravu, A., Seabrook, M. S., Alinde, B., Holmes, S. P., Blish, C. A., Jaspan, H. B., Gray, C. M. 2021

    Abstract

    Few studies have investigated immune cell ontogeny throughout the neonatal and early pediatric period, when there is often increased vulnerability to infections. In this study, we evaluated the dynamics of two critical T cell populations, T regulatory (Treg) cells and Th17 cells, over the first 36 wk of human life. First, we observed distinct CD4+ T cells phenotypes between cord blood and peripheral blood, collected within 12 h of birth, showing that cord blood is not a surrogate for newborn blood. Second, both Treg and Th17 cells expanded in a synchronous fashion over 36 wk of life. However, comparing infants exposed to HIV in utero, but remaining uninfected, with HIV-unexposed uninfected control infants, there was a lower frequency of peripheral blood Treg cells at birth, resulting in a delayed expansion, and then declining again at 36 wk. Focusing on birth events, we found that Treg cells coexpressing CCR4 and alpha4beta7 inversely correlated with plasma concentrations of CCL17 (the ligand for CCR4) and intestinal fatty acid binding protein, IL-7, and CCL20. This was in contrast with Th17 cells, which showed a positive association with these plasma analytes. Thus, despite the stereotypic expansion of both cell subsets over the first few months of life, there was a disruption in the balance of Th17 to Treg cells at birth likely being a result of gut damage and homing of newborn Treg cells from the blood circulation to the gut.

    View details for DOI 10.4049/jimmunol.2100503

    View details for PubMedID 34819390

  • Reporting guidelines for human microbiome research: the STORMS checklist. Nature medicine Mirzayi, C., Renson, A., Genomic Standards Consortium, Massive Analysis and Quality Control Society, Zohra, F., Elsafoury, S., Geistlinger, L., Kasselman, L. J., Eckenrode, K., van de Wijgert, J., Loughman, A., Marques, F. Z., MacIntyre, D. A., Arumugam, M., Azhar, R., Beghini, F., Bergstrom, K., Bhatt, A., Bisanz, J. E., Braun, J., Bravo, H. C., Buck, G. A., Bushman, F., Casero, D., Clarke, G., Collado, M. C., Cotter, P. D., Cryan, J. F., Demmer, R. T., Devkota, S., Elinav, E., Escobar, J. S., Fettweis, J., Finn, R. D., Fodor, A. A., Forslund, S., Franke, A., Furlanello, C., Gilbert, J., Grice, E., Haibe-Kains, B., Handley, S., Herd, P., Holmes, S., Jacobs, J. P., Karstens, L., Knight, R., Knights, D., Koren, O., Kwon, D. S., Langille, M., Lindsay, B., McGovern, D., McHardy, A. C., McWeeney, S., Mueller, N. T., Nezi, L., Olm, M., Palm, N., Pasolli, E., Raes, J., Redinbo, M. R., Ruhlemann, M., Balfour Sartor, R., Schloss, P. D., Schriml, L., Segal, E., Shardell, M., Sharpton, T., Smirnova, E., Sokol, H., Sonnenburg, J. L., Srinivasan, S., Thingholm, L. B., Turnbaugh, P. J., Upadhyay, V., Walls, R. L., Wilmes, P., Yamada, T., Zeller, G., Zhang, M., Zhao, N., Zhao, L., Bao, W., Culhane, A., Devanarayan, V., Dopazo, J., Fan, X., Fischer, M., Jones, W., Kusko, R., Mason, C. E., Mercer, T. R., Sansone, S., Scherer, A., Shi, L., Thakkar, S., Tong, W., Wolfinger, R., Hunter, C., Segata, N., Huttenhower, C., Dowd, J. B., Jones, H. E., Waldron, L., Furlanello, C., Sansone, S. 2021

    Abstract

    The particularly interdisciplinary nature of human microbiome research makes the organization and reporting of results spanning epidemiology, biology, bioinformatics, translational medicine and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studies. Therefore, a multidisciplinary group of microbiome epidemiology researchers adapted guidelines for observational and genetic studies to culture-independent human microbiome studies, and also developed new reporting elements for laboratory, bioinformatics and statistical analyses tailored to microbiome studies. The resulting tool, called 'Strengthening The Organization and Reporting of Microbiome Studies' (STORMS), is composed of a 17-item checklist organized into six sections that correspond to the typical sections of a scientific publication, presented as an editable table for inclusion in supplementary materials. The STORMS checklist provides guidance for concise and complete reporting of microbiome studies that will facilitate manuscript preparation, peer review, and reader comprehension of publications and comparative analysis of published results.

    View details for DOI 10.1038/s41591-021-01552-x

    View details for PubMedID 34789871

  • Author Correction: Community-wide hackathons to identify central themes in single-cell multi-omics. Genome biology Cao, K. L., Abadi, A. J., Davis-Marcisak, E. F., Hsu, L., Arora, A., Coullomb, A., Deshpande, A., Feng, Y., Jeganathan, P., Loth, M., Meng, C., Mu, W., Pancaldi, V., Sankaran, K., Righelli, D., Singh, A., Sodicoff, J. S., Stein-O'Brien, G. L., Subramanian, A., Welch, J. D., You, Y., Argelaguet, R., Carey, V. J., Dries, R., Greene, C. S., Holmes, S., Love, M. I., Ritchie, M. E., Yuan, G., Culhane, A. C., Fertig, E. 2021; 22 (1): 246

    View details for DOI 10.1186/s13059-021-02468-y

    View details for PubMedID 34433496

  • Community-wide hackathons to identify central themes in single-cell multi-omics. Genome biology Le Cao, K., Abadi, A. J., Davis-Marcisak, E. F., Hsu, L., Arora, A., Coullomb, A., Deshpande, A., Feng, Y., Jeganathan, P., Loth, M., Meng, C., Mu, W., Pancaldi, V., Sankaran, K., Singh, A., Sodicoff, J. S., Stein-O'Brien, G. L., Subramanian, A., Welch, J. D., You, Y., Argelaguet, R., Carey, V. J., Dries, R., Greene, C. S., Holmes, S., Love, M. I., Ritchie, M. E., Yuan, G., Culhane, A. C., Fertig, E. 2021; 22 (1): 220

    View details for DOI 10.1186/s13059-021-02433-9

    View details for PubMedID 34353350

  • Multi-omic profiling reveals widespread dysregulation of innate immunity and hematopoiesis in COVID-19. The Journal of experimental medicine Wilk, A. J., Lee, M. J., Wei, B., Parks, B., Pi, R., Martinez-Colon, G. J., Ranganath, T., Zhao, N. Q., Taylor, S., Becker, W., Stanford COVID-19 Biobank, Jimenez-Morales, D., Blomkalns, A. L., O'Hara, R., Ashley, E. A., Nadeau, K. C., Yang, S., Holmes, S., Rabinovitch, M., Rogers, A. J., Greenleaf, W. J., Blish, C. A. 2021; 218 (8)

    Abstract

    Our understanding of protective versus pathological immune responses to SARS-CoV-2, the virus that causes coronavirus disease 2019 (COVID-19), is limited by inadequate profiling of patients at the extremes of the disease severity spectrum. Here, we performed multi-omic single-cell immune profiling of 64 COVID-19 patients across the full range of disease severity, from outpatients with mild disease to fatal cases. Our transcriptomic, epigenomic, and proteomic analyses revealed widespread dysfunction of peripheral innate immunity in severe and fatal COVID-19, including prominent hyperactivation signatures in neutrophils and NK cells. We also identified chromatin accessibility changes at NF-kappaB binding sites within cytokine gene loci as a potential mechanism for the striking lack of pro-inflammatory cytokine production observed in monocytes in severe and fatal COVID-19. We further demonstrated that emergency myelopoiesis is a prominent feature of fatal COVID-19. Collectively, our results reveal disease severity-associated immune phenotypes in COVID-19 and identify pathogenesis-associated pathways that are potential targets for therapeutic intervention.

    View details for DOI 10.1084/jem.20210582

    View details for PubMedID 34128959

  • Modeling the heterogeneity in COVID-19's reproductive number and its impact on predictive scenarios JOURNAL OF APPLIED STATISTICS Donnat, C., Holmes, S. 2021
  • A Statistical Perspective on the Challenges in Molecular Microbial Biology. Journal of agricultural, biological, and environmental statistics Jeganathan, P., Holmes, S. P. 2021; 26 (2): 131-160

    Abstract

    High throughput sequencing (HTS)-based technology enables identifying and quantifying non-culturable microbial organisms in all environments. Microbial sequences have enhanced our understanding of the human microbiome, the soil and plant environment, and the marine environment. All molecular microbial data pose statistical challenges due to contamination sequences from reagents, batch effects, unequal sampling, and undetected taxa. Technical biases and heteroscedasticity have the strongest effects, but different strains across subjects and environments also make direct differential abundance testing unwieldy. We provide an introduction to a few statistical tools that can overcome some of these difficulties and demonstrate those tools on an example. We show how standard statistical methods, such as simple hierarchical mixture and topic models, can facilitate inferences on latent microbial communities. We also review some nonparametric Bayesian approaches that combine visualization and uncertainty quantification. The intersection of molecular microbial biology and statistics is an exciting new venue. Finally, we list some of the important open problems that would benefit from more careful statistical method development.

    View details for DOI 10.1007/s13253-021-00447-1

    View details for PubMedID 36398283

    View details for PubMedCentralID PMC9667415

  • A Statistical Perspective on the Challenges in Molecular Microbial Biology JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS Jeganathan, P., Holmes, S. P. 2021
  • CytoGLMM: conditional differential analysis for flow and mass cytometry experiments. BMC bioinformatics Seiler, C., Ferreira, A., Kronstad, L. M., Simpson, L. J., Le Gars, M., Vendrame, E., Blish, C. A., Holmes, S. 2021; 22 (1): 137

    Abstract

    BACKGROUND: Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current data analysis tools compare expressions across many computationally discovered cell types. Our goal is to focus on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees.RESULTS: Differential analysis of marker expressions can be difficult due to marker correlations and inter-subject heterogeneity, particularly for studies of human immunology. We address these challenges with two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. On simulated datasets, we compare the robustness towards marker correlations and heterogeneity of both strategies. For paired experiments, we find that both strategies maintain the target false discovery rate under medium correlations and that mixed models are statistically more powerful under the correct model specification. For unpaired experiments, our results indicate that much larger patient sample sizes are required to detect differences. We illustrate the CytoGLMM R package and workflow for both strategies on a pregnancy dataset.CONCLUSION: Our approach to finding differential proteins in flow and mass cytometry data reduces biases arising from marker correlations and safeguards against false discoveries induced by patient heterogeneity.

    View details for DOI 10.1186/s12859-021-04067-x

    View details for PubMedID 33752595

  • Mass Cytometry Analysis of the NK Cell Receptor-Ligand Repertoire Reveals Unique Differences between Dengue-Infected Children and Adults. ImmunoHorizons McKechnie, J. L., Beltran, D., Ferreira, A. M., Vergara, R., Saenz, L., Vergara, O., Estripeaut, D., Arauz, A. B., Simpson, L. J., Holmes, S., Lopez-Verges, S., Blish, C. A. 2020; 4 (10): 634–47

    Abstract

    Dengue virus (DENV) is a significant cause of morbidity in many regions of the world, with children at the greatest risk of developing severe dengue. NK cells, characterized by their ability to rapidly recognize and kill virally infected cells, are activated during acute DENV infection. However, their role in viral clearance versus pathogenesis has not been fully elucidated. Our goal was to profile the NK cell receptor-ligand repertoire to provide further insight into the function of NK cells during pediatric and adult DENV infection. We used mass cytometry to phenotype isolate NK cells and PBMCs from a cohort of DENV-infected children and adults. Using unsupervised clustering, we found that pediatric DENV infection leads to a decrease in total NK cell frequency with a reduction in the percentage of CD56dimCD38bright NK cells and an increase in the percentage of CD56dimperforinbright NK cells. No such changes were observed in adults. Next, we identified markers predictive of DENV infection using a differential state test. In adults, NK cell expression of activation markers, including CD69, perforin, and Fas-L, and myeloid cell expression of activating NK cell ligands, namely Fas, were predictive of infection. In contrast, increased NK cell expression of the maturation marker CD57 and myeloid cell expression of inhibitory ligands, such as HLA class I molecules, were predictive of pediatric DENV infection. These findings suggest that acute pediatric DENV infection may result in diminished NK cell activation, which could contribute to enhanced pathogenesis and disease severity.

    View details for DOI 10.4049/immunohorizons.2000074

    View details for PubMedID 33067399

  • Effect of water, sanitation, handwashing and nutrition interventions on enteropathogens in children 14 months old: a cluster-randomized controlled trial in rural Bangladesh. The Journal of infectious diseases Grembi, J. A., Lin, A., Karim, M. A., Islam, M. O., Miah, R., Arnold, B. F., McQuade, E. T., Ali, S., Rahman, M. Z., Hussain, Z., Shoab, A. K., Famida, S. L., Hossen, M. S., Mutsuddi, P., Rahman, M., Unicomb, L., Haque, R., Taniuchi, M., Liu, J., Platts-Mills, J. A., Holmes, S. P., Stewart, C. P., Benjamin-Chung, J., Colford, J. M., Houpt, E. R., Luby, S. P. 2020

    Abstract

    BACKGROUND: We evaluated the impact of low-cost water, sanitation, handwashing (WSH) and child nutrition interventions on enteropathogen carriage in the WASH Benefits cluster-randomized controlled trial in rural Bangladesh.METHODS: We analyzed 1411 routine fecal samples from children 14±2 months old in the WSH (n = 369), nutrition counseling plus lipid-based nutrient supplement (n = 353), nutrition plus WSH (n = 360), and control (n = 329) arms for 34 enteropathogens using quantitative PCR. Outcomes included the number of co-occurring pathogens; cumulative quantity of four stunting-associated pathogens; and prevalence and quantity of individual pathogens. Masked analysis was by intention-to-treat.RESULTS: 326 (99.1%) control children had one or more enteropathogens detected (mean 3.8±1.8). Children receiving WSH interventions had lower prevalence and quantity of individual viruses than controls (prevalence difference for norovirus: -11% [95% confidence interval [CI], -5 to -17%]; sapovirus: -9% [95%CI, -3 to -15%]; and adenovirus 40/41: -9% [95%CI, -2 to - 15%]). There was no difference in bacteria, parasites, or cumulative quantity of stunting-associated pathogens between controls and any intervention arm.CONCLUSIONS: WSH interventions were associated with fewer enteric viruses in children aged 14 months. Different strategies are needed to reduce enteric bacteria and parasites at this critical young age.

    View details for DOI 10.1093/infdis/jiaa549

    View details for PubMedID 32861214

  • Microbiota assembly, structure, and dynamics among Tsimane horticulturalists of the Bolivian Amazon. Nature communications Sprockett, D. D., Martin, M., Costello, E. K., Burns, A. R., Holmes, S. P., Gurven, M. D., Relman, D. A. 2020; 11 (1): 3772

    Abstract

    Selective and neutral forces shape human microbiota assembly in early life. The Tsimane are an indigenous Bolivian population with infant care-associated behaviors predicted to increase mother-infant microbial dispersal. Here, we characterize microbial community assembly in 47 infant-mother pairs from six Tsimane villages, using 16S rRNA gene amplicon sequencing of longitudinal stool and tongue swab samples. We find that infant consumption of dairy products, vegetables, and chicha (a fermented drink inoculated with oral microbes) is associated with stool microbiota composition. In stool and tongue samples, microbes shared between mothers and infants are more abundant than non-shared microbes. Using a neutral model of community assembly, we find that neutral processes alone explain the prevalence of 79% of infant-colonizing microbes, but explain microbial prevalence less well in adults from river villages with more regular access to markets. Our results underscore the importance of neutral forces during microbiota assembly. Changing lifestyle factors may alter traditional modes of microbiota assembly by decreasing the role of neutral processes.

    View details for DOI 10.1038/s41467-020-17541-6

    View details for PubMedID 32728114

  • Cytokine profile in plasma of severe COVID-19 does not differ from ARDS and sepsis. JCI insight Wilson, J. G., Simpson, L. J., Ferreira, A., Rustagi, A., Roque, J. A., Asuni, A., Ranganath, T., Grant, P. M., Subramanian, A. K., Rosenberg-Hasson, Y., Maecker, H., Holmes, S., Levitt, J. E., Blish, C., Rogers, A. J. 2020

    Abstract

    BACKGROUND: Elevated levels of inflammatory cytokines have been associated with poor outcomes among COVID-19 patients. It is unknown, however, how these levels compare to those observed in critically ill patients with ARDS or sepsis due to other causes.METHODS: We used a luminex assay to determine expression of 76 cytokines from plasma of hospitalized COVID-19 patients and banked plasma samples from ARDS and sepsis patients. Our analysis focused on detecting statistical differences in levels of 6 cytokines associated with cytokine storm (IL-1b, IL-1RA, IL-6, IL-8, IL-18, and TNFalpha) between patients with moderate COVID-19, severe COVID-19, and ARDS or sepsis.RESULTS: 15 hospitalized COVID-19 patients, 9 of whom were critically ill, were compared to critically ill patients with ARDS (n = 12) or sepsis (n = 16). There were no statistically significant differences in baseline levels of IL-1b, IL-1RA, IL-6, IL-8, IL-18, and TNFalpha between patients with COVID-19 and critically ill controls with ARDS or sepsis.CONCLUSIONS: Levels of inflammatory cytokines were not higher in severe COVID-19 patients than in moderate COVID-19 or critically ill patients with ARDS or sepsis in this small cohort. Broad use of immunosuppressive therapies in ARDS has failed in numerous Phase 3 studies; use of these therapies in unselected patients with COVID-19 may be unwarranted.FUNDING: A.J.R.: Stanford ICU Biobank NHLBI K23 HL125663. C.A.B.: Burroughs Wellcome Fund Investigators in the Pathogenesis of Infectious Diseases #1016687; NIH/NIAID U19AI057229-16 (PI MM Davis); Stanford Maternal Child Health Research Institute; Chan Zuckerberg Biohub.

    View details for DOI 10.1172/jci.insight.140289

    View details for PubMedID 32706339

  • Author Correction: Gut microbiota plasticity is correlated with sustained weight loss on a low-carb or low-fat dietary intervention. Scientific reports Grembi, J. A., Nguyen, L. H., Haggerty, T. D., Gardner, C. D., Holmes, S. P., Parsonnet, J. 2020; 10 (1): 11095

    Abstract

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

    View details for DOI 10.1038/s41598-020-68280-z

    View details for PubMedID 32606436

  • Variability in the analysis of a single neuroimaging dataset by many teams. Nature Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., Kirchler, M., Iwanir, R., Mumford, J. A., Adcock, R. A., Avesani, P., Baczkowski, B. M., Bajracharya, A., Bakst, L., Ball, S., Barilari, M., Bault, N., Beaton, D., Beitner, J., Benoit, R. G., Berkers, R. M., Bhanji, J. P., Biswal, B. B., Bobadilla-Suarez, S., Bortolini, T., Bottenhorn, K. L., Bowring, A., Braem, S., Brooks, H. R., Brudner, E. G., Calderon, C. B., Camilleri, J. A., Castrellon, J. J., Cecchetti, L., Cieslik, E. C., Cole, Z. J., Collignon, O., Cox, R. W., Cunningham, W. A., Czoschke, S., Dadi, K., Davis, C. P., Luca, A. D., Delgado, M. R., Demetriou, L., Dennison, J. B., Di, X., Dickie, E. W., Dobryakova, E., Donnat, C. L., Dukart, J., Duncan, N. W., Durnez, J., Eed, A., Eickhoff, S. B., Erhart, A., Fontanesi, L., Fricke, G. M., Fu, S., Galván, A., Gau, R., Genon, S., Glatard, T., Glerean, E., Goeman, J. J., Golowin, S. A., González-García, C., Gorgolewski, K. J., Grady, C. L., Green, M. A., Guassi Moreira, J. F., Guest, O., Hakimi, S., Hamilton, J. P., Hancock, R., Handjaras, G., Harry, B. B., Hawco, C., Herholz, P., Herman, G., Heunis, S., Hoffstaedter, F., Hogeveen, J., Holmes, S., Hu, C. P., Huettel, S. A., Hughes, M. E., Iacovella, V., Iordan, A. D., Isager, P. M., Isik, A. I., Jahn, A., Johnson, M. R., Johnstone, T., Joseph, M. J., Juliano, A. C., Kable, J. W., Kassinopoulos, M., Koba, C., Kong, X. Z., Koscik, T. R., Kucukboyaci, N. E., Kuhl, B. A., Kupek, S., Laird, A. R., Lamm, C., Langner, R., Lauharatanahirun, N., Lee, H., Lee, S., Leemans, A., Leo, A., Lesage, E., Li, F., Li, M. Y., Lim, P. C., Lintz, E. N., Liphardt, S. W., Losecaat Vermeer, A. B., Love, B. C., Mack, M. L., Malpica, N., Marins, T., Maumet, C., McDonald, K., McGuire, J. T., Melero, H., Méndez Leal, A. S., Meyer, B., Meyer, K. N., Mihai, G., Mitsis, G. D., Moll, J., Nielson, D. M., Nilsonne, G., Notter, M. P., Olivetti, E., Onicas, A. I., Papale, P., Patil, K. R., Peelle, J. E., Pérez, A., Pischedda, D., Poline, J. B., Prystauka, Y., Ray, S., Reuter-Lorenz, P. A., Reynolds, R. C., Ricciardi, E., Rieck, J. R., Rodriguez-Thompson, A. M., Romyn, A., Salo, T., Samanez-Larkin, G. R., Sanz-Morales, E., Schlichting, M. L., Schultz, D. H., Shen, Q., Sheridan, M. A., Silvers, J. A., Skagerlund, K., Smith, A., Smith, D. V., Sokol-Hessner, P., Steinkamp, S. R., Tashjian, S. M., Thirion, B., Thorp, J. N., Tinghög, G., Tisdall, L., Tompson, S. H., Toro-Serey, C., Torre Tresols, J. J., Tozzi, L., Truong, V., Turella, L., van 't Veer, A. E., Verguts, T., Vettel, J. M., Vijayarajah, S., Vo, K., Wall, M. B., Weeda, W. D., Weis, S., White, D. J., Wisniewski, D., Xifra-Porxas, A., Yearling, E. A., Yoon, S., Yuan, R., Yuen, K. S., Zhang, L., Zhang, X., Zosky, J. E., Nichols, T. E., Poldrack, R. A., Schonberg, T. 2020; 582 (7810): 84-88

    Abstract

    Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.

    View details for DOI 10.1038/s41586-020-2314-9

    View details for PubMedID 32483374

  • Microbial biogeography and ecology of the mouth and implications for periodontal diseases. Periodontology 2000 Proctor, D. M., Shelef, K. M., Gonzalez, A., Davis, C. L., Dethlefsen, L., Burns, A. R., Loomer, P. M., Armitage, G. C., Ryder, M. I., Millman, M. E., Knight, R., Holmes, S. P., Relman, D. A. 2020; 82 (1): 26–41

    Abstract

    In humans, the composition ofmicrobial communities differs among body sites and between habitats within asingle site. Patterns of variation in the distribution of organisms across time and space are referred to as "biogeography." The human oral cavity is a critical observatory for exploring microbial biogeography because it is spatially structured, easily accessible, and its microbiota has been linked to the promotion of both health and disease. The biogeographic features of microbial communities residing in spatially distinct, but ecologically similar, environments on the human body, including the subgingival crevice, have not yet been adequately explored. The purpose of this paper is twofold. First, we seek to provide the dental community with a primer on biogeographic theory, highlighting its relevance to the study of the human oral cavity. We summarize what is known about the biogeographic variation of dental caries and periodontitis and postulate that disease occurrence reflectsspatial patterning in the composition and structureoforal microbial communities. Second, we present a number of methods that investigators can use to test specific hypotheses using biogeographic theory. To anchor our discussion, we apply each method to a case study and examine the spatial variation of the human subgingival microbiota in 2 individuals. Our case study suggests that the composition ofsubgingival communities may conform to an anterior-to-posterior gradient within the oral cavity. The gradient appears to be structured by both deterministic and nondeterministic processes, although additional work is needed to confirm these findings. A better understanding of biogeographic patterns and processes will lead to improved efficacy of dental interventions targeting the oral microbiota.

    View details for DOI 10.1111/prd.12268

    View details for PubMedID 31850642

  • Expanded Spectrum of Antiretroviral-Selected Mutations in Human Immunodeficiency Virus Type 2. The Journal of infectious diseases Tzou, P. L., Descamps, D., Rhee, S., Raugi, D. N., Charpentier, C., Taveira, N., Smith, R. A., Soriano, V., de Mendoza, C., Holmes, S. P., Gottlieb, G. S., Shafer, R. W. 2020

    Abstract

    INTRODUCTION: HIV-1 and HIV-2 differ in their antiretroviral (ARV) susceptibilities and drug resistance mutations (DRMs).METHODS: We analyzed published HIV-2 pol sequences to identify HIV-2 treatment-selected mutations (TSMs). Mutation prevalences were determined by HIV-2 group and ARV status. Nonpolymorphic mutations were those in <1% of ARV-naive persons. TSMs were those associated with ARV therapy after multiple comparisons adjustment.RESULTS: We analyzed protease (PR) sequences from 483 PR inhibitor (PI)-naive and 232 PI-treated persons; RT sequences from 333 nucleoside RT inhibitor (NRTI)-naive and 252 NRTI-treated persons; and integrase (IN) sequences from 236 IN inhibitor (INSTI)-naive and 60 INSTI-treated persons. In PR, 12 nonpolymorphic TSMs occurred in ≥11 persons: V33I, K45R, V47A, I50V, I54M, T56V, V62A, A73G, I82F, I84V, F85L, L90M. In RT, 9 nonpolymorphic TSMs occurred in ≥10 persons: K40R, A62V, K70R, Y115F, Q151M, M184VI, S215Y. In IN, 11 nonpolymorphic TSMs occurred in ≥4 persons: Q91R, E92AQ, T97A, G140S, Y143G, Q148R, A153G, N155H, H156R, R231 5-amino acid insertions. Nine of 32 nonpolymorphic TSMs were previously unreported.CONCLUSIONS: This meta-analysis confirmed the ARV association of previously reported HIV-2 DRMs and identified novel TSMs. Genotypic and phenotypic studies of HIV-2 TSMs will improve approaches to predicting HIV-2 ARV susceptibility and treating HIV-2-infected persons.

    View details for DOI 10.1093/infdis/jiaa026

    View details for PubMedID 31965175

  • Gut microbiota plasticity is correlated with sustained weight loss on a low-carb or low-fat dietary intervention Scientific Reports Grembi, J. A., Nguyen, L. H., Haggerty, T. D., Gardner, C. D., Holmes, S. P., Parsonnet, J. 2020; 10
  • Estimation of Orientation and Camera Parameters from Cryo-Electron Microscopy Images with Variational Autoencoders and Generative Adversarial Networks Miolane, N., Poitevin, F., Li, Y., Holmes, S., IEEE COMP SOC IEEE COMPUTER SOC. 2020: 4174-4183
  • Characterization of the Impact of Daclizumab Beta on Circulating Natural Killer Cells by Mass Cytometry. Frontiers in immunology Ranganath, T. n., Simpson, L. J., Ferreira, A. M., Seiler, C. n., Vendrame, E. n., Zhao, N. n., Fontenot, J. D., Holmes, S. n., Blish, C. A. 2020; 11: 714

    Abstract

    Daclizumab beta is a humanized monoclonal antibody that binds to CD25 and selectively inhibits high-affinity IL-2 receptor signaling. As a former treatment for relapsing forms of multiple sclerosis (RMS), daclizumab beta induces robust expansion of the CD56bright subpopulation of NK cells that is correlated with the drug's therapeutic effects. As NK cells represent a heterogeneous population of lymphocytes with a range of phenotypes and functions, the goal of this study was to better understand how daclizumab beta altered the NK cell repertoire to provide further insight into the possible mechanism(s) of action in RMS. We used mass cytometry to evaluate expression patterns of NK cell markers and provide a comprehensive assessment of the NK cell repertoire in individuals with RMS treated with daclizumab beta or placebo over the course of 1 year. Treatment with daclizumab beta significantly altered the NK cell repertoire compared to placebo treatment. As previously reported, daclizumab beta significantly increased expression of CD56 on total NK cells. Within the CD56bright NK cells, treatment was associated with multiple phenotypic changes, including increased expression of NKG2A and NKp44, and diminished expression of CD244, CD57, and NKp46. These alterations occurred broadly across the CD56bright population, and were not associated with a specific subset of CD56bright NK cells. While the changes were less dramatic, CD56dim NK cells responded distinctly to daclizumab beta treatment, with higher expression of CD2 and NKG2A, and lower expression of FAS-L, HLA-DR, NTB-A, NKp30, and Perforin. Together, these data indicate that the expanded CD56bright NK cells share features of both immature and mature NK cells. These findings show that daclizumab beta treatment is associated with unique changes in NK cells that may enhance their ability to kill autoreactive T cells or to exert immunomodulatory functions.

    View details for DOI 10.3389/fimmu.2020.00714

    View details for PubMedID 32391016

    View details for PubMedCentralID PMC7194113

  • Influenza-Induced Interferon Lambda Response Is Associated With Longer Time to Delivery Among Pregnant Kenyan Women. Frontiers in immunology Seiler, C. n., Bayless, N. L., Vergara, R. n., Pintye, J. n., Kinuthia, J. n., Osborn, L. n., Matemo, D. n., Richardson, B. A., John-Stewart, G. n., Holmes, S. n., Blish, C. A. 2020; 11: 452

    Abstract

    Specific causes of preterm birth remain unclear. Several recent studies have suggested that immune changes during pregnancy are associated with the timing of delivery, yet few studies have been performed in low-income country settings where the rates of preterm birth are the highest. We conducted a retrospective nested case-control evaluation within a longitudinal study among HIV-uninfected pregnant Kenyan women. To characterize immune function in these women, we evaluated unstimulated and stimulated peripheral blood mononuclear cells in vitro with the A/California/2009 strain of influenza to understand the influenza-induced immune response. We then evaluated transcript expression profiles using the Affymetrix Human GeneChip Transcriptome Array 2.0. Transcriptional profiles of sufficient quality for analysis were obtained from 54 women; 19 of these women delivered <34 weeks and were defined as preterm cases and 35 controls delivered >37 weeks. The median time to birth from sample collection was 13 weeks. No transcripts were significantly associated with preterm birth in a case-control study of matched term and preterm birth (n = 42 women). In the influenza-stimulated samples, expression of IFNL1 was associated with longer time to delivery-the amount of time between sample collection and delivery (n = 54 women). A qPCR analysis confirmed that influenza-induced IFNL expression was associated with longer time to delivery. These data indicate that during pregnancy, ex vivo influenza stimulation results in altered transcriptional response and is associated with time to delivery in cohort of women residing in an area with high preterm birth prevalence.

    View details for DOI 10.3389/fimmu.2020.00452

    View details for PubMedID 32256497

    View details for PubMedCentralID PMC7089959

  • Treated HIV Infection Alters Phenotype but Not HIV-Specific Function of Peripheral Blood Natural Killer Cells. Frontiers in immunology Zhao, N. Q., Ferreira, A., Grant, P. M., Holmes, S., Blish, C. A. 2020; 11: 829

    Abstract

    Natural killer (NK) cells are the predominant antiviral cells of the innate immune system, and may play an important role in acquisition and disease progression of HIV. While untreated HIV infection is associated with distinct alterations in the peripheral blood NK cell repertoire, less is known about how NK phenotype is altered in the setting of long-term viral suppression with antiretroviral therapy (ART), as well as how NK memory can impact functional responses. As such, we sought to identify changes in NK cell phenotype and function using high-dimensional mass cytometry to simultaneously analyze both surface and functional marker expression of peripheral blood NK cells in a cohort of ART-suppressed, HIV+ patients and HIV- healthy controls. We found that the NK cell repertoire following IL-2 treatment was altered in individuals with treated HIV infection compared to healthy controls, with increased expression of markers including NKG2C and CD2, and decreased expression of CD244 and NKp30. Using co-culture assays with autologous, in vitro HIV-infected CD4 T cells, we identified a subset of NK cells with enhanced responsiveness to HIV-1-infected cells, but no differences in the magnitude of anti-HIV NK cell responses between the HIV+ and HIV- groups. In addition, by profiling of NK cell receptors on responding cells, we found similar phenotypes of HIV-responsive NK cell subsets in both groups. Lastly, we identified clusters of NK cells that are altered in individuals with treated HIV infection compared to healthy controls, but found that these clusters are distinct from those that respond to HIV in vitro. As such, we conclude that while chronic, treated HIV infection induces a reshaping of the IL-2-stimulated peripheral blood NK cell repertoire, it does so in a way that does not make the repertoire more HIV-specific.

    View details for DOI 10.3389/fimmu.2020.00829

    View details for PubMedID 32477342

  • Analysis of unusual and signature APOBEC-mutations in HIV-1 pol next-generation sequences. PloS one Tzou, P. L., Kosakovsky Pond, S. L., Avila-Rios, S. n., Holmes, S. P., Kantor, R. n., Shafer, R. W. 2020; 15 (2): e0225352

    Abstract

    At low mutation-detection thresholds, next generation sequencing (NGS) for HIV-1 genotypic resistance testing is susceptible to artifactual detection of mutations arising from PCR error and APOBEC-mediated G-to-A hypermutation.We analyzed published HIV-1 pol Illumina NGS data to characterize the distribution of mutations at eight NGS mutation detection thresholds: 20%, 10%, 5%, 2%, 1%, 0.5%, 0.2%, and 0.1%. At each threshold, we determined proportions of amino acid mutations that were unusual (defined as having a prevalence <0.01% in HIV-1 group M sequences) or signature APOBEC mutations.Eight studies, containing 855 samples, in the NCBI Sequence Read Archive were analyzed. As detection thresholds were lowered, there was a progressive increase in the proportion of positions with usual and unusual mutations and in the proportion of all mutations that were unusual. The median proportion of positions with an unusual mutation increased gradually from 0% at the 20% threshold to 0.3% at the 1% threshold and then exponentially to 1.3% (0.5% threshold), 6.9% (0.2% threshold), and 23.2% (0.1% threshold). In two of three studies with available plasma HIV-1 RNA levels, the proportion of positions with unusual mutations was negatively associated with virus levels. Although the complete set of signature APOBEC mutations was much smaller than that of unusual mutations, the former outnumbered the latter in one-sixth of samples at the 0.5%, 1%, and 2% thresholds.The marked increase in the proportion of positions with unusual mutations at thresholds below 1% and in samples with lower virus loads suggests that, at low thresholds, many unusual mutations are artifactual, reflecting PCR error or G-to-A hypermutation. Profiling the numbers of unusual and signature APOBEC pol mutations at different NGS mutation detection thresholds may be useful to avoid selecting a threshold that is too low and poses an unacceptable risk of identifying artifactual mutations.

    View details for DOI 10.1371/journal.pone.0225352

    View details for PubMedID 32102090

  • TIGIT is upregulated by HIV-1 infection and marks a highly functional adaptive and mature subset of natural killer cells. AIDS (London, England) Vendrame, E. n., Seiler, C. n., Ranganath, T. n., Zhao, N. Q., Vergara, R. n., Alary, M. n., Labbé, A. C., Guédou, F. n., Poudrier, J. n., Holmes, S. n., Roger, M. n., Blish, C. A. 2020

    Abstract

    Our objective was to investigate the mechanisms that govern natural killer (NK) cell responses to HIV, with a focus on specific receptor-ligand interactions involved in HIV recognition by NK cells.We first performed a mass cytometry-based screen of NK cell receptor expression patterns in healthy controls and HIV individuals. We then focused mechanistic studies on the expression and function of T cell immunoreceptor with Ig and ITIM domains (TIGIT).The mass cytometry screen revealed that TIGIT is upregulated on NK cells of untreated HIV women, but not in antiretroviral-treated women. TIGIT is an inhibitory receptor that is thought to mark exhausted NK cells; however, blocking TIGIT did not improve anti-HIV NK cell responses. In fact, the TIGIT ligands CD112 and CD155 were not upregulated on CD4 T cells in vitro or in vivo, providing an explanation for the lack of benefit from TIGIT blockade. TIGIT expression marked a unique subset of NK cells that express significantly higher levels of NK cell activating receptors (DNAM-1, NTB-A, 2B4, CD2) and exhibit a mature/adaptive phenotype (CD57, NKG2C, LILRB1, FcRγ, Syk). Furthermore, TIGIT NK cells had increased responses to mock-infected and HIV-infected autologous CD4 T cells, and to PMA/ionomycin, cytokine stimulation and the K562 cancer cell line.TIGIT expression is increased on NK cells from untreated HIV individuals. Although TIGIT does not participate directly to the response to HIV-infected cells, it marks a population of mature/adaptive NK cells with increased functional responses.

    View details for DOI 10.1097/QAD.0000000000002488

    View details for PubMedID 32028328

  • Natural killer cell phenotype is altered in HIV-exposed seronegative women. PloS one Zhao, N. Q., Vendrame, E., Ferreira, A., Seiler, C., Ranganath, T., Alary, M., Labbe, A., Guedou, F., Poudrier, J., Holmes, S., Roger, M., Blish, C. A. 2020; 15 (9): e0238347

    Abstract

    Highly exposed seronegative (HESN) individuals present a unique setting to study mechanisms of protection against HIV acquisition. As natural killer (NK) cell activation and function have been implicated as a correlate of protection in HESN individuals, we sought to better understand the features of NK cells that may confer protection. We used mass cytometry to phenotypically profile NK cells from a cohort of Beninese sex workers and healthy controls. We found that NK cells from HESN women had increased expression of NKG2A, NKp30 and LILRB1, as well as the Fc receptor CD16, and decreased expression of DNAM-1, CD94, Siglec-7, and NKp44. Using functional assessments of NK cells from healthy donors against autologous HIV-infected CD4+ T cells, we observed that NKp30+ and Siglec-7+ cells had improved functional activity. Further, we found that NK cells from HESN women trended towards increased antibody-dependent cellular cytotoxicity (ADCC) activity; this activity correlated with increased CD16 expression. Overall, we identify features of NK cells in HESN women that may contribute to protection from HIV infection. Follow up studies with larger cohorts are warranted to confirm these findings.

    View details for DOI 10.1371/journal.pone.0238347

    View details for PubMedID 32870938

  • Chromosome-level de novo assembly of the pig-tailed macaque genome using linked-read sequencing and HiC proximity scaffolding. GigaScience Roodgar, M. n., Babveyh, A. n., Nguyen, L. H., Zhou, W. n., Sinha, R. n., Lee, H. n., Hanks, J. B., Avula, M. n., Jiang, L. n., Jian, R. n., Lee, H. n., Song, G. n., Chaib, H. n., Weissman, I. L., Batzoglou, S. n., Holmes, S. n., Smith, D. G., Mankowski, J. L., Prost, S. n., Snyder, M. P. 2020; 9 (7)

    Abstract

    Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort.To close this gap and enhance functional genomics approaches, we used a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells derived from the same animal. Reconstruction of the evolutionary tree using whole-genome annotation and orthologous comparisons among 3 macaque species, human, and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques.These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.

    View details for DOI 10.1093/gigascience/giaa069

    View details for PubMedID 32649757

  • Successful strategies for human microbiome data generation, storage and analyses JOURNAL OF BIOSCIENCES Holmes, S. 2019; 44 (5)
  • Successful strategies for human microbiome data generation, storage and analyses. Journal of biosciences Holmes, S. 2019; 44 (5)

    Abstract

    Current interest in the potential for clinical use of new tools for improving human health are now focused on techniques for the study of the human microbiome and its interaction with environmental and clinical covariates. This review outlines the use of statistical strategies that have been developed in past studies and can inform successful design and analyses of controlled perturbation experiments performed in the human microbiome. We carefully outline what the data are, their imperfections and how we need to transform, decontaminate and denoise them. We show how to identify the important unknown parameters and how to can leverage variability we see to produce efficient models for prediction and uncertainty quantification. We encourage a reproducible strategy that builds on best practice principles that can be adapted for effective experimental design and reproducible workflows. Nonparametric, data-driven denoising strategies already provide the best strain identification and decontamination methods. Data driven models can be combined with uncertainty quantification to provide reproducible aids to decision making in the clinical context, as long as careful, separate, registered confirmatory testing are undertaken. Here we provide guidelines for effective longitudinal studies and their analyses. Lessons learned along the way are that visualizations at every step can pinpoint problems and outliers, normalization and filtering improve power in downstream testing. We recommend collecting and binding the metadata and covariates to sample descriptors and recording complete computer scripts into an R markdown supplement that can reduce opportunities for human error and enable collaborators and readers to replicate all the steps of the study. Finally, we note that optimizing the bioinformatic and statistical workflow involves adopting a wait-and-see approach that is particularly effective in cases where the features such as 'mass spectrometry peaks' and metagenomic tables can only be partially annotated.

    View details for PubMedID 31719220

  • Specific gut microbiome members are associated with distinct immune markers in pediatric allogeneic hematopoietic stem cell transplantation. Microbiome Ingham, A. C., Kielsen, K., Cilieborg, M. S., Lund, O., Holmes, S., Aarestrup, F. M., Muller, K. G., Pamp, S. J. 2019; 7 (1): 131

    Abstract

    BACKGROUND: Increasing evidence reveals the importance of the microbiome in health and disease and inseparable host-microbial dependencies. Host-microbe interactions are highly relevant in patients receiving allogeneic hematopoietic stem cell transplantation (HSCT), i.e., a replacement of the cellular components of the patients' immune system with that of a foreign donor. HSCT is employed as curative immunotherapy for a number of non-malignant and malignant hematologic conditions, includingcancers such as acute lymphoblastic leukemia. The procedure can be accompanied by severe side effects such as infections, acute graft-versus-host disease (aGvHD), and death. Here, we performed a longitudinal analysis of immunological markers, immune reconstitution and gut microbiota composition in relation to clinical outcomes in children undergoing HSCT. Such an analysis could reveal biomarkers, e.g., at the time point prior to HSCT, that in the future could be used to predict which patients are of high risk in relation to side effects and clinical outcomes and guide treatment strategies accordingly.RESULTS: In two multivariate analyses (sparse partial least squares regression and canonical correspondence analysis), we identified three consistent clusters: (1) high concentrations of the antimicrobial peptide human beta-defensin 2 (hBD2) prior to the transplantation in patients with high abundances of Lactobacillaceae, who later developed moderate or severe aGvHD and exhibited high mortality. (2) Rapid reconstitution of NK and B cells in patients with high abundances of obligate anaerobes such as Ruminococcaceae, who developed no or mild aGvHD and exhibited low mortality. (3) High inflammation, indicated by high levels of C-reactive protein, in patients with high abundances of facultative anaerobic bacteria such as Enterobacteriaceae. Furthermore, we observed that antibiotic treatment influenced the bacterial community state.CONCLUSIONS: We identify multivariate associations between specific microbial taxa, host immune markers, immune cell reconstitution, and clinical outcomes in relation to HSCT. Our findings encourage further investigations into establishing longitudinal surveillance of the intestinal microbiome and relevant immune markers, such as hBD2, in HSCT patients. Profiling of the microbiome may prove useful as a prognostic tool that could help identify patients at risk of poor immune reconstitution and adverse outcomes, such as aGvHD and death, upon HSCT, providing actionable information in guiding precision medicine.

    View details for DOI 10.1186/s40168-019-0745-z

    View details for PubMedID 31519210

  • Multitable Methods for Microbiome Data Integration. Frontiers in genetics Sankaran, K., Holmes, S. P. 2019; 10: 627

    Abstract

    The simultaneous study of multiple measurement types is a frequently encountered problem in practical data analysis. It is especially common in microbiome research, where several sources of data-for example, 16s-rRNA, metagenomic, metabolomic, or transcriptomic data-can be collected on the same physical samples. There has been a proliferation of proposals for analyzing such multitable microbiome data, as is often the case when new data sources become more readily available, facilitating inquiry into new types of scientific questions. However, stepping back from the rush for new methods for multitable analysis in the microbiome literature, it is worthwhile to recognize the broader landscape of multitable methods, as they have been relevant in problem domains ranging across economics, robotics, genomics, chemometrics, and neuroscience. In different contexts, these techniques are called data integration, multi-omic, and multitask methods, for example. Of course, there is no unique optimal algorithm to use across domains-different instances of the multitable problem possess specific structure or variation that are worth incorporating in methodology. Our purpose here is not to develop new algorithms, but rather to 1) distill relevant themes across different analysis approaches and 2) provide concrete workflows for approaching analysis, as a function of ultimate analysis goals and data characteristics (heterogeneity, dimensionality, sparsity). Towards the second goal, we have made code for all analysis and figures available online at https://github.com/krisrs1128/multitable_review.

    View details for DOI 10.3389/fgene.2019.00627

    View details for PubMedID 31555316

    View details for PubMedCentralID PMC6724662

  • Multitable Methods for Microbiome Data Integration FRONTIERS IN GENETICS Sankaran, K., Holmes, S. P. 2019; 10
  • Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature biotechnology Bolyen, E., Rideout, J. R., Dillon, M. R., Bokulich, N. A., Abnet, C. C., Al-Ghalith, G. A., Alexander, H., Alm, E. J., Arumugam, M., Asnicar, F., Bai, Y., Bisanz, J. E., Bittinger, K., Brejnrod, A., Brislawn, C. J., Brown, C. T., Callahan, B. J., Caraballo-Rodriguez, A. M., Chase, J., Cope, E. K., Da Silva, R., Diener, C., Dorrestein, P. C., Douglas, G. M., Durall, D. M., Duvallet, C., Edwardson, C. F., Ernst, M., Estaki, M., Fouquier, J., Gauglitz, J. M., Gibbons, S. M., Gibson, D. L., Gonzalez, A., Gorlick, K., Guo, J., Hillmann, B., Holmes, S., Holste, H., Huttenhower, C., Huttley, G. A., Janssen, S., Jarmusch, A. K., Jiang, L., Kaehler, B. D., Kang, K. B., Keefe, C. R., Keim, P., Kelley, S. T., Knights, D., Koester, I., Kosciolek, T., Kreps, J., Langille, M. G., Lee, J., Ley, R., Liu, Y., Loftfield, E., Lozupone, C., Maher, M., Marotz, C., Martin, B. D., McDonald, D., McIver, L. J., Melnik, A. V., Metcalf, J. L., Morgan, S. C., Morton, J. T., Naimey, A. T., Navas-Molina, J. A., Nothias, L. F., Orchanian, S. B., Pearson, T., Peoples, S. L., Petras, D., Preuss, M. L., Pruesse, E., Rasmussen, L. B., Rivers, A., Robeson, M. S., Rosenthal, P., Segata, N., Shaffer, M., Shiffer, A., Sinha, R., Song, S. J., Spear, J. R., Swafford, A. D., Thompson, L. R., Torres, P. J., Trinh, P., Tripathi, A., Turnbaugh, P. J., Ul-Hasan, S., van der Hooft, J. J., Vargas, F., Vazquez-Baeza, Y., Vogtmann, E., von Hippel, M., Walters, W., Wan, Y., Wang, M., Warren, J., Weber, K. C., Williamson, C. H., Willis, A. D., Xu, Z. Z., Zaneveld, J. R., Zhang, Y., Zhu, Q., Knight, R., Caporaso, J. G. 2019

    View details for DOI 10.1038/s41587-019-0209-9

    View details for PubMedID 31341288

  • Nuclear degradation dynamics in a nonapoptotic programmed cell death. Cell death and differentiation Yalonetskaya, A., Mondragon, A. A., Hintze, Z. J., Holmes, S., McCall, K. 2019

    Abstract

    Nuclear degradation is a major event during programmed cell death (PCD). The breakdown of nuclear components has been well characterized during apoptosis, one form of PCD. Many nonapoptotic forms of PCD have been identified, but our understanding of nuclear degradation during those events is limited. Here, we take advantage of Drosophila oogenesis to investigate nuclear degeneration during stress-induced apoptotic and developmental nonapoptotic cell death in the same cell type in vivo. We find that nuclear Lamin, a caspase substrate, dissociates from the nucleus as an early event during apoptosis, but remains associated with nuclei during nonapoptotic cell death. Lamin reveals a series of changes in nuclear architecture during nonapoptotic death, including nuclear crenellations and involutions. Stretch follicle cells contribute to these architecture changes, and phagocytic and lysosome-associated machinery in stretch follicle cells promote Lamin degradation. More specifically, we find that the lysosomal cathepsin CP1 facilitates Lamin degradation.

    View details for DOI 10.1038/s41418-019-0382-x

    View details for PubMedID 31285547

  • Treatment-Specific Composition of the Gut Microbiota Is Associated With Disease Remission in a Pediatric Crohn's Disease Cohort. Inflammatory bowel diseases Sprockett, D., Fischer, N., Boneh, R. S., Turner, D., Kierkus, J., Sladek, M., Escher, J. C., Wine, E., Yerushalmi, B., Dias, J. A., Shaoul, R., Kori, M., Snapper, S. B., Holmes, S., Bousvaros, A., Levine, A., Relman, D. A. 2019

    Abstract

    BACKGROUND: The beneficial effects of antibiotics in Crohn's disease (CD) depend in part on the gut microbiota but are inadequately understood. We investigated the impact of metronidazole (MET) and metronidazole plus azithromycin (MET+AZ) on the microbiota in pediatric CD and the use of microbiota features as classifiers or predictors of disease remission.METHODS: 16S rRNA-based microbiota profiling was performed on stool samples from 67 patients in a multinational, randomized, controlled, longitudinal, 12-week trial of MET vs MET+AZ in children with mild to moderate CD. Profiles were analyzed together with disease activity, and then used to construct random forest models to classify remission or predict treatment response.RESULTS: Both MET and MET+AZ significantly decreased diversity of the microbiota and caused large treatment-specific shifts in microbiota structure at week 4. Disease remission was associated with a treatment-specific microbiota configuration. Random forest models constructed from microbiota profiles before and during antibiotic treatment with metronidazole accurately classified disease remission in this treatment group (area under the curve [AUC], 0.879; 95% confidence interval, 0.683-0.9877; sensitivity, 0.7778; specificity, 1.000; P < 0.001). A random forest model trained on pre-antibiotic microbiota profiles predicted disease remission at week 4 with modest accuracy (AUC, 0.8; P = 0.24).CONCLUSIONS: MET and MET+AZ antibiotic regimens in pediatric CD lead to distinct gut microbiota structures at remission. It may be possible to classify and predict remission based in part on microbiota profiles, but larger cohorts will be needed to realize this goal.

    View details for DOI 10.1093/ibd/izz130

    View details for PubMedID 31276165

  • Convex Hierarchical Clustering for Graph-Structured Data Donnat, C., Holmes, S., Matthews, M. B. IEEE. 2019: 1999–2006
  • Pregnancy-Induced Alterations in NK Cell Phenotype and Function. Frontiers in immunology Le Gars, M., Seiler, C., Kay, A. W., Bayless, N. L., Starosvetsky, E., Moore, L., Shen-Orr, S. S., Aziz, N., Khatri, P., Dekker, C. L., Swan, G. E., Davis, M. M., Holmes, S., Blish, C. A. 2019; 10: 2469

    Abstract

    Pregnant women are particularly susceptible to complications of influenza A virus infection, which may result from pregnancy-induced changes in the function of immune cells, including natural killer (NK) cells. To better understand NK cell function during pregnancy, we assessed the ability of the two main subsets of NK cells, CD56dim, and CD56bright NK cells, to respond to influenza-virus infected cells and tumor cells. During pregnancy, CD56dim and CD56bright NK cells displayed enhanced functional responses to both infected and tumor cells, with increased expression of degranulation markers and elevated frequency of NK cells producing IFN-gamma. To better understand the mechanisms driving this enhanced function, we profiled CD56dim and CD56bright NK cells from pregnant and non-pregnant women using mass cytometry. NK cells from pregnant women displayed significantly increased expression of several functional and activation markers such as CD38 on both subsets and NKp46 on CD56dim NK cells. NK cells also displayed diminished expression of the chemokine receptor CXCR3 during pregnancy. Overall, these data demonstrate that functional and phenotypic shifts occur in NK cells during pregnancy that can influence the magnitude of the immune response to both infections and tumors.

    View details for DOI 10.3389/fimmu.2019.02469

    View details for PubMedID 31708922

  • Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A., Callahan, B. J. 2018; 6 (1): 226

    Abstract

    BACKGROUND: The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants-DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam ( https://github.com/benjjneb/decontam ), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls.RESULTS: Decontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants.CONCLUSIONS: Decontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.

    View details for PubMedID 30558668

  • Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data MICROBIOME Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A., Callahan, B. J. 2018; 6
  • Gut microbiome transition across a lifestyle gradient in Himalaya PLOS BIOLOGY Jha, A. R., Davenport, E. R., Gautam, Y., Bhandari, D., Tandukar, S., Ng, K. M., Fragiadakis, G. K., Holmes, S., Gautam, G., Leach, J., Sherchand, J., Bustamante, C. D., Sonnenburg, J. L. 2018; 16 (11)
  • Interactive Visualization of Hierarchically Structured Data. Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America Sankaran, K., Holmes, S. 2018; 27 (3): 553-563

    Abstract

    We introduce methods for visualization of data structured along trees, especially hierarchically structured collections of time series. To this end, we identify questions that often emerge when working with hierarchical data and provide an R package to simplify their investigation. Our key contribution is the adaptation of the visualization principles of focus-plus-context and linking to the study of tree-structured data. Our motivating application is to the analysis of bacterial time series, where an evolutionary tree relating bacteria is available a priori. However, we have identified common problem types where, if a tree is not directly available, it can be constructed from data and then studied using our techniques. We perform detailed case studies to describe the alternative use cases, interpretations, and utility of the proposed visualization methods.

    View details for DOI 10.1080/10618600.2017.1392866

    View details for PubMedID 30416327

    View details for PubMedCentralID PMC6223648

  • Metagenomic analysis with strain-level resolution reveals fine-scale variation in the human pregnancy microbiome GENOME RESEARCH Goltsman, D., Sun, C. L., Proctor, D. M., DiGiulio, D. B., Robaczewska, A., Thomas, B. C., Shaw, G. M., Stevenson, D. K., Holmes, S. P., Banfield, J. F., Relman, D. A. 2018; 28 (10): 1467–80
  • Differential Induction of IFN-alpha and Modulation of CD112 and CD54 Expression Govern the Magnitude of NK Cell IFN-gamma Response to Influenza A Viruses JOURNAL OF IMMUNOLOGY Kronstad, L. M., Seiler, C., Vergara, R., Holmes, S. P., Bish, C. A. 2018; 201 (7): 2117–31
  • Metagenomic analysis with strain-level resolution reveals fine-scale variation in the human pregnancy microbiome. Genome research Goltsman, D. S., Sun, C. L., Proctor, D. M., DiGiulio, D. B., Robaczewska, A., Thomas, B. C., Shaw, G. M., Stevenson, D. K., Holmes, S. P., Banfield, J. F., Relman, D. A. 2018

    Abstract

    Recent studies suggest that the microbiome has an impact on gestational health and outcome. However, characterization of the pregnancy-associated microbiome has largely relied on 16S rRNA gene amplicon-based surveys. Here, we describe an assembly-driven, metagenomics-based, longitudinal study of the vaginal, gut, and oral microbiomes in 292 samples from 10 subjects sampled every three weeks throughout pregnancy. Nonhuman sequences in the amount of 1.53 Gb were assembled into scaffolds, and functional genes were predicted for gene- and pathway-based analyses. Vaginal assemblies were binned into 97 draft quality genomes. Redundancy analysis (RDA) of microbial community composition at all three body sites revealed gestational age to be a significant source of variation in patterns of gene abundance. In addition, health complications were associated with variation in community functional gene composition in the mouth and gut. The diversity of Lactobacillus iners-dominated communities in the vagina, unlike most other vaginal community types, significantly increased with gestational age. The genomes of co-occurring Gardnerella vaginalis strains with predicted distinct functions were recovered in samples from two subjects. In seven subjects, gut samples contained strains of the same Lactobacillus species that dominated the vaginal community of that same subject and not other Lactobacillus species; however, these within-host strains were divergent. CRISPR spacer analysis suggested shared phage and plasmid populations across body sites and individuals. This work underscores the dynamic behavior of the microbiome during pregnancy and suggests the potential importance of understanding the sources of this behavior for fetal development and gestational outcome.

    View details for PubMedID 30232199

  • Differential Induction of IFN-alpha and Modulation of CD112 and CD54 Expression Govern the Magnitude of NK Cell IFN-gamma Response to Influenza A Viruses. Journal of immunology (Baltimore, Md. : 1950) Kronstad, L. M., Seiler, C., Vergara, R., Holmes, S. P., Blish, C. A. 2018

    Abstract

    In human and murine studies, IFN-gamma is a critical mediator immunity to influenza. IFN-gamma production is critical for viral clearance and the development of adaptive immune responses, yet excessive production of IFN-gamma and other cytokines as part of a cytokine storm is associated with poor outcomes of influenza infection in humans. As NK cells are the main population of lung innate immune cells capable of producing IFN-gamma early in infection, we set out to identify the drivers of the human NK cell IFN-gamma response to influenza A viruses. We found that influenza triggers NK cells to secrete IFN-gamma in the absence of T cells and in a manner dependent upon signaling from both cytokines and receptor-ligand interactions. Further, we discovered that the pandemic A/California/07/2009 (H1N1) strain elicits a seven-fold greater IFN-gamma response than other strains tested, including a seasonal A/Victoria/361/2011 (H3N2) strain. These differential responses were independent of memory NK cells. Instead, we discovered that the A/Victoria/361/2011 influenza strain suppresses the NK cell IFN-gamma response by downregulating NK-activating ligands CD112 and CD54 and by repressing the type I IFN response in a viral replication-dependent manner. In contrast, the A/California/07/2009 strain fails to repress the type I IFN response or to downregulate CD54 and CD112 to the same extent, which leads to the enhanced NK cell IFN-gamma response. Our results indicate that influenza implements a strain-specific mechanism governing NK cell production of IFN-gamma and identifies a previously unrecognized influenza innate immune evasion strategy.

    View details for PubMedID 30143589

  • Characterization of the impact of daclizumab beta on circulating natural killer cells by mass cytometry Ranganath, T., Seiler, C., Vendrame, E., Le Gars, M., Fontenot, J. D., Fam, S., Holmes, S., Blish, C. LIPPINCOTT WILLIAMS & WILKINS. 2018
  • A spatial gradient of bacterial diversity in the human oral cavity shaped by salivary flow. Nature communications Proctor, D. M., Fukuyama, J. A., Loomer, P. M., Armitage, G. C., Lee, S. A., Davis, N. M., Ryder, M. I., Holmes, S. P., Relman, D. A. 2018; 9 (1): 681

    Abstract

    Spatial and temporal patterns in microbial communities provide insights into the forces that shape them, their functions and roles in health and disease. Here, we used spatial and ecological statistics to analyze the role that saliva plays in structuring bacterial communities of the human mouth using >9000 dental and mucosal samples. We show that regardless of tissue type (teeth, alveolar mucosa, keratinized gingiva, or buccal mucosa), surface-associated bacterial communities vary along an ecological gradient from the front to the back of the mouth, and that on exposed tooth surfaces, the gradient is pronounced on lingual compared to buccal surfaces. Furthermore, our data suggest that this gradient is attenuated in individuals with low salivary flow due to Sjögren's syndrome. Taken together, our findings imply that salivary flow influences the spatial organization of microbial communities and that biogeographical patterns may be useful for understanding host physiological processes and for predicting disease.

    View details for PubMedID 29445174

    View details for PubMedCentralID PMC5813034

  • Gut microbiome transition across a lifestyle gradient in Himalaya. PLoS biology Jha, A. R., Davenport, E. R., Gautam, Y. n., Bhandari, D. n., Tandukar, S. n., Ng, K. M., Fragiadakis, G. K., Holmes, S. n., Gautam, G. P., Leach, J. n., Sherchand, J. B., Bustamante, C. D., Sonnenburg, J. L. 2018; 16 (11): e2005396

    Abstract

    The composition of the gut microbiome in industrialized populations differs from those living traditional lifestyles. However, it has been difficult to separate the contributions of human genetic and geographic factors from lifestyle. Whether shifts away from the foraging lifestyle that characterize much of humanity's past influence the gut microbiome, and to what degree, remains unclear. Here, we characterize the stool bacterial composition of four Himalayan populations to investigate how the gut community changes in response to shifts in traditional human lifestyles. These groups led seminomadic hunting-gathering lifestyles until transitioning to varying levels of agricultural dependence upon farming. The Tharu began farming 250-300 years ago, the Raute and Raji transitioned 30-40 years ago, and the Chepang retain many aspects of a foraging lifestyle. We assess the contributions of dietary and environmental factors on their gut-associated microbes and find that differences in the lifestyles of Himalayan foragers and farmers are strongly correlated with microbial community variation. Furthermore, the gut microbiomes of all four traditional Himalayan populations are distinct from that of the Americans, indicating that industrialization may further exacerbate differences in the gut community. The Chepang foragers harbor an elevated abundance of taxa associated with foragers around the world. Conversely, the gut microbiomes of the populations that have transitioned to farming are more similar to those of Americans, with agricultural dependence and several associated lifestyle and environmental factors correlating with the extent of microbiome divergence from the foraging population. The gut microbiomes of Raute and Raji reveal an intermediate state between the Chepang and Tharu, indicating that divergence from a stereotypical foraging microbiome can occur within a single generation. Our results also show that environmental factors such as drinking water source and solid cooking fuel are significantly associated with the gut microbiome. Despite the pronounced differences in gut bacterial composition across populations, we found little differences in alpha diversity across lifestyles. These findings in genetically similar populations living in the same geographical region establish the key role of lifestyle in determining human gut microbiome composition and point to the next challenging steps of determining how large-scale gut microbiome reconfiguration impacts human biology.

    View details for PubMedID 30439937

  • Topologically Constrained Template Estimation via Morse-Smale Complexes Controls Its Statistical Consistency SIAM JOURNAL ON APPLIED ALGEBRA AND GEOMETRY Miolane, N., Holmes, S., Pennec, X. 2018; 2 (2): 348–75

    View details for DOI 10.1137/17M1129222

    View details for Web of Science ID 000437369100007

  • Multivariate Heteroscedasticity Models for Functional Brain Connectivity FRONTIERS IN NEUROSCIENCE Seiler, C., Holmes, S. 2017; 11: 696

    Abstract

    Functional brain connectivity is the co-occurrence of brain activity in different areas during resting and while doing tasks. The data of interest are multivariate timeseries measured simultaneously across brain parcels using resting-state fMRI (rfMRI). We analyze functional connectivity using two heteroscedasticity models. Our first model is low-dimensional and scales linearly in the number of brain parcels. Our second model scales quadratically. We apply both models to data from the Human Connectome Project (HCP) comparing connectivity between short and conventional sleepers. We find stronger functional connectivity in short than conventional sleepers in brain areas consistent with previous findings. This might be due to subjects falling asleep in the scanner. Consequently, we recommend the inclusion of average sleep duration as a covariate to remove unwanted variation in rfMRI studies. A power analysis using the HCP data shows that a sample size of 40 detects 50% of the connectivity at a false discovery rate of 20%. We provide implementations using R and the probabilistic programming language Stan.

    View details for PubMedID 29311777

  • Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment. PLoS computational biology Fukuyama, J., Rumker, L., Sankaran, K., Jeganathan, P., Dethlefsen, L., Relman, D. A., Holmes, S. P. 2017; 13 (8): e1005706

    Abstract

    Our work focuses on the stability, resilience, and response to perturbation of the bacterial communities in the human gut. Informative flash flood-like disturbances that eliminate most gastrointestinal biomass can be induced using a clinically-relevant iso-osmotic agent. We designed and executed such a disturbance in human volunteers using a dense longitudinal sampling scheme extending before and after induced diarrhea. This experiment has enabled a careful multidomain analysis of a controlled perturbation of the human gut microbiota with a new level of resolution. These new longitudinal multidomain data were analyzed using recently developed statistical methods that demonstrate improvements over current practices. By imposing sparsity constraints we have enhanced the interpretability of the analyses and by employing a new adaptive generalized principal components analysis, incorporated modulated phylogenetic information and enhanced interpretation through scoring of the portions of the tree most influenced by the perturbation. Our analyses leverage the taxa-sample duality in the data to show how the gut microbiota recovers following this perturbation. Through a holistic approach that integrates phylogenetic, metagenomic and abundance information, we elucidate patterns of taxonomic and functional change that characterize the community recovery process across individuals. We provide complete code and illustrations of new sparse statistical methods for high-dimensional, longitudinal multidomain data that provide greater interpretability than existing methods.

    View details for DOI 10.1371/journal.pcbi.1005706

    View details for PubMedID 28821012

    View details for PubMedCentralID PMC5576755

  • Parallel imaging of Drosophila embryos for quantitative analysis of genetic perturbations of the Ras pathway. Disease models & mechanisms Goyal, Y., Levario, T. J., Mattingly, H. H., Holmes, S., Shvartsman, S. Y., Lu, H. 2017

    Abstract

    The Ras pathway patterns the poles of the Drosophila embryo by downregulating the levels and activity of a DNA-binding transcriptional repressor Capicua (Cic). We demonstrate that the spatiotemporal pattern of Cic during this signaling event can be harnessed for functional studies of the Ras-pathway mutations from human diseases. Our approach relies on a new microfluidic device that enables parallel imaging of Cic dynamics in dozens of live embryos. We found that although the pattern of Cic in early embryos is complex, it can be accurately approximated by a product of one spatial profile and one time-dependent amplitude. Analysis of these functions of space and time alone reveals the differential effects of mutations within the Ras pathway. Given the highly-conserved nature of Ras-dependent control of Cic, our approach opens a new way for functional analysis of multiple sequence variants from developmental abnormalities and cancers.

    View details for DOI 10.1242/dmm.030163

    View details for PubMedID 28495673

  • Mutational Correlates of Virological Failure in Individuals Receiving a WHO-Recommended Tenofovir-Containing First-Line Regimen: An International Collaboration. EBioMedicine Rhee, S., Varghese, V., Holmes, S. P., van Zyl, G. U., Steegen, K., Boyd, M. A., Cooper, D. A., Nsanzimana, S., Saravanan, S., Charpentier, C., de Oliveira, T., Etiebet, M. A., Garcia, F., Goedhals, D., Gomes, P., Günthard, H. F., Hamers, R. L., Hoffmann, C. J., Hunt, G., Jiamsakul, A., Kaleebu, P., Kanki, P., Kantor, R., Kerschberger, B., Marconi, V. C., D'amour Ndahimana, J., Ndembi, N., Ngo-Giang-Huong, N., Rokx, C., Santoro, M. M., Schapiro, J. M., Schmidt, D., Seu, L., Sigaloff, K. C., Sirivichayakul, S., Skhosana, L., Sunpath, H., Tang, M., Yang, C., Carmona, S., Gupta, R. K., Shafer, R. W. 2017; 18: 225-235

    Abstract

    Tenofovir disoproxil fumarate (TDF) genotypic resistance defined by K65R/N and/or K70E/Q/G occurs in 20% to 60% of individuals with virological failure (VF) on a WHO-recommended TDF-containing first-line regimen. However, the full spectrum of reverse transcriptase (RT) mutations selected in individuals with VF on such a regimen is not known. To identify TDF regimen-associated mutations (TRAMs), we compared the proportion of each RT mutation in 2873 individuals with VF on a WHO-recommended first-line TDF-containing regimen to its proportion in a cohort of 50,803 antiretroviral-naïve individuals. To identify TRAMs specifically associated with TDF-selection pressure, we compared the proportion of each TRAM to its proportion in a cohort of 5805 individuals with VF on a first-line thymidine analog-containing regimen. We identified 83 TRAMs including 33 NRTI-associated, 40 NNRTI-associated, and 10 uncommon mutations of uncertain provenance. Of the 33 NRTI-associated TRAMs, 12 - A62V, K65R/N, S68G/N/D, K70E/Q/T, L74I, V75L, and Y115F - were more common among individuals receiving a first-line TDF-containing compared to a first-line thymidine analog-containing regimen. These 12 TDF-selected TRAMs will be important for monitoring TDF-associated transmitted drug-resistance and for determining the extent of reduced TDF susceptibility in individuals with VF on a TDF-containing regimen.

    View details for DOI 10.1016/j.ebiom.2017.03.024

    View details for PubMedID 28365230

    View details for PubMedCentralID PMC5405160

  • Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women. Proceedings of the National Academy of Sciences of the United States of America Callahan, B. J., DiGiulio, D. B., Goltsman, D. S., Sun, C. L., Costello, E. K., Jeganathan, P. n., Biggio, J. R., Wong, R. J., Druzin, M. L., Shaw, G. M., Stevenson, D. K., Holmes, S. P., Relman, D. A. 2017

    Abstract

    Preterm birth (PTB) is the leading cause of neonatal morbidity and mortality. Previous studies have suggested that the maternal vaginal microbiota contributes to the pathophysiology of PTB, but conflicting results in recent years have raised doubts. We conducted a study of PTB compared with term birth in two cohorts of pregnant women: one predominantly Caucasian (n = 39) at low risk for PTB, the second predominantly African American and at high-risk (n = 96). We profiled the taxonomic composition of 2,179 vaginal swabs collected prospectively and weekly during gestation using 16S rRNA gene sequencing. Previously proposed associations between PTB and lower Lactobacillus and higher Gardnerella abundances replicated in the low-risk cohort, but not in the high-risk cohort. High-resolution bioinformatics enabled taxonomic assignment to the species and subspecies levels, revealing that Lactobacillus crispatus was associated with low risk of PTB in both cohorts, while Lactobacillus iners was not, and that a subspecies clade of Gardnerella vaginalis explained the genus association with PTB. Patterns of cooccurrence between L. crispatus and Gardnerella were highly exclusive, while Gardnerella and L. iners often coexisted at high frequencies. We argue that the vaginal microbiota is better represented by the quantitative frequencies of these key taxa than by classifying communities into five community state types. Our findings extend and corroborate the association between the vaginal microbiota and PTB, demonstrate the benefits of high-resolution statistical bioinformatics in clinical microbiome studies, and suggest that previous conflicting results may reflect the different risk profile of women of black race.

    View details for PubMedID 28847941

  • Prevalence of Drug-Resistant Minority Variants in Untreated HIV-1-Infected Individuals With and Those Without Transmitted Drug Resistance Detected by Sanger Sequencing. The Journal of infectious diseases Clutter, D. S., Zhou, S. n., Varghese, V. n., Rhee, S. Y., Pinsky, B. A., Jeffrey Fessel, W. n., Klein, D. B., Spielvogel, E. n., Holmes, S. P., Hurley, L. B., Silverberg, M. J., Swanstrom, R. n., Shafer, R. W. 2017; 216 (3): 387–91

    Abstract

    Minority variant human immunodeficiency virus type 1 (HIV-1) nonnucleoside reverse transcriptase inhibitor (NNRTI) resistance mutations are associated with an increased risk of virological failure during treatment with NNRTI-containing regimens. To determine whether individuals to whom variants with isolated NNRTI-associated drug resistance were transmitted are at increased risk of virological failure during treatment with a non-NNRTI-containing regimen, we identified minority variant resistance mutations in 33 individuals with isolated NNRTI-associated transmitted drug resistance and 49 matched controls. We found similar proportions of overall and nucleoside reverse transcriptase inhibitor-associated minority variant resistance mutations in both groups, suggesting that isolated NNRTI-associated transmitted drug resistance may not be a risk factor for virological failure during treatment with a non-NNRTI-containing regimen.

    View details for PubMedID 28859436

  • Discussion of "50 Years of Data Science" JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS Holmes, S., Josse, J. 2017; 26 (4): 768–69
  • Multi-Table Differential Correlation Analysis of Neuroanatomical and Cognitive Interactions in Turner Syndrome. Neuroinformatics Seiler, C. n., Green, T. n., Hong, D. n., Chromik, L. n., Huffman, L. n., Holmes, S. n., Reiss, A. L. 2017

    Abstract

    Girls and women with Turner syndrome (TS) have a completely or partially missing X chromosome. Extensive studies on the impact of TS on neuroanatomy and cognition have been conducted. The integration of neuroanatomical and cognitive information into one consistent analysis through multi-table methods is difficult and most standard tests are underpowered. We propose a new two-sample testing procedure that compares associations between two tables in two groups. The procedure combines multi-table methods with permutation tests. In particular, we construct cluster size test statistics that incorporate spatial dependencies. We apply our new procedure to a newly collected dataset comprising of structural brain scans and cognitive test scores from girls with TS and healthy control participants (age and sex matched). We measure neuroanatomy with Tensor-Based Morphometry (TBM) and cognitive function with Wechsler IQ and NEuroPSYchological tests (NEPSY-II). We compare our multi-table testing procedure to a single-table analysis. Our new procedure reports differential correlations between two voxel clusters and a wide range of cognitive tests whereas the single-table analysis reports no differences. Our findings are consistent with the hypothesis that girls with TS have a different brain-cognition association structure than healthy controls.

    View details for PubMedID 29270892

  • Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME journal Callahan, B. J., McMurdie, P. J., Holmes, S. P. 2017

    Abstract

    Recent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold. New methods control errors sufficiently such that amplicon sequence variants (ASVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. The benefits of finer resolution are immediately apparent, and arguments for ASV methods have focused on their improved resolution. Less obvious, but we believe more important, are the broad benefits that derive from the status of ASVs as consistent labels with intrinsic biological meaning identified independently from a reference database. Here we discuss how these features grant ASVs the combined advantages of closed-reference OTUs-including computational costs that scale linearly with study size, simple merging between independently processed data sets, and forward prediction-and of de novo OTUs-including accurate measurement of diversity and applicability to communities lacking deep coverage in reference databases. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that ASVs should replace OTUs as the standard unit of marker-gene analysis and reporting.The ISME Journal advance online publication, 21 July 2017; doi:10.1038/ismej.2017.119.

    View details for PubMedID 28731476

  • STATISTICAL PROOF? THE PROBLEM OF IRREPRODUCIBILITY Bulletin of the American Mathematical Society Holmes, S. P. 2017

    View details for DOI 10.1090/bull/1597

  • Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment PLOS Computational Biology Fukuyama, J., Rumker, L., Sankaran, K., Jeganathan, P., Relman, D. A., Holmes, S. P. 2017; 13 (8): e1005706

    Abstract

    Our work focuses on the stability, resilience, and response to perturbation of the bacterial communities in the human gut. Informative flash flood-like disturbances that eliminate most gastrointestinal biomass can be induced using a clinically-relevant iso-osmotic agent. We designed and executed such a disturbance in human volunteers using a dense longitudinal sampling scheme extending before and after induced diarrhea. This experiment has enabled a careful multidomain analysis of a controlled perturbation of the human gut microbiota with a new level of resolution. These new longitudinal multidomain data were analyzed using recently developed statistical methods that demonstrate improvements over current practices. By imposing sparsity constraints we have enhanced the interpretability of the analyses and by employing a new adaptive generalized principal components analysis, incorporated modulated phylogenetic information and enhanced interpretation through scoring of the portions of the tree most influenced by the perturbation. Our analyses leverage the taxa-sample duality in the data to show how the gut microbiota recovers following this perturbation. Through a holistic approach that integrates phylogenetic, metagenomic and abundance information, we elucidate patterns of taxonomic and functional change that characterize the community recovery process across individuals. We provide complete code and illustrations of new sparse statistical methods for high-dimensional, longitudinal multidomain data that provide greater interpretability than existing methods.

    View details for DOI 10.1371/journal.pcbi.1005706

    View details for PubMedCentralID PMC5576755

  • Interactive Visualization of Hierarchically Structured Data Journal of Computational and Graphical Statistics Sankaran, K., Holmes, S. P. 2017
  • Template Shape Estimation: Correcting an Asymptotic Bias SIAM J Imaging Science Miolane, N., Holmes, S., Pennec, X. 2017; 10 (2): 808-844

    View details for DOI 10.1137/16M1084493

  • Bayesian Nonparametric Ordination for the Analysis of Microbial Communities Journal of the American Statistical Association Ren, B., Bacallado, S., Favaro, S., Holmes, S., Trippa, L. 2017
  • Mass Cytometry Analytical Approaches Reveal Cytokine-Induced Changes in Natural Killer Cells. Cytometry. Part B, Clinical cytometry Vendrame, E., Fukuyama, J., Strauss-Albee, D. M., Holmes, S., Blish, C. A. 2017; 92 (1): 57-67

    Abstract

    Natural killer (NK) cells have antiviral and antitumor activity that could be harnessed for the treatment of infections and malignancies. To maintain cell viability and enhance antiviral and antitumor effects, NK cells are frequently treated with cytokines. Here we performed an extensive assessment of the effects of cytokines on the phenotype and function of human NK cells.We used cytometry by time-of-flight (CyTOF) to evaluate NK cell repertoire changes after stimulation with interleukin (IL)-2, IL-15 or a combination of IL- 12/IL-15/IL-18. To analyze the high dimensional CyTOF data, we used several statistical and visualization tools, including viSNE (Visualization of t-Distributed Stochastic Neighbor Embedding), Citrus (Cluster identification, characterization, and regression), correspondence analysis, and the Friedman-Rafsky test.All three treatments (IL-2, IL-15, and IL-12/IL-15/IL-18) increase expression of CD56 and CD69. The effects of treatment with IL-2 and IL-15 are nearly indistinguishable and characterized principally by increased expression of surface markers including CD56, NKp30, NKp44, and increased expression of functional markers, such as perforin, granzyme B, and MIP-1β. The combination of IL-12/IL-15/IL- 18 induces a profound shift in the repertoire structure, decreasing expression of CD16, CD57, CD8, NKp30, NKp46, and NKG2D, and dramatically increasing expression of IFN-γ.CyTOF provides insights into the effects of cytokines on the phenotype and function of NK cells, which could inform future research efforts and approaches to NK cell immunotherapy. There are several analytical approaches to CyTOF data, and the appropriate method should be carefully selected based on which aspect of the dataset is being explored. This article is protected by copyright. All rights reserved.

    View details for DOI 10.1002/cyto.b.21500

    View details for PubMedID 27933717

  • Measuring multivariate association and beyond. Statistics surveys Josse, J., Holmes, S. 2016; 10: 132-167

    Abstract

    Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association's underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.

    View details for DOI 10.1214/16-SS116

    View details for PubMedID 29081877

    View details for PubMedCentralID PMC5658146

  • 1,2-Dichloroethane Exposure Alters the Population Structure, Metabolism, and Kinetics of a Trichloroethene-Dechlorinating Dehalococcoides mccartyi Consortium. Environmental science & technology Mayer-Blackwell, K., Fincker, M., Molenda, O., Callahan, B., Sewell, H., Holmes, S., Edwards, E. A., Spormann, A. M. 2016: -?

    Abstract

    Bioremediation of groundwater contaminated with chlorinated aliphatic hydrocarbons such as perchloroethene and trichloroethene can result in the accumulation of the undesirable intermediate vinyl chloride. Such accumulation can either be due to the absence of specific vinyl chloride respiring Dehalococcoides mccartyi or to the inhibition of such strains by the metabolism of other microorganisms. The fitness of vinyl chloride respiring Dehalococcoides mccartyi subpopulations is particularly uncertain in the presence of chloroethene/chloroethane cocontaminant mixtures, which are commonly found in contaminated groundwater. Therefore, we investigated the structure of Dehalococcoides populations in a continuously fed reactor system under changing chloroethene/ethane influent conditions. We observed that increasing the influent ratio of 1,2-dichloroethane to trichloroethene was associated with ecological selection of a tceA-containing Dehalococcoides population relative to a vcrA-containing Dehalococcoides population. Although both vinyl chloride and 1,2-dichloroethane could be simultaneously transformed to ethene, prolonged exposure to 1,2-dichloroethane diminished the vinyl chloride transforming capacity of the culture. Kinetic tests revealed that dechlorination of 1,2-dichloroethane by the consortium was strongly inhibited by cis-dichloroethene but not vinyl chloride. Native polyacrylamide gel electrophoresis and mass spectrometry revealed that a trichloroethene reductive dehalogenase (TceA) homologue was the most consistently expressed of four detectable reductive dehalogenases during 1,2-dichloroethane exposure, suggesting that it catalyzes the reductive dihaloelimination of 1,2-dichloroethane to ethene.

    View details for PubMedID 27809491

  • HIV-1 Protease, Reverse Transcriptase, and Integrase Variation JOURNAL OF VIROLOGY Rhee, S., Sankaran, K., Varghese, V., Winters, M. A., Hurt, C. B., Eron, J. J., Parkin, N., Holmes, S. P., Holodniy, M., Shafer, R. W. 2016; 90 (13): 6058-6070

    Abstract

    HIV-1 protease (PR), reverse transcriptase (RT), and integrase (IN) variability presents a challenge to laboratories performing genotypic resistance testing. This challenge will grow with increased sequencing of samples enriched for proviral DNA such as dried blood spots and increased use of next-generation sequencing (NGS) to detect low-abundance HIV-1 variants. We analyzed PR and RT sequences from >100,000 individuals and IN sequences from >10,000 individuals to characterize variation at each amino acid position, identify mutations indicating APOBEC-mediated G-to-A editing, and identify mutations resulting from selective drug pressure. Forty-seven percent of PR, 37% of RT and 34% of IN positions had one or more amino acid variants with a prevalence ≥1%. Seventy percent of PR, 60% of RT and 60% of IN positions had one or more variants with a prevalence ≥0.1%. Overall 201 PR, 636 RT and 346 IN variants had a prevalence ≥0.1%. The median inter-subtype prevalence-ratio was 2.9-, 2.1- and 1.9-fold for these PR, RT and IN variants, respectively. Only 5.0% of PR, 3.7% of RT and 2.0% of IN variants had a median inter-subtype prevalence-ratio ≥10-fold. Variants at lower prevalences were more likely to differ biochemically and to be part of an electrophoretic mixture compared to high prevalence variants. There were 209 mutations indicative of APOBEC-mediated G-to-A editing and 326 non-polymorphic treatment-selected mutations. Identifying viruses with a high number of APOBEC-associated mutations will facilitate the quality control of dried blood spot sequencing. Identifying sequences with a high proportion of rare mutations will facilitate the quality control of NGS.Most antiretroviral drugs target three HIV-1 proteins: PR, RT, and IN. These proteins are highly variable: many different amino acids can be present at the same position in viruses from different individuals. Some of the amino acid variants cause drug resistance and occur mainly in individuals receiving antiretroviral drugs. Some variants result from a human cellular defense mechanism called APOBEC-mediated hypermutation. Many variants result from naturally occurring mutation. Some variants may represent technical artifacts. We studied PR and RT sequences from >100,000 individuals and IN sequences from >10,000 individuals to quantify variation at each amino acid position in these three HIV-1 proteins. We performed analyses to determine which amino acid variants resulted from antiretroviral drug selection pressure, APOBEC-mediated editing, and naturally occurring variation. Our results provide information essential to clinical, research, and public health laboratories performing genotypic resistance testing by sequencing HIV-1 PR, RT, and IN.

    View details for DOI 10.1128/JVI.00495-16

    View details for Web of Science ID 000378340300019

    View details for PubMedID 27099321

  • REPRODUCIBLE RESEARCH WORKFLOW IN R FOR THE ANALYSIS OF PERSONALIZED HUMAN MICROBIOME DATA. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Callahan, B., Proctor, D., Relman, D., Fukuyama, J., Holmes, S. 2016; 21: 183-194

    Abstract

    This article presents a reproducible research workflow for amplicon-based microbiome studies in personalized medicine created using Bioconductor packages and the knitr markdown interface.We show that sometimes a multiplicity of choices and lack of consistent documentation at each stage of the sequential processing pipeline used for the analysis of microbiome data can lead to spurious results. We propose its replacement with reproducible and documented analysis using R packages dada2, knitr, and phyloseq. This workflow implements both key stages of amplicon analysis: the initial filtering and denoising steps needed to construct taxonomic feature tables from error-containing sequencing reads (dada2), and the exploratory and inferential analysis of those feature tables and associated sample metadata (phyloseq). This workow facilitates reproducible interrogation of the full set of choices required in microbiome studies. We present several examples in which we leverage existing packages for analysis in a way that allows easy sharing and modification by others, and give pointers to articles that depend on this reproducible workflow for the study of longitudinal and spatial series analyses of the vaginal microbiome in pregnancy and the oral microbiome in humans with healthy dentition and intra-oral tissues.

    View details for PubMedID 26776185

  • Bioconductor workflow for microbiome data analysis: from raw reads to community analyses. F1000Research Callahan, B. J., Sankaran, K., Fukuyama, J. A., McMurdie, P. J., Holmes, S. P. 2016; 5: 1492-?

    Abstract

    High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.

    View details for DOI 10.12688/f1000research.8986.1

    View details for PubMedID 27508062

    View details for PubMedCentralID PMC4955027

  • Interpreting Prevotella and Bacteroides as biomarkers of diet and lifestyle. Microbiome Gorvitovskaia, A., Holmes, S. P., Huse, S. M. 2016; 4 (1): 15-?

    Abstract

    In a series of studies of the gut microbiome, "enterotypes" have been used to classify gut microbiome samples that cluster together in ordination analyses. Initially, three distinct enterotypes were described, although later studies reduced this to two clusters, one dominated by Bacteroides or Clostridiales species found more commonly in Western (American and Western European) subjects and the other dominated by Prevotella more often associated with non-Western subjects. The two taxa, Bacteroides and Prevotella, have been presumed to represent consistent underlying microbial communities, but no one has demonstrated the presence of additional microbial taxa across studies that can define these communities.We analyzed the combined microbiome data from five previous studies with samples across five continents. We clearly demonstrate that there are no consistent bacterial taxa associated with either Bacteroides- or Prevotella-dominated communities across the studies. By increasing the number and diversity of samples, we found gradients of both Bacteroides and Prevotella and a lack of the distinct clusters in the principal coordinate plots originally proposed in the "enterotypes" hypothesis. The apparent segregation of the samples seen in many ordination plots is due to the differences in the samples' Prevotella and Bacteroides abundances and does not represent consistent microbial communities within the "enterotypes" and is not associated with other taxa across studies. The projections we see are consistent with a continuum of values created from a simple mixture of Bacteroides and Prevotella; these two biomarkers are significantly correlated to the projection axes. We suggest that previous findings citing Bacteroides- and Prevotella-dominated clusters are the result of an artifact caused by the greater relative abundance of these two taxa over other taxa in the human gut and the sparsity of Prevotella abundant samples.We believe that the term "enterotypes" is misleading because it implies both an underlying consistency of community taxa and a clear separation of sets of human gut samples, neither of which is supported by the broader data. We propose the use of "biomarker" as a more accurate description of these and other taxa that correlate with diet, lifestyle, and disease state.

    View details for DOI 10.1186/s40168-016-0160-7

    View details for PubMedID 27068581

    View details for PubMedCentralID PMC4828855

  • Measuring multivariate association and beyond. Statistics Surveys Josse, J., Holmes, S. 2016; 10: 132-167

    Abstract

    Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association's underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.

    View details for DOI 10.1214/16-SS116

    View details for PubMedCentralID PMC5658146

  • More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1. eLife Feder, A. F., Rhee, S., Holmes, S. P., Shafer, R. W., Petrov, D. A., Pennings, P. S. 2016; 5

    Abstract

    In the early days of HIV treatment, drug resistance occurred rapidly and predictably in all patients, but under modern treatments, resistance arises slowly, if at all. The probability of resistance should be controlled by the rate of generation of resistance mutations. If many adaptive mutations arise simultaneously, then adaptation proceeds by soft selective sweeps in which multiple adaptive mutations spread concomitantly, but if adaptive mutations occur rarely in the population, then a single adaptive mutation should spread alone in a hard selective sweep. Here, we use 6717 HIV-1 consensus sequences from patients treated with first-line therapies between 1989 and 2013 to confirm that the transition from fast to slow evolution of drug resistance was indeed accompanied with the expected transition from soft to hard selective sweeps. This suggests more generally that evolution proceeds via hard sweeps if resistance is unlikely and via soft sweeps if it is likely.

    View details for DOI 10.7554/eLife.10670

    View details for PubMedID 26882502

    View details for PubMedCentralID PMC4764592

  • Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea. Nature communications Bik, E. M., Costello, E. K., Switzer, A. D., Callahan, B. J., Holmes, S. P., Wells, R. S., Carlin, K. P., Jensen, E. D., Venn-Watson, S., Relman, D. A. 2016; 7: 10516-?

    Abstract

    Marine mammals play crucial ecological roles in the oceans, but little is known about their microbiotas. Here we study the bacterial communities in 337 samples from 5 body sites in 48 healthy dolphins and 18 healthy sea lions, as well as those of adjacent seawater and other hosts. The bacterial taxonomic compositions are distinct from those of other mammals, dietary fish and seawater, are highly diverse and vary according to body site and host species. Dolphins harbour 30 bacterial phyla, with 25 of them in the mouth, several abundant but poorly characterized Tenericutes species in gastric fluid and a surprisingly paucity of Bacteroidetes in distal gut. About 70% of near-full length bacterial 16S ribosomal RNA sequences from dolphins are unique. Host habitat, diet and phylogeny all contribute to variation in marine mammal distal gut microbiota composition. Our findings help elucidate the factors structuring marine mammal microbiotas and may enhance monitoring of marine mammal health.

    View details for DOI 10.1038/ncomms10516

    View details for PubMedID 26839246

    View details for PubMedCentralID PMC4742810

  • Variation in Taxonomic Composition of the Fecal Microbiota in an Inbred Mouse Strain across Individuals and Time PLOS ONE Hoy, Y. E., Bik, E. M., Lawley, T. D., Holmes, S. P., Monack, D. M., Theriot, J. A., Relman, D. A. 2015; 10 (11)

    Abstract

    Genetics, diet, and other environmental exposures are thought to be major factors in the development and composition of the intestinal microbiota of animals. However, the relative contributions of these factors in adult animals, as well as variation with time in a variety of important settings, are still not fully understood. We studied a population of inbred, female mice fed the same diet and housed under the same conditions. We collected fecal samples from 46 individual mice over two weeks, sampling four of these mice for periods as long as 236 days for a total of 190 samples, and determined the phylogenetic composition of their microbial communities after analyzing 1,849,990 high-quality pyrosequencing reads of the 16S rRNA gene V3 region. Even under these controlled conditions, we found significant inter-individual variation in community composition, as well as variation within an individual over time, including increases in alpha diversity during the first 2 months of co-habitation. Some variation was explained by mouse membership in different cage and vendor shipment groups. The differences among individual mice from the same shipment group and cage were still significant. Overall, we found that 23% of the variation in intestinal microbiota composition was explained by changes within the fecal microbiota of a mouse over time, 12% was explained by persistent differences among individual mice, 14% by cage, and 18% by shipment group. Our findings suggest that the microbiota of controlled populations of inbred laboratory animals may not be as uniform as previously thought, that animal rearing and handling may account for some variation, and that as yet unidentified factors may explain additional components of variation in the composition of the microbiota within populations and individuals over time. These findings have implications for the design and interpretation of experiments involving laboratory animals.

    View details for DOI 10.1371/journal.pone.0142825

    View details for Web of Science ID 000367628500058

    View details for PubMedCentralID PMC4643986

  • Temporal and spatial variation of the human microbiota during pregnancy PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA DiGiulio, D. B., Callahan, B. J., McMurdie, P. J., Costello, E. K., Lyell, D. J., Robaczewska, A., Sun, C. L., Goltsman, D. S., Wong, R. J., Shaw, G., Stevenson, D. K., Holmes, S. P., Relman, D. A. 2015; 112 (35): 11060-11065

    Abstract

    Despite the critical role of the human microbiota in health, our understanding of microbiota compositional dynamics during and after pregnancy is incomplete. We conducted a case-control study of 49 pregnant women, 15 of whom delivered preterm. From 40 of these women, we analyzed bacterial taxonomic composition of 3,767 specimens collected prospectively and weekly during gestation and monthly after delivery from the vagina, distal gut, saliva, and tooth/gum. Linear mixed-effects modeling, medoid-based clustering, and Markov chain modeling were used to analyze community temporal trends, community structure, and vaginal community state transitions. Microbiota community taxonomic composition and diversity remained remarkably stable at all four body sites during pregnancy (P > 0.05 for trends over time). Prevalence of a Lactobacillus-poor vaginal community state type (CST 4) was inversely correlated with gestational age at delivery (P = 0.0039). Risk for preterm birth was more pronounced for subjects with CST 4 accompanied by elevated Gardnerella or Ureaplasma abundances. This finding was validated with a set of 246 vaginal specimens from nine women (four of whom delivered preterm). Most women experienced a postdelivery disturbance in the vaginal community characterized by a decrease in Lactobacillus species and an increase in diverse anaerobes such as Peptoniphilus, Prevotella, and Anaerococcus species. This disturbance was unrelated to gestational age at delivery and persisted for up to 1 y. These findings have important implications for predicting premature labor, a major global health problem, and for understanding the potential impact of a persistent, altered postpartum microbiota on maternal health, including outcomes of pregnancies following short interpregnancy intervals.

    View details for DOI 10.1073/pnas.1502875112

    View details for Web of Science ID 000360383200068

  • Temporal and spatial variation of the human microbiota during pregnancy. Proceedings of the National Academy of Sciences of the United States of America DiGiulio, D. B., Callahan, B. J., McMurdie, P. J., Costello, E. K., Lyell, D. J., Robaczewska, A., Sun, C. L., Goltsman, D. S., Wong, R. J., Shaw, G., Stevenson, D. K., Holmes, S. P., Relman, D. A. 2015

    Abstract

    Despite the critical role of the human microbiota in health, our understanding of microbiota compositional dynamics during and after pregnancy is incomplete. We conducted a case-control study of 49 pregnant women, 15 of whom delivered preterm. From 40 of these women, we analyzed bacterial taxonomic composition of 3,767 specimens collected prospectively and weekly during gestation and monthly after delivery from the vagina, distal gut, saliva, and tooth/gum. Linear mixed-effects modeling, medoid-based clustering, and Markov chain modeling were used to analyze community temporal trends, community structure, and vaginal community state transitions. Microbiota community taxonomic composition and diversity remained remarkably stable at all four body sites during pregnancy (P > 0.05 for trends over time). Prevalence of a Lactobacillus-poor vaginal community state type (CST 4) was inversely correlated with gestational age at delivery (P = 0.0039). Risk for preterm birth was more pronounced for subjects with CST 4 accompanied by elevated Gardnerella or Ureaplasma abundances. This finding was validated with a set of 246 vaginal specimens from nine women (four of whom delivered preterm). Most women experienced a postdelivery disturbance in the vaginal community characterized by a decrease in Lactobacillus species and an increase in diverse anaerobes such as Peptoniphilus, Prevotella, and Anaerococcus species. This disturbance was unrelated to gestational age at delivery and persisted for up to 1 y. These findings have important implications for predicting premature labor, a major global health problem, and for understanding the potential impact of a persistent, altered postpartum microbiota on maternal health, including outcomes of pregnancies following short interpregnancy intervals.

    View details for DOI 10.1073/pnas.1502875112

    View details for PubMedID 26283357

  • Human NK cell repertoire diversity reflects immune experience and correlates with viral susceptibility SCIENCE TRANSLATIONAL MEDICINE Strauss-Albee, D. M., Fukuyama, J., Liang, E. C., Yao, Y., Jarrell, J. A., Drake, A. L., Kinuthia, J., Montgomery, R. R., John-Stewart, G., Holmes, S., Blish, C. A. 2015; 7 (297)

    Abstract

    Innate natural killer (NK) cells are diverse at the single-cell level because of variegated expressions of activating and inhibitory receptors, yet the developmental roots and functional consequences of this diversity remain unknown. Because NK cells are critical for antiviral and antitumor responses, a better understanding of their diversity could lead to an improved ability to harness them therapeutically. We found that NK diversity is lower at birth than in adults. During an antiviral response to either HIV-1 or West Nile virus, NK diversity increases, resulting in terminal differentiation and cytokine production at the cost of cell division and degranulation. In African women matched for HIV-1 exposure risk, high NK diversity is associated with increased risk of HIV-1 acquisition. Existing diversity may therefore decrease the flexibility of the antiviral response. Collectively, the data reveal that human NK diversity is a previously undefined metric of immune history and function that may be clinically useful in forecasting the outcomes of infection and malignancy.

    View details for DOI 10.1126/scitranslmed.aac5722

    View details for Web of Science ID 000358738700007

  • de Finetti Priors using Markov chain Monte Carlo computations STATISTICS AND COMPUTING Bacallado, S., Diaconis, P., Holmes, S. 2015; 25 (4): 797-808

    Abstract

    Recent advances in Monte Carlo methods allow us to revisit work by de Finetti who suggested the use of approximate exchangeability in the analyses of contingency tables. This paper gives examples of computational implementations using Metropolis Hastings, Langevin and Hamiltonian Monte Carlo to compute posterior distributions for test statistics relevant for testing independence, reversible or three way models for discrete exponential families using polynomial priors and Gröbner bases.

    View details for DOI 10.1007/s11222-015-9562-9

    View details for Web of Science ID 000356828600009

    View details for PubMedCentralID PMC4578810

  • de Finetti Priors using Markov chain Monte Carlo computations. Statistics and computing Bacallado, S., Diaconis, P., Holmes, S. 2015; 25 (4): 797-808

    Abstract

    Recent advances in Monte Carlo methods allow us to revisit work by de Finetti who suggested the use of approximate exchangeability in the analyses of contingency tables. This paper gives examples of computational implementations using Metropolis Hastings, Langevin and Hamiltonian Monte Carlo to compute posterior distributions for test statistics relevant for testing independence, reversible or three way models for discrete exponential families using polynomial priors and Gröbner bases.

    View details for DOI 10.1007/s11222-015-9562-9

    View details for PubMedID 26412947

    View details for PubMedCentralID PMC4578810

  • Pregnancy does not attenuate the antibody or plasmablast response to inactivated influenza vaccine Bayless, N., Kay, A., Fukuayama, J., Aziz, N., Dekker, C., Mackey, S., Swan, G., Davis, M., Holmes, S., Blish, C. AMER ASSOC IMMUNOLOGISTS. 2015
  • Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking. Bioinformatics McMurdie, P. J., Holmes, S. 2015; 31 (2): 282-283

    Abstract

    We have created a Shiny-based web application, called Shiny-phyloseq, for dynamic interaction with microbiome data that runs on any modern web-browser, and requires no programming to use - increasing the accessibility and decreasing the entrance requirement to using phyloseq and related R tools. Along with a data- and context-aware dynamic interface for exploring the effects of parameter and method choices, Shiny-phyloseq also records the complete user-input and subsequent graphical results of a user's session, allowing the user to archive, share, and reproduce the sequence of steps that created their result - without writing any new code themselves. Availability and Implementation: Shiny-phyloseq is implemented entirely in the R language. It can be hosted/launched by any system with R installed, including Windows, Mac OS, and most Linux distributions. Information technology administrators can also host Shiny-phyloseq from a remote server, in which case users need only have a web-browser installed. Shiny-phyloseq is provided free of charge under a GPL-3 open-source license through GitHub at http://joey711.github.io/shiny-phyloseq/.mcmurdie@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btu616

    View details for PubMedID 25262154

  • Biological Threats MATHEMATICS OF PLANET EARTH: MATHEMATICIANS REFLECT ON HOW TO DISCOVER, ORGANIZE, AND PROTECT OUR PLANET Lewis, M., Crowley, J., Muthukumaraswamy, K., Conway, J. M., Basor, E., Diaconis, P., Holmes, S., Smith, R., Kaper, H., Rousseau, C. 2015; 140: 173–86
  • Enhanced natural killer-cell and T-cell responses to influenza A virus during pregnancy PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kay, A. W., Fukuyama, J., Aziz, N., Dekker, C. L., Mackey, S., Swan, G. E., Davis, M. M., Holmes, S., Blish, C. A. 2014; 111 (40): 14506-14511

    Abstract

    Pregnant women experience increased morbidity and mortality after influenza infection, for reasons that are not understood. Although some data suggest that natural killer (NK)- and T-cell responses are suppressed during pregnancy, influenza-specific responses have not been previously evaluated. Thus, we analyzed the responses of women that were pregnant (n = 21) versus those that were not (n = 29) immediately before inactivated influenza vaccination (IIV), 7 d after vaccination, and 6 wk postpartum. Expression of CD107a (a marker of cytolysis) and production of IFN-γ and macrophage inflammatory protein (MIP) 1β were assessed by flow cytometry. Pregnant women had a significantly increased percentage of NK cells producing a MIP-1β response to pH1N1 virus compared with nonpregnant women pre-IIV [median, 6.66 vs. 0.90% (P = 0.0149)] and 7 d post-IIV [median, 11.23 vs. 2.81% (P = 0.004)], indicating a heightened chemokine response in pregnant women that was further enhanced by the vaccination. Pregnant women also exhibited significantly increased T-cell production of MIP-1β and polyfunctionality in NK and T cells to pH1N1 virus pre- and post-IIV. NK- and T-cell polyfunctionality was also enhanced in pregnant women in response to the H3N2 viral strain. In contrast, pregnant women had significantly reduced NK- and T-cell responses to phorbol 12-myristate 13-acetate and ionomycin. This type of stimulation led to the conclusion that NK- and T-cell responses during pregnancy are suppressed, but clearly this conclusion is not correct relative to the more biologically relevant assays described here. Robust cellular immune responses to influenza during pregnancy could drive pulmonary inflammation, explaining increased morbidity and mortality.

    View details for DOI 10.1073/pnas.1416569111

    View details for Web of Science ID 000342633900054

  • structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data. Journal of statistical software Sankaran, K., Holmes, S. 2014; 59 (13): 1-21

    Abstract

    The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures. Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate. Doing so increases statistical power and interpretability. Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses. We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series. For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results. These steps are encapsulated by 𝖱 functions, and we explain their applicability to general data sets.

    View details for DOI 10.18637/jss.v059.i13

    View details for PubMedID 26917999

    View details for PubMedCentralID PMC4764101

  • structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data JOURNAL OF STATISTICAL SOFTWARE Sankaran, K., Holmes, S. 2014; 59 (13)

    Abstract

    The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures. Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate. Doing so increases statistical power and interpretability. Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses. We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series. For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results. These steps are encapsulated by 𝖱 functions, and we explain their applicability to general data sets.

    View details for Web of Science ID 000341807500001

    View details for PubMedCentralID PMC4764101

  • Connections and Extensions: A Discussion of the Paper by Girolami and Byrne SCANDINAVIAN JOURNAL OF STATISTICS Diaconis, P., Seiler, C., Holmes, S. 2014; 41 (1): 3–7

    View details for DOI 10.1111/sjos.12070

    View details for Web of Science ID 000331606500002

  • HIV-1 transmission networks in a small world. journal of infectious diseases Pennings, P. S., Holmes, S. P., Shafer, R. W. 2014; 209 (2): 180-182

    View details for DOI 10.1093/infdis/jit525

    View details for PubMedID 24151310

    View details for PubMedCentralID PMC3873789

  • Positive Curvature and Hamiltonian Monte Carlo Seiler, C., Rubinstein-Salzedo, S., Holmes, S., Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., Weinberger, K. Q. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2014
  • Nasal Microenvironments and Interspecific Interactions Influence Nasal Microbiota Complexity and S. aureus Carriage. Cell host & microbe Yan, M., Pamp, S. J., Fukuyama, J., Hwang, P. H., Cho, D., Holmes, S., Relman, D. A. 2013; 14 (6): 631-640

    Abstract

    The indigenous microbiota of the nasal cavity plays important roles in human health and disease. Patterns of spatial variation in microbiota composition may help explain Staphylococcus aureus colonization and reveal interspecies and species-host interactions. To assess the biogeography of the nasal microbiota, we sampled healthy subjects, representing both S. aureus carriers and noncarriers at three nasal sites (anterior naris, middle meatus, and sphenoethmoidal recess). Phylogenetic compositional and sparse linear discriminant analyses revealed communities that differed according to site epithelium type and S. aureus culture-based carriage status. Corynebacterium accolens and C. pseudodiphtheriticum were identified as the most important microbial community determinants of S. aureus carriage, and competitive interactions were only evident at sites with ciliated pseudostratified columnar epithelium. In vitro cocultivation experiments provided supporting evidence of interactions among these species. These results highlight spatial variation in nasal microbial communities and differences in community composition between S. aureus carriers and noncarriers.

    View details for DOI 10.1016/j.chom.2013.11.005

    View details for PubMedID 24331461

  • Detection of cytomegalovirus drug resistance mutations by next-generation sequencing. Journal of clinical microbiology Sahoo, M. K., Lefterova, M. I., Yamamoto, F., Waggoner, J. J., Chou, S., Holmes, S. P., Anderson, M. W., Pinsky, B. A. 2013; 51 (11): 3700-3710

    Abstract

    Antiviral therapy for cytomegalovirus (CMV) plays an important role in the clinical management of solid organ and hematopoietic stem cell transplant recipients. However, CMV antiviral therapy can be complicated by drug resistance associated with mutations in the phosphotransferase UL97 and the DNA polymerase UL54. We have developed an amplicon-based high-throughput sequencing strategy for detecting CMV drug resistance mutations in clinical plasma specimens using a microfluidics PCR platform for multiplexed library preparation and a benchtop next-generation sequencing instrument. Plasmid clones of the UL97 and UL54 genes were used to demonstrate the low overall empirical error rate of the assay (0.189%) and to develop a statistical algorithm for identifying authentic low-abundance variants. The ability of the assay to detect resistance mutations was tested with mixes of wild-type and mutant plasmids, as well as clinical CMV isolates and plasma samples that were known to contain mutations that confer resistance. Finally, 48 clinical plasma specimens with a range of viral loads (394 to 2,191,011 copies/ml plasma) were sequenced using multiplexing of up to 24 specimens per run. This led to the identification of seven resistance mutations, three of which were present in <20% of the sequenced population. Thus, this assay offers more sensitive detection of minor variants and a higher multiplexing capacity than current methods for the genotypic detection of CMV drug resistance mutations.

    View details for DOI 10.1128/JCM.01605-13

    View details for PubMedID 23985916

  • Genetically dictated change in host mucus carbohydrate landscape exerts a diet-dependent effect on the gut microbiota PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kashyap, P. C., Marcobal, A., Ursell, L. K., Smits, S. A., Sonnenburg, E. D., Costello, E. K., Higginbottom, S. K., Domino, S. E., Holmes, S. P., Relman, D. A., Knight, R., Gordon, J. I., Sonnenburg, J. L. 2013; 110 (42): 17059-17064

    Abstract

    We investigate how host mucus glycan composition interacts with dietary carbohydrate content to influence the composition and expressed functions of a human gut community. The humanized gnotobiotic mice mimic humans with a nonsecretor phenotype due to knockout of their α1-2 fucosyltransferase (Fut2) gene. The fecal microbiota of Fut2(-) mice that lack fucosylated host glycans show decreased alpha diversity relative to Fut2(+) mice and exhibit significant differences in community composition. A glucose-rich plant polysaccharide-deficient (PD) diet exerted a strong effect on the microbiota membership but eliminated the effect of Fut2 genotype. Additionally fecal metabolites predicted host genotype in mice on a polysaccharide-rich standard diet but not on a PD diet. A more detailed mechanistic analysis of these interactions involved colonization of gnotobiotic Fut2(+) and Fut2(-) mice with Bacteroides thetaiotaomicron, a prominent member of the human gut microbiota known to adaptively forage host mucosal glycans when dietary polysaccharides are absent. Within Fut2(-) mice, the B. thetaiotaomicron fucose catabolic pathway was markedly down-regulated, whereas BT4241-4247, an operon responsive to terminal β-galactose, the precursor that accumulates in the Fut2(-) mice, was significantly up-regulated. These changes in B. thetaiotaomicron gene expression were only evident in mice fed a PD diet, wherein B. thetaiotaomicron relies on host mucus consumption. Furthermore, up-regulation of the BT4241-4247 operon was also seen in humanized Fut2(-) mice. Together, these data demonstrate that differences in host genotype that affect the carbohydrate landscape of the distal gut interact with diet to alter the composition and function of resident microbes in a diet-dependent manner.

    View details for DOI 10.1073/pnas.1306070110

    View details for Web of Science ID 000325634200076

    View details for PubMedID 24062455

    View details for PubMedCentralID PMC3800993

  • Prototypical Recombinant Multi-Protease-Inhibitor-Resistant Infectious Molecular Clones of Human Immunodeficiency Virus Type 1 ANTIMICROBIAL AGENTS AND CHEMOTHERAPY Varghese, V., Mitsuya, Y., Fessel, W. J., Liu, T. F., Melikian, G. L., Katzenstein, D. A., Schiffer, C. A., Holmes, S. P., Shafer, R. W. 2013; 57 (9): 4290-4299

    Abstract

    The many genetic manifestations of HIV-1 protease inhibitor (PI) resistance present challenges to research into the mechanisms of PI-resistance and the assessment of new PIs. To address these challenges, we created a panel of recombinant multi-PI resistant infectious molecular clones designed to represent the spectrum of clinically relevant multi-PI resistant viruses. To assess the representativeness of this panel, we examined the sequences of the panel's viruses in the context of a correlation network of PI-resistance amino acid substitutions in sequences from more than 10,000 patients. The panel of recombinant infectious molecular clones comprised 29 of 41 study-defined PI-resistance amino acid substitutions and 23 of the 27 tightest amino acid substitution clusters. Based on their phenotypic properties, the clones were classified into four groups with increasing cross-resistance to the PIs most commonly used for salvage therapy: lopinavir (LPV), tipranavir (TPV), and darunavir (DRV). The panel of recombinant infectious molecular clones has been made available without restriction through the NIH AIDS Research and Reference Reagent Program. The public availability of the panel makes it possible to compare the inhibitory activity of different PIs with one another. The diversity of the panel and the high-level PI resistance of its clones suggest that investigational PIs active against the clones in this panel will retain antiviral activity against most, if not all clinically relevant PI-resistant viruses.

    View details for DOI 10.1128/AAC.00614-13

    View details for Web of Science ID 000323285500025

  • ANALYSIS OF CASINO SHELF SHUFFLING MACHINES ANNALS OF APPLIED PROBABILITY Diaconis, P., Fulman, J., Holmes, S. 2013; 23 (4): 1692-1720

    View details for DOI 10.1214/12-AAP884

    View details for Web of Science ID 000321678200014

  • Harvester ants use interactions to regulate forager activation and availability ANIMAL BEHAVIOUR Pinter-Wollman, N., Bala, A., Merrell, A., Queirolo, J., Stumpe, M. C., Holmes, S., Gordon, D. M. 2013; 86 (1): 197-207

    Abstract

    Social groups balance flexibility and robustness in their collective response to environmental changes using feedback between behavioural processes that operate at different timescales. Here we examine how behavioural processes operating at two timescales regulate the foraging activity of colonies of the harvester ant, Pogonomyrmex barbatus, allowing them to balance their response to food availability and predation. Previous work showed that the rate at which foragers return to the nest with food influences the rate at which foragers leave the nest. To investigate how interactions inside the nest link the rates of returning and outgoing foragers, we observed outgoing foragers inside the nest in field colonies using a novel observation method. We found that the interaction rate experienced by outgoing foragers inside the nest corresponded to forager return rate, and that the interactions of outgoing foragers were spatially clustered. Activation of a forager occurred on the timescale of seconds: a forager left the nest 3-8 s after a substantial increase in interactions with returning foragers. The availability of outgoing foragers to become activated was adjusted on the timescale of minutes: when forager return was interrupted for more than 4-5 min, available foragers waiting near the nest entrance went deeper into the nest. Thus, forager activation and forager availability both increased with the rate at which foragers returned to the nest. This process was checked by negative feedback between forager activation and forager availability. Regulation of foraging activation on the timescale of seconds provides flexibility in response to fluctuations in food abundance, whereas regulation of forager availability on the timescale of minutes provides robustness in response to sustained disturbance such as predation.

    View details for DOI 10.1016/j.anbehav.2013.05.012

    View details for Web of Science ID 000321758700026

    View details for PubMedCentralID PMC3767282

  • Harvester ants use interactions to regulate forager activation and availability. Animal behaviour Pinter-Wollman, N., Bala, A., Merrell, A., Queirolo, J., Stumpe, M. C., Holmes, S., Gordon, D. M. 2013; 86 (1): 197-207

    Abstract

    Social groups balance flexibility and robustness in their collective response to environmental changes using feedback between behavioural processes that operate at different timescales. Here we examine how behavioural processes operating at two timescales regulate the foraging activity of colonies of the harvester ant, Pogonomyrmex barbatus, allowing them to balance their response to food availability and predation. Previous work showed that the rate at which foragers return to the nest with food influences the rate at which foragers leave the nest. To investigate how interactions inside the nest link the rates of returning and outgoing foragers, we observed outgoing foragers inside the nest in field colonies using a novel observation method. We found that the interaction rate experienced by outgoing foragers inside the nest corresponded to forager return rate, and that the interactions of outgoing foragers were spatially clustered. Activation of a forager occurred on the timescale of seconds: a forager left the nest 3-8 s after a substantial increase in interactions with returning foragers. The availability of outgoing foragers to become activated was adjusted on the timescale of minutes: when forager return was interrupted for more than 4-5 min, available foragers waiting near the nest entrance went deeper into the nest. Thus, forager activation and forager availability both increased with the rate at which foragers returned to the nest. This process was checked by negative feedback between forager activation and forager availability. Regulation of foraging activation on the timescale of seconds provides flexibility in response to fluctuations in food abundance, whereas regulation of forager availability on the timescale of minutes provides robustness in response to sustained disturbance such as predation.

    View details for DOI 10.1016/j.anbehav.2013.05.012

    View details for PubMedID 24031094

    View details for PubMedCentralID PMC3767282

  • Prototypical Recombinant Multi-Protease Inhibitor Resistant Infectious Molecular Clones of Human Immunodeficiency Virus Type-1. Antimicrobial agents and chemotherapy Varghese, V., Mitsuya, Y., Fessel, W. J., Liu, T. F., Melikian, G. L., Katzenstein, D. A., Schiffer, C. A., Holmes, S. P., Shafer, R. W. 2013

    Abstract

    The many genetic manifestations of HIV-1 protease inhibitor (PI) resistance present challenges to research into the mechanisms of PI-resistance and the assessment of new PIs. To address these challenges, we created a panel of recombinant multi-PI resistant infectious molecular clones designed to represent the spectrum of clinically relevant multi-PI resistant viruses. To assess the representativeness of this panel, we examined the sequences of the panel's viruses in the context of a correlation network of PI-resistance amino acid substitutions in sequences from more than 10,000 patients. The panel of recombinant infectious molecular clones comprised 29 of 41 study-defined PI-resistance amino acid substitutions and 23 of the 27 tightest amino acid substitution clusters. Based on their phenotypic properties, the clones were classified into four groups with increasing cross-resistance to the PIs most commonly used for salvage therapy: lopinavir (LPV), tipranavir (TPV), and darunavir (DRV). The panel of recombinant infectious molecular clones has been made available without restriction through the NIH AIDS Research and Reference Reagent Program. The public availability of the panel makes it possible to compare the inhibitory activity of different PIs with one another. The diversity of the panel and the high-level PI resistance of its clones suggest that investigational PIs active against the clones in this panel will retain antiviral activity against most, if not all clinically relevant PI-resistant viruses.

    View details for DOI 10.1128/AAC.00614-13

    View details for PubMedID 23796938

  • Nucleoside reverse transcriptase inhibitor resistance mutations associated with first-line stavudine-containing antiretroviral therapy: programmatic implications for countries phasing out stavudine. journal of infectious diseases Tang, M. W., Rhee, S., Bertagnolio, S., Ford, N., Holmes, S., Sigaloff, K. C., Hamers, R. L., de Wit, T. F., Fleury, H. J., Kanki, P. J., Ruxrungtham, K., Hawkins, C. A., Wallis, C. L., Stevens, W., van Zyl, G. U., Manosuthi, W., Hosseinipour, M. C., Ngo-Giang-Huong, N., Belec, L., Peeters, M., Aghokeng, A., Bunupuradah, T., Burda, S., Cane, P., Cappelli, G., Charpentier, C., Dagnra, A. Y., Deshpande, A. K., El-Katib, Z., Eshleman, S. H., Fokam, J., Gody, J., Katzenstein, D., Koyalta, D. D., Kumwenda, J. J., Lallemant, M., Lynen, L., Marconi, V. C., Margot, N. A., Moussa, S., Ndung'u, T., Nyambi, P. N., Orrell, C., Schapiro, J. M., Schuurman, R., Sirivichayakul, S., Smith, D., Zolfo, M., Jordan, M. R., Shafer, R. W. 2013; 207: S70-7

    Abstract

    Background The World Health Organization Antiretroviral Treatment Guidelines recommend phasing-out stavudine because of its risk of long-term toxicity. There are two mutational pathways of stavudine resistance with different implications for zidovudine and tenofovir cross-resistance, the primary candidates for replacing stavudine. However, because resistance testing is rarely available in resource-limited settings, it is critical to identify the cross-resistance patterns associated with first-line stavudine failure. Methods We analyzed HIV-1 resistance mutations following first-line stavudine failure from 35 publications comprising 1,825 individuals. We also assessed the influence of concomitant nevirapine vs. efavirenz, therapy duration, and HIV-1 subtype on the proportions of mutations associated with zidovudine vs. tenofovir cross-resistance. Results Mutations with preferential zidovudine activity, K65R or K70E, occurred in 5.3% of individuals. Mutations with preferential tenofovir activity, ≥two thymidine analog mutations (TAMs) or Q151M, occurred in 22% of individuals. Nevirapine increased the risk of TAMs, K65R, and Q151M. Longer therapy increased the risk of TAMs and Q151M but not K65R. Subtype C and CRF01_AE increased the risk of K65R, but only CRF01_AE increased the risk of K65R without Q151M. Conclusions Regardless of concomitant nevirapine vs. efavirenz, therapy duration, or subtype, tenofovir was more likely than zidovudine to retain antiviral activity following first-line d4T therapy.

    View details for DOI 10.1093/infdis/jit114

    View details for PubMedID 23687292

    View details for PubMedCentralID PMC3657117

  • Nucleoside reverse transcriptase inhibitor resistance mutations associated with first-line Stavudine-containing antiretroviral therapy: programmatic implications for countries phasing out Stavudine. journal of infectious diseases Tang, M. W., Rhee, S., Bertagnolio, S., Ford, N., Holmes, S., Sigaloff, K. C., Hamers, R. L., de Wit, T. F., Fleury, H. J., Kanki, P. J., Ruxrungtham, K., Hawkins, C. A., Wallis, C. L., Stevens, W., van Zyl, G. U., Manosuthi, W., Hosseinipour, M. C., Ngo-Giang-Huong, N., Belec, L., Peeters, M., Aghokeng, A., Bunupuradah, T., Burda, S., Cane, P., Cappelli, G., Charpentier, C., Dagnra, A. Y., Deshpande, A. K., El-Katib, Z., Eshleman, S. H., Fokam, J., Gody, J., Katzenstein, D., Koyalta, D. D., Kumwenda, J. J., Lallemant, M., Lynen, L., Marconi, V. C., Margot, N. A., Moussa, S., Ndung'u, T., Nyambi, P. N., Orrell, C., Schapiro, J. M., Schuurman, R., Sirivichayakul, S., Smith, D., Zolfo, M., Jordan, M. R., Shafer, R. W. 2013; 207: S70-7

    Abstract

    Background The World Health Organization Antiretroviral Treatment Guidelines recommend phasing-out stavudine because of its risk of long-term toxicity. There are two mutational pathways of stavudine resistance with different implications for zidovudine and tenofovir cross-resistance, the primary candidates for replacing stavudine. However, because resistance testing is rarely available in resource-limited settings, it is critical to identify the cross-resistance patterns associated with first-line stavudine failure. Methods We analyzed HIV-1 resistance mutations following first-line stavudine failure from 35 publications comprising 1,825 individuals. We also assessed the influence of concomitant nevirapine vs. efavirenz, therapy duration, and HIV-1 subtype on the proportions of mutations associated with zidovudine vs. tenofovir cross-resistance. Results Mutations with preferential zidovudine activity, K65R or K70E, occurred in 5.3% of individuals. Mutations with preferential tenofovir activity, ≥two thymidine analog mutations (TAMs) or Q151M, occurred in 22% of individuals. Nevirapine increased the risk of TAMs, K65R, and Q151M. Longer therapy increased the risk of TAMs and Q151M but not K65R. Subtype C and CRF01_AE increased the risk of K65R, but only CRF01_AE increased the risk of K65R without Q151M. Conclusions Regardless of concomitant nevirapine vs. efavirenz, therapy duration, or subtype, tenofovir was more likely than zidovudine to retain antiviral activity following first-line d4T therapy.

    View details for DOI 10.1093/infdis/jit114

    View details for PubMedID 23687292

    View details for PubMedCentralID PMC3657117

  • Interval Graph Limits ANNALS OF COMBINATORICS Diaconis, P., Holmes, S., Janson, S. 2013; 17 (1): 27-52

    Abstract

    We work out a graph limit theory for dense interval graphs. The theory developed departs from the usual description of a graph limit as a symmetric function W (x, y) on the unit square, with x and y uniform on the interval (0, 1). Instead, we fix a W and change the underlying distribution of the coordinates x and y. We find choices such that our limits are continuous. Connections to random interval graphs are given, including some examples. We also show a continuity result for the chromatic number and clique number of interval graphs. Some results on uniqueness of the limit description are given for general graph limits.

    View details for DOI 10.1007/s00026-012-0175-0

    View details for Web of Science ID 000319358600003

    View details for PubMedCentralID PMC4578824

  • Interval Graph Limits. Annals of combinatorics Diaconis, P., Holmes, S., Janson, S. 2013; 17 (1): 27-52

    Abstract

    We work out a graph limit theory for dense interval graphs. The theory developed departs from the usual description of a graph limit as a symmetric function W (x, y) on the unit square, with x and y uniform on the interval (0, 1). Instead, we fix a W and change the underlying distribution of the coordinates x and y. We find choices such that our limits are continuous. Connections to random interval graphs are given, including some examples. We also show a continuity result for the chromatic number and clique number of interval graphs. Some results on uniqueness of the limit description are given for general graph limits.

    View details for DOI 10.1007/s00026-012-0175-0

    View details for PubMedID 26405368

    View details for PubMedCentralID PMC4578824

  • phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PloS one McMurdie, P. J., Holmes, S. 2013; 8 (4)

    Abstract

    the analysis of microbial communities through dna sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

    View details for DOI 10.1371/journal.pone.0061217

    View details for PubMedID 23630581

    View details for PubMedCentralID PMC3632530

  • Advancing Our Understanding of the Human Microbiome Using QIIME. Methods in enzymology Navas-Molina, J. A., Peralta-Sánchez, J. M., González, A., McMurdie, P. J., Vázquez-Baeza, Y., Xu, Z., Ursell, L. K., Lauber, C., Zhou, H., Song, S. J., Huntley, J., Ackermann, G. L., Berg-Lyons, D., Holmes, S., Caporaso, J. G., Knight, R. 2013; 531: 371-444

    Abstract

    High-throughput DNA sequencing technologies, coupled with advanced bioinformatics tools, have enabled rapid advances in microbial ecology and our understanding of the human microbiome. QIIME (Quantitative Insights Into Microbial Ecology) is an open-source bioinformatics software package designed for microbial community analysis based on DNA sequence data, which provides a single analysis framework for analysis of raw sequence data through publication-quality statistical analyses and interactive visualizations. In this chapter, we demonstrate the use of the QIIME pipeline to analyze microbial communities obtained from several sites on the bodies of transgenic and wild-type mice, as assessed using 16S rRNA gene sequences generated on the Illumina MiSeq platform. We present our recommended pipeline for performing microbial community analysis and provide guidelines for making critical choices in the process. We present examples of some of the types of analyses that are enabled by QIIME and discuss how other tools, such as phyloseq and R, can be applied to expand upon these analyses.

    View details for DOI 10.1016/B978-0-12-407863-5.00019-8

    View details for PubMedID 24060131

  • Sampling from a Manifold Festchrift for Joe Eaton Diaconis, P., Holmes, S., Shahshahani, M. IMS. 2013
  • Low-Level Persistence of Drug Resistance Mutations in Hepatitis B Virus-Infected Subjects with a Past History of Lamivudine Treatment ANTIMICROBIAL AGENTS AND CHEMOTHERAPY Margeridon-Thermet, S., Svarovskaia, E. S., Babrzadeh, F., Martin, R., Liu, T. F., Pacold, M., Reuman, E. C., Holmes, S. P., Borroto-Esoda, K., Shafer, R. W. 2013; 57 (1): 343-349

    Abstract

    We sought to determine the prevalence of hepatitis B virus (HBV) lamivudine (LAM)-resistant minority variants in subjects who once received LAM but had discontinued it prior to virus sampling. We performed direct PCR Sanger sequencing and ultradeep pyrosequencing (UDPS) of HBV reverse transcriptase (RT) of plasma viruses from 45 LAM-naive subjects and 46 LAM-experienced subjects who had discontinued LAM a median of 24 months earlier. UDPS was performed to a depth of ∼3,000 reads per nucleotide. Minority variants were defined as differences from the Sanger sequence present in ≥0.5% of UDPS reads in a sample. Sanger sequencing identified ≥1 LAM resistance mutations (rtL80I/V, rtM204I, and rtA181T) in samples from 5 (11%) of 46 LAM-experienced and none of 45 LAM-naive subjects (0%; P = 0.06). UDPS detected ≥1 LAM resistance mutations (rtL80I/V, rtV173L, rtL180M, rtA181T, and rtM204I/V) in 10 (22%) of the 46 LAM-experienced subjects, including 5 in whom LAM resistance mutations were not identified by Sanger sequencing. Overall, LAM resistance mutations were more likely to be present in LAM-experienced (10/46, 22%) than LAM-naive subjects (0/45, 0%; P = 0.001). The median time since LAM discontinuation was 12.8 months in the 10 subjects with a LAM resistance mutation compared to 30.5 months in the 36 LAM-experienced subjects without a LAM resistance mutation (P < 0.001). The likelihood of detecting a LAM resistance mutation was significantly increased using UDPS compared to Sanger sequencing and was inversely associated with the time since LAM discontinuation.

    View details for DOI 10.1128/AAC.01601-12

    View details for Web of Science ID 000312958400044

    View details for PubMedID 23114756

    View details for PubMedCentralID PMC3535911

  • PRC2/EED-EZH2 Complex Is Up-Regulated in Breast Cancer Lymph Node Metastasis Compared to Primary Tumor and Correlates with Tumor Proliferation In Situ PLOS ONE Yu, H., Simons, D. L., Segall, I., Carcamo-Cavazos, V., Schwartz, E. J., Yan, N., Zuckerman, N. S., Dirbas, F. M., Johnson, D. L., Holmes, S. P., Lee, P. P. 2012; 7 (12)

    Abstract

    Lymph node metastasis is a key event in the progression of breast cancer. Therefore it is important to understand the underlying mechanisms which facilitate regional lymph node metastatic progression.We performed gene expression profiling of purified tumor cells from human breast tumor and lymph node metastasis. By microarray network analysis, we found an increased expression of polycomb repression complex 2 (PRC2) core subunits EED and EZH2 in lymph node metastatic tumor cells over primary tumor cells which were validated through real-time PCR. Additionally, immunohistochemical (IHC) staining and quantitative image analysis of whole tissue sections showed a significant increase of EZH2 expressing tumor cells in lymph nodes over paired primary breast tumors, which strongly correlated with tumor cell proliferation in situ. We further explored the mechanisms of PRC2 gene up-regulation in metastatic tumor cells and found up-regulation of E2F genes, MYC targets and down-regulation of tumor suppressor gene E-cadherin targets in lymph node metastasis through GSEA analyses. Using IHC, the expression of potential EZH2 target, E-cadherin was examined in paired primary/lymph node samples and was found to be significantly decreased in lymph node metastases over paired primary tumors.This study identified an over expression of the epigenetic silencing complex PRC2/EED-EZH2 in breast cancer lymph node metastasis as compared to primary tumor and its positive association with tumor cell proliferation in situ. Concurrently, PRC2 target protein E-cadherin was significant decreased in lymph node metastases, suggesting PRC2 promotes epithelial mesenchymal transition (EMT) in lymph node metastatic process through repression of E-cadherin. These results indicate that epigenetic regulation mediated by PRC2 proteins may provide additional advantage for the outgrowth of metastatic tumor cells in lymph nodes. This opens up epigenetic drug development possibilities for the treatment and prevention of lymph node metastasis in breast cancer.

    View details for DOI 10.1371/journal.pone.0051239

    View details for PubMedID 23251464

  • Denoising PCR-amplified metagenome data BMC BIOINFORMATICS Rosen, M. J., Callahan, B. J., Fisher, D. S., Holmes, S. P. 2012; 13

    Abstract

    PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy.We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche's 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise.DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.

    View details for DOI 10.1186/1471-2105-13-283

    View details for Web of Science ID 000314687600001

    View details for PubMedID 23113967

    View details for PubMedCentralID PMC3563472

  • Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS Chakerian, J., Holmes, S. 2012; 21 (3): 581-599
  • Nest site and weather affect the personality of harvester ant colonies. Behavioral ecology : official journal of the International Society for Behavioral Ecology Pinter-Wollman, N., Gordon, D. M., Holmes, S. 2012; 23 (5): 1022-1029

    Abstract

    Environmental conditions and physical constraints both influence an animal's behavior. We investigate whether behavioral variation among colonies of the black harvester ant, Messor andrei, remains consistent across foraging and disturbance situations and ask whether consistent colony behavior is affected by nest site and weather. We examined variation among colonies in responsiveness to food baits and to disturbance, measured as a change in numbers of active ants, and in the speed with which colonies retrieved food and removed debris. Colonies differed consistently, across foraging and disturbance situations, in both responsiveness and speed. Increased activity in response to food was associated with a smaller decrease in response to alarm. Speed of retrieving food was correlated with speed of removing debris. In all colonies, speed was greater in dry conditions, reducing the amount of time ants spent outside the nest. While a colony occupied a certain nest site, its responsiveness was consistent in both foraging and disturbance situations, suggesting that nest structure influences colony personality.

    View details for DOI 10.1093/beheco/ars066

    View details for PubMedID 22936841

    View details for PubMedCentralID PMC3431114

  • Nest site and weather affect the personality of harvester ant colonies BEHAVIORAL ECOLOGY Pinter-Wollman, N., Gordon, D. M., Holmes, S. 2012; 23 (5): 1022-1029

    Abstract

    Environmental conditions and physical constraints both influence an animal's behavior. We investigate whether behavioral variation among colonies of the black harvester ant, Messor andrei, remains consistent across foraging and disturbance situations and ask whether consistent colony behavior is affected by nest site and weather. We examined variation among colonies in responsiveness to food baits and to disturbance, measured as a change in numbers of active ants, and in the speed with which colonies retrieved food and removed debris. Colonies differed consistently, across foraging and disturbance situations, in both responsiveness and speed. Increased activity in response to food was associated with a smaller decrease in response to alarm. Speed of retrieving food was correlated with speed of removing debris. In all colonies, speed was greater in dry conditions, reducing the amount of time ants spent outside the nest. While a colony occupied a certain nest site, its responsiveness was consistent in both foraging and disturbance situations, suggesting that nest structure influences colony personality.

    View details for DOI 10.1093/beheco/ars066

    View details for Web of Science ID 000308228200017

    View details for PubMedCentralID PMC3431114

  • Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees. Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America Chakerian, J., Holmes, S. 2012; 21 (3): 581-599

    Abstract

    Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and hierarchical clustering trees, and shows some of the applications in evaluating tree estimates. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (called a CAT(0) space), a collection of trees can naturally be represented as a tree. We compare this representation to the Euclidean approximations of treespace made available through both a classical multidimensional scaling and a Kernel multidimensional scaling of the matrix of the distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence of both certain columns (positions, variables, or genes) and certain rows (species, observations, or arrays) on the construction of such trees. It also can provide a way of detecting heterogeneous mixtures in the input data. Supplementary materials for this article are available online.

    View details for DOI 10.1080/10618600.2012.640901

    View details for PubMedID 32982128

    View details for PubMedCentralID PMC7518125

  • The Molecular Architecture of the Eukaryotic Chaperonin TRiC/CCT STRUCTURE Leitner, A., Joachimiak, L. A., Bracher, A., Moenkemeyer, L., Walzthoeni, T., Chen, B., Pechmann, S., Holmes, S., Cong, Y., Ma, B., Ludtke, S., Chiu, W., Hartl, F. U., Aebersold, R., Frydman, J. 2012; 20 (5): 814-825

    Abstract

    TRiC/CCT is a highly conserved and essential chaperonin that uses ATP cycling to facilitate folding of approximately 10% of the eukaryotic proteome. This 1 MDa hetero-oligomeric complex consists of two stacked rings of eight paralogous subunits each. Previously proposed TRiC models differ substantially in their subunit arrangements and ring register. Here, we integrate chemical crosslinking, mass spectrometry, and combinatorial modeling to reveal the definitive subunit arrangement of TRiC. In vivo disulfide mapping provided additional validation for the crosslinking-derived arrangement as the definitive TRiC topology. This subunit arrangement allowed the refinement of a structural model using existing X-ray diffraction data. The structure described here explains all available crosslink experiments, provides a rationale for previously unexplained structural features, and reveals a surprising asymmetry of charges within the chaperonin folding chamber.

    View details for DOI 10.1016/j.str.2012.03.007

    View details for Web of Science ID 000304214400008

    View details for PubMedID 22503819

    View details for PubMedCentralID PMC3350567

  • Phyloseq: a bioconductor package for handling and analysis of high-throughput phylogenetic sequence data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing McMurdie, P. J., Holmes, S. 2012: 235-246

    Abstract

    We present a detailed description of a new Bioconductor package, phyloseq, for integrated data and analysis of taxonomically-clustered phylogenetic sequencing data in conjunction with related data types. The phyloseq package integrates abundance data, phylogenetic information and covariates so that exploratory transformations, plots, and confirmatory testing and diagnostic plots can be carried out seamlessly. The package is built following the S4 object-oriented framework of the R language so that once the data have been input the user can easily transform, plot and analyze the data. We present some examples that highlight the methods and the ease with which we can leverage existing packages.

    View details for PubMedID 22174279

  • Comparisons of distance methods for combining covariates and abundances in microbiome studies. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Fukuyama, J., McMurdie, P. J., Dethlefsen, L., Relman, D. A., Holmes, S. 2012: 213-224

    Abstract

    This article compares different methods for combining abundance data, phylogenetic trees and clinical covariates in a nonparametric setting. In particular we study the output from the principal coordinates analysis on UNIFRAC and WEIGHTED UNIFRAC distances and the output from a double principal coordinate analyses DPCOA using distances computed on the phylogenetic tree. We also present power comparisons for some of the standard tests of phylogenetic signal between different types of samples. These methods are compared both on simulated and real data sets. Our study shows that DPCoA is less robust to outliers, and more robust to small noisy fluctuations around zero.

    View details for PubMedID 22174277

  • A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes BMC BIOINFORMATICS Doherty, K. M., Nakka, P., King, B. M., Rhee, S., Holmes, S. P., Shafer, R. W., Radhakrishnan, M. L. 2011; 12

    Abstract

    Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to arise within the population. Understanding the phenotypic and genotypic patterns responsible for multi-PI resistance is necessary for developing PIs that are active against clinically-relevant PI-resistant HIV-1 variants.In this work, we use globally optimal integer programming-based clustering techniques to elucidate multi-PI phenotypic resistance patterns using a data set of 398 HIV-1 protease sequences that have each been phenotyped for susceptibility toward the nine clinically-approved HIV-1 PIs. We validate the information content of the clusters by evaluating their ability to predict the level of decreased susceptibility to each of the available PIs using a cross validation procedure. We demonstrate the finding that as a result of phenotypic cross resistance, the considered clinical HIV-1 protease isolates are confined to ~6% or less of the clinically-relevant phenotypic space. Clustering and feature selection methods are used to find representative sequences and mutations for major resistance phenotypes to elucidate their genotypic signatures. We show that phenotypic similarity does not imply genotypic similarity, that different PI-resistance mutation patterns can give rise to HIV-1 isolates with similar phenotypic profiles.Rather than characterizing HIV-1 susceptibility toward each PI individually, our study offers a unique perspective on the phenomenon of PI class resistance by uncovering major multidrug-resistant phenotypic patterns and their often diverse genotypic determinants, providing a methodology that can be applied to understand clinically-relevant phenotypic patterns to aid in the design of novel inhibitors that target other rapidly evolving molecular targets as well.

    View details for DOI 10.1186/1471-2105-12-477

    View details for Web of Science ID 000301382000001

    View details for PubMedID 22172090

    View details for PubMedCentralID PMC3305535

  • THE DUALITY DIAGRAM IN DATA ANALYSIS: EXAMPLES OF MODERN APPLICATIONS ANNALS OF APPLIED STATISTICS De la Cruz, O., Holmes, S. 2011; 5 (4): 2266-2277

    Abstract

    Today's data-heavy research environment requires the integration of different sources of information into structured datasets that can not be analyzed as simple matrices. We introduce an old technique, known in the European data analyses circles as the Duality Diagram Approach, put to new uses through the use of a variety of metrics and ways of combining different diagrams together. This issue of the Annals of Applied Statistics contains contemporary examples of how this approach provides solutions to hard problems in data integration. We present here the genesis of the technique and how it can be seen as a precursor of the modern kernel based approaches.

    View details for DOI 10.1214/10-AOAS408

    View details for Web of Science ID 000300382800002

    View details for PubMedCentralID PMC3265363

  • The effect of individual variation on the structure and function of interaction networks in harvester ants JOURNAL OF THE ROYAL SOCIETY INTERFACE Pinter-Wollman, N., Wollman, R., Guetz, A., Holmes, S., Gordon, D. M. 2011; 8 (64): 1562-1573

    Abstract

    Social insects exhibit coordinated behaviour without central control. Local interactions among individuals determine their behaviour and regulate the activity of the colony. Harvester ants are recruited for outside work, using networks of brief antennal contacts, in the nest chamber closest to the nest exit: the entrance chamber. Here, we combine empirical observations, image analysis and computer simulations to investigate the structure and function of the interaction network in the entrance chamber. Ant interactions were distributed heterogeneously in the chamber, with an interaction hot-spot at the entrance leading further into the nest. The distribution of the total interactions per ant followed a right-skewed distribution, indicating the presence of highly connected individuals. Numbers of ant encounters observed positively correlated with the duration of observation. Individuals varied in interaction frequency, even after accounting for the duration of observation. An ant's interaction frequency was explained by its path shape and location within the entrance chamber. Computer simulations demonstrate that variation among individuals in connectivity accelerates information flow to an extent equivalent to an increase in the total number of interactions. Individual variation in connectivity, arising from variation among ants in location and spatial behaviour, creates interaction centres, which may expedite information flow.

    View details for DOI 10.1098/rsif.2011.0059

    View details for Web of Science ID 000295211200003

    View details for PubMedID 21490001

    View details for PubMedCentralID PMC3177612

  • Adaptive importance sampling for network growth models ANNALS OF OPERATIONS RESEARCH Guetz, A. N., Holmes, S. P. 2011; 189 (1): 187-203

    Abstract

    Network Growth Models such as Preferential Attachment and Duplication/Divergence are popular generative models with which to study complex networks in biology, sociology, and computer science. However, analyzing them within the framework of model selection and statistical inference is often complicated and computationally difficult, particularly when comparing models that are not directly related or nested. In practice, ad hoc methods are often used with uncertain results. If possible, the use of standard likelihood-based statistical model selection techniques is desirable. With this in mind, we develop an Adaptive Importance Sampling algorithm for estimating likelihoods of Network Growth Models. We introduce the use of the classic Plackett-Luce model of rankings as a family of importance distributions. Updates to importance distributions are performed iteratively via the Cross-Entropy Method with an additional correction for degeneracy/over-fitting inspired by the Minimum Description Length principle. This correction can be applied to other estimation problems using the Cross-Entropy method for integration/approximate counting, and it provides an interpretation of Adaptive Importance Sampling as iterative model selection. Empirical results for the Preferential Attachment model are given, along with a comparison to an alternative established technique, Annealed Importance Sampling.

    View details for DOI 10.1007/s10479-010-0685-2

    View details for Web of Science ID 000294689000010

    View details for PubMedCentralID PMC4863242

  • Adaptive importance sampling for network growth models. Annals of operations research Guetz, A. N., Holmes, S. P. 2011; 189 (1): 187-203

    Abstract

    Network Growth Models such as Preferential Attachment and Duplication/Divergence are popular generative models with which to study complex networks in biology, sociology, and computer science. However, analyzing them within the framework of model selection and statistical inference is often complicated and computationally difficult, particularly when comparing models that are not directly related or nested. In practice, ad hoc methods are often used with uncertain results. If possible, the use of standard likelihood-based statistical model selection techniques is desirable. With this in mind, we develop an Adaptive Importance Sampling algorithm for estimating likelihoods of Network Growth Models. We introduce the use of the classic Plackett-Luce model of rankings as a family of importance distributions. Updates to importance distributions are performed iteratively via the Cross-Entropy Method with an additional correction for degeneracy/over-fitting inspired by the Minimum Description Length principle. This correction can be applied to other estimation problems using the Cross-Entropy method for integration/approximate counting, and it provides an interpretation of Adaptive Importance Sampling as iterative model selection. Empirical results for the Preferential Attachment model are given, along with a comparison to an alternative established technique, Annealed Importance Sampling.

    View details for DOI 10.1007/s10479-010-0685-2

    View details for PubMedID 27182098

    View details for PubMedCentralID PMC4863242

  • Colonic Contribution to Uremic Solutes JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY Aronov, P. A., Luo, F. J., Plummer, N. S., Quan, Z., Holmes, S., Hostetter, T. H., Meyer, T. W. 2011; 22 (9): 1769-1776

    Abstract

    Microbes in the colon produce compounds, normally excreted by the kidneys, which are potential uremic toxins. Although p-cresol sulfate and indoxyl sulfate are well studied examples, few other compounds are known. Here, we compared plasma from hemodialysis patients with and without colons to identify and further characterize colon-derived uremic solutes. HPLC confirmed the colonic origin of p-cresol sulfate and indoxyl sulfate, but levels of hippurate, methylamine, and dimethylamine were not significantly lower in patients without colons. High-resolution mass spectrometry detected more than 1000 features in predialysis plasma samples. Hierarchical clustering based on these features clearly separated dialysis patients with and without colons. Compared with patients with colons, we identified more than 30 individual features in patients without colons that were either absent or present in lower concentration. Almost all of these features were more prominent in plasma from dialysis patients than normal subjects, suggesting that they represented uremic solutes. We used a panel of indole and phenyl standards to identify five colon-derived uremic solutes: α-phenylacetyl-l-glutamine, 5-hydroxyindole, indoxyl glucuronide, p-cresol sulfate, and indoxyl sulfate. However, compounds with accurate mass values matching most of the colon-derived solutes could not be found in standard metabolomic databases. These results suggest that colonic microbes may produce an important portion of uremic solutes, most of which remain unidentified.

    View details for DOI 10.1681/ASN.2010121220

    View details for Web of Science ID 000295705800024

    View details for PubMedID 21784895

    View details for PubMedCentralID PMC3171947

  • Site-Specific Mobilization of Vinyl Chloride Respiration Islands by a Mechanism Common in Dehalococcoides BMC GENOMICS McMurdie, P. J., Hug, L. A., Edwards, E. A., Holmes, S., Spormann, A. M. 2011; 12

    Abstract

    Vinyl chloride is a widespread groundwater pollutant and Group 1 carcinogen. A previous comparative genomic analysis revealed that the vinyl chloride reductase operon, vcrABC, of Dehalococcoides sp. strain VS is embedded in a horizontally-acquired genomic island that integrated at the single-copy tmRNA gene, ssrA.We targeted conserved positions in available genomic islands to amplify and sequence four additional vcrABC -containing genomic islands from previously-unsequenced vinyl chloride respiring Dehalococcoides enrichments. We identified a total of 31 ssrA-specific genomic islands from Dehalococcoides genomic data, accounting for 47 reductive dehalogenase homologous genes and many other non-core genes. Sixteen of these genomic islands contain a syntenic module of integration-associated genes located adjacent to the predicted site of integration, and among these islands, eight contain vcrABC as genetic 'cargo'. These eight vcrABC -containing genomic islands are syntenic across their ~12 kbp length, but have two phylogenetically discordant segments that unambiguously differentiate the integration module from the vcrABC cargo. Using available Dehalococcoides phylogenomic data we estimate that these ssrA-specific genomic islands are at least as old as the Dehalococcoides group itself, which in turn is much older than human civilization.The vcrABC -containing genomic islands are a recently-acquired subset of a diverse collection of ssrA-specific mobile elements that are a major contributor to strain-level diversity in Dehalococcoides, and may have been throughout its evolution. The high similarity between vcrABC sequences is quantitatively consistent with recent horizontal acquisition driven by ~100 years of industrial pollution with chlorinated ethenes.

    View details for DOI 10.1186/1471-2164-12-287

    View details for Web of Science ID 000293280200001

    View details for PubMedID 21635780

    View details for PubMedCentralID PMC3146451

  • Colony variation in the collective regulation of foraging by harvester ants BEHAVIORAL ECOLOGY Gordon, D. M., Guetz, A., Greene, M. J., Holmes, S. 2011; 22 (2): 429-435

    Abstract

    This study investigates variation in collective behavior in a natural population of colonies of the harvester ant, Pogonomyrmex barbatus. Harvester ant colonies regulate foraging activity to adjust to current food availability; the rate at which inactive foragers leave the nest on the next trip depends on the rate at which successful foragers return with food. This study investigates differences among colonies in foraging activity and how these differences are associated with variation among colonies in the regulation of foraging. Colonies differ in the baseline rate at which patrollers leave the nest, without stimulation from returning ants. This baseline rate predicts a colony's foraging activity, suggesting there is a colony-specific activity level that influences how quickly any ant leaves the nest. When a colony's foraging activity is high, the colony is more likely to regulate foraging. Moreover, colonies differ in the propensity to adjust the rate of outgoing foragers to the rate of forager return. Naturally occurring variation in the regulation of foraging may lead to variation in colony survival and reproductive success.

    View details for DOI 10.1093/beheco/arq218

    View details for Web of Science ID 000289299500033

    View details for PubMedCentralID PMC3071749

  • Colony variation in the collective regulation of foraging by harvester ants. Behavioral ecology : official journal of the International Society for Behavioral Ecology Gordon, D. M., Guetz, A., Greene, M. J., Holmes, S. 2011; 22 (2): 429-435

    Abstract

    This study investigates variation in collective behavior in a natural population of colonies of the harvester ant, Pogonomyrmex barbatus. Harvester ant colonies regulate foraging activity to adjust to current food availability; the rate at which inactive foragers leave the nest on the next trip depends on the rate at which successful foragers return with food. This study investigates differences among colonies in foraging activity and how these differences are associated with variation among colonies in the regulation of foraging. Colonies differ in the baseline rate at which patrollers leave the nest, without stimulation from returning ants. This baseline rate predicts a colony's foraging activity, suggesting there is a colony-specific activity level that influences how quickly any ant leaves the nest. When a colony's foraging activity is high, the colony is more likely to regulate foraging. Moreover, colonies differ in the propensity to adjust the rate of outgoing foragers to the rate of forager return. Naturally occurring variation in the regulation of foraging may lead to variation in colony survival and reproductive success.

    View details for DOI 10.1093/beheco/arq218

    View details for PubMedID 22479133

    View details for PubMedCentralID PMC3071749

  • PhyloChip microarray analysis reveals altered gastrointestinal microbial communities in a rat model of colonic hypersensitivity NEUROGASTROENTEROLOGY AND MOTILITY Nelson, T. A., Holmes, S., ALEKSEYENKO, A. V., Shenoy, M., DeSantis, T., Wu, C. H., Andersen, G. L., Winston, J., Sonnenburg, J., Pasricha, P. J., Spormann, A. 2011; 23 (2)

    Abstract

    Irritable bowel syndrome (IBS) is a chronic, episodic gastrointestinal disorder that is prevalent in a significant fraction of western human populations; and changes in the microbiota of the large bowel have been implicated in the pathology of the disease.Using a novel comprehensive, high-density DNA microarray (PhyloChip) we performed a phylogenetic analysis of the microbial community of the large bowel in a rat model in which intracolonic acetic acid in neonates was used to induce long lasting colonic hypersensitivity and decreased stool water content and frequency, representing the equivalent of human constipation-predominant IBS.Our results revealed a significantly increased compositional difference in the microbial communities in rats with neonatal irritation as compared with controls. Even more striking was the dramatic change in the ratio of Firmicutes relative to Bacteroidetes, where neonatally irritated rats were enriched more with Bacteroidetes and also contained a different composition of species within this phylum. Our study also revealed differences at the level of bacterial families and species.The PhyloChip is a useful and convenient method to study enteric microflora. Further, this rat model system may be a useful experimental platform to study the causes and consequences of changes in microbial community composition associated with IBS.

    View details for DOI 10.1111/j.1365-2982.2010.01637.x

    View details for Web of Science ID 000286211600017

    View details for PubMedID 21129126

    View details for PubMedCentralID PMC3353725

  • Visualization and statistical comparisons of microbial communities using R packages on Phylochip data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Holmes, S., Alekseyenko, A., Timme, A., Nelson, T., Pasricha, P. J., Spormann, A. 2011: 142-153

    Abstract

    This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.

    View details for PubMedID 21121042

  • Bioinformatics and Data Analysis Handbook of Molecular and Cellular Methods in Biology and Medicine de La Cruz, O., Holmes, S. CRC. 2011; 3rd: 409–436
  • A classification model for G-to-A hypermutation in hepatitis B virus ultra-deep pyrosequencing reads BIOINFORMATICS Reuman, E. C., Margeridon-Thermet, S., Caudill, H. B., Liu, T., Borroto-Esoda, K., Svarovskaia, E. S., Holmes, S. P., Shafer, R. W. 2010; 26 (23): 2929-2932

    Abstract

    G → A hypermutation is an innate antiviral defense mechanism, mediated by host enzymes, which leads to the mutational impairment of viruses. Sensitive and specific identification of host-mediated G → A hypermutation is a novel sequence analysis challenge, particularly for viral deep sequencing studies. For example, two of the most common hepatitis B virus (HBV) reverse transcriptase (RT) drug-resistance mutations, A181T and M204I, arise from G → A changes and are routinely detected as low-abundance variants in nearly all HBV deep sequencing samples.We developed a classification model using measures of G → A excess and predicted indicators of lethal mutation and applied this model to 325 920 unique deep sequencing reads from plasma virus samples from 45 drug treatment-naïve HBV-infected individuals. The 2.9% of sequence reads that were classified as hypermutated by our model included most of the reads with A181T and/or M204I, indicating the usefulness of this model for distinguishing viral adaptive changes from host-mediated viral editing.Source code and sequence data are available at http://hivdb.stanford.edu/pages/resources.html.ereuman@stanfordalumni.orgSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btq570

    View details for Web of Science ID 000284430900001

    View details for PubMedID 20937597

    View details for PubMedCentralID PMC2982158

  • Quantitative, Architectural Analysis of Immune Cell Subsets in Tumor-Draining Lymph Nodes from Breast Cancer Patients and Healthy Lymph Nodes PLOS ONE Setiadi, A. F., Ray, N. C., Kohrt, H. E., Kapelner, A., Carcamo-Cavazos, V., Levic, E. B., Yadegarynia, S., van der Loos, C. M., Schwartz, E. J., Holmes, S., Lee, P. P. 2010; 5 (8)

    Abstract

    To date, pathological examination of specimens remains largely qualitative. Quantitative measures of tissue spatial features are generally not captured. To gain additional mechanistic and prognostic insights, a need for quantitative architectural analysis arises in studying immune cell-cancer interactions within the tumor microenvironment and tumor-draining lymph nodes (TDLNs).We present a novel, quantitative image analysis approach incorporating 1) multi-color tissue staining, 2) high-resolution, automated whole-section imaging, 3) custom image analysis software that identifies cell types and locations, and 4) spatial statistical analysis. As a proof of concept, we applied this approach to study the architectural patterns of T and B cells within tumor-draining lymph nodes from breast cancer patients versus healthy lymph nodes. We found that the spatial grouping patterns of T and B cells differed between healthy and breast cancer lymph nodes, and this could be attributed to the lack of B cell localization in the extrafollicular region of the TDLNs.Our integrative approach has made quantitative analysis of complex visual data possible. Our results highlight spatial alterations of immune cells within lymph nodes from breast cancer patients as an independent variable from numerical changes. This opens up new areas of investigations in research and medicine. Future application of this approach will lead to a better understanding of immune changes in the tumor microenvironment and TDLNs, and how they affect clinical outcomes.

    View details for DOI 10.1371/journal.pone.0012420

    View details for Web of Science ID 000281234700034

    View details for PubMedID 20811638

    View details for PubMedCentralID PMC2928294

  • Constrained patterns of covariation and clustering of HIV-1 non-nucleoside reverse transcriptase inhibitor resistance mutations JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY Reuman, E. C., Rhee, S., Holmes, S. P., Shafer, R. W. 2010; 65 (7): 1477-1485

    Abstract

    We characterized pairwise and higher order patterns of non-nucleoside reverse transcriptase inhibitor (NNRTI)-selected mutations because multiple mutations are usually required for clinically significant resistance to second-generation NNRTIs.We analysed viruses from 13 039 individuals with sequences containing at least one of 52 published NNRTI-selected mutations, including 1133 viruses from individuals who received efavirenz but no other NNRTI and 1510 viruses from individuals who received nevirapine but no other NNRTI. Of the 17 reported etravirine resistance-associated mutations (RAMs), Y181C/I/V, L100I, K101P and M230L were considered major based on published in vitro susceptibility data.Efavirenz preferentially selected for 16 mutations, including L100I (14% versus 0.1%, P < 0.001), K101P (3.3% versus 0.4%, P < 0.001) and M230L (2.8% versus 1.3%, P = 0.004), whereas nevirapine preferentially selected for 12 mutations, including Y181C/I/V (48% versus 6.9%, P < 0.001). Twenty-nine pairs of NNRTI-selected mutations covaried significantly, including Y181C with seven other mutations (A98G, K101E/H, V108I, G190A/S and H221Y), L100I with K103N, and K101P with K103S. Two pairs (Y181C + V179F and Y181C + G190S) were predicted to confer >10-fold decreased etravirine susceptibility. Seventeen percent of sequences had three or more NNRTI-selected mutations, mostly in clusters of covarying mutations. Many clusters had Y181C plus a non-major etravirine RAM; few had more than one major etravirine RAM.Although major etravirine RAMs rarely occur in combination, 2 of 29 pairs of covarying mutations were associated with >10-fold decreased etravirine susceptibility. Viruses with three or more NNRTI-selected mutations often contained Y181C in combination with one or more minor etravirine RAMs; however, phenotypic and clinical correlates for most of these higher order combinations have not been published.

    View details for DOI 10.1093/jac/dkq140

    View details for Web of Science ID 000279926500027

    View details for PubMedID 20462946

    View details for PubMedCentralID PMC2882873

  • Localized Plasticity in the Streamlined Genomes of Vinyl Chloride Respiring Dehalococcoides PLOS GENETICS McMurdie, P. J., Behrens, S. F., Mueller, J. A., Goeke, J., Ritalahti, K. M., Wagner, R., Goltsman, E., Lapidus, A., Holmes, S., Loeffler, F. E., Spormann, A. M. 2009; 5 (11)

    Abstract

    Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain the majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism.

    View details for DOI 10.1371/journal.pgen.1000714

    View details for Web of Science ID 000272419500010

    View details for PubMedID 19893622

    View details for PubMedCentralID PMC2764846

  • Nonpolymorphic Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Treatment-Selected Mutations ANTIMICROBIAL AGENTS AND CHEMOTHERAPY Shahriar, R., Rhee, S., Liu, T. F., Fessel, W. J., Scarsella, A., Towner, W., Holmes, S. P., Zolopa, A. R., Shafer, R. W. 2009; 53 (11): 4869-4878

    Abstract

    The spectrum of human immunodeficiency virus type 1 (HIV-1) protease and reverse transcriptase (RT) mutations selected by antiretroviral (ARV) drugs requires ongoing reassessment as ARV treatment patterns evolve and increasing numbers of protease and RT sequences of different viral subtypes are published. Accordingly, we compared the prevalences of protease and RT mutations in HIV-1 group M sequences from individuals with and without a history of previous treatment with protease inhibitors (PIs) or RT inhibitors (RTIs). Mutations in protease sequences from 26,888 individuals and in RT sequences from 25,695 individuals were classified according to whether they were nonpolymorphic in untreated individuals and whether their prevalence increased fivefold with ARV therapy. This analysis showed that 88 PI-selected and 122 RTI-selected nonpolymorphic mutations had a prevalence that was fivefold higher in individuals receiving ARVs than in ARV-naïve individuals. This was an increase of 47% and 77%, respectively, compared with the 60 PI- and 69 RTI-selected mutations identified in a similar analysis that we published in 2005 using subtype B sequences obtained from one-fourth as many individuals. In conclusion, many nonpolymorphic mutations in protease and RT are under ARV selection pressure. The spectrum of treatment-selected mutations is changing as data for more individuals are collected, treatment exposures change, and the number of available sequences from non-subtype B viruses increases.

    View details for DOI 10.1128/AAC.00592-09

    View details for Web of Science ID 000270881200040

    View details for PubMedID 19721070

    View details for PubMedCentralID PMC2772298

  • Impaired interferon signaling is a common immune defect in human cancer PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Critchley-Thorne, R. J., Simons, D. L., Yan, N., Miyahira, A. K., Dirbas, F. M., Johnson, D. L., Swetter, S. M., Carlson, R. W., Fisher, G. A., Koong, A., Holmes, S., Lee, P. P. 2009; 106 (22): 9010-9015

    Abstract

    Immune dysfunction develops in patients with many cancer types and may contribute to tumor progression and failure of immunotherapy. Mechanisms underlying cancer-associated immune dysfunction are not fully understood. Efficient IFN signaling is critical to lymphocyte function; animals rendered deficient in IFN signaling develop cancer at higher rates. We hypothesized that altered IFN signaling may be a key mechanism of immune dysfunction common to cancer. To address this, we assessed the functional responses to IFN in peripheral blood lymphocytes from patients with 3 major cancers: breast cancer, melanoma, and gastrointestinal cancer. Type-I IFN (IFN-alpha)-induced signaling was reduced in T cells and B cells from all 3 cancer-patient groups compared to healthy controls. Type-II IFN (IFN-gamma)-induced signaling was reduced in B cells from all 3 cancer patient groups, but not in T cells or natural killer cells. Impaired-IFN signaling was equally evident in stage II, III, and IV breast cancer patients, and downstream functional defects in T cell activation were identified. Taken together, these findings indicate that defects in lymphocyte IFN signaling arise in patients with breast cancer, melanoma, and gastrointestinal cancer, and these defects may represent a common cancer-associated mechanism of immune dysfunction.

    View details for DOI 10.1073/pnas.0901329106

    View details for PubMedID 19451644

  • An Interactive Java Statistical Image Segmentation System: GemIdent JOURNAL OF STATISTICAL SOFTWARE Holmes, S., Kapelner, A., Lee, P. P. 2009; 30 (10): 1-20
  • How site fidelity leads to individual differences in the foraging activity of harvester ants BEHAVIORAL ECOLOGY Beverly, B. D., McLendon, H., Nacu, S., Holmes, S., Gordon, D. M. 2009; 20 (3): 633-638
  • Ultra-Deep Pyrosequencing of Hepatitis B Virus Quasispecies from Nucleoside and Nucleotide Reverse-Transcriptase Inhibitor (NRTI)-Treated Patients and NRTI-Naive Patients 15th Conference on Retroviruses and Opportunistic Infections Margeridon-Thermet, S., Shulman, N. S., Ahmed, A., Shahriar, R., Liu, T., Wang, C., Holmes, S. P., Babrzadeh, F., Gharizadeh, B., Hanczaruk, B., Simen, B. B., Egholm, M., Shafer, R. W. OXFORD UNIV PRESS INC. 2009: 1275–85

    Abstract

    The dynamics of emerging nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI) resistance in hepatitis B virus (HBV) are not well understood because standard dideoxynucleotide direct polymerase chain reaction (PCR) sequencing assays detect drug-resistance mutations only after they have become dominant. To obtain insight into NRTI resistance, we used a new sequencing technology to characterize the spectrum of low-prevalence NRTI-resistance mutations in HBV obtained from 20 plasma samples from 11 NRTI-treated patients and 17 plasma samples from 17 NRTI-naive patients, by using standard direct PCR sequencing and ultra-deep pyrosequencing (UDPS). UDPS detected drug-resistance mutations that were not detected by PCR in 10 samples from 5 NRTI-treated patients, including the lamivudine-resistance mutation V173L (in 5 samples), the entecavir-resistance mutations T184S (in 2 samples) and S202G (in 1 sample), the adefovir-resistance mutation N236T (in 1 sample), and the lamivudine and adefovir-resistance mutations V173L, L180M, A181T, and M204V (in 1 sample). G-to-A hypermutation mediated by the apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like family of cytidine deaminases was estimated to be present in 0.6% of reverse-transcriptase genes. Genotype A coinfection was detected by UDPS in each of 3 patients in whom genotype G virus was detected by direct PCR sequencing. UDPS detected low-prevalence HBV variants with NRTI-resistance mutations, G-to-A hypermutation, and low-level dual genotype infection with a sensitivity not previously possible.

    View details for DOI 10.1086/597808

    View details for Web of Science ID 000265035500007

    View details for PubMedID 19301976

    View details for PubMedCentralID PMC3353721

  • An Interactive Java Statistical Image Segmentation System: GemIdent. Journal of statistical software Holmes, S. n., Kapelner, A. n., Lee, P. P. 2009; 30 (10)

    Abstract

    Supervised learning can be used to segment/identify regions of interest in images using both color and morphological information. A novel object identification algorithm was developed in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from a recent study published by Kohrt et al. (2005). The algorithms are also showing promise in other domains. The success of the method depends heavily on the use of color, the relative homogeneity of object appearance and on interactivity. As is often the case in segmentation, an algorithm specifically tailored to the application works better than using broader methods that work passably well on any problem. Our main innovation is the interactive feature extraction from color images. We also enable the user to improve the classification with an interactive visualization system. This is then coupled with the statistical learning algorithms and intensive feedback from the user over many classification-correction iterations, resulting in a highly accurate and user-friendly solution. The system ultimately provides the locations of every cell recognized in the entire tissue in a text file tailored to be easily imported into R (Ihaka and Gentleman 1996; R Development Core Team 2009) for further statistical analyses. This data is invaluable in the study of spatial and multidimensional relationships between cell populations and tumor structure. This system is available at http://www.GemIdent.com/ together with three demonstration videos and a manual.

    View details for PubMedID 21614138

    View details for PubMedCentralID PMC3100170

  • Minority Human Immunodeficiency Virus Type 1 Variants in Antiretroviral-Naive Persons with Reverse Transcriptase Codon 215 Revertant Mutations JOURNAL OF VIROLOGY Mitsuya, Y., Varghese, V., Wang, C., Liu, T. F., Holmes, S. P., Jayakumar, P., Gharizadeh, B., Ronaghi, M., Klein, D., Fessel, W. J., Shafer, R. W. 2008; 82 (21): 10747-10755

    Abstract

    T215 revertant mutations such as T215C/D/E/S that evolve from the nucleoside reverse transcriptase (RT) inhibitor mutations T215Y/F have been found in about 3% of human immunodeficiency virus type 1 (HIV-1) isolates from newly diagnosed HIV-1-infected persons. We used a newly developed sequencing method-ultradeep pyrosequencing (UDPS; 454 Life Sciences)--to determine the frequency with which T215Y/F or other RT inhibitor resistance mutations could be detected as minority variants in samples from untreated persons that contain T215 revertants ("revertant" samples) compared with samples from untreated persons that lack such revertants ("control" samples). Among the 22 revertant and 29 control samples, UDPS detected a mean of 3.8 and 4.8 additional RT amino acid mutations, respectively. In 6 of 22 (27%) revertant samples and in 4 of 29 control samples (14%; P = 0.4), UDPS detected one or more RT inhibitor resistance mutations. T215Y or T215F was not detected in any of the revertant or control samples; however, 4 of 22 revertant samples had one or more T215 revertants that were detected by UDPS but not by direct PCR sequencing. The failure to detect viruses with T215Y/F in the 22 revertant samples in this study may result from the overwhelming replacement of transmitted T215Y variants by the more fit T215 revertants or from the primary transmission of a T215 revertant in a subset of persons with T215 revertants.

    View details for DOI 10.1128/JVI.01827-07

    View details for Web of Science ID 000260109600041

    View details for PubMedID 18715933

    View details for PubMedCentralID PMC2573178

  • HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS ANNALS OF APPLIED STATISTICS Diaconis, P., Goel, S., Holmes, S. 2008; 2 (3): 777-807

    View details for DOI 10.1214/08-AOAS165

    View details for Web of Science ID 000261057900001

  • Natural variation of HIV-1 group M integrase: Implications for a new class of antiretroviral inhibitors RETROVIROLOGY Rhee, S., Liu, T. F., Kiuchi, M., Zioni, R., Gifford, R. J., Holmes, S. P., Shafer, R. W. 2008; 5

    Abstract

    HIV-1 integrase is the third enzymatic target of antiretroviral (ARV) therapy. However, few data have been published on the distribution of naturally occurring amino acid variation in this enzyme. We therefore characterized the distribution of integrase variants among more than 1,800 published group M HIV-1 isolates from more than 1,500 integrase inhibitor (INI)-naïve individuals. Polymorphism rates equal or above 0.5% were found for 34% of the central core domain positions, 42% of the C-terminal domain positions, and 50% of the N-terminal domain positions. Among 727 ARV-naïve individuals in whom the complete pol gene was sequenced, integrase displayed significantly decreased inter- and intra-subtype diversity and a lower Shannon's entropy than protease or RT. All primary INI-resistance mutations with the exception of E157Q--which was present in 1.1% of sequences--were nonpolymorphic. Several accessory INI-resistance mutations including L74M, T97A, V151I, G163R, and S230N were also polymorphic with polymorphism rates ranging between 0.5% to 2.0%.

    View details for DOI 10.1186/1742-4690-5-74

    View details for Web of Science ID 000259485200001

    View details for PubMedID 18687142

    View details for PubMedCentralID PMC2546438

  • Genomic interrogation of ancestral Mycobacterium tuberculosis from south India 8th International Congress on Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases Narayanan, S., Gagneux, S., Hari, L., Tsolaki, A. G., Rajasekhar, S., Narayanan, P. R., Small, P. M., Holmes, S., DeRiemer, K. ELSEVIER SCIENCE BV. 2008: 474–83

    Abstract

    Mycobacterium tuberculosis is a very important global pathogen. One quarter of the world's TB cases occur in India. The tuberculosis strains isolated from south Indian patients exhibit certain phenotypic characteristics like low virulence in guinea-pigs, resistance to isoniazid, thiophene-2-carboxylic acid hydrazide (TCH) and para-amino salicylic acid (PAS), and enhanced susceptibility to H2O2. Besides this, a large percentage of the isolates harbor only a single copy of IS 6110 which makes these strains distinct. Hence, we have studied the genotypic characteristics of these strains by using advanced techniques like Deletion Micro array, deletion PCR, allelic discrimination RT-PCR using several lineage specific markers and KatG G1388T (non-synonymous) polymorphism along with spoligotyping. The analysis of 1215 tuberculosis patient isolates from south India revealed that 85.2% belonged to the ancestral lineage of M. tuberculosis. Comparative whole-genome hybridization identified six new genomic regions within this lineage that were variably deleted.

    View details for DOI 10.1016/j.meegid.2007.09.007

    View details for Web of Science ID 000257001400012

    View details for PubMedID 18024233

  • An interactive statistical imaging system and pilot application to characterize axillary lymph nodes in breast cancer Kohrt, H. E., Kapelner, A., Holmes, S., Lee, P. P. AMER SOC CLINICAL ONCOLOGY. 2008
  • The short-term regulation of foraging in harvester ants BEHAVIORAL ECOLOGY Gordon, D. M., Holmes, S., Nacu, S. 2008; 19 (1): 217-222
  • Threshold Graph Limits and Random Threshold Graphs. Internet mathematics Diaconis, P., Holmes, S., Janson, S. 2008; 5 (3): 267-320

    Abstract

    We study the limit theory of large threshold graphs and apply this to a variety of models for random threshold graphs. The results give a nice set of examples for the emerging theory of graph limits.

    View details for PubMedID 20811581

  • Dynamical bias in the coin toss SIAM REVIEW Diaconis, P., Holmes, S., Montgomery, R. 2007; 49 (2): 211-235
  • HIV-1 subtype B protease and reverse transcriptase amino acid covariation PLOS COMPUTATIONAL BIOLOGY Rhee, S., Liu, T. F., Holmes, S. P., Shafer, R. W. 2007; 3 (5): 836-843

    Abstract

    Despite the high degree of HIV-1 protease and reverse transcriptase (RT) mutation in the setting of antiretroviral therapy, the spectrum of possible virus variants appears to be limited by patterns of amino acid covariation. We analyzed patterns of amino acid covariation in protease and RT sequences from more than 7,000 persons infected with HIV-1 subtype B viruses obtained from the Stanford HIV Drug Resistance Database (http://hivdb.stanford.edu). In addition, we examined the relationship between conditional probabilities associated with a pair of mutations and the order in which those mutations developed in viruses for which longitudinal sequence data were available. Patterns of RT covariation were dominated by the distinct clustering of Type I and Type II thymidine analog mutations and the Q151M-associated mutations. Patterns of protease covariation were dominated by the clustering of nelfinavir-associated mutations (D30N and N88D), two main groups of protease inhibitor (PI)-resistance mutations associated either with V82A or L90M, and a tight cluster of mutations associated with decreased susceptibility to amprenavir and the most recently approved PI darunavir. Different patterns of covariation were frequently observed for different mutations at the same position including the RT mutations T69D versus T69N, L74V versus L74I, V75I versus V75M, T215F versus T215Y, and K219Q/E versus K219N/R, and the protease mutations M46I versus M46L, I54V versus I54M/L, and N88D versus N88S. Sequence data from persons with correlated mutations in whom earlier sequences were available confirmed that the conditional probabilities associated with correlated mutation pairs could be used to predict the order in which the mutations were likely to have developed. Whereas accessory nucleoside RT inhibitor-resistance mutations nearly always follow primary nucleoside RT inhibitor-resistance mutations, accessory PI-resistance mutations often preceded primary PI-resistance mutations.

    View details for DOI 10.1371/journal.pcbi.0030087

    View details for Web of Science ID 000249105100007

    View details for PubMedID 17500586

    View details for PubMedCentralID PMC1866358

  • Down-regulation of the interferon signaling pathway in T lymphocytes from patients with metastatic melanoma PLOS MEDICINE Critchley-Thorne, R. J., Yan, N., Nacu, S., Weber, J., Holmes, S. P., Lee, P. P. 2007; 4 (5): 897-911

    Abstract

    Dysfunction of the immune system has been documented in many types of cancers. The precise nature and molecular basis of immune dysfunction in the cancer state are not well defined.To gain insights into the molecular mechanisms of immune dysfunction in cancer, gene expression profiles of pure sorted peripheral blood lymphocytes from 12 patients with melanoma were compared to 12 healthy controls. Of 25 significantly altered genes in T cells and B cells from melanoma patients, 17 are interferon (IFN)-stimulated genes. These microarray findings were further confirmed by quantitative PCR and functional responses to IFNs. The median percentage of lymphocytes that phosphorylate STAT1 in response to interferon-alpha was significantly reduced (Delta = 16.8%; 95% confidence interval, 0.98% to 33.35%) in melanoma patients (n = 9) compared to healthy controls (n = 9) in Phosflow analysis. The Phosflow results also identified two subgroups of patients with melanoma: IFN-responsive (33%) and low-IFN-response (66%). The defect in IFN signaling in the melanoma patient group as a whole was partially overcome at the level of expression of IFN-stimulated genes by prolonged stimulation with the high concentration of IFN-alpha that is achievable only in IFN therapy used in melanoma. The lowest responders to IFN-alpha in the Phosflow assay also showed the lowest gene expression in response to IFN-alpha. Finally, T cells from low-IFN-response patients exhibited functional abnormalities, including decreased expression of activation markers CD69, CD25, and CD71; TH1 cytokines interleukin-2, IFN-gamma, and tumor necrosis factor alpha, and reduced survival following stimulation with anti-CD3/CD28 antibodies compared to controls.Defects in interferon signaling represent novel, dominant mechanisms of immune dysfunction in cancer. These findings may be used to design therapies to counteract immune dysfunction in melanoma and to improve cancer immunotherapy.

    View details for DOI 10.1371/journal.pmed.0040176

    View details for Web of Science ID 000246889700018

    View details for PubMedID 17488182

    View details for PubMedCentralID PMC1865558

  • Gene expression network analysis and applications to immunology BIOINFORMATICS Nacu, S., Critchley-Thorne, R., Lee, P., Holmes, S. 2007; 23 (7): 850-858

    Abstract

    We address the problem of using expression data and prior biological knowledge to identify differentially expressed pathways or groups of genes. Following an idea of Ideker et al. (2002), we construct a gene interaction network and search for high-scoring subnetworks. We make several improvements in terms of scoring functions and algorithms, resulting in higher speed and accuracy and easier biological interpretation. We also assign significance levels to our results, adjusted for multiple testing. Our methods are successfully applied to three human microarray data sets, related to cancer and the immune system, retrieving several known and potential pathways. The method, denoted by the acronym GXNA (Gene eXpression Network Analysis) is implemented in software that is publicly available and can be used on virtually any microarray data set.The source code and executable for the software, as well as certain supplemental materials, can be downloaded from http://stat.stanford.edu/~serban/gxna.

    View details for DOI 10.1093/bioinformatics/btm019

    View details for Web of Science ID 000246120400010

    View details for PubMedID 17267429

  • Unusual codon bias in vinyl chloride reductase genes of Dehalococcoides species APPLIED AND ENVIRONMENTAL MICROBIOLOGY McMurdie, P. J., Behrens, S. F., Holmes, S., Spormann, A. M. 2007; 73 (8): 2744-2747

    Abstract

    Vinyl chloride reductases (VC-RDase) are the key enzymes for complete microbial reductive dehalogenation of chloroethenes, including the groundwater pollutants tetrachloroethene and trichloroethene. Analysis of the codon usage of the VC-RDase genes vcrA and bvcA showed that these genes are highly unusual and are characterized by a low G+C fraction at the third position. The third position of codons in VC-RDase genes is biased toward the nucleotide T, even though available Dehalococcoides genome sequences indicate the absence of any tRNAs matching codons that end in T. The comparatively high level of abnormality in the codon usage of VC-RDase genes suggests an evolutionary history that is different from that of most other Dehalococcoides genes.

    View details for DOI 10.1128/AEM.02768-06

    View details for Web of Science ID 000246542400039

    View details for PubMedID 17308190

    View details for PubMedCentralID PMC1855607

  • An interactive statistical image segmentation and visualization system 4th International Conference on Medical Information Visualisation Kapelner, A., Lee, P. R., Holmes, S. IEEE COMPUTER SOC. 2007: 81–86
  • Differences in the temporal emergence of primary and secondary NRTI and PI drug-resistance mutations (DRMs) Rhee, S. Y., Liu, T. F., Holmes, S. P., Shafe, R. W. INT MEDICAL PRESS LTD. 2007: S67
  • Differences in the temporal emergence of primary and secondary NRTI and PI drug-resistance mutations (DRMs) 16th International HIV Drug Resistance Workshop Rhee, S. Y., Liu, T. F., Holmes, S. P., Shafer, R. W. INT MEDICAL PRESS LTD. 2007: S67–S67
  • Forager activation and food availability in harvester ants ANIMAL BEHAVIOUR Schafer, R. J., Holmes, S., Gordon, D. M. 2006; 71: 815-822
  • Constraints on Yukawa-type deviations from Newtonian gravity at 20 microns PHYSICAL REVIEW D Smullin, S. J., Geraci, A. A., Weld, D. M., Chiaverini, J., Holmes, S., Kapitulnik, A. 2005; 72 (12)
  • Profile of immune cells in axillary lymph nodes predicts disease-free survival in breast cancer PLOS MEDICINE Kohrt, H. E., Nouri, N., Nowels, K., Johnson, D., Holmes, S., Lee, P. P. 2005; 2 (9): 904-919

    Abstract

    While lymph node metastasis is among the strongest predictors of disease-free and overall survival for patients with breast cancer, the immunological nature of tumor-draining lymph nodes is often ignored, and may provide additional prognostic information on clinical outcome.We performed immunohistochemical analysis of 47 sentinel and 104 axillary (nonsentinel) nodes from 77 breast cancer patients with 5 y of follow-up to determine if alterations in CD4, CD8, and CD1a cell populations predict nodal metastasis or disease-free survival. Sentinel and axillary node CD4 and CD8 T cells were decreased in breast cancer patients compared to control nodes. CD1a dendritic cells were also diminished in sentinel and tumor-involved axillary nodes, but increased in tumor-free axillary nodes. Axillary node, but not sentinel node, CD4 T cell and dendritic cell populations were highly correlated with disease-free survival, independent of axillary metastasis. Immune profiling of ALN from a test set of 48 patients, applying CD4 T cell and CD1a dendritic cell population thresholds of CD4 > or = 7.0% and CD1a > or = 0.6%, determined from analysis of a learning set of 29 patients, provided significant risk stratification into favorable and unfavorable prognostic groups superior to clinicopathologic characteristics including tumor size, extent or size of nodal metastasis (CD4, p < 0.001 and CD1a, p < 0.001). Moreover, axillary node CD4 T cell and CD1a dendritic cell populations allowed more significant stratification of disease-free survival of patients with T1 (primary tumor size 2 cm or less) and T2 (5 cm or larger) tumors than all other patient characteristics. Finally, sentinel node immune profiles correlated primarily with the presence of infiltrating tumor cells, while axillary node immune profiles appeared largely independent of nodal metastases, raising the possibility that, within axillary lymph nodes, immune profile changes and nodal metastases represent independent processes.These findings demonstrate that the immune profile of tumor-draining lymph nodes is of novel biologic and clinical importance for patients with early stage breast cancer.

    View details for DOI 10.1371/journal.pmed.0020284

    View details for Web of Science ID 000232433600019

    View details for PubMedID 16124834

    View details for PubMedCentralID PMC1198041

  • Rapid assessment of recognition efficiency and functional capacity of antigen-specific T-cell responses JOURNAL OF IMMUNOTHERAPY Kohrt, H. E., Shu, C. T., Stuge, T. B., Holmes, S. P., Weber, J., Lee, P. P. 2005; 28 (4): 297-305

    Abstract

    It is increasingly recognized that cells within an antigen-specific CD8 T-cell population may be diverse in recognition efficiency for target, which may significantly affect the overall efficacy of the response in clinical settings such as viral infections and cancer. CD8 T cells with seemingly identical antigen specificity, particularly those elicited by cancer vaccines, may be heterogeneous for sensitivity and recognition efficiency for the cognate peptide and functional state in vivo. Analysis of individual T-cell clones derived from an antigen-specific T-cell population would provide an accurate assessment of the overall response; however, this is time- and labor-intensive, preventing rapid and routine assessment of patient samples from clinical trials. By stimulating antigen-specific T cells that otherwise appear homogeneous on tetramer staining with graded amounts of cognate peptides, the authors show that individual cells downmodulate surface T-cell receptors (TCR) and thus lose tetramer reactivity with variable dynamics within the T-cell population. The dynamics of TCR downregulation represent an accurate assessment of an individual cell's antigen sensitivity, recognition efficiency, and relative functional state within an antigen-specific population and have direct correlation to killing capacity by chromium release as well as degranulation by CD107 mobilization. Furthermore, despite correlation of average T-cell function by all three techniques, TCR downregulation uncovered heterogeneity in T-cell responses after vaccination among patient samples directly ex vivo. When examined using this novel technique, antigen-specific T cells elicited by vaccination with heteroclitic peptides exhibited significantly different recognition efficiencies for the heteroclitic versus native peptides, translating into differences in functional responses. With advancing cancer vaccine trials, the capacity to detect and functionally characterize antigen-specific T-cell responses in detail is critical. Techniques, as presented here, that rapidly assess the overall antigen sensitivity, recognition efficiency, and functional status of patients' T-cell responses will guide future vaccine trials and immunotherapies.

    View details for Web of Science ID 000230219200004

    View details for PubMedID 16000947

  • Memory T cells have gene expression patterns intermediate between naive and effector PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Holmes, S., He, M., Xu, T., Lee, P. P. 2005; 102 (15): 5519-5523

    Abstract

    The biological basis underlying differentiation of naive (NAI) T cells into effector (EFFE) and memory (MEM) cells is incompletely understood. Furthermore, whether NAI T cells serially differentiate into EFFE and then MEM cells (linear differentiation) or whether they concurrently differentiate into either EFFE or MEM cells (parallel differentiation) remains unresolved. We isolated NAI, EFFE, and MEM CD8(+) T cell subsets from human peripheral blood and analyzed their gene expression by using microarrays. We identified 156 genes that strongly differentiate NAI, EFFE, and MEM CD8(+) T cells; these genes provide previously unrecognized markers to help identify each cell type. Using several statistical approaches to analyze and group the data (standard heat-map and hierarchical clustering, a unique circular representation, multivariate analyses based on principal components, and a clustering method based on phylogenetic parsimony analysis), we assessed the lineage relationships between these subsets and showed that MEM cells have gene expression patterns intermediate between NAI and EFFE T cells. Our analysis suggests a common differentiation pathway to an intermediate state followed by a split into EFFE or MEM cells, hence supporting the parallel differentiation model. As such, conditions under which NAI T cells are activated may determine the magnitude of both EFFE and MEM cells, which arise subsequently. A better understanding of these conditions may be very useful in the design of future vaccine strategies to maximize MEM cell generation.

    View details for DOI 10.1073/pnas.0501437102

    View details for Web of Science ID 000228376600042

    View details for PubMedID 15809420

    View details for PubMedCentralID PMC556264

  • Sequential Monte Carlo methods for statistical analysis of tables JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Chen, Y. G., Diaconis, P., Holmes, S. R., Liu, J. S. 2005; 100 (469): 109-120
  • Immune profile in tumor-draining nodes predicts disease-free survival in breast cancer Kohrt, H. E., Nowels, K., Holmes, S., Lee, P. P., Johnson, D. SPRINGER. 2005: S17
  • Error distribution for gene expression data STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY Purdom, E., Holmes, S. P. 2005; 4

    Abstract

    We present a new instance of Laplace's second Law of Errors and show how it can be used in the analysis of data from microarray experiments. This error distribution is shown to fit microarray expression data much better than a normal distribution. The use of this distribution in a parametric bootstrap leads to more powerful tests as we show that the t-test is conservative in this setting. We propose a biological explanations for this distribution based on the Pareto distribution of the variables used to compute the log ratios.

    View details for Web of Science ID 000238478100022

    View details for PubMedID 16646833

  • Gene expression diversity among Mycobacterium tuberculosis clinical isolates MICROBIOLOGY-SGM Gao, Q., Kripke, K. E., Saldanha, A. J., Yan, W. H., Holmes, S., Small, P. M. 2005; 151: 5-14

    Abstract

    Intraspecies genetic diversity has been demonstrated to be important in the pathogenesis and epidemiology of several pathogens, such as HIV, influenza, Helicobacter and Salmonella. It is also important to consider strain-to-strain variation when identifying drug targets and vaccine antigens and developing tools for molecular diagnostics. Here, the authors present a description of the variability in gene expression patterns among ten clinical isolates of Mycobacterium tuberculosis, plus the laboratory strains H37Rv and H37Ra, growing in liquid culture. They identified 527 genes (15 % of those tested) that are variably expressed among the isolates studied. The remaining genes were divided into three categories based on their expression levels: unexpressed (38 %), low to undetectable expression (31 %) and consistently expressed (16 %). The expression categories were compared with functional categories and three biologically interesting gene lists: genes that are deleted among clinical isolates, T-cell antigens and essential genes. There were significant associations between expression variability and the classification of genes as T-cell antigens, involved in lipid metabolism, PE/PPE, insertion sequences and phages, and deleted among clinical isolates. This survey of mRNA expression among clinical isolates of M. tuberculosis demonstrates that genes with important functions can vary in their expression levels between strains grown under identical conditions.

    View details for Web of Science ID 000226352800002

    View details for PubMedID 15632420

  • Heterogeneity within antigen-specific T cell responses revealed by differential dynamics of TCR downregulation. 46th Annual Meeting of the American-Society-of-Hematology Kohrt, H. E., Shu, C. T., Holmes, S. P., Weber, J., Lee, P. P. AMER SOC HEMATOLOGY. 2004: 50B–50B
  • Diversity and recognition efficiency of T cell responses to cancer PLOS MEDICINE Stuge, T. B., Holmes, S. P., Saharan, S., Tuettenberg, A., Roederer, M., Weber, J. S., Lee, P. P. 2004; 1 (2): 149-160

    Abstract

    Melanoma patients vaccinated with tumor-associated antigens frequently develop measurable peptide-specific CD8+ T cell responses; however, such responses often do not confer clinical benefit. Understanding why vaccine-elicited responses are beneficial in some patients but not in others will be important to improve targeted cancer immunotherapies.We analyzed peptide-specific CD8+ T cell responses in detail, by generating and characterizing over 200 cytotoxic T lymphocyte clones derived from T cell responses to heteroclitic peptide vaccination, and compared these responses to endogenous anti-tumor T cell responses elicited naturally (a heteroclitic peptide is a modification of a native peptide sequence involving substitution of an amino acid at an anchor residue to enhance the immunogenicity of the peptide). We found that vaccine-elicited T cells are diverse in T cell receptor variable chain beta expression and exhibit a different recognition profile for heteroclitic versus native peptide. In particular, vaccine-elicited T cells respond to native peptide with predominantly low recognition efficiency--a measure of the sensitivity of a T cell to different cognate peptide concentrations for stimulation--and, as a result, are inefficient in tumor lysis. In contrast, endogenous tumor-associated-antigen-specific T cells show a predominantly high recognition efficiency for native peptide and efficiently lyse tumor targets.These results suggest that factors that shape the peptide-specific T cell repertoire after vaccination may be different from those that affect the endogenous response. Furthermore, our findings suggest that current heteroclitic peptide vaccination protocols drive expansion of peptide-specific T cells with a diverse range of recognition efficiencies, a significant proportion of which are unable to respond to melanoma cells. Therefore, it is critical that the recognition efficiency of vaccine-elicited T cells be measured, with the goal of advancing those modalities that elicit T cells with the greatest potential of tumor reactivity.

    View details for DOI 10.1371/journal.pmed.0010028

    View details for Web of Science ID 000227378800015

    View details for PubMedID 15578105

    View details for PubMedCentralID PMC529423

  • Microarray analysis reveals differences in gene expression of circulating CD8+ T cells in melanoma patients and healthy donors CANCER RESEARCH Xu, T., Shu, C. T., Purdom, E., Dang, D., Ilsley, D., Guo, Y., Weber, J., Holmes, S. P., Lee, P. P. 2004; 64 (10): 3661-3667

    Abstract

    Circulating T cells from many cancer patients are known to be dysfunctional and undergo spontaneous apoptosis. We used microarray technology to determine whether gene expression differences exist in T cells from melanoma patients versus healthy subjects, which may underlie these abnormalities. To maximize the resolution of our data, we sort purified CD8(+) subsets and amplified the extracted RNA for microarray analysis. These analyses show subtle but statistically significant expression differences for 10 genes in T cells from melanoma patients versus healthy controls, which were additionally confirmed by quantitative real-time PCR analysis. Whereas none of these genes are members of the classical apoptosis pathways, several may be linked to apoptosis. To additionally investigate the significance of these 10 genes, we combined them into a classifier and found that they provide a much better discrimination between melanoma and healthy T cells as compared with a classifier built uniquely with classical apoptosis-related genes. These results suggest the possible engagement of an alternative apoptosis pathway in circulating T cells from cancer patients.

    View details for Web of Science ID 000221419100043

    View details for PubMedID 15150126

  • Bioinformatics and management science: Some common tools and techniques OPERATIONS RESEARCH Abbas, A. E., Holmes, S. R. 2004; 52 (2): 165-190
  • An in vitro human cell-based assay to rank the relative immunogenicity of proteins TOXICOLOGICAL SCIENCES Stickler, M., Rochanayon, N., Razo, O. J., Mucha, J., Gebel, W., Faravashi, N., Chin, R., Holmes, S., Harding, F. A. 2004; 77 (2): 280-289

    Abstract

    A method to rank proteins based on their relative immunogenicity has been devised. A statistical analysis of peptide-specific responses in large human donor pools provides a structure index value metric that ranked four industrial enzymes in the order determined by both mouse and guinea pig exposure models. The ranking method also compared favorably with human sensitization rates measured in occupationally exposed workers. Structure index values for other proteins known to cause immune responses in humans were also determined and found to be higher than the value determined for human beta2-microglobulin. Using values from known immunogenic and putative nonimmunogenic proteins, a cut-off value was established. The structure index value calculation provides a comparative method to predict subsequent immunogenicity on a human population basis without the need to use animal models. Information provided by this assay can be used in the early development of protein therapies and other protein-based applications to select or create reduced immunogenicity variants.

    View details for DOI 10.1093/toxsci/kfh021

    View details for Web of Science ID 000188987000013

    View details for PubMedID 14691215

  • Stein's method: expository lectures and applications Institute of Mathematical Statistics Lecture Notes—Monograph Series. edited by Diaconis, P., Holmes, S. Inst. Math. Statist. . 2004
  • Probability by surprise Mathematical Adventures for Students and Amateurs Holmes, S. edited by Shubin, T., Hayes, D. MAA. 2004: 135–153
  • Human population-based identification of CD4(+) T-cell peptide epitope determinants JOURNAL OF IMMUNOLOGICAL METHODS Stickler, M., Chin, R., Faravashi, N., Gebel, W., Razo, O. J., Rochanayon, N., Power, S., Valdes, A. M., Holmes, S., Harding, F. A. 2003; 281 (1-2): 95-108

    Abstract

    A human cell-based method to identify functional CD4(+) T-cell epitopes in any protein has been developed. Proteins are tested as synthetic 15-mer peptides offset by three amino acids. Percent responses within a large donor population are tabulated for each peptide in the set. Peptide epitope regions are designated by difference in response frequency from the overall background response rate for the compiled dataset. Epitope peptide responses are reproducible, with a median coefficient of variance of 21% when tested on multiple random-donor sets. The overall average response rate within the dataset increases with increasing putative human population antigenic exposure to a given protein. The background rate was high for HPV16 E6, and was low for human-derived cytokine proteins. The assay identified recall epitope regions within the donor population for the protein staphylokinase. For an industrial protease with minimal presumed population exposure, immunodominant epitope peptides were identified that were found to bind promiscuously to many HLA class II molecules in vitro. The peptide epitope regions identified in presumably unexposed donors represent a subset of the total recall epitopes. Finally, as a negative control, the assay found no peptide epitope regions in human beta2-microglobulin. This method identifies functional CD4(+) T-cell epitopes in any protein without pre-selection for HLA class II, suggests whether a donor population is pre-exposed to a protein of interest, and does not require sensitized donors for in vitro testing.

    View details for DOI 10.1016/S0022-1759(03)00279-5

    View details for Web of Science ID 000186487500009

    View details for PubMedID 14580884

  • Bootstrapping phylogenetic trees: Theory and methods STATISTICAL SCIENCE Holmes, S. 2003; 18 (2): 241-255
  • Bradley Efron: A conversation with good friends STATISTICAL SCIENCE Holmes, S., Morris, C., Tibshirani, R. 2003; 18 (2): 268–81
  • An in vitro human cell-based assay to rank the relative immunogenicity of proteins Harding, F. A., Rochanayon, N., Razo, J., Gebel, W., Faravashi, N., Chin, R., Holmes, S., Mucha, J., Stickler, M. M. FEDERATION AMER SOC EXP BIOL. 2003: C308
  • The i-mune assay for the identification of functional CD4+T cell epitopes in any protein: population-based immunodominant primary and memory epitope regions Harding, F. A., Chin, R., Faravashi, N., Gebel, W., Razo, J., Rochanayon, N., Holmes, S., Valdes, A. M., Stickler, M. M. FEDERATION AMER SOC EXP BIOL. 2003: C308
  • Statistics for phylogenetic trees THEORETICAL POPULATION BIOLOGY Holmes, S. 2003; 63 (1): 17-32

    Abstract

    This paper poses the problem of estimating and validating phylogenetic trees in statistical terms. The problem is hard enough to warrant several tacks: we reason by analogy to rounding real numbers, and dealing with ranking data. These are both cases where, as in phylogeny the parameters of interest are not real numbers. Then we pose the problem in geometrical terms, using distances and measures on a natural space of trees. We do not solve the problems of inference on tree space, but suggest some coherent ways of tackling them.

    View details for Web of Science ID 000180151400002

    View details for PubMedID 12464492

  • A back migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome haplotypes AMERICAN JOURNAL OF HUMAN GENETICS Cruciani, F., Santolamazza, P., Shen, P. D., Macaulay, V., Moral, P., Olckers, A., Modiano, D., Holmes, S., Destro-Bisol, G., Coia, V., Wallace, D. C., Oefner, P. J., Torroni, A., Cavalli-Sforza, L. L., Scozzari, R., Underhill, P. A. 2002; 70 (5): 1197-1214

    Abstract

    The variation of 77 biallelic sites located in the nonrecombining portion of the Y chromosome was examined in 608 male subjects from 22 African populations. This survey revealed a total of 37 binary haplotypes, which were combined with microsatellite polymorphism data to evaluate internal diversities and to estimate coalescence ages of the binary haplotypes. The majority of binary haplotypes showed a nonuniform distribution across the continent. Analysis of molecular variance detected a high level of interpopulation diversity (PhiST=0.342), which appears to be partially related to the geography (PhiCT=0.230). In sub-Saharan Africa, the recent spread of a set of haplotypes partially erased pre-existing diversity, but a high level of population (PhiST=0.332) and geographic (PhiCT=0.179) structuring persists. Correspondence analysis shows that three main clusters of populations can be identified: northern, eastern, and sub-Saharan Africans. Among the latter, the Khoisan, the Pygmies, and the northern Cameroonians are clearly distinct from a tight cluster formed by the Niger-Congo-speaking populations from western, central western, and southern Africa. Phylogeographic analyses suggest that a large component of the present Khoisan gene pool is eastern African in origin and that Asia was the source of a back migration to sub-Saharan Africa. Haplogroup IX Y chromosomes appear to have been involved in such a migration, the traces of which can now be observed mostly in northern Cameroon.

    View details for Web of Science ID 000175012400010

    View details for PubMedID 11910562

    View details for PubMedCentralID PMC447595

  • Discussion on the meeting on 'Statistical modelling and analysis of genetic data' JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY Balding, D. J., Carothers, A. D., Marchini, J. L., Cardon, L. R., Vetta, A., Griffiths, B., Weir, B. S., Hill, W. G., Goldstein, D., Strimmer, K., Myers, S., Beaumont, M. A., Glasbey, C. A., Mayer, C. D., Richardson, S., Marshall, C., Durrett, R., Nielsen, R., Visscher, P. M., Knott, S. A., Haley, C. S., Ball, R. D., Hackett, C. A., Holmes, S., Husmeier, D., Jansen, R. C., Ter Braak, C. J., Maliepaard, C. A., Boer, M. P., Joyce, P., Li, N., Stephens, M., Marcoulides, G. A., Drezner, Z., Mardia, K., McVean, G., Meng, X. L., Ochs, M. F., Pagel, M., Sha, N., Vannucci, M., Sillanpaa, M. J., Sisson, S., Yandell, B. S., Jin, C. F., Satagopan, J. M., Gaffney, P. J., Zeng, Z. B., Broman, K. W., Speed, T. P., Fearnhead, P., Donnelly, P., Larget, B., Simon, D. L., Kadane, J. B., Nicholson, G., Smith, A. V., Jonsson, F., Gustafsson, O., Stefansson, K., Donnelly, P., Parmigiani, G., Garrett, E. S., Anbazhagan, R., Gabrielson, E. 2002; 64: 737-775
  • Geometry of the space of phylogenetic trees ADVANCES IN APPLIED MATHEMATICS Billera, L. J., Holmes, S. P., Vogtmann, K. 2001; 27 (4): 733-767
  • Statistical problems involving permutations with restricted positions Symposium on State of the Art in Probability and Statistics: Festschrift for Willem R VanZwet Diaconis, P., Graham, R., Holmes, S. P. INST MATHEMATICAL STATISTICS. 2001: 195–222
  • Analysis of a nonreversible Markov chain sampler ANNALS OF APPLIED PROBABILITY Diaconis, P., Holmes, S., Neal, R. M. 2000; 10 (3): 726-752
  • Matchings and phylogenetic trees PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Diaconis, P. W., Holmes, S. P. 1998; 95 (25): 14600-14602

    Abstract

    This paper presents a natural coordinate system for phylogenetic trees using a correspondence with the set of perfect matchings in the complete graph. This correspondence produces a distance between phylogenetic trees, and a way of enumerating all trees in a minimal step order. It is useful in randomized algorithms because it enables moves on the space of trees that make random optimization strategies "mix" quickly. It also promises a generalization to intermediary trees when data are not decisive as to their choice of tree, and a new way of constructing Bayesian priors on tree space.

    View details for Web of Science ID 000077436700005

    View details for PubMedID 9843935

  • The next linear collider test accelerator's rf pulse compression and transmission systems 17th Particle Accelerator Conference Tantawi, S. G., Adolphsen, C., Holmes, S., LAVINE, T., Loewen, R. J., Nantista, C., Pearson, C., Pope, R., Rifkin, J., Ruth, R. D., Vlieks, A. E. IEEE. 1998: 3192–3194
  • Addressing geographical data errors in a classification tree for soil unit prediction INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE Lagacherie, P., Holmes, S. 1997; 11 (2): 183-198
  • Are there still things to do in Bayesian statistics? Conference on Probability, Dynamics and Causality Diaconis, P., Holmes, S. KLUWER ACADEMIC PUBL. 1997: 5–18
  • Bootstrap confidence levels for phylogenetic trees (vol 93, pg 7085, 1996) PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Efron, B., Halloran, E., Holmes, S. 1996; 93 (23): 13429-13434

    Abstract

    Evolutionary trees are often estimated from DNA or RNA sequence data. How much confidence should we have in the estimated trees? In 1985, Felsenstein [Felsenstein, J. (1985) Evolution 39, 783-791] suggested the use of the bootstrap to answer this question. Felsenstein's method, which in concept is a straightforward application of the bootstrap, is widely used, but has been criticized as biased in the genetics literature. This paper concerns the use of the bootstrap in the tree problem. We show that Felsenstein's method is not biased, but that it can be corrected to better agree with standard ideas of confidence levels and hypothesis testing. These corrections can be made by using the more elaborate bootstrap method presented here, at the expense of considerably more computation.

    View details for Web of Science ID A1996VT05400135

    View details for PubMedID 8917608

    View details for PubMedCentralID PMC24110

  • GRAY CODES FOR RANDOMIZATION PROCEDURES STATISTICS AND COMPUTING Diaconis, P., Holmes, S. 1994; 4 (4): 287-302
  • CORRELATIONS AMONG QUALITY PARAMETERS OF PEACH FRUIT JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE GENARD, M., SOUTY, M., HOLMES, S., REICH, M., BREUILS, L. 1994; 66 (2): 241-245