Trained in the French school of Data Analysis in Montpellier, Susan Holmes has been working in non parametric multivariate statistics applied to Biology since 1985. She has taught at MIT, Harvard and was an Associate Professor of Biometry at Cornell before moving to Stanford in 1998. She teaches the Thinking Matters class: Breaking Codes and Finding patterns and likes working on big messy data sets, mostly from the areas of Immunology, Cancer Biology and Microbial Ecology. Her theoretical interests include applied probability, MCMC (Monte Carlo Markov chains), Graph Limit Theory, Differential Geometry and the topology of the space of Phylogenetic Trees.
CoDirector, Mathematical and Computational Sciences IDP (2002 - Present)
Director, VIGRE program in Statistics (2005 - 2014)
Professor, Member, BioX (2003 - Present)
Honors & Awards
CASBS Fellow, Center for the Advanced study of the Behavioral Sciences (2017-2018)
Breiman Lecturer, NIPS (December, 2016)
Fellow, Fields Institute in Mathematical Sciences, Toronto, Canada (2015)
Director's Transformative Research Award, NIH (2013)
John Henry Samter University Fellow in Undergraduate Education, Stanford (2012)
Fellow of the Institute of Mathematical Statistics, IMS (2005)
Boards, Advisory Committees, Professional Organizations
Science Boards, Fields Research Institute in the Mathematical Sciences (2012 - 2015)
Member and Chair (2010-2012), Science Board, NimBioS (2007 - 2012)
Current Research and Scholarly Interests
Our work focuses on large heterogeneous multi-layer data analyses. Whether using image analysis and segmentation for the study of cancer and immune cell interactions, or brain imaging and DNA sequence analyses for the study of dependencies between genetic and neurological dynamics, all these statistical studies have involved large complex datasets of different types where dynamics of interactions between different components of a system are the key to understanding the underlying biology.
We have generalized methods such as Principal Components Analysis (PCA) to more diverse data incorporating spatial information as well as tree dependency structures. This has proved useful in the study of drug resistant mutations in HIV and in the study of the dynamics of bacterial communities in the Human Microbiome.
The statistical bases for these nonparametric methods are computer intensive methods using optimization and Kernels and we often find useful embeddings of high dimensional data in low dimensional structures, the extreme case being finding a natural ordering in high dimensional data. More general manifolds have also proved useful in one of our current projects, joint with Xavier Pennec of INRIA-SophiaAntiolis which focuses on the uses of differential geometry in computational anatomy and image processing.
In a long term collaboration with Professor David Relman (Stanford Medical School) we are developing a multi-table toolbox of non parametric methods that enable users to normalize and visualize the multiple facets of the microbiota in the human body under different classes of perturbations. The tools developed in this project are all open source packages developed in R and provide an example of reproducible research in action.
Study of the dynamics of the human microbiome., Stanford University (March 1, 2013)
We are using statistical multivariate methods we are devloping new methods for modeling and visualizing the dynamics of bacterial community networks in the human microbiome.
- David Relman, Professor, Stanford University
Hierarchical Testing, Stanford University (8/1/2012)
Develop tools for testing evokutionary signals in bacterial data.
- Alfred Spormann, Member, Bio-X, Independent Labs, Institutes, and Centers (Dean of Research)
Use of Differential Geometry for Identification of Features in Spinal Images Predictive of Lower Back Pain, Stanford University and INRIA Sophia-Antipolis (1/1/2013)
Using modern Monte Carlo Markov chain techniques to generate posterior distributions on the space of transformations on manifolds, we are discovering the geometric features predictive of back pain using high resolution imagery.
- Xavier Pennec, Directeur de Recherche, INRIA- SophiaAntipolis
Multivariate Data Analysis of Drug Resistance in HIV, Stanford University (9/1/2003)
We use phylogenetic trees, networks and multivariate analyses to deompose the complexities of drug resistance in HIV.
- Robert Shafer, Professor, Stanford University School of Medicine
Multiple Component Longitudinal Analysis of Fragile X syndrome, Stanford (2/1/2014)
We use deformation based statistics to analyse images of brain images from Fragile X patients compared to healthy controls.
Stanford, CA 94305
Multiway Data Analysis for the Human Microbiome, Stanford University (11/1/2013 - 10/31/2018)
Combining 16S sRNA data, metabolic and transcriptomic data to predict resilience in the human microbiome after perturbations.
- David Relman, Professor, Stanford
Independent Studies (11)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Independent Study
STATS 199 (Aut, Win, Spr, Sum)
- Independent Study
STATS 299 (Aut, Win, Spr, Sum)
- Industrial Research for Statisticians
STATS 298 (Win, Spr, Sum)
- Industrial Research for Statisticians
STATS 398 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Ph.D. Research
CME 400 (Win, Spr, Sum)
- Practical Training
MCS 198 (Sum)
PHYSICS 490 (Aut)
STATS 399 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
Prior Year Courses
- Breaking Codes, Finding Patterns
THINK 3 (Aut)
- Modern Statistics for Modern Biology
STATS 366 (Sum)
- PhD First Year Student Workshop
STATS 303 (Aut, Win, Spr)
- Statistical Methods in Computational Genetics
STATS 155 (Aut)
- Breaking Codes, Finding Patterns
THINK 3 (Spr)
- PhD First Year Student Workshop
STATS 303 (Aut, Win, Spr)
- Statistical Methods in Computational Genetics
STATS 155 (Aut)
- Breaking Codes, Finding Patterns
Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women.
Proceedings of the National Academy of Sciences of the United States of America
Preterm birth (PTB) is the leading cause of neonatal morbidity and mortality. Previous studies have suggested that the maternal vaginal microbiota contributes to the pathophysiology of PTB, but conflicting results in recent years have raised doubts. We conducted a study of PTB compared with term birth in two cohorts of pregnant women: one predominantly Caucasian (n = 39) at low risk for PTB, the second predominantly African American and at high-risk (n = 96). We profiled the taxonomic composition of 2,179 vaginal swabs collected prospectively and weekly during gestation using 16S rRNA gene sequencing. Previously proposed associations between PTB and lower Lactobacillus and higher Gardnerella abundances replicated in the low-risk cohort, but not in the high-risk cohort. High-resolution bioinformatics enabled taxonomic assignment to the species and subspecies levels, revealing that Lactobacillus crispatus was associated with low risk of PTB in both cohorts, while Lactobacillus iners was not, and that a subspecies clade of Gardnerella vaginalis explained the genus association with PTB. Patterns of cooccurrence between L. crispatus and Gardnerella were highly exclusive, while Gardnerella and L. iners often coexisted at high frequencies. We argue that the vaginal microbiota is better represented by the quantitative frequencies of these key taxa than by classifying communities into five community state types. Our findings extend and corroborate the association between the vaginal microbiota and PTB, demonstrate the benefits of high-resolution statistical bioinformatics in clinical microbiome studies, and suggest that previous conflicting results may reflect the different risk profile of women of black race.
View details for DOI 10.1073/pnas.1705899114
View details for PubMedID 28847941
Exact sequence variants should replace operational taxonomic units in marker-gene data analysis.
The ISME journal
Recent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold. New methods control errors sufficiently such that amplicon sequence variants (ASVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. The benefits of finer resolution are immediately apparent, and arguments for ASV methods have focused on their improved resolution. Less obvious, but we believe more important, are the broad benefits that derive from the status of ASVs as consistent labels with intrinsic biological meaning identified independently from a reference database. Here we discuss how these features grant ASVs the combined advantages of closed-reference OTUs-including computational costs that scale linearly with study size, simple merging between independently processed data sets, and forward prediction-and of de novo OTUs-including accurate measurement of diversity and applicability to communities lacking deep coverage in reference databases. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that ASVs should replace OTUs as the standard unit of marker-gene analysis and reporting.The ISME Journal advance online publication, 21 July 2017; doi:10.1038/ismej.2017.119.
View details for DOI 10.1038/ismej.2017.119
View details for PubMedID 28731476
STATISTICAL PROOF? THE PROBLEM OF IRREPRODUCIBILITY
Bulletin of the American Mathematical Society
View details for DOI 10.1090/bull/1597
Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment
PLOS Computational Biology
2017; 13 (8): e1005706
View details for DOI 10.1371/journal.pcbi.1005706
DADA2: High-resolution sample inference from Illumina amplicon data.
2016; 13 (7): 581-583
We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.
View details for DOI 10.1038/nmeth.3869
View details for PubMedID 27214047
View details for PubMedCentralID PMC4927377
Waste not, want not: why rarefying microbiome data is inadmissible.
PLoS computational biology
2014; 10 (4)
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
View details for DOI 10.1371/journal.pcbi.1003531
View details for PubMedID 24699258
Parallel imaging of Drosophila embryos for quantitative analysis of genetic perturbations of the Ras pathway.
Disease models & mechanisms
The Ras pathway patterns the poles of the Drosophila embryo by downregulating the levels and activity of a DNA-binding transcriptional repressor Capicua (Cic). We demonstrate that the spatiotemporal pattern of Cic during this signaling event can be harnessed for functional studies of the Ras-pathway mutations from human diseases. Our approach relies on a new microfluidic device that enables parallel imaging of Cic dynamics in dozens of live embryos. We found that although the pattern of Cic in early embryos is complex, it can be accurately approximated by a product of one spatial profile and one time-dependent amplitude. Analysis of these functions of space and time alone reveals the differential effects of mutations within the Ras pathway. Given the highly-conserved nature of Ras-dependent control of Cic, our approach opens a new way for functional analysis of multiple sequence variants from developmental abnormalities and cancers.
View details for DOI 10.1242/dmm.030163
View details for PubMedID 28495673
Mutational Correlates of Virological Failure in Individuals Receiving a WHO-Recommended Tenofovir-Containing First-Line Regimen: An International Collaboration.
2017; 18: 225-235
Tenofovir disoproxil fumarate (TDF) genotypic resistance defined by K65R/N and/or K70E/Q/G occurs in 20% to 60% of individuals with virological failure (VF) on a WHO-recommended TDF-containing first-line regimen. However, the full spectrum of reverse transcriptase (RT) mutations selected in individuals with VF on such a regimen is not known. To identify TDF regimen-associated mutations (TRAMs), we compared the proportion of each RT mutation in 2873 individuals with VF on a WHO-recommended first-line TDF-containing regimen to its proportion in a cohort of 50,803 antiretroviral-naïve individuals. To identify TRAMs specifically associated with TDF-selection pressure, we compared the proportion of each TRAM to its proportion in a cohort of 5805 individuals with VF on a first-line thymidine analog-containing regimen. We identified 83 TRAMs including 33 NRTI-associated, 40 NNRTI-associated, and 10 uncommon mutations of uncertain provenance. Of the 33 NRTI-associated TRAMs, 12 - A62V, K65R/N, S68G/N/D, K70E/Q/T, L74I, V75L, and Y115F - were more common among individuals receiving a first-line TDF-containing compared to a first-line thymidine analog-containing regimen. These 12 TDF-selected TRAMs will be important for monitoring TDF-associated transmitted drug-resistance and for determining the extent of reduced TDF susceptibility in individuals with VF on a TDF-containing regimen.
View details for DOI 10.1016/j.ebiom.2017.03.024
View details for PubMedID 28365230
View details for PubMedCentralID PMC5405160
Mass Cytometry Analytical Approaches Reveal Cytokine-Induced Changes in Natural Killer Cells.
Cytometry. Part B, Clinical cytometry
2017; 92 (1): 57-67
Natural killer (NK) cells have antiviral and antitumor activity that could be harnessed for the treatment of infections and malignancies. To maintain cell viability and enhance antiviral and antitumor effects, NK cells are frequently treated with cytokines. Here we performed an extensive assessment of the effects of cytokines on the phenotype and function of human NK cells.We used cytometry by time-of-flight (CyTOF) to evaluate NK cell repertoire changes after stimulation with interleukin (IL)-2, IL-15 or a combination of IL- 12/IL-15/IL-18. To analyze the high dimensional CyTOF data, we used several statistical and visualization tools, including viSNE (Visualization of t-Distributed Stochastic Neighbor Embedding), Citrus (Cluster identification, characterization, and regression), correspondence analysis, and the Friedman-Rafsky test.All three treatments (IL-2, IL-15, and IL-12/IL-15/IL-18) increase expression of CD56 and CD69. The effects of treatment with IL-2 and IL-15 are nearly indistinguishable and characterized principally by increased expression of surface markers including CD56, NKp30, NKp44, and increased expression of functional markers, such as perforin, granzyme B, and MIP-1β. The combination of IL-12/IL-15/IL- 18 induces a profound shift in the repertoire structure, decreasing expression of CD16, CD57, CD8, NKp30, NKp46, and NKG2D, and dramatically increasing expression of IFN-γ.CyTOF provides insights into the effects of cytokines on the phenotype and function of NK cells, which could inform future research efforts and approaches to NK cell immunotherapy. There are several analytical approaches to CyTOF data, and the appropriate method should be carefully selected based on which aspect of the dataset is being explored. This article is protected by copyright. All rights reserved.
View details for DOI 10.1002/cyto.b.21500
View details for PubMedID 27933717
Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations.
2017; 18 (Suppl 10): 394
Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points' 'natural ordering' and their corresponding uncertainties can help researchers draw insights about the mechanisms involved.We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data.Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/ .
View details for DOI 10.1186/s12859-017-1790-x
View details for PubMedID 28929970
View details for PubMedCentralID PMC5606221
Interactive Visualization of Hierarchically Structured Data
Journal of Computational and Graphical Statistics
View details for DOI 10.1080/10618600.2017.1392866
Template Shape Estimation: Correcting an Asymptotic Bias
SIAM J Imaging Science
2017; 10 (2): 808-844
View details for DOI 10.1137/16M1084493
- Bayesian Nonparametric Ordination for the Analysis of Microbial Communities Journal of the American Statistical Association 2017
1,2-Dichloroethane Exposure Alters the Population Structure, Metabolism, and Kinetics of a Trichloroethene-Dechlorinating Dehalococcoides mccartyi Consortium.
Environmental science & technology
Bioremediation of groundwater contaminated with chlorinated aliphatic hydrocarbons such as perchloroethene and trichloroethene can result in the accumulation of the undesirable intermediate vinyl chloride. Such accumulation can either be due to the absence of specific vinyl chloride respiring Dehalococcoides mccartyi or to the inhibition of such strains by the metabolism of other microorganisms. The fitness of vinyl chloride respiring Dehalococcoides mccartyi subpopulations is particularly uncertain in the presence of chloroethene/chloroethane cocontaminant mixtures, which are commonly found in contaminated groundwater. Therefore, we investigated the structure of Dehalococcoides populations in a continuously fed reactor system under changing chloroethene/ethane influent conditions. We observed that increasing the influent ratio of 1,2-dichloroethane to trichloroethene was associated with ecological selection of a tceA-containing Dehalococcoides population relative to a vcrA-containing Dehalococcoides population. Although both vinyl chloride and 1,2-dichloroethane could be simultaneously transformed to ethene, prolonged exposure to 1,2-dichloroethane diminished the vinyl chloride transforming capacity of the culture. Kinetic tests revealed that dechlorination of 1,2-dichloroethane by the consortium was strongly inhibited by cis-dichloroethene but not vinyl chloride. Native polyacrylamide gel electrophoresis and mass spectrometry revealed that a trichloroethene reductive dehalogenase (TceA) homologue was the most consistently expressed of four detectable reductive dehalogenases during 1,2-dichloroethane exposure, suggesting that it catalyzes the reductive dihaloelimination of 1,2-dichloroethane to ethene.
View details for PubMedID 27809491
HIV-1 Protease, Reverse Transcriptase, and Integrase Variation
JOURNAL OF VIROLOGY
2016; 90 (13): 6058-6070
HIV-1 protease (PR), reverse transcriptase (RT), and integrase (IN) variability presents a challenge to laboratories performing genotypic resistance testing. This challenge will grow with increased sequencing of samples enriched for proviral DNA such as dried blood spots and increased use of next-generation sequencing (NGS) to detect low-abundance HIV-1 variants. We analyzed PR and RT sequences from >100,000 individuals and IN sequences from >10,000 individuals to characterize variation at each amino acid position, identify mutations indicating APOBEC-mediated G-to-A editing, and identify mutations resulting from selective drug pressure. Forty-seven percent of PR, 37% of RT and 34% of IN positions had one or more amino acid variants with a prevalence ≥1%. Seventy percent of PR, 60% of RT and 60% of IN positions had one or more variants with a prevalence ≥0.1%. Overall 201 PR, 636 RT and 346 IN variants had a prevalence ≥0.1%. The median inter-subtype prevalence-ratio was 2.9-, 2.1- and 1.9-fold for these PR, RT and IN variants, respectively. Only 5.0% of PR, 3.7% of RT and 2.0% of IN variants had a median inter-subtype prevalence-ratio ≥10-fold. Variants at lower prevalences were more likely to differ biochemically and to be part of an electrophoretic mixture compared to high prevalence variants. There were 209 mutations indicative of APOBEC-mediated G-to-A editing and 326 non-polymorphic treatment-selected mutations. Identifying viruses with a high number of APOBEC-associated mutations will facilitate the quality control of dried blood spot sequencing. Identifying sequences with a high proportion of rare mutations will facilitate the quality control of NGS.Most antiretroviral drugs target three HIV-1 proteins: PR, RT, and IN. These proteins are highly variable: many different amino acids can be present at the same position in viruses from different individuals. Some of the amino acid variants cause drug resistance and occur mainly in individuals receiving antiretroviral drugs. Some variants result from a human cellular defense mechanism called APOBEC-mediated hypermutation. Many variants result from naturally occurring mutation. Some variants may represent technical artifacts. We studied PR and RT sequences from >100,000 individuals and IN sequences from >10,000 individuals to quantify variation at each amino acid position in these three HIV-1 proteins. We performed analyses to determine which amino acid variants resulted from antiretroviral drug selection pressure, APOBEC-mediated editing, and naturally occurring variation. Our results provide information essential to clinical, research, and public health laboratories performing genotypic resistance testing by sequencing HIV-1 PR, RT, and IN.
View details for DOI 10.1128/JVI.00495-16
View details for Web of Science ID 000378340300019
View details for PubMedID 27099321
REPRODUCIBLE RESEARCH WORKFLOW IN R FOR THE ANALYSIS OF PERSONALIZED HUMAN MICROBIOME DATA.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2016; 21: 183-194
This article presents a reproducible research workflow for amplicon-based microbiome studies in personalized medicine created using Bioconductor packages and the knitr markdown interface.We show that sometimes a multiplicity of choices and lack of consistent documentation at each stage of the sequential processing pipeline used for the analysis of microbiome data can lead to spurious results. We propose its replacement with reproducible and documented analysis using R packages dada2, knitr, and phyloseq. This workflow implements both key stages of amplicon analysis: the initial filtering and denoising steps needed to construct taxonomic feature tables from error-containing sequencing reads (dada2), and the exploratory and inferential analysis of those feature tables and associated sample metadata (phyloseq). This workow facilitates reproducible interrogation of the full set of choices required in microbiome studies. We present several examples in which we leverage existing packages for analysis in a way that allows easy sharing and modification by others, and give pointers to articles that depend on this reproducible workflow for the study of longitudinal and spatial series analyses of the vaginal microbiome in pregnancy and the oral microbiome in humans with healthy dentition and intra-oral tissues.
View details for PubMedID 26776185
Bioconductor workflow for microbiome data analysis: from raw reads to community analyses.
2016; 5: 1492-?
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.
View details for DOI 10.12688/f1000research.8986.1
View details for PubMedID 27508062
Interpreting Prevotella and Bacteroides as biomarkers of diet and lifestyle.
2016; 4 (1): 15-?
In a series of studies of the gut microbiome, "enterotypes" have been used to classify gut microbiome samples that cluster together in ordination analyses. Initially, three distinct enterotypes were described, although later studies reduced this to two clusters, one dominated by Bacteroides or Clostridiales species found more commonly in Western (American and Western European) subjects and the other dominated by Prevotella more often associated with non-Western subjects. The two taxa, Bacteroides and Prevotella, have been presumed to represent consistent underlying microbial communities, but no one has demonstrated the presence of additional microbial taxa across studies that can define these communities.We analyzed the combined microbiome data from five previous studies with samples across five continents. We clearly demonstrate that there are no consistent bacterial taxa associated with either Bacteroides- or Prevotella-dominated communities across the studies. By increasing the number and diversity of samples, we found gradients of both Bacteroides and Prevotella and a lack of the distinct clusters in the principal coordinate plots originally proposed in the "enterotypes" hypothesis. The apparent segregation of the samples seen in many ordination plots is due to the differences in the samples' Prevotella and Bacteroides abundances and does not represent consistent microbial communities within the "enterotypes" and is not associated with other taxa across studies. The projections we see are consistent with a continuum of values created from a simple mixture of Bacteroides and Prevotella; these two biomarkers are significantly correlated to the projection axes. We suggest that previous findings citing Bacteroides- and Prevotella-dominated clusters are the result of an artifact caused by the greater relative abundance of these two taxa over other taxa in the human gut and the sparsity of Prevotella abundant samples.We believe that the term "enterotypes" is misleading because it implies both an underlying consistency of community taxa and a clear separation of sets of human gut samples, neither of which is supported by the broader data. We propose the use of "biomarker" as a more accurate description of these and other taxa that correlate with diet, lifestyle, and disease state.
View details for DOI 10.1186/s40168-016-0160-7
View details for PubMedID 27068581
Measuring multivariate association and beyond.
2016; 10: 132-167
View details for DOI 10.1214/16-SS116
More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1.
In the early days of HIV treatment, drug resistance occurred rapidly and predictably in all patients, but under modern treatments, resistance arises slowly, if at all. The probability of resistance should be controlled by the rate of generation of resistance mutations. If many adaptive mutations arise simultaneously, then adaptation proceeds by soft selective sweeps in which multiple adaptive mutations spread concomitantly, but if adaptive mutations occur rarely in the population, then a single adaptive mutation should spread alone in a hard selective sweep. Here, we use 6717 HIV-1 consensus sequences from patients treated with first-line therapies between 1989 and 2013 to confirm that the transition from fast to slow evolution of drug resistance was indeed accompanied with the expected transition from soft to hard selective sweeps. This suggests more generally that evolution proceeds via hard sweeps if resistance is unlikely and via soft sweeps if it is likely.
View details for DOI 10.7554/eLife.10670
View details for PubMedID 26882502
View details for PubMedCentralID PMC4764592
Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea.
2016; 7: 10516-?
Marine mammals play crucial ecological roles in the oceans, but little is known about their microbiotas. Here we study the bacterial communities in 337 samples from 5 body sites in 48 healthy dolphins and 18 healthy sea lions, as well as those of adjacent seawater and other hosts. The bacterial taxonomic compositions are distinct from those of other mammals, dietary fish and seawater, are highly diverse and vary according to body site and host species. Dolphins harbour 30 bacterial phyla, with 25 of them in the mouth, several abundant but poorly characterized Tenericutes species in gastric fluid and a surprisingly paucity of Bacteroidetes in distal gut. About 70% of near-full length bacterial 16S ribosomal RNA sequences from dolphins are unique. Host habitat, diet and phylogeny all contribute to variation in marine mammal distal gut microbiota composition. Our findings help elucidate the factors structuring marine mammal microbiotas and may enhance monitoring of marine mammal health.
View details for DOI 10.1038/ncomms10516
View details for PubMedID 26839246
View details for PubMedCentralID PMC4742810
- Variation in Taxonomic Composition of the Fecal Microbiota in an Inbred Mouse Strain across Individuals and Time PLOS ONE 2015; 10 (11)
Temporal and spatial variation of the human microbiota during pregnancy
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2015; 112 (35): 11060-11065
Despite the critical role of the human microbiota in health, our understanding of microbiota compositional dynamics during and after pregnancy is incomplete. We conducted a case-control study of 49 pregnant women, 15 of whom delivered preterm. From 40 of these women, we analyzed bacterial taxonomic composition of 3,767 specimens collected prospectively and weekly during gestation and monthly after delivery from the vagina, distal gut, saliva, and tooth/gum. Linear mixed-effects modeling, medoid-based clustering, and Markov chain modeling were used to analyze community temporal trends, community structure, and vaginal community state transitions. Microbiota community taxonomic composition and diversity remained remarkably stable at all four body sites during pregnancy (P > 0.05 for trends over time). Prevalence of a Lactobacillus-poor vaginal community state type (CST 4) was inversely correlated with gestational age at delivery (P = 0.0039). Risk for preterm birth was more pronounced for subjects with CST 4 accompanied by elevated Gardnerella or Ureaplasma abundances. This finding was validated with a set of 246 vaginal specimens from nine women (four of whom delivered preterm). Most women experienced a postdelivery disturbance in the vaginal community characterized by a decrease in Lactobacillus species and an increase in diverse anaerobes such as Peptoniphilus, Prevotella, and Anaerococcus species. This disturbance was unrelated to gestational age at delivery and persisted for up to 1 y. These findings have important implications for predicting premature labor, a major global health problem, and for understanding the potential impact of a persistent, altered postpartum microbiota on maternal health, including outcomes of pregnancies following short interpregnancy intervals.
View details for DOI 10.1073/pnas.1502875112
View details for Web of Science ID 000360383200068
- Human NK cell repertoire diversity reflects immune experience and correlates with viral susceptibility SCIENCE TRANSLATIONAL MEDICINE 2015; 7 (297)
- de Finetti Priors using Markov chain Monte Carlo computations STATISTICS AND COMPUTING 2015; 25 (4): 797-808
Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking.
2015; 31 (2): 282-283
We have created a Shiny-based web application, called Shiny-phyloseq, for dynamic interaction with microbiome data that runs on any modern web-browser, and requires no programming to use - increasing the accessibility and decreasing the entrance requirement to using phyloseq and related R tools. Along with a data- and context-aware dynamic interface for exploring the effects of parameter and method choices, Shiny-phyloseq also records the complete user-input and subsequent graphical results of a user's session, allowing the user to archive, share, and reproduce the sequence of steps that created their result - without writing any new code themselves. Availability and Implementation: Shiny-phyloseq is implemented entirely in the R language. It can be hosted/launched by any system with R installed, including Windows, Mac OS, and most Linux distributions. Information technology administrators can also host Shiny-phyloseq from a remote server, in which case users need only have a web-browser installed. Shiny-phyloseq is provided free of charge under a GPL-3 open-source license through GitHub at http://email@example.com.
View details for DOI 10.1093/bioinformatics/btu616
View details for PubMedID 25262154
- Enhanced natural killer-cell and T-cell responses to influenza A virus during pregnancy PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 2014; 111 (40): 14506-14511
structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data
JOURNAL OF STATISTICAL SOFTWARE
2014; 59 (13)
View details for Web of Science ID 000341807500001
- HIV-1 transmission networks in a small world. journal of infectious diseases 2014; 209 (2): 180-182
Nasal Microenvironments and Interspecific Interactions Influence Nasal Microbiota Complexity and S. aureus Carriage.
Cell host & microbe
2013; 14 (6): 631-640
The indigenous microbiota of the nasal cavity plays important roles in human health and disease. Patterns of spatial variation in microbiota composition may help explain Staphylococcus aureus colonization and reveal interspecies and species-host interactions. To assess the biogeography of the nasal microbiota, we sampled healthy subjects, representing both S. aureus carriers and noncarriers at three nasal sites (anterior naris, middle meatus, and sphenoethmoidal recess). Phylogenetic compositional and sparse linear discriminant analyses revealed communities that differed according to site epithelium type and S. aureus culture-based carriage status. Corynebacterium accolens and C. pseudodiphtheriticum were identified as the most important microbial community determinants of S. aureus carriage, and competitive interactions were only evident at sites with ciliated pseudostratified columnar epithelium. In vitro cocultivation experiments provided supporting evidence of interactions among these species. These results highlight spatial variation in nasal microbial communities and differences in community composition between S. aureus carriers and noncarriers.
View details for DOI 10.1016/j.chom.2013.11.005
View details for PubMedID 24331461
Detection of cytomegalovirus drug resistance mutations by next-generation sequencing.
Journal of clinical microbiology
2013; 51 (11): 3700-3710
Antiviral therapy for cytomegalovirus (CMV) plays an important role in the clinical management of solid organ and hematopoietic stem cell transplant recipients. However, CMV antiviral therapy can be complicated by drug resistance associated with mutations in the phosphotransferase UL97 and the DNA polymerase UL54. We have developed an amplicon-based high-throughput sequencing strategy for detecting CMV drug resistance mutations in clinical plasma specimens using a microfluidics PCR platform for multiplexed library preparation and a benchtop next-generation sequencing instrument. Plasmid clones of the UL97 and UL54 genes were used to demonstrate the low overall empirical error rate of the assay (0.189%) and to develop a statistical algorithm for identifying authentic low-abundance variants. The ability of the assay to detect resistance mutations was tested with mixes of wild-type and mutant plasmids, as well as clinical CMV isolates and plasma samples that were known to contain mutations that confer resistance. Finally, 48 clinical plasma specimens with a range of viral loads (394 to 2,191,011 copies/ml plasma) were sequenced using multiplexing of up to 24 specimens per run. This led to the identification of seven resistance mutations, three of which were present in <20% of the sequenced population. Thus, this assay offers more sensitive detection of minor variants and a higher multiplexing capacity than current methods for the genotypic detection of CMV drug resistance mutations.
View details for DOI 10.1128/JCM.01605-13
View details for PubMedID 23985916
Genetically dictated change in host mucus carbohydrate landscape exerts a diet-dependent effect on the gut microbiota
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2013; 110 (42): 17059-17064
We investigate how host mucus glycan composition interacts with dietary carbohydrate content to influence the composition and expressed functions of a human gut community. The humanized gnotobiotic mice mimic humans with a nonsecretor phenotype due to knockout of their α1-2 fucosyltransferase (Fut2) gene. The fecal microbiota of Fut2(-) mice that lack fucosylated host glycans show decreased alpha diversity relative to Fut2(+) mice and exhibit significant differences in community composition. A glucose-rich plant polysaccharide-deficient (PD) diet exerted a strong effect on the microbiota membership but eliminated the effect of Fut2 genotype. Additionally fecal metabolites predicted host genotype in mice on a polysaccharide-rich standard diet but not on a PD diet. A more detailed mechanistic analysis of these interactions involved colonization of gnotobiotic Fut2(+) and Fut2(-) mice with Bacteroides thetaiotaomicron, a prominent member of the human gut microbiota known to adaptively forage host mucosal glycans when dietary polysaccharides are absent. Within Fut2(-) mice, the B. thetaiotaomicron fucose catabolic pathway was markedly down-regulated, whereas BT4241-4247, an operon responsive to terminal β-galactose, the precursor that accumulates in the Fut2(-) mice, was significantly up-regulated. These changes in B. thetaiotaomicron gene expression were only evident in mice fed a PD diet, wherein B. thetaiotaomicron relies on host mucus consumption. Furthermore, up-regulation of the BT4241-4247 operon was also seen in humanized Fut2(-) mice. Together, these data demonstrate that differences in host genotype that affect the carbohydrate landscape of the distal gut interact with diet to alter the composition and function of resident microbes in a diet-dependent manner.
View details for DOI 10.1073/pnas.1306070110
View details for Web of Science ID 000325634200076
View details for PubMedID 24062455
- Prototypical Recombinant Multi-Protease-Inhibitor-Resistant Infectious Molecular Clones of Human Immunodeficiency Virus Type 1 ANTIMICROBIAL AGENTS AND CHEMOTHERAPY 2013; 57 (9): 4290-4299
- ANALYSIS OF CASINO SHELF SHUFFLING MACHINES ANNALS OF APPLIED PROBABILITY 2013; 23 (4): 1692-1720
- Harvester ants use interactions to regulate forager activation and availability ANIMAL BEHAVIOUR 2013; 86 (1): 197-207
Nucleoside reverse transcriptase inhibitor resistance mutations associated with first-line stavudine-containing antiretroviral therapy: programmatic implications for countries phasing out stavudine.
journal of infectious diseases
2013; 207: S70-7
Background The World Health Organization Antiretroviral Treatment Guidelines recommend phasing-out stavudine because of its risk of long-term toxicity. There are two mutational pathways of stavudine resistance with different implications for zidovudine and tenofovir cross-resistance, the primary candidates for replacing stavudine. However, because resistance testing is rarely available in resource-limited settings, it is critical to identify the cross-resistance patterns associated with first-line stavudine failure. Methods We analyzed HIV-1 resistance mutations following first-line stavudine failure from 35 publications comprising 1,825 individuals. We also assessed the influence of concomitant nevirapine vs. efavirenz, therapy duration, and HIV-1 subtype on the proportions of mutations associated with zidovudine vs. tenofovir cross-resistance. Results Mutations with preferential zidovudine activity, K65R or K70E, occurred in 5.3% of individuals. Mutations with preferential tenofovir activity, ≥two thymidine analog mutations (TAMs) or Q151M, occurred in 22% of individuals. Nevirapine increased the risk of TAMs, K65R, and Q151M. Longer therapy increased the risk of TAMs and Q151M but not K65R. Subtype C and CRF01_AE increased the risk of K65R, but only CRF01_AE increased the risk of K65R without Q151M. Conclusions Regardless of concomitant nevirapine vs. efavirenz, therapy duration, or subtype, tenofovir was more likely than zidovudine to retain antiviral activity following first-line d4T therapy.
View details for DOI 10.1093/infdis/jit114
View details for PubMedID 23687292
- Nucleoside reverse transcriptase inhibitor resistance mutations associated with first-line Stavudine-containing antiretroviral therapy: programmatic implications for countries phasing out Stavudine. journal of infectious diseases 2013; 207: S70-7
- Interval Graph Limits ANNALS OF COMBINATORICS 2013; 17 (1): 27-52
phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.
2013; 8 (4)
the analysis of microbial communities through dna sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
View details for DOI 10.1371/journal.pone.0061217
View details for PubMedID 23630581
Advancing Our Understanding of the Human Microbiome Using QIIME.
Methods in enzymology
2013; 531: 371-444
High-throughput DNA sequencing technologies, coupled with advanced bioinformatics tools, have enabled rapid advances in microbial ecology and our understanding of the human microbiome. QIIME (Quantitative Insights Into Microbial Ecology) is an open-source bioinformatics software package designed for microbial community analysis based on DNA sequence data, which provides a single analysis framework for analysis of raw sequence data through publication-quality statistical analyses and interactive visualizations. In this chapter, we demonstrate the use of the QIIME pipeline to analyze microbial communities obtained from several sites on the bodies of transgenic and wild-type mice, as assessed using 16S rRNA gene sequences generated on the Illumina MiSeq platform. We present our recommended pipeline for performing microbial community analysis and provide guidelines for making critical choices in the process. We present examples of some of the types of analyses that are enabled by QIIME and discuss how other tools, such as phyloseq and R, can be applied to expand upon these analyses.
View details for DOI 10.1016/B978-0-12-407863-5.00019-8
View details for PubMedID 24060131
- Sampling from a Manifold Festchrift for Joe Eaton IMS. 2013
Low-Level Persistence of Drug Resistance Mutations in Hepatitis B Virus-Infected Subjects with a Past History of Lamivudine Treatment
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY
2013; 57 (1): 343-349
We sought to determine the prevalence of hepatitis B virus (HBV) lamivudine (LAM)-resistant minority variants in subjects who once received LAM but had discontinued it prior to virus sampling. We performed direct PCR Sanger sequencing and ultradeep pyrosequencing (UDPS) of HBV reverse transcriptase (RT) of plasma viruses from 45 LAM-naive subjects and 46 LAM-experienced subjects who had discontinued LAM a median of 24 months earlier. UDPS was performed to a depth of ∼3,000 reads per nucleotide. Minority variants were defined as differences from the Sanger sequence present in ≥0.5% of UDPS reads in a sample. Sanger sequencing identified ≥1 LAM resistance mutations (rtL80I/V, rtM204I, and rtA181T) in samples from 5 (11%) of 46 LAM-experienced and none of 45 LAM-naive subjects (0%; P = 0.06). UDPS detected ≥1 LAM resistance mutations (rtL80I/V, rtV173L, rtL180M, rtA181T, and rtM204I/V) in 10 (22%) of the 46 LAM-experienced subjects, including 5 in whom LAM resistance mutations were not identified by Sanger sequencing. Overall, LAM resistance mutations were more likely to be present in LAM-experienced (10/46, 22%) than LAM-naive subjects (0/45, 0%; P = 0.001). The median time since LAM discontinuation was 12.8 months in the 10 subjects with a LAM resistance mutation compared to 30.5 months in the 36 LAM-experienced subjects without a LAM resistance mutation (P < 0.001). The likelihood of detecting a LAM resistance mutation was significantly increased using UDPS compared to Sanger sequencing and was inversely associated with the time since LAM discontinuation.
View details for DOI 10.1128/AAC.01601-12
View details for Web of Science ID 000312958400044
View details for PubMedID 23114756
PRC2/EED-EZH2 Complex Is Up-Regulated in Breast Cancer Lymph Node Metastasis Compared to Primary Tumor and Correlates with Tumor Proliferation In Situ
2012; 7 (12)
Lymph node metastasis is a key event in the progression of breast cancer. Therefore it is important to understand the underlying mechanisms which facilitate regional lymph node metastatic progression.We performed gene expression profiling of purified tumor cells from human breast tumor and lymph node metastasis. By microarray network analysis, we found an increased expression of polycomb repression complex 2 (PRC2) core subunits EED and EZH2 in lymph node metastatic tumor cells over primary tumor cells which were validated through real-time PCR. Additionally, immunohistochemical (IHC) staining and quantitative image analysis of whole tissue sections showed a significant increase of EZH2 expressing tumor cells in lymph nodes over paired primary breast tumors, which strongly correlated with tumor cell proliferation in situ. We further explored the mechanisms of PRC2 gene up-regulation in metastatic tumor cells and found up-regulation of E2F genes, MYC targets and down-regulation of tumor suppressor gene E-cadherin targets in lymph node metastasis through GSEA analyses. Using IHC, the expression of potential EZH2 target, E-cadherin was examined in paired primary/lymph node samples and was found to be significantly decreased in lymph node metastases over paired primary tumors.This study identified an over expression of the epigenetic silencing complex PRC2/EED-EZH2 in breast cancer lymph node metastasis as compared to primary tumor and its positive association with tumor cell proliferation in situ. Concurrently, PRC2 target protein E-cadherin was significant decreased in lymph node metastases, suggesting PRC2 promotes epithelial mesenchymal transition (EMT) in lymph node metastatic process through repression of E-cadherin. These results indicate that epigenetic regulation mediated by PRC2 proteins may provide additional advantage for the outgrowth of metastatic tumor cells in lymph nodes. This opens up epigenetic drug development possibilities for the treatment and prevention of lymph node metastasis in breast cancer.
View details for DOI 10.1371/journal.pone.0051239
View details for Web of Science ID 000312201900057
View details for PubMedID 23251464
View details for PubMedCentralID PMC3519681
Denoising PCR-amplified metagenome data
PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy.We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche's 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise.DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.
View details for DOI 10.1186/1471-2105-13-283
View details for Web of Science ID 000314687600001
View details for PubMedID 23113967
- Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2012; 21 (3): 581-599
Nest site and weather affect the personality of harvester ant colonies
2012; 23 (5): 1022-1029
Environmental conditions and physical constraints both influence an animal's behavior. We investigate whether behavioral variation among colonies of the black harvester ant, Messor andrei, remains consistent across foraging and disturbance situations and ask whether consistent colony behavior is affected by nest site and weather. We examined variation among colonies in responsiveness to food baits and to disturbance, measured as a change in numbers of active ants, and in the speed with which colonies retrieved food and removed debris. Colonies differed consistently, across foraging and disturbance situations, in both responsiveness and speed. Increased activity in response to food was associated with a smaller decrease in response to alarm. Speed of retrieving food was correlated with speed of removing debris. In all colonies, speed was greater in dry conditions, reducing the amount of time ants spent outside the nest. While a colony occupied a certain nest site, its responsiveness was consistent in both foraging and disturbance situations, suggesting that nest structure influences colony personality.
View details for DOI 10.1093/beheco/ars066
View details for Web of Science ID 000308228200017
The Molecular Architecture of the Eukaryotic Chaperonin TRiC/CCT
2012; 20 (5): 814-825
TRiC/CCT is a highly conserved and essential chaperonin that uses ATP cycling to facilitate folding of approximately 10% of the eukaryotic proteome. This 1 MDa hetero-oligomeric complex consists of two stacked rings of eight paralogous subunits each. Previously proposed TRiC models differ substantially in their subunit arrangements and ring register. Here, we integrate chemical crosslinking, mass spectrometry, and combinatorial modeling to reveal the definitive subunit arrangement of TRiC. In vivo disulfide mapping provided additional validation for the crosslinking-derived arrangement as the definitive TRiC topology. This subunit arrangement allowed the refinement of a structural model using existing X-ray diffraction data. The structure described here explains all available crosslink experiments, provides a rationale for previously unexplained structural features, and reveals a surprising asymmetry of charges within the chaperonin folding chamber.
View details for DOI 10.1016/j.str.2012.03.007
View details for Web of Science ID 000304214400008
View details for PubMedID 22503819
Phyloseq: a bioconductor package for handling and analysis of high-throughput phylogenetic sequence data.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
We present a detailed description of a new Bioconductor package, phyloseq, for integrated data and analysis of taxonomically-clustered phylogenetic sequencing data in conjunction with related data types. The phyloseq package integrates abundance data, phylogenetic information and covariates so that exploratory transformations, plots, and confirmatory testing and diagnostic plots can be carried out seamlessly. The package is built following the S4 object-oriented framework of the R language so that once the data have been input the user can easily transform, plot and analyze the data. We present some examples that highlight the methods and the ease with which we can leverage existing packages.
View details for PubMedID 22174279
Comparisons of distance methods for combining covariates and abundances in microbiome studies.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
This article compares different methods for combining abundance data, phylogenetic trees and clinical covariates in a nonparametric setting. In particular we study the output from the principal coordinates analysis on UNIFRAC and WEIGHTED UNIFRAC distances and the output from a double principal coordinate analyses DPCOA using distances computed on the phylogenetic tree. We also present power comparisons for some of the standard tests of phylogenetic signal between different types of samples. These methods are compared both on simulated and real data sets. Our study shows that DPCoA is less robust to outliers, and more robust to small noisy fluctuations around zero.
View details for PubMedID 22174277
A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes
Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to arise within the population. Understanding the phenotypic and genotypic patterns responsible for multi-PI resistance is necessary for developing PIs that are active against clinically-relevant PI-resistant HIV-1 variants.In this work, we use globally optimal integer programming-based clustering techniques to elucidate multi-PI phenotypic resistance patterns using a data set of 398 HIV-1 protease sequences that have each been phenotyped for susceptibility toward the nine clinically-approved HIV-1 PIs. We validate the information content of the clusters by evaluating their ability to predict the level of decreased susceptibility to each of the available PIs using a cross validation procedure. We demonstrate the finding that as a result of phenotypic cross resistance, the considered clinical HIV-1 protease isolates are confined to ~6% or less of the clinically-relevant phenotypic space. Clustering and feature selection methods are used to find representative sequences and mutations for major resistance phenotypes to elucidate their genotypic signatures. We show that phenotypic similarity does not imply genotypic similarity, that different PI-resistance mutation patterns can give rise to HIV-1 isolates with similar phenotypic profiles.Rather than characterizing HIV-1 susceptibility toward each PI individually, our study offers a unique perspective on the phenomenon of PI class resistance by uncovering major multidrug-resistant phenotypic patterns and their often diverse genotypic determinants, providing a methodology that can be applied to understand clinically-relevant phenotypic patterns to aid in the design of novel inhibitors that target other rapidly evolving molecular targets as well.
View details for DOI 10.1186/1471-2105-12-477
View details for Web of Science ID 000301382000001
View details for PubMedID 22172090
- THE DUALITY DIAGRAM IN DATA ANALYSIS: EXAMPLES OF MODERN APPLICATIONS ANNALS OF APPLIED STATISTICS 2011; 5 (4): 2266-2277
The effect of individual variation on the structure and function of interaction networks in harvester ants
JOURNAL OF THE ROYAL SOCIETY INTERFACE
2011; 8 (64): 1562-1573
Social insects exhibit coordinated behaviour without central control. Local interactions among individuals determine their behaviour and regulate the activity of the colony. Harvester ants are recruited for outside work, using networks of brief antennal contacts, in the nest chamber closest to the nest exit: the entrance chamber. Here, we combine empirical observations, image analysis and computer simulations to investigate the structure and function of the interaction network in the entrance chamber. Ant interactions were distributed heterogeneously in the chamber, with an interaction hot-spot at the entrance leading further into the nest. The distribution of the total interactions per ant followed a right-skewed distribution, indicating the presence of highly connected individuals. Numbers of ant encounters observed positively correlated with the duration of observation. Individuals varied in interaction frequency, even after accounting for the duration of observation. An ant's interaction frequency was explained by its path shape and location within the entrance chamber. Computer simulations demonstrate that variation among individuals in connectivity accelerates information flow to an extent equivalent to an increase in the total number of interactions. Individual variation in connectivity, arising from variation among ants in location and spatial behaviour, creates interaction centres, which may expedite information flow.
View details for DOI 10.1098/rsif.2011.0059
View details for Web of Science ID 000295211200003
View details for PubMedID 21490001
- Adaptive importance sampling for network growth models ANNALS OF OPERATIONS RESEARCH 2011; 189 (1): 187-203
Colonic Contribution to Uremic Solutes
JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY
2011; 22 (9): 1769-1776
Microbes in the colon produce compounds, normally excreted by the kidneys, which are potential uremic toxins. Although p-cresol sulfate and indoxyl sulfate are well studied examples, few other compounds are known. Here, we compared plasma from hemodialysis patients with and without colons to identify and further characterize colon-derived uremic solutes. HPLC confirmed the colonic origin of p-cresol sulfate and indoxyl sulfate, but levels of hippurate, methylamine, and dimethylamine were not significantly lower in patients without colons. High-resolution mass spectrometry detected more than 1000 features in predialysis plasma samples. Hierarchical clustering based on these features clearly separated dialysis patients with and without colons. Compared with patients with colons, we identified more than 30 individual features in patients without colons that were either absent or present in lower concentration. Almost all of these features were more prominent in plasma from dialysis patients than normal subjects, suggesting that they represented uremic solutes. We used a panel of indole and phenyl standards to identify five colon-derived uremic solutes: α-phenylacetyl-l-glutamine, 5-hydroxyindole, indoxyl glucuronide, p-cresol sulfate, and indoxyl sulfate. However, compounds with accurate mass values matching most of the colon-derived solutes could not be found in standard metabolomic databases. These results suggest that colonic microbes may produce an important portion of uremic solutes, most of which remain unidentified.
View details for DOI 10.1681/ASN.2010121220
View details for Web of Science ID 000295705800024
View details for PubMedID 21784895
View details for PubMedCentralID PMC3171947
Site-Specific Mobilization of Vinyl Chloride Respiration Islands by a Mechanism Common in Dehalococcoides
Vinyl chloride is a widespread groundwater pollutant and Group 1 carcinogen. A previous comparative genomic analysis revealed that the vinyl chloride reductase operon, vcrABC, of Dehalococcoides sp. strain VS is embedded in a horizontally-acquired genomic island that integrated at the single-copy tmRNA gene, ssrA.We targeted conserved positions in available genomic islands to amplify and sequence four additional vcrABC -containing genomic islands from previously-unsequenced vinyl chloride respiring Dehalococcoides enrichments. We identified a total of 31 ssrA-specific genomic islands from Dehalococcoides genomic data, accounting for 47 reductive dehalogenase homologous genes and many other non-core genes. Sixteen of these genomic islands contain a syntenic module of integration-associated genes located adjacent to the predicted site of integration, and among these islands, eight contain vcrABC as genetic 'cargo'. These eight vcrABC -containing genomic islands are syntenic across their ~12 kbp length, but have two phylogenetically discordant segments that unambiguously differentiate the integration module from the vcrABC cargo. Using available Dehalococcoides phylogenomic data we estimate that these ssrA-specific genomic islands are at least as old as the Dehalococcoides group itself, which in turn is much older than human civilization.The vcrABC -containing genomic islands are a recently-acquired subset of a diverse collection of ssrA-specific mobile elements that are a major contributor to strain-level diversity in Dehalococcoides, and may have been throughout its evolution. The high similarity between vcrABC sequences is quantitatively consistent with recent horizontal acquisition driven by ~100 years of industrial pollution with chlorinated ethenes.
View details for DOI 10.1186/1471-2164-12-287
View details for Web of Science ID 000293280200001
View details for PubMedID 21635780
Colony variation in the collective regulation of foraging by harvester ants
2011; 22 (2): 429-435
This study investigates variation in collective behavior in a natural population of colonies of the harvester ant, Pogonomyrmex barbatus. Harvester ant colonies regulate foraging activity to adjust to current food availability; the rate at which inactive foragers leave the nest on the next trip depends on the rate at which successful foragers return with food. This study investigates differences among colonies in foraging activity and how these differences are associated with variation among colonies in the regulation of foraging. Colonies differ in the baseline rate at which patrollers leave the nest, without stimulation from returning ants. This baseline rate predicts a colony's foraging activity, suggesting there is a colony-specific activity level that influences how quickly any ant leaves the nest. When a colony's foraging activity is high, the colony is more likely to regulate foraging. Moreover, colonies differ in the propensity to adjust the rate of outgoing foragers to the rate of forager return. Naturally occurring variation in the regulation of foraging may lead to variation in colony survival and reproductive success.
View details for DOI 10.1093/beheco/arq218
View details for Web of Science ID 000289299500033
PhyloChip microarray analysis reveals altered gastrointestinal microbial communities in a rat model of colonic hypersensitivity
NEUROGASTROENTEROLOGY AND MOTILITY
2011; 23 (2)
Irritable bowel syndrome (IBS) is a chronic, episodic gastrointestinal disorder that is prevalent in a significant fraction of western human populations; and changes in the microbiota of the large bowel have been implicated in the pathology of the disease.Using a novel comprehensive, high-density DNA microarray (PhyloChip) we performed a phylogenetic analysis of the microbial community of the large bowel in a rat model in which intracolonic acetic acid in neonates was used to induce long lasting colonic hypersensitivity and decreased stool water content and frequency, representing the equivalent of human constipation-predominant IBS.Our results revealed a significantly increased compositional difference in the microbial communities in rats with neonatal irritation as compared with controls. Even more striking was the dramatic change in the ratio of Firmicutes relative to Bacteroidetes, where neonatally irritated rats were enriched more with Bacteroidetes and also contained a different composition of species within this phylum. Our study also revealed differences at the level of bacterial families and species.The PhyloChip is a useful and convenient method to study enteric microflora. Further, this rat model system may be a useful experimental platform to study the causes and consequences of changes in microbial community composition associated with IBS.
View details for DOI 10.1111/j.1365-2982.2010.01637.x
View details for Web of Science ID 000286211600017
View details for PubMedID 21129126
View details for PubMedCentralID PMC3353725
Visualization and statistical comparisons of microbial communities using R packages on Phylochip data.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.
View details for PubMedID 21121042
- Bioinformatics and Data Analysis Handbook of Molecular and Cellular Methods in Biology and Medicine CRC. 2011; 3rd: 409–436
A classification model for G-to-A hypermutation in hepatitis B virus ultra-deep pyrosequencing reads
2010; 26 (23): 2929-2932
G → A hypermutation is an innate antiviral defense mechanism, mediated by host enzymes, which leads to the mutational impairment of viruses. Sensitive and specific identification of host-mediated G → A hypermutation is a novel sequence analysis challenge, particularly for viral deep sequencing studies. For example, two of the most common hepatitis B virus (HBV) reverse transcriptase (RT) drug-resistance mutations, A181T and M204I, arise from G → A changes and are routinely detected as low-abundance variants in nearly all HBV deep sequencing samples.We developed a classification model using measures of G → A excess and predicted indicators of lethal mutation and applied this model to 325 920 unique deep sequencing reads from plasma virus samples from 45 drug treatment-naïve HBV-infected individuals. The 2.9% of sequence reads that were classified as hypermutated by our model included most of the reads with A181T and/or M204I, indicating the usefulness of this model for distinguishing viral adaptive changes from host-mediated viral editing.Source code and sequence data are available at http://firstname.lastname@example.orgSupplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btq570
View details for Web of Science ID 000284430900001
View details for PubMedID 20937597
Quantitative, Architectural Analysis of Immune Cell Subsets in Tumor-Draining Lymph Nodes from Breast Cancer Patients and Healthy Lymph Nodes
2010; 5 (8)
To date, pathological examination of specimens remains largely qualitative. Quantitative measures of tissue spatial features are generally not captured. To gain additional mechanistic and prognostic insights, a need for quantitative architectural analysis arises in studying immune cell-cancer interactions within the tumor microenvironment and tumor-draining lymph nodes (TDLNs).We present a novel, quantitative image analysis approach incorporating 1) multi-color tissue staining, 2) high-resolution, automated whole-section imaging, 3) custom image analysis software that identifies cell types and locations, and 4) spatial statistical analysis. As a proof of concept, we applied this approach to study the architectural patterns of T and B cells within tumor-draining lymph nodes from breast cancer patients versus healthy lymph nodes. We found that the spatial grouping patterns of T and B cells differed between healthy and breast cancer lymph nodes, and this could be attributed to the lack of B cell localization in the extrafollicular region of the TDLNs.Our integrative approach has made quantitative analysis of complex visual data possible. Our results highlight spatial alterations of immune cells within lymph nodes from breast cancer patients as an independent variable from numerical changes. This opens up new areas of investigations in research and medicine. Future application of this approach will lead to a better understanding of immune changes in the tumor microenvironment and TDLNs, and how they affect clinical outcomes.
View details for DOI 10.1371/journal.pone.0012420
View details for Web of Science ID 000281234700034
View details for PubMedID 20811638
Constrained patterns of covariation and clustering of HIV-1 non-nucleoside reverse transcriptase inhibitor resistance mutations
JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY
2010; 65 (7): 1477-1485
We characterized pairwise and higher order patterns of non-nucleoside reverse transcriptase inhibitor (NNRTI)-selected mutations because multiple mutations are usually required for clinically significant resistance to second-generation NNRTIs.We analysed viruses from 13 039 individuals with sequences containing at least one of 52 published NNRTI-selected mutations, including 1133 viruses from individuals who received efavirenz but no other NNRTI and 1510 viruses from individuals who received nevirapine but no other NNRTI. Of the 17 reported etravirine resistance-associated mutations (RAMs), Y181C/I/V, L100I, K101P and M230L were considered major based on published in vitro susceptibility data.Efavirenz preferentially selected for 16 mutations, including L100I (14% versus 0.1%, P < 0.001), K101P (3.3% versus 0.4%, P < 0.001) and M230L (2.8% versus 1.3%, P = 0.004), whereas nevirapine preferentially selected for 12 mutations, including Y181C/I/V (48% versus 6.9%, P < 0.001). Twenty-nine pairs of NNRTI-selected mutations covaried significantly, including Y181C with seven other mutations (A98G, K101E/H, V108I, G190A/S and H221Y), L100I with K103N, and K101P with K103S. Two pairs (Y181C + V179F and Y181C + G190S) were predicted to confer >10-fold decreased etravirine susceptibility. Seventeen percent of sequences had three or more NNRTI-selected mutations, mostly in clusters of covarying mutations. Many clusters had Y181C plus a non-major etravirine RAM; few had more than one major etravirine RAM.Although major etravirine RAMs rarely occur in combination, 2 of 29 pairs of covarying mutations were associated with >10-fold decreased etravirine susceptibility. Viruses with three or more NNRTI-selected mutations often contained Y181C in combination with one or more minor etravirine RAMs; however, phenotypic and clinical correlates for most of these higher order combinations have not been published.
View details for DOI 10.1093/jac/dkq140
View details for Web of Science ID 000279926500027
View details for PubMedID 20462946
Localized Plasticity in the Streamlined Genomes of Vinyl Chloride Respiring Dehalococcoides
2009; 5 (11)
Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain the majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism.
View details for DOI 10.1371/journal.pgen.1000714
View details for Web of Science ID 000272419500010
View details for PubMedID 19893622
Nonpolymorphic Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Treatment-Selected Mutations
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY
2009; 53 (11): 4869-4878
The spectrum of human immunodeficiency virus type 1 (HIV-1) protease and reverse transcriptase (RT) mutations selected by antiretroviral (ARV) drugs requires ongoing reassessment as ARV treatment patterns evolve and increasing numbers of protease and RT sequences of different viral subtypes are published. Accordingly, we compared the prevalences of protease and RT mutations in HIV-1 group M sequences from individuals with and without a history of previous treatment with protease inhibitors (PIs) or RT inhibitors (RTIs). Mutations in protease sequences from 26,888 individuals and in RT sequences from 25,695 individuals were classified according to whether they were nonpolymorphic in untreated individuals and whether their prevalence increased fivefold with ARV therapy. This analysis showed that 88 PI-selected and 122 RTI-selected nonpolymorphic mutations had a prevalence that was fivefold higher in individuals receiving ARVs than in ARV-naïve individuals. This was an increase of 47% and 77%, respectively, compared with the 60 PI- and 69 RTI-selected mutations identified in a similar analysis that we published in 2005 using subtype B sequences obtained from one-fourth as many individuals. In conclusion, many nonpolymorphic mutations in protease and RT are under ARV selection pressure. The spectrum of treatment-selected mutations is changing as data for more individuals are collected, treatment exposures change, and the number of available sequences from non-subtype B viruses increases.
View details for DOI 10.1128/AAC.00592-09
View details for Web of Science ID 000270881200040
View details for PubMedID 19721070
View details for PubMedCentralID PMC2772298
Impaired interferon signaling is a common immune defect in human cancer
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (22): 9010-9015
Immune dysfunction develops in patients with many cancer types and may contribute to tumor progression and failure of immunotherapy. Mechanisms underlying cancer-associated immune dysfunction are not fully understood. Efficient IFN signaling is critical to lymphocyte function; animals rendered deficient in IFN signaling develop cancer at higher rates. We hypothesized that altered IFN signaling may be a key mechanism of immune dysfunction common to cancer. To address this, we assessed the functional responses to IFN in peripheral blood lymphocytes from patients with 3 major cancers: breast cancer, melanoma, and gastrointestinal cancer. Type-I IFN (IFN-alpha)-induced signaling was reduced in T cells and B cells from all 3 cancer-patient groups compared to healthy controls. Type-II IFN (IFN-gamma)-induced signaling was reduced in B cells from all 3 cancer patient groups, but not in T cells or natural killer cells. Impaired-IFN signaling was equally evident in stage II, III, and IV breast cancer patients, and downstream functional defects in T cell activation were identified. Taken together, these findings indicate that defects in lymphocyte IFN signaling arise in patients with breast cancer, melanoma, and gastrointestinal cancer, and these defects may represent a common cancer-associated mechanism of immune dysfunction.
View details for DOI 10.1073/pnas.0901329106
View details for Web of Science ID 000266580500043
View details for PubMedID 19451644
An Interactive Java Statistical Image Segmentation System: GemIdent
JOURNAL OF STATISTICAL SOFTWARE
2009; 30 (10): 1-20
View details for Web of Science ID 000267708100001
- How site fidelity leads to individual differences in the foraging activity of harvester ants BEHAVIORAL ECOLOGY 2009; 20 (3): 633-638
Ultra-Deep Pyrosequencing of Hepatitis B Virus Quasispecies from Nucleoside and Nucleotide Reverse-Transcriptase Inhibitor (NRTI)-Treated Patients and NRTI-Naive Patients
15th Conference on Retroviruses and Opportunistic Infections
OXFORD UNIV PRESS INC. 2009: 1275–85
The dynamics of emerging nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI) resistance in hepatitis B virus (HBV) are not well understood because standard dideoxynucleotide direct polymerase chain reaction (PCR) sequencing assays detect drug-resistance mutations only after they have become dominant. To obtain insight into NRTI resistance, we used a new sequencing technology to characterize the spectrum of low-prevalence NRTI-resistance mutations in HBV obtained from 20 plasma samples from 11 NRTI-treated patients and 17 plasma samples from 17 NRTI-naive patients, by using standard direct PCR sequencing and ultra-deep pyrosequencing (UDPS). UDPS detected drug-resistance mutations that were not detected by PCR in 10 samples from 5 NRTI-treated patients, including the lamivudine-resistance mutation V173L (in 5 samples), the entecavir-resistance mutations T184S (in 2 samples) and S202G (in 1 sample), the adefovir-resistance mutation N236T (in 1 sample), and the lamivudine and adefovir-resistance mutations V173L, L180M, A181T, and M204V (in 1 sample). G-to-A hypermutation mediated by the apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like family of cytidine deaminases was estimated to be present in 0.6% of reverse-transcriptase genes. Genotype A coinfection was detected by UDPS in each of 3 patients in whom genotype G virus was detected by direct PCR sequencing. UDPS detected low-prevalence HBV variants with NRTI-resistance mutations, G-to-A hypermutation, and low-level dual genotype infection with a sensitivity not previously possible.
View details for DOI 10.1086/597808
View details for Web of Science ID 000265035500007
View details for PubMedID 19301976
Minority Human Immunodeficiency Virus Type 1 Variants in Antiretroviral-Naive Persons with Reverse Transcriptase Codon 215 Revertant Mutations
JOURNAL OF VIROLOGY
2008; 82 (21): 10747-10755
T215 revertant mutations such as T215C/D/E/S that evolve from the nucleoside reverse transcriptase (RT) inhibitor mutations T215Y/F have been found in about 3% of human immunodeficiency virus type 1 (HIV-1) isolates from newly diagnosed HIV-1-infected persons. We used a newly developed sequencing method-ultradeep pyrosequencing (UDPS; 454 Life Sciences)--to determine the frequency with which T215Y/F or other RT inhibitor resistance mutations could be detected as minority variants in samples from untreated persons that contain T215 revertants ("revertant" samples) compared with samples from untreated persons that lack such revertants ("control" samples). Among the 22 revertant and 29 control samples, UDPS detected a mean of 3.8 and 4.8 additional RT amino acid mutations, respectively. In 6 of 22 (27%) revertant samples and in 4 of 29 control samples (14%; P = 0.4), UDPS detected one or more RT inhibitor resistance mutations. T215Y or T215F was not detected in any of the revertant or control samples; however, 4 of 22 revertant samples had one or more T215 revertants that were detected by UDPS but not by direct PCR sequencing. The failure to detect viruses with T215Y/F in the 22 revertant samples in this study may result from the overwhelming replacement of transmitted T215Y variants by the more fit T215 revertants or from the primary transmission of a T215 revertant in a subset of persons with T215 revertants.
View details for DOI 10.1128/JVI.01827-07
View details for Web of Science ID 000260109600041
View details for PubMedID 18715933
- HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS ANNALS OF APPLIED STATISTICS 2008; 2 (3): 777-807
Natural variation of HIV-1 group M integrase: Implications for a new class of antiretroviral inhibitors
HIV-1 integrase is the third enzymatic target of antiretroviral (ARV) therapy. However, few data have been published on the distribution of naturally occurring amino acid variation in this enzyme. We therefore characterized the distribution of integrase variants among more than 1,800 published group M HIV-1 isolates from more than 1,500 integrase inhibitor (INI)-naïve individuals. Polymorphism rates equal or above 0.5% were found for 34% of the central core domain positions, 42% of the C-terminal domain positions, and 50% of the N-terminal domain positions. Among 727 ARV-naïve individuals in whom the complete pol gene was sequenced, integrase displayed significantly decreased inter- and intra-subtype diversity and a lower Shannon's entropy than protease or RT. All primary INI-resistance mutations with the exception of E157Q--which was present in 1.1% of sequences--were nonpolymorphic. Several accessory INI-resistance mutations including L74M, T97A, V151I, G163R, and S230N were also polymorphic with polymorphism rates ranging between 0.5% to 2.0%.
View details for DOI 10.1186/1742-4690-5-74
View details for Web of Science ID 000259485200001
View details for PubMedID 18687142
View details for PubMedCentralID PMC2546438
Genomic interrogation of ancestral Mycobacterium tuberculosis from south India
8th International Congress on Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases
ELSEVIER SCIENCE BV. 2008: 474–83
Mycobacterium tuberculosis is a very important global pathogen. One quarter of the world's TB cases occur in India. The tuberculosis strains isolated from south Indian patients exhibit certain phenotypic characteristics like low virulence in guinea-pigs, resistance to isoniazid, thiophene-2-carboxylic acid hydrazide (TCH) and para-amino salicylic acid (PAS), and enhanced susceptibility to H2O2. Besides this, a large percentage of the isolates harbor only a single copy of IS 6110 which makes these strains distinct. Hence, we have studied the genotypic characteristics of these strains by using advanced techniques like Deletion Micro array, deletion PCR, allelic discrimination RT-PCR using several lineage specific markers and KatG G1388T (non-synonymous) polymorphism along with spoligotyping. The analysis of 1215 tuberculosis patient isolates from south India revealed that 85.2% belonged to the ancestral lineage of M. tuberculosis. Comparative whole-genome hybridization identified six new genomic regions within this lineage that were variably deleted.
View details for DOI 10.1016/j.meegid.2007.09.007
View details for Web of Science ID 000257001400012
View details for PubMedID 18024233
- The short-term regulation of foraging in harvester ants BEHAVIORAL ECOLOGY 2008; 19 (1): 217-222
Threshold Graph Limits and Random Threshold Graphs.
2008; 5 (3): 267-320
We study the limit theory of large threshold graphs and apply this to a variety of models for random threshold graphs. The results give a nice set of examples for the emerging theory of graph limits.
View details for PubMedID 20811581
- Dynamical bias in the coin toss SIAM REVIEW 2007; 49 (2): 211-235
HIV-1 subtype B protease and reverse transcriptase amino acid covariation
PLOS COMPUTATIONAL BIOLOGY
2007; 3 (5): 836-843
Despite the high degree of HIV-1 protease and reverse transcriptase (RT) mutation in the setting of antiretroviral therapy, the spectrum of possible virus variants appears to be limited by patterns of amino acid covariation. We analyzed patterns of amino acid covariation in protease and RT sequences from more than 7,000 persons infected with HIV-1 subtype B viruses obtained from the Stanford HIV Drug Resistance Database (http://hivdb.stanford.edu). In addition, we examined the relationship between conditional probabilities associated with a pair of mutations and the order in which those mutations developed in viruses for which longitudinal sequence data were available. Patterns of RT covariation were dominated by the distinct clustering of Type I and Type II thymidine analog mutations and the Q151M-associated mutations. Patterns of protease covariation were dominated by the clustering of nelfinavir-associated mutations (D30N and N88D), two main groups of protease inhibitor (PI)-resistance mutations associated either with V82A or L90M, and a tight cluster of mutations associated with decreased susceptibility to amprenavir and the most recently approved PI darunavir. Different patterns of covariation were frequently observed for different mutations at the same position including the RT mutations T69D versus T69N, L74V versus L74I, V75I versus V75M, T215F versus T215Y, and K219Q/E versus K219N/R, and the protease mutations M46I versus M46L, I54V versus I54M/L, and N88D versus N88S. Sequence data from persons with correlated mutations in whom earlier sequences were available confirmed that the conditional probabilities associated with correlated mutation pairs could be used to predict the order in which the mutations were likely to have developed. Whereas accessory nucleoside RT inhibitor-resistance mutations nearly always follow primary nucleoside RT inhibitor-resistance mutations, accessory PI-resistance mutations often preceded primary PI-resistance mutations.
View details for DOI 10.1371/journal.pcbi.0030087
View details for Web of Science ID 000249105100007
View details for PubMedID 17500586
View details for PubMedCentralID PMC1866358
Down-regulation of the interferon signaling pathway in T lymphocytes from patients with metastatic melanoma
2007; 4 (5): 897-911
Dysfunction of the immune system has been documented in many types of cancers. The precise nature and molecular basis of immune dysfunction in the cancer state are not well defined.To gain insights into the molecular mechanisms of immune dysfunction in cancer, gene expression profiles of pure sorted peripheral blood lymphocytes from 12 patients with melanoma were compared to 12 healthy controls. Of 25 significantly altered genes in T cells and B cells from melanoma patients, 17 are interferon (IFN)-stimulated genes. These microarray findings were further confirmed by quantitative PCR and functional responses to IFNs. The median percentage of lymphocytes that phosphorylate STAT1 in response to interferon-alpha was significantly reduced (Delta = 16.8%; 95% confidence interval, 0.98% to 33.35%) in melanoma patients (n = 9) compared to healthy controls (n = 9) in Phosflow analysis. The Phosflow results also identified two subgroups of patients with melanoma: IFN-responsive (33%) and low-IFN-response (66%). The defect in IFN signaling in the melanoma patient group as a whole was partially overcome at the level of expression of IFN-stimulated genes by prolonged stimulation with the high concentration of IFN-alpha that is achievable only in IFN therapy used in melanoma. The lowest responders to IFN-alpha in the Phosflow assay also showed the lowest gene expression in response to IFN-alpha. Finally, T cells from low-IFN-response patients exhibited functional abnormalities, including decreased expression of activation markers CD69, CD25, and CD71; TH1 cytokines interleukin-2, IFN-gamma, and tumor necrosis factor alpha, and reduced survival following stimulation with anti-CD3/CD28 antibodies compared to controls.Defects in interferon signaling represent novel, dominant mechanisms of immune dysfunction in cancer. These findings may be used to design therapies to counteract immune dysfunction in melanoma and to improve cancer immunotherapy.
View details for DOI 10.1371/journal.pmed.0040176
View details for Web of Science ID 000246889700018
View details for PubMedID 17488182
Unusual codon bias in vinyl chloride reductase genes of Dehalococcoides species
APPLIED AND ENVIRONMENTAL MICROBIOLOGY
2007; 73 (8): 2744-2747
Vinyl chloride reductases (VC-RDase) are the key enzymes for complete microbial reductive dehalogenation of chloroethenes, including the groundwater pollutants tetrachloroethene and trichloroethene. Analysis of the codon usage of the VC-RDase genes vcrA and bvcA showed that these genes are highly unusual and are characterized by a low G+C fraction at the third position. The third position of codons in VC-RDase genes is biased toward the nucleotide T, even though available Dehalococcoides genome sequences indicate the absence of any tRNAs matching codons that end in T. The comparatively high level of abnormality in the codon usage of VC-RDase genes suggests an evolutionary history that is different from that of most other Dehalococcoides genes.
View details for DOI 10.1128/AEM.02768-06
View details for Web of Science ID 000246542400039
View details for PubMedID 17308190
Gene expression network analysis and applications to immunology
2007; 23 (7): 850-858
We address the problem of using expression data and prior biological knowledge to identify differentially expressed pathways or groups of genes. Following an idea of Ideker et al. (2002), we construct a gene interaction network and search for high-scoring subnetworks. We make several improvements in terms of scoring functions and algorithms, resulting in higher speed and accuracy and easier biological interpretation. We also assign significance levels to our results, adjusted for multiple testing. Our methods are successfully applied to three human microarray data sets, related to cancer and the immune system, retrieving several known and potential pathways. The method, denoted by the acronym GXNA (Gene eXpression Network Analysis) is implemented in software that is publicly available and can be used on virtually any microarray data set.The source code and executable for the software, as well as certain supplemental materials, can be downloaded from http://stat.stanford.edu/~serban/gxna.
View details for DOI 10.1093/bioinformatics/btm019
View details for Web of Science ID 000246120400010
View details for PubMedID 17267429
An interactive statistical image segmentation and visualization system
4th International Conference on Medical Information Visualisation
IEEE COMPUTER SOC. 2007: 81–86
View details for Web of Science ID 000248082900013
Differences in the temporal emergence of primary and secondary NRTI and PI drug-resistance mutations (DRMs)
16th International HIV Drug Resistance Workshop
INT MEDICAL PRESS LTD. 2007: S67–S67
View details for Web of Science ID 000248546900077
- Forager activation and food availability in harvester ants ANIMAL BEHAVIOUR 2006; 71: 815-822
- Constraints on Yukawa-type deviations from Newtonian gravity at 20 microns PHYSICAL REVIEW D 2005; 72 (12)
Profile of immune cells in axillary lymph nodes predicts disease-free survival in breast cancer
2005; 2 (9): 904-919
While lymph node metastasis is among the strongest predictors of disease-free and overall survival for patients with breast cancer, the immunological nature of tumor-draining lymph nodes is often ignored, and may provide additional prognostic information on clinical outcome.We performed immunohistochemical analysis of 47 sentinel and 104 axillary (nonsentinel) nodes from 77 breast cancer patients with 5 y of follow-up to determine if alterations in CD4, CD8, and CD1a cell populations predict nodal metastasis or disease-free survival. Sentinel and axillary node CD4 and CD8 T cells were decreased in breast cancer patients compared to control nodes. CD1a dendritic cells were also diminished in sentinel and tumor-involved axillary nodes, but increased in tumor-free axillary nodes. Axillary node, but not sentinel node, CD4 T cell and dendritic cell populations were highly correlated with disease-free survival, independent of axillary metastasis. Immune profiling of ALN from a test set of 48 patients, applying CD4 T cell and CD1a dendritic cell population thresholds of CD4 > or = 7.0% and CD1a > or = 0.6%, determined from analysis of a learning set of 29 patients, provided significant risk stratification into favorable and unfavorable prognostic groups superior to clinicopathologic characteristics including tumor size, extent or size of nodal metastasis (CD4, p < 0.001 and CD1a, p < 0.001). Moreover, axillary node CD4 T cell and CD1a dendritic cell populations allowed more significant stratification of disease-free survival of patients with T1 (primary tumor size 2 cm or less) and T2 (5 cm or larger) tumors than all other patient characteristics. Finally, sentinel node immune profiles correlated primarily with the presence of infiltrating tumor cells, while axillary node immune profiles appeared largely independent of nodal metastases, raising the possibility that, within axillary lymph nodes, immune profile changes and nodal metastases represent independent processes.These findings demonstrate that the immune profile of tumor-draining lymph nodes is of novel biologic and clinical importance for patients with early stage breast cancer.
View details for DOI 10.1371/journal.pmed.0020284
View details for Web of Science ID 000232433600019
View details for PubMedID 16124834
Rapid assessment of recognition efficiency and functional capacity of antigen-specific T-cell responses
JOURNAL OF IMMUNOTHERAPY
2005; 28 (4): 297-305
It is increasingly recognized that cells within an antigen-specific CD8 T-cell population may be diverse in recognition efficiency for target, which may significantly affect the overall efficacy of the response in clinical settings such as viral infections and cancer. CD8 T cells with seemingly identical antigen specificity, particularly those elicited by cancer vaccines, may be heterogeneous for sensitivity and recognition efficiency for the cognate peptide and functional state in vivo. Analysis of individual T-cell clones derived from an antigen-specific T-cell population would provide an accurate assessment of the overall response; however, this is time- and labor-intensive, preventing rapid and routine assessment of patient samples from clinical trials. By stimulating antigen-specific T cells that otherwise appear homogeneous on tetramer staining with graded amounts of cognate peptides, the authors show that individual cells downmodulate surface T-cell receptors (TCR) and thus lose tetramer reactivity with variable dynamics within the T-cell population. The dynamics of TCR downregulation represent an accurate assessment of an individual cell's antigen sensitivity, recognition efficiency, and relative functional state within an antigen-specific population and have direct correlation to killing capacity by chromium release as well as degranulation by CD107 mobilization. Furthermore, despite correlation of average T-cell function by all three techniques, TCR downregulation uncovered heterogeneity in T-cell responses after vaccination among patient samples directly ex vivo. When examined using this novel technique, antigen-specific T cells elicited by vaccination with heteroclitic peptides exhibited significantly different recognition efficiencies for the heteroclitic versus native peptides, translating into differences in functional responses. With advancing cancer vaccine trials, the capacity to detect and functionally characterize antigen-specific T-cell responses in detail is critical. Techniques, as presented here, that rapidly assess the overall antigen sensitivity, recognition efficiency, and functional status of patients' T-cell responses will guide future vaccine trials and immunotherapies.
View details for Web of Science ID 000230219200004
View details for PubMedID 16000947
Memory T cells have gene expression patterns intermediate between naive and effector
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2005; 102 (15): 5519-5523
The biological basis underlying differentiation of naive (NAI) T cells into effector (EFFE) and memory (MEM) cells is incompletely understood. Furthermore, whether NAI T cells serially differentiate into EFFE and then MEM cells (linear differentiation) or whether they concurrently differentiate into either EFFE or MEM cells (parallel differentiation) remains unresolved. We isolated NAI, EFFE, and MEM CD8(+) T cell subsets from human peripheral blood and analyzed their gene expression by using microarrays. We identified 156 genes that strongly differentiate NAI, EFFE, and MEM CD8(+) T cells; these genes provide previously unrecognized markers to help identify each cell type. Using several statistical approaches to analyze and group the data (standard heat-map and hierarchical clustering, a unique circular representation, multivariate analyses based on principal components, and a clustering method based on phylogenetic parsimony analysis), we assessed the lineage relationships between these subsets and showed that MEM cells have gene expression patterns intermediate between NAI and EFFE T cells. Our analysis suggests a common differentiation pathway to an intermediate state followed by a split into EFFE or MEM cells, hence supporting the parallel differentiation model. As such, conditions under which NAI T cells are activated may determine the magnitude of both EFFE and MEM cells, which arise subsequently. A better understanding of these conditions may be very useful in the design of future vaccine strategies to maximize MEM cell generation.
View details for DOI 10.1073/pnas.0501437102
View details for Web of Science ID 000228376600042
View details for PubMedID 15809420
- Sequential Monte Carlo methods for statistical analysis of tables JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2005; 100 (469): 109-120
Error distribution for gene expression data
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
We present a new instance of Laplace's second Law of Errors and show how it can be used in the analysis of data from microarray experiments. This error distribution is shown to fit microarray expression data much better than a normal distribution. The use of this distribution in a parametric bootstrap leads to more powerful tests as we show that the t-test is conservative in this setting. We propose a biological explanations for this distribution based on the Pareto distribution of the variables used to compute the log ratios.
View details for Web of Science ID 000238478100022
View details for PubMedID 16646833
Gene expression diversity among Mycobacterium tuberculosis clinical isolates
2005; 151: 5-14
Intraspecies genetic diversity has been demonstrated to be important in the pathogenesis and epidemiology of several pathogens, such as HIV, influenza, Helicobacter and Salmonella. It is also important to consider strain-to-strain variation when identifying drug targets and vaccine antigens and developing tools for molecular diagnostics. Here, the authors present a description of the variability in gene expression patterns among ten clinical isolates of Mycobacterium tuberculosis, plus the laboratory strains H37Rv and H37Ra, growing in liquid culture. They identified 527 genes (15 % of those tested) that are variably expressed among the isolates studied. The remaining genes were divided into three categories based on their expression levels: unexpressed (38 %), low to undetectable expression (31 %) and consistently expressed (16 %). The expression categories were compared with functional categories and three biologically interesting gene lists: genes that are deleted among clinical isolates, T-cell antigens and essential genes. There were significant associations between expression variability and the classification of genes as T-cell antigens, involved in lipid metabolism, PE/PPE, insertion sequences and phages, and deleted among clinical isolates. This survey of mRNA expression among clinical isolates of M. tuberculosis demonstrates that genes with important functions can vary in their expression levels between strains grown under identical conditions.
View details for Web of Science ID 000226352800002
View details for PubMedID 15632420
Heterogeneity within antigen-specific T cell responses revealed by differential dynamics of TCR downregulation.
46th Annual Meeting of the American-Society-of-Hematology
AMER SOC HEMATOLOGY. 2004: 50B–50B
View details for Web of Science ID 000225127700183
Diversity and recognition efficiency of T cell responses to cancer
2004; 1 (2): 149-160
Melanoma patients vaccinated with tumor-associated antigens frequently develop measurable peptide-specific CD8+ T cell responses; however, such responses often do not confer clinical benefit. Understanding why vaccine-elicited responses are beneficial in some patients but not in others will be important to improve targeted cancer immunotherapies.We analyzed peptide-specific CD8+ T cell responses in detail, by generating and characterizing over 200 cytotoxic T lymphocyte clones derived from T cell responses to heteroclitic peptide vaccination, and compared these responses to endogenous anti-tumor T cell responses elicited naturally (a heteroclitic peptide is a modification of a native peptide sequence involving substitution of an amino acid at an anchor residue to enhance the immunogenicity of the peptide). We found that vaccine-elicited T cells are diverse in T cell receptor variable chain beta expression and exhibit a different recognition profile for heteroclitic versus native peptide. In particular, vaccine-elicited T cells respond to native peptide with predominantly low recognition efficiency--a measure of the sensitivity of a T cell to different cognate peptide concentrations for stimulation--and, as a result, are inefficient in tumor lysis. In contrast, endogenous tumor-associated-antigen-specific T cells show a predominantly high recognition efficiency for native peptide and efficiently lyse tumor targets.These results suggest that factors that shape the peptide-specific T cell repertoire after vaccination may be different from those that affect the endogenous response. Furthermore, our findings suggest that current heteroclitic peptide vaccination protocols drive expansion of peptide-specific T cells with a diverse range of recognition efficiencies, a significant proportion of which are unable to respond to melanoma cells. Therefore, it is critical that the recognition efficiency of vaccine-elicited T cells be measured, with the goal of advancing those modalities that elicit T cells with the greatest potential of tumor reactivity.
View details for DOI 10.1371/journal.pmed.0010028
View details for Web of Science ID 000227378800015
View details for PubMedID 15578105
Microarray analysis reveals differences in gene expression of circulating CD8+ T cells in melanoma patients and healthy donors
2004; 64 (10): 3661-3667
Circulating T cells from many cancer patients are known to be dysfunctional and undergo spontaneous apoptosis. We used microarray technology to determine whether gene expression differences exist in T cells from melanoma patients versus healthy subjects, which may underlie these abnormalities. To maximize the resolution of our data, we sort purified CD8(+) subsets and amplified the extracted RNA for microarray analysis. These analyses show subtle but statistically significant expression differences for 10 genes in T cells from melanoma patients versus healthy controls, which were additionally confirmed by quantitative real-time PCR analysis. Whereas none of these genes are members of the classical apoptosis pathways, several may be linked to apoptosis. To additionally investigate the significance of these 10 genes, we combined them into a classifier and found that they provide a much better discrimination between melanoma and healthy T cells as compared with a classifier built uniquely with classical apoptosis-related genes. These results suggest the possible engagement of an alternative apoptosis pathway in circulating T cells from cancer patients.
View details for Web of Science ID 000221419100043
View details for PubMedID 15150126
- Bioinformatics and management science: Some common tools and techniques OPERATIONS RESEARCH 2004; 52 (2): 165-190
An in vitro human cell-based assay to rank the relative immunogenicity of proteins
2004; 77 (2): 280-289
A method to rank proteins based on their relative immunogenicity has been devised. A statistical analysis of peptide-specific responses in large human donor pools provides a structure index value metric that ranked four industrial enzymes in the order determined by both mouse and guinea pig exposure models. The ranking method also compared favorably with human sensitization rates measured in occupationally exposed workers. Structure index values for other proteins known to cause immune responses in humans were also determined and found to be higher than the value determined for human beta2-microglobulin. Using values from known immunogenic and putative nonimmunogenic proteins, a cut-off value was established. The structure index value calculation provides a comparative method to predict subsequent immunogenicity on a human population basis without the need to use animal models. Information provided by this assay can be used in the early development of protein therapies and other protein-based applications to select or create reduced immunogenicity variants.
View details for DOI 10.1093/toxsci/kfh021
View details for Web of Science ID 000188987000013
View details for PubMedID 14691215
- Stein's method: expository lectures and applications Institute of Mathematical Statistics Lecture Notes—Monograph Series. edited by Diaconis, P., Holmes, S. Inst. Math. Statist. . 2004
- Probability by surprise Mathematical Adventures for Students and Amateurs edited by Shubin, T., Hayes, D. MAA. 2004: 135–153
Human population-based identification of CD4(+) T-cell peptide epitope determinants
JOURNAL OF IMMUNOLOGICAL METHODS
2003; 281 (1-2): 95-108
A human cell-based method to identify functional CD4(+) T-cell epitopes in any protein has been developed. Proteins are tested as synthetic 15-mer peptides offset by three amino acids. Percent responses within a large donor population are tabulated for each peptide in the set. Peptide epitope regions are designated by difference in response frequency from the overall background response rate for the compiled dataset. Epitope peptide responses are reproducible, with a median coefficient of variance of 21% when tested on multiple random-donor sets. The overall average response rate within the dataset increases with increasing putative human population antigenic exposure to a given protein. The background rate was high for HPV16 E6, and was low for human-derived cytokine proteins. The assay identified recall epitope regions within the donor population for the protein staphylokinase. For an industrial protease with minimal presumed population exposure, immunodominant epitope peptides were identified that were found to bind promiscuously to many HLA class II molecules in vitro. The peptide epitope regions identified in presumably unexposed donors represent a subset of the total recall epitopes. Finally, as a negative control, the assay found no peptide epitope regions in human beta2-microglobulin. This method identifies functional CD4(+) T-cell epitopes in any protein without pre-selection for HLA class II, suggests whether a donor population is pre-exposed to a protein of interest, and does not require sensitized donors for in vitro testing.
View details for DOI 10.1016/S0022-1759(03)00279-5
View details for Web of Science ID 000186487500009
View details for PubMedID 14580884
Bootstrapping phylogenetic trees: Theory and methods
2003; 18 (2): 241-255
View details for Web of Science ID 000185861100013
Statistics for phylogenetic trees
THEORETICAL POPULATION BIOLOGY
2003; 63 (1): 17-32
This paper poses the problem of estimating and validating phylogenetic trees in statistical terms. The problem is hard enough to warrant several tacks: we reason by analogy to rounding real numbers, and dealing with ranking data. These are both cases where, as in phylogeny the parameters of interest are not real numbers. Then we pose the problem in geometrical terms, using distances and measures on a natural space of trees. We do not solve the problems of inference on tree space, but suggest some coherent ways of tackling them.
View details for Web of Science ID 000180151400002
View details for PubMedID 12464492
A back migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome haplotypes
AMERICAN JOURNAL OF HUMAN GENETICS
2002; 70 (5): 1197-1214
The variation of 77 biallelic sites located in the nonrecombining portion of the Y chromosome was examined in 608 male subjects from 22 African populations. This survey revealed a total of 37 binary haplotypes, which were combined with microsatellite polymorphism data to evaluate internal diversities and to estimate coalescence ages of the binary haplotypes. The majority of binary haplotypes showed a nonuniform distribution across the continent. Analysis of molecular variance detected a high level of interpopulation diversity (PhiST=0.342), which appears to be partially related to the geography (PhiCT=0.230). In sub-Saharan Africa, the recent spread of a set of haplotypes partially erased pre-existing diversity, but a high level of population (PhiST=0.332) and geographic (PhiCT=0.179) structuring persists. Correspondence analysis shows that three main clusters of populations can be identified: northern, eastern, and sub-Saharan Africans. Among the latter, the Khoisan, the Pygmies, and the northern Cameroonians are clearly distinct from a tight cluster formed by the Niger-Congo-speaking populations from western, central western, and southern Africa. Phylogeographic analyses suggest that a large component of the present Khoisan gene pool is eastern African in origin and that Asia was the source of a back migration to sub-Saharan Africa. Haplogroup IX Y chromosomes appear to have been involved in such a migration, the traces of which can now be observed mostly in northern Cameroon.
View details for Web of Science ID 000175012400010
View details for PubMedID 11910562
- Geometry of the space of phylogenetic trees ADVANCES IN APPLIED MATHEMATICS 2001; 27 (4): 733-767
Statistical problems involving permutations with restricted positions
Symposium on State of the Art in Probability and Statistics: Festschrift for Willem R VanZwet
INST MATHEMATICAL STATISTICS. 2001: 195–222
View details for Web of Science ID 000175458600011
Analysis of a nonreversible Markov chain sampler
ANNALS OF APPLIED PROBABILITY
2000; 10 (3): 726-752
View details for Web of Science ID 000166779900003
Matchings and phylogenetic trees
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1998; 95 (25): 14600-14602
This paper presents a natural coordinate system for phylogenetic trees using a correspondence with the set of perfect matchings in the complete graph. This correspondence produces a distance between phylogenetic trees, and a way of enumerating all trees in a minimal step order. It is useful in randomized algorithms because it enables moves on the space of trees that make random optimization strategies "mix" quickly. It also promises a generalization to intermediary trees when data are not decisive as to their choice of tree, and a new way of constructing Bayesian priors on tree space.
View details for Web of Science ID 000077436700005
View details for PubMedID 9843935
The next linear collider test accelerator's rf pulse compression and transmission systems
17th Particle Accelerator Conference
IEEE. 1998: 3192–3194
View details for Web of Science ID 000079582801038
Are there still things to do in Bayesian statistics?
Conference on Probability, Dynamics and Causality
KLUWER ACADEMIC PUBL. 1997: 5–18
View details for Web of Science ID 000071894300002
Bootstrap confidence levels for phylogenetic trees (vol 93, pg 7085, 1996)
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1996; 93 (23): 13429-13434
Evolutionary trees are often estimated from DNA or RNA sequence data. How much confidence should we have in the estimated trees? In 1985, Felsenstein [Felsenstein, J. (1985) Evolution 39, 783-791] suggested the use of the bootstrap to answer this question. Felsenstein's method, which in concept is a straightforward application of the bootstrap, is widely used, but has been criticized as biased in the genetics literature. This paper concerns the use of the bootstrap in the tree problem. We show that Felsenstein's method is not biased, but that it can be corrected to better agree with standard ideas of confidence levels and hypothesis testing. These corrections can be made by using the more elaborate bootstrap method presented here, at the expense of considerably more computation.
View details for Web of Science ID A1996VT05400135
View details for PubMedID 8917608
GRAY CODES FOR RANDOMIZATION PROCEDURES
STATISTICS AND COMPUTING
1994; 4 (4): 287-302
View details for Web of Science ID A1994PW03200009