Basic Life Science Research Associate, Statistics
REPRODUCIBLE RESEARCH WORKFLOW IN R FOR THE ANALYSIS OF PERSONALIZED HUMAN MICROBIOME DATA.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2016; 21: 183-194
This article presents a reproducible research workflow for amplicon-based microbiome studies in personalized medicine created using Bioconductor packages and the knitr markdown interface.We show that sometimes a multiplicity of choices and lack of consistent documentation at each stage of the sequential processing pipeline used for the analysis of microbiome data can lead to spurious results. We propose its replacement with reproducible and documented analysis using R packages dada2, knitr, and phyloseq. This workflow implements both key stages of amplicon analysis: the initial filtering and denoising steps needed to construct taxonomic feature tables from error-containing sequencing reads (dada2), and the exploratory and inferential analysis of those feature tables and associated sample metadata (phyloseq). This workow facilitates reproducible interrogation of the full set of choices required in microbiome studies. We present several examples in which we leverage existing packages for analysis in a way that allows easy sharing and modification by others, and give pointers to articles that depend on this reproducible workflow for the study of longitudinal and spatial series analyses of the vaginal microbiome in pregnancy and the oral microbiome in humans with healthy dentition and intra-oral tissues.
View details for PubMedID 26776185
Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea.
2016; 7: 10516-?
Marine mammals play crucial ecological roles in the oceans, but little is known about their microbiotas. Here we study the bacterial communities in 337 samples from 5 body sites in 48 healthy dolphins and 18 healthy sea lions, as well as those of adjacent seawater and other hosts. The bacterial taxonomic compositions are distinct from those of other mammals, dietary fish and seawater, are highly diverse and vary according to body site and host species. Dolphins harbour 30 bacterial phyla, with 25 of them in the mouth, several abundant but poorly characterized Tenericutes species in gastric fluid and a surprisingly paucity of Bacteroidetes in distal gut. About 70% of near-full length bacterial 16S ribosomal RNA sequences from dolphins are unique. Host habitat, diet and phylogeny all contribute to variation in marine mammal distal gut microbiota composition. Our findings help elucidate the factors structuring marine mammal microbiotas and may enhance monitoring of marine mammal health.
View details for DOI 10.1038/ncomms10516
View details for PubMedID 26839246
Temporal and spatial variation of the human microbiota during pregnancy
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2015; 112 (35): 11060-11065
Despite the critical role of the human microbiota in health, our understanding of microbiota compositional dynamics during and after pregnancy is incomplete. We conducted a case-control study of 49 pregnant women, 15 of whom delivered preterm. From 40 of these women, we analyzed bacterial taxonomic composition of 3,767 specimens collected prospectively and weekly during gestation and monthly after delivery from the vagina, distal gut, saliva, and tooth/gum. Linear mixed-effects modeling, medoid-based clustering, and Markov chain modeling were used to analyze community temporal trends, community structure, and vaginal community state transitions. Microbiota community taxonomic composition and diversity remained remarkably stable at all four body sites during pregnancy (P > 0.05 for trends over time). Prevalence of a Lactobacillus-poor vaginal community state type (CST 4) was inversely correlated with gestational age at delivery (P = 0.0039). Risk for preterm birth was more pronounced for subjects with CST 4 accompanied by elevated Gardnerella or Ureaplasma abundances. This finding was validated with a set of 246 vaginal specimens from nine women (four of whom delivered preterm). Most women experienced a postdelivery disturbance in the vaginal community characterized by a decrease in Lactobacillus species and an increase in diverse anaerobes such as Peptoniphilus, Prevotella, and Anaerococcus species. This disturbance was unrelated to gestational age at delivery and persisted for up to 1 y. These findings have important implications for predicting premature labor, a major global health problem, and for understanding the potential impact of a persistent, altered postpartum microbiota on maternal health, including outcomes of pregnancies following short interpregnancy intervals.
View details for DOI 10.1073/pnas.1502875112
View details for Web of Science ID 000360383200068
- Rapid evolution of adaptive niche construction in experimental microbial populations EVOLUTION 2014; 68 (11): 3307-3316
Evolutionary dynamics and information hierarchies in biological systems
CONFERENCE REPORTS: EVOLUTIONARY DYNAMICS AND INFORMATION HIERARCHIES IN BIOLOGICAL SYSTEMS: ASPEN CENTER FOR PHYSICS WORKSHOP AND CRACKING THE NEURAL CODE: THIRD ANNUAL ASPEN BRAIN FORUMS
2013; 1305: 1-17
The study of evolution has entered a revolutionary new era, where quantitative and predictive methods are transforming the traditionally qualitative and retrospective approaches of the past. Genomic sequencing and modern computational techniques are permitting quantitative comparisons between variation in the natural world and predictions rooted in neo-Darwinian theory, revealing the shortcomings of current evolutionary theory, particularly with regard to large-scale phenomena like macroevolution. Current research spanning and uniting diverse fields and exploring the physical and chemical nature of organisms across temporal, spatial, and organizational scales is replacing the model of evolution as a passive filter selecting for random changes at the nucleotide level with a paradigm in which evolution is a dynamic process both constrained and driven by the informational architecture of organisms across scales, from DNA and chromatin regulation to interactions within and between species and the environment.
View details for DOI 10.1111/nyas.12140
View details for Web of Science ID 000329568700001
View details for PubMedID 23691975
Denoising PCR-amplified metagenome data
PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy.We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche's 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise.DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.
View details for DOI 10.1186/1471-2105-13-283
View details for Web of Science ID 000314687600001
View details for PubMedID 23113967
The length scale of selection in protein evolution
2012; 6 (1): 16-20
Central to the study of molecular evolution, and an area of long-standing debate, is the appropriate model for the fitness landscape of proteins. Much of this debate has focused on the strength and frequency of positive and purifying selection, but the form and frequency of selective correlations is also a vital element. The constituent amino acids within a protein generically interact and share selective pressures in predictable ways, which conflicts with the selective independence assumed by common caricatures of the fitness landscape. Here, I discuss a recent study by myself and coauthors that used whole-genome comparisons of orthologous molecular sequences from closely related Drosophilids to explore the form of the selective correlations and selective interactions (epistasis) between the amino acids within a protein. I outline our results and highlight our finding of a selective length scale of ten amino acids within which individual amino acids are substantially and generically more likely to share selective pressures and interact epistatically. I then focus on the evidence presented in our study supporting a substantial role for epistasis in the process of molecular evolution, and discuss further the implications of this widespread epistasis on the overdispersion of the molecular clock and the efficacy of common tests for positive selection.
View details for DOI 10.4161/fly.18305
View details for Web of Science ID 000304412700003
View details for PubMedID 22198524
Heterozygote advantage as a natural consequence of adaptation in diploids
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (51): 20666-20671
Molecular adaptation is typically assumed to proceed by sequential fixation of beneficial mutations. In diploids, this picture presupposes that for most adaptive mutations, the homozygotes have a higher fitness than the heterozygotes. Here, we show that contrary to this expectation, a substantial proportion of adaptive mutations should display heterozygote advantage. This feature of adaptation in diploids emerges naturally from the primary importance of the fitness of heterozygotes for the invasion of new adaptive mutations. We formalize this result in the framework of Fisher's influential geometric model of adaptation. We find that in diploids, adaptation should often proceed through a succession of short-lived balanced states that maintain substantially higher levels of phenotypic and fitness variation in the population compared with classic adaptive walks. In fast-changing environments, this variation produces a diversity advantage that allows diploids to remain better adapted compared with haploids despite the disadvantage associated with the presence of unfit homozygotes. The short-lived balanced states arising during adaptive walks should be mostly invisible to current scans for long-term balancing selection. Instead, they should leave signatures of incomplete selective sweeps, which do appear to be common in many species. Our results also raise the possibility that balancing selection, as a natural consequence of frequent adaptation, might play a more prominent role among the forces maintaining genetic variation than is commonly recognized.
View details for DOI 10.1073/pnas.1114573108
View details for Web of Science ID 000298289400081
View details for PubMedID 22143780
Correlated Evolution of Nearby Residues in Drosophilid Proteins
2011; 7 (2)
Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are "clustered" along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ?10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a "correlation length" associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction.
View details for DOI 10.1371/journal.pgen.1001315
View details for Web of Science ID 000287697300030
View details for PubMedID 21383965