I am a bioinformatician and microbiologist interested in studying the human microbiome and fine-scale microbial population genetics. See my personal website for more info-

Stanford Advisors

All Publications

  • Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries. mSystems Olm, M. R., Crits-Christoph, A. n., Diamond, S. n., Lavy, A. n., Matheus Carnevali, P. B., Banfield, J. F. 2020; 5 (1)


    Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination.IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.

    View details for DOI 10.1128/mSystems.00731-19

    View details for PubMedID 31937678

    View details for PubMedCentralID PMC6967389

  • Necrotizing enterocolitis is preceded by increased gut bacterial replication, Klebsiella, and fimbriae-encoding bacteria SCIENCE ADVANCES Olm, M. R., Bhattacharya, N., Crits-Christoph, A., Firek, B. A., Baker, R., Song, Y. S., Morowitz, M. J., Banfield, J. F. 2019; 5 (12): eaax5727


    Necrotizing enterocolitis (NEC) is a devastating intestinal disease that occurs primarily in premature infants. We performed genome-resolved metagenomic analysis of 1163 fecal samples from premature infants to identify microbial features predictive of NEC. Features considered include genes, bacterial strain types, eukaryotes, bacteriophages, plasmids, and growth rates. A machine learning classifier found that samples collected before NEC diagnosis harbored significantly more Klebsiella, bacteria encoding fimbriae, and bacteria encoding secondary metabolite gene clusters related to quorum sensing and bacteriocin production. Notably, replication rates of all bacteria, especially Enterobacteriaceae, were significantly higher 2 days before NEC diagnosis. The findings uncover biomarkers that could lead to early detection of NEC and targets for microbiome-based therapeutics.

    View details for DOI 10.1126/sciadv.aax5727

    View details for Web of Science ID 000505069600043

    View details for PubMedID 31844663

    View details for PubMedCentralID PMC6905865

  • Genome-resolved metagenomics of eukaryotic populations during early colonization of premature infants and in hospital rooms MICROBIOME Olm, M. R., West, P. T., Brooks, B., Firek, B. A., Baker, R., Morowitz, M. J., Banfield, J. F. 2019; 7: 26


    Fungal infections are a significant cause of mortality and morbidity in hospitalized preterm infants, yet little is known about eukaryotic colonization of infants and of the neonatal intensive care unit as a possible source of colonizing strains. This is partly because microbiome studies often utilize bacterial 16S rRNA marker gene sequencing, a technique that is blind to eukaryotic organisms. Knowledge gaps exist regarding the phylogeny and microdiversity of eukaryotes that colonize hospitalized infants, as well as potential reservoirs of eukaryotes in the hospital room built environment.Genome-resolved analysis of 1174 time-series fecal metagenomes from 161 premature infants revealed fungal colonization of 10 infants. Relative abundance levels reached as high as 97% and were significantly higher in the first weeks of life (p = 0.004). When fungal colonization occurred, multiple species were present more often than expected by random chance (p = 0.008). Twenty-four metagenomic samples were analyzed from hospital rooms of six different infants. Compared to floor and surface samples, hospital sinks hosted diverse and highly variable communities containing genomically novel species, including from Diptera (fly) and Rhabditida (worm) for which genomes were assembled. With the exception of Diptera and two other organisms, zygosity of the newly assembled diploid eukaryote genomes was low. Interestingly, Malassezia and Candida species were present in both room and infant gut samples.Increased levels of fungal co-colonization may reflect synergistic interactions or differences in infant susceptibility to fungal colonization. Discovery of eukaryotic organisms that have not been sequenced previously highlights the benefit of genome-resolved analyses, and low zygosity of assembled genomes could reflect inbreeding or strong selection imposed by room conditions.

    View details for DOI 10.1186/s40168-019-0638-1

    View details for Web of Science ID 000458988100002

    View details for PubMedID 30770768

    View details for PubMedCentralID PMC6377789

  • dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication ISME JOURNAL Olm, M. R., Brown, C. T., Brooks, B., Banfield, J. F. 2017; 11 (12): 2864–68


    The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity. dRep achieves a 28 × increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly.

    View details for DOI 10.1038/ismej.2017.126

    View details for Web of Science ID 000415947900019

    View details for PubMedID 28742071

    View details for PubMedCentralID PMC5702732

  • Identical bacterial populations colonize premature infant gut, skin, and oral microbiomes and exhibit different in situ growth rates GENOME RESEARCH Olm, M. R., Brown, C. T., Brooks, B., Firek, B., Baker, R., Burstein, D., Soenjoyo, K., Thomas, B. C., Morowitz, M., Banfield, J. F. 2017; 27 (4): 601–12


    The initial microbiome impacts the health and future development of premature infants. Methodological limitations have led to gaps in our understanding of the habitat range and subpopulation complexity of founding strains, as well as how different body sites support microbial growth. Here, we used metagenomics to reconstruct genomes of strains that colonized the skin, mouth, and gut of two hospitalized premature infants during the first month of life. Seven bacterial populations, considered to be identical given whole-genome average nucleotide identity of >99.9%, colonized multiple body sites, yet none were shared between infants. Gut-associated Citrobacter koseri genomes harbored 47 polymorphic sites that we used to define 10 subpopulations, one of which appeared in the gut after 1 wk but did not spread to other body sites. Differential genome coverage was used to measure bacterial population replication rates in situ. In all cases where the same bacterial population was detected in multiple body sites, replication rates were faster in mouth and skin compared to the gut. The ability of identical strains to colonize multiple body sites underscores the habit flexibility of initial colonists, whereas differences in microbial replication rates between body sites suggest differences in host control and/or resource availability. Population genomic analyses revealed microdiversity within bacterial populations, implying initial inoculation by multiple individual cells with distinct genotypes. Overall, however, the overlap of strains across body sites implies that the premature infant microbiome can exhibit very low microbial diversity.

    View details for DOI 10.1101/gr.213256.116

    View details for Web of Science ID 000398058600010

    View details for PubMedID 28073918

    View details for PubMedCentralID PMC5378178

  • The Source and Evolutionary History of a Microbial Contaminant Identified Through Soil Metagenomic Analysis MBIO Olm, M. R., Butterfield, C. N., Copeland, A., Boles, T., Thomas, B. C., Banfield, J. F. 2017; 8 (1)


    In this study, strain-resolved metagenomics was used to solve a mystery. A 6.4-Mbp complete closed genome was recovered from a soil metagenome and found to be astonishingly similar to that of Delftia acidovorans SPH-1, which was isolated in Germany a decade ago. It was suspected that this organism was not native to the soil sample because it lacked the diversity that is characteristic of other soil organisms; this suspicion was confirmed when PCR testing failed to detect the bacterium in the original soil samples. D. acidovorans was also identified in 16 previously published metagenomes from multiple environments, but detailed-scale single nucleotide polymorphism analysis grouped these into five distinct clades. All of the strains indicated as contaminants fell into one clade. Fragment length anomalies were identified in paired reads mapping to the contaminant clade genotypes only. This finding was used to establish that the DNA was present in specific size selection reagents used during sequencing. Ultimately, the source of the contaminant was identified as bacterial biofilms growing in tubing. On the basis of direct measurement of the rate of fixation of mutations across the period of time in which contamination was occurring, we estimated the time of separation of the contaminant strain from the genomically sequenced ancestral population within a factor of 2. This research serves as a case study of high-resolution microbial forensics and strain tracking accomplished through metagenomics-based comparative genomics. The specific case reported here is unusual in that the study was conducted in the background of a soil metagenome and the conclusions were confirmed by independent methods.IMPORTANCE It is often important to determine the source of a microbial strain. Examples include tracking a bacterium linked to a disease epidemic, contaminating the food supply, or used in bioterrorism. Strain identification and tracking are generally approached by using cultivation-based or relatively nonspecific gene fingerprinting methods. Genomic methods have the ability to distinguish strains, but this approach typically has been restricted to isolates or relatively low-complexity communities. We demonstrate that strain-resolved metagenomics can be applied to extremely complex soil samples. We genotypically defined a soil-associated bacterium and identified it as a contaminant. By linking together snapshots of the bacterial genome over time, it was possible to estimate how long the contaminant had been diverging from a likely source population. The results are congruent with the derivation of the bacterium from a strain isolated in Germany and sequenced a decade ago and highlight the utility of metagenomics in strain tracking.

    View details for DOI 10.1128/mBio.01969-16

    View details for Web of Science ID 000395835000052

    View details for PubMedID 28223457

    View details for PubMedCentralID PMC5358914

  • Transporter genes in biosynthetic gene clusters predict metabolite characteristics and siderophore activity. Genome research Crits-Christoph, A., Bhattacharya, N., Olm, M. R., Song, Y. S., Banfield, J. F. 2020


    Biosynthetic gene clusters (BGCs) are operonic sets of microbial genes that synthesize specialized metabolites with diverse functions, including siderophores and antibiotics, which often require export to the extracellular environment. For this reason, genes for transport across cellular membranes are essential for the production of specialized metabolites, and are often genomically colocalized with BGCs. Here we conducted a comprehensive computational analysis of transporters associated with characterized BGCs. In addition to known exporters, in BGCs we found many importer-specific transmembrane domains that co-occur with substrate binding proteins possibly for uptake of siderophores or metabolic precursors. Machine learning models using transporter gene frequencies were predictive of known siderophore activity, molecular weights, and a measure of lipophilicity (log P) for corresponding BGC-synthesized metabolites. Transporter genes associated with BGCs were often equally or more predictive of metabolite features than biosynthetic genes. Given the importance of siderophores as pathogenicity factors, we used transporters specific for siderophore BGCs to identify both known and uncharacterized siderophore-like BGCs in genomes from metagenomes from the infant and adult gut microbiome. We find that 23% of microbial genomes from the infant gut have siderophore-like BGCs, but only 3% of those assembled from adult gut microbiomes do. While siderophore-like BGCs from the infant gut are predominantly associated with Enterobactericaee and Staphylococcus, siderophore-like BGCs can be identified from taxa in the adult gut microbiome that have rarely been recognized for siderophore production. Taken together, these results show that consideration of BGC-associated transporter genes can inform predictions of specialized metabolite structure and function.

    View details for DOI 10.1101/gr.268169.120

    View details for PubMedID 33361114

  • Soil bacterial populations are shaped by recombination and gene-specific selection across a grassland meadow. The ISME journal Crits-Christoph, A., Olm, M. R., Diamond, S., Bouma-Gregson, K., Banfield, J. F. 2020


    Soil microbial diversity is often studied from the perspective of community composition, but less is known about genetic heterogeneity within species. The relative impacts of clonal interference, gene-specific selection, and recombination in many abundant but rarely cultivated soil microbes remain unknown. Here we track genome-wide population genetic variation for 19 highly abundant bacterial species sampled from across a grassland meadow. Genomic inferences about population structure are made using the millions of sequencing reads that are assembled de novo into consensus genomes from metagenomes, as each read pair describes a short genomic sequence from a cell in each population. Genomic nucleotide identity of assembled genomes was significantly associated with local geography for over half of the populations studied, and for a majority of populations within-sample nucleotide diversity could often be as high as meadow-wide nucleotide diversity. Genes involved in metabolite biosynthesis and extracellular transport were characterized by elevated nucleotide diversity in multiple species. Microbial populations displayed varying degrees of homologous recombination and recombinant variants were often detected at 7-36% of loci genome-wide. Within multiple populations we identified genes with unusually high spatial differentiation of alleles, fewer recombinant events, elevated ratios of nonsynonymous to synonymous variants, and lower nucleotide diversity, suggesting recent selective sweeps for gene variants. Taken together, these results indicate that recombination and gene-specific selection commonly shape genetic variation in several understudied soil bacterial lineages.

    View details for DOI 10.1038/s41396-020-0655-x

    View details for PubMedID 32327732

  • Clades of huge phages from across Earth's ecosystems. Nature Al-Shayeb, B., Sachdeva, R., Chen, L., Ward, F., Munk, P., Devoto, A., Castelle, C. J., Olm, M. R., Bouma-Gregson, K., Amano, Y., He, C., Meheust, R., Brooks, B., Thomas, A., Lavy, A., Matheus-Carnevali, P., Sun, C., Goltsman, D. S., Borton, M. A., Sharrar, A., Jaffe, A. L., Nelson, T. C., Kantor, R., Keren, R., Lane, K. R., Farag, I. F., Lei, S., Finstad, K., Amundson, R., Anantharaman, K., Zhou, J., Probst, A. J., Power, M. E., Tringe, S. G., Li, W., Wrighton, K., Harrison, S., Morowitz, M., Relman, D. A., Doudna, J. A., Lehours, A., Warren, L., Cate, J. H., Santini, J. M., Banfield, J. F. 2020


    Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200kilobases (kb), including a genome of 735kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems.

    View details for DOI 10.1038/s41586-020-2007-4

    View details for PubMedID 32051592

  • Combined analysis of microbial metagenomic and metatranscriptomic sequencing data to assess in situ physiological conditions in the premature infant gut. PloS one Sher, Y., Olm, M. R., Raveh-Sadka, T., Brown, C. T., Sher, R., Firek, B., Baker, R., Morowitz, M. J., Banfield, J. F. 2020; 15 (3): e0229537


    Microbes alter their transcriptomic profiles in response to the environment. The physiological conditions experienced by a microbial community can thus be inferred using meta-transcriptomic sequencing by comparing transcription levels of specifically chosen genes. However, this analysis requires accurate reference genomes to identify the specific genes from which RNA reads originate. In addition, such an analysis should avoid biases in transcript counts related to differences in organism abundance. In this study we describe an approach to address these difficulties. Sample-specific meta-genomic assembled genomes (MAGs) were used as reference genomes to accurately identify the origin of RNA reads, and transcript ratios of genes with opposite transcription responses were compared to eliminate biases related to differences in organismal abundance, an approach hereafter named the "diametric ratio" method. We used this approach to probe the environmental conditions experienced by Escherichia spp. in the gut of 4 premature infants, 2 of whom developed necrotizing enterocolitis (NEC), a severe inflammatory intestinal disease. We analyzed twenty fecal samples taken from four premature infants (4-6 time points from each infant), and found significantly higher diametric ratios of genes associated with low oxygen levels in samples of infants later diagnosed with NEC than in samples without NEC. We also show this method can be used for examining other physiological conditions, such as exposure to nitric oxide and osmotic pressure. These study results should be treated with caution, due to the presence of confounding factors that might also distinguish between NEC and control infants. Nevertheless, together with benchmarking analyses, we show here that the diametric ratio approach can be applied for evaluating the physiological conditions experienced by microbes in situ. Results from similar studies can be further applied for designing diagnostic methods to detect NEC in its early developmental stages.

    View details for DOI 10.1371/journal.pone.0229537

    View details for PubMedID 32130257

  • Impacts of microbial assemblage and environmental conditions on the distribution of anatoxin-a producing cyanobacteria within a river network ISME JOURNAL Bouma-Gregson, K., Olm, M. R., Probst, A. J., Anantharaman, K., Power, M. E., Banfield, J. F. 2019; 13 (6): 1618–34


    Blooms of planktonic cyanobacteria have long been of concern in lakes, but more recently, harmful impacts of riverine benthic cyanobacterial mats been recognized. As yet, we know little about how various benthic cyanobacteria are distributed in river networks, or how environmental conditions or other associated microbes in their consortia affect their biosynthetic capacities. We performed metagenomic sequencing for 22 Oscillatoriales-dominated (Cyanobacteria) microbial mats collected across the Eel River network in Northern California and investigated factors associated with anatoxin-a producing cyanobacteria. All microbial communities were dominated by one or two cyanobacterial species, so the key mat metabolisms involve oxygenic photosynthesis and carbon oxidation. Only a few metabolisms fueled the growth of the mat communities, with little evidence for anaerobic metabolic pathways. We genomically defined four cyanobacterial species, all which shared <96% average nucleotide identity with reference Oscillatoriales genomes and are potentially novel species in the genus Microcoleus. One of the Microcoleus species contained the anatoxin-a biosynthesis genes, and we describe the first anatoxin-a gene cluster from the Microcoleus clade within Oscillatoriales. Occurrence of these four Microcoleus species in the watershed was correlated with total dissolved nitrogen and phosphorus concentrations, and the species that contains the anatoxin-a gene cluster was found in sites with higher nitrogen concentrations. Microbial assemblages in mat samples with the anatoxin-a gene cluster consistently had a lower abundance of Burkholderiales (Betaproteobacteria) species than did mats without the anatoxin-producing genes. The associations of water nutrient concentrations and certain co-occurring microbes with anatoxin-a producing Microcoleus motivate further exploration for their roles as potential controls on the distributions of toxigenic benthic cyanobacteria in river networks.

    View details for DOI 10.1038/s41396-019-0374-3

    View details for Web of Science ID 000468529400018

    View details for PubMedID 30809011

    View details for PubMedCentralID PMC6776057

  • Megaphages infect Prevotella and variants are widespread in gut microbiomes NATURE MICROBIOLOGY Devoto, A. E., Santini, J. M., Olm, M. R., Anantharaman, K., Munk, P., Tung, J., Archie, E. A., Turnbaugh, P., Seed, K. D., Blekhman, R., Aarestrup, F. M., Thomas, B. C., Banfield, J. F. 2019; 4 (4): 693–700


    Bacteriophages (phages) dramatically shape microbial community composition, redistribute nutrients via host lysis and drive evolution through horizontal gene transfer. Despite their importance, much remains to be learned about phages in the human microbiome. We investigated the gut microbiomes of humans from Bangladesh and Tanzania, two African baboon social groups and Danish pigs; many of these microbiomes contain phages belonging to a clade with genomes >540 kilobases in length, the largest yet reported in the human microbiome and close to the maximum size ever reported for phages. We refer to these as Lak phages. CRISPR spacer targeting indicates that Lak phages infect bacteria of the genus Prevotella. We manually curated to completion 15 distinct Lak phage genomes recovered from metagenomes. The genomes display several interesting features, including use of an alternative genetic code, large intergenic regions that are highly expressed and up to 35 putative transfer RNAs, some of which contain enigmatic introns. Different individuals have distinct phage genotypes, and shifts in variant frequencies over consecutive sampling days reflect changes in the relative abundance of phage subpopulations. Recent homologous recombination has resulted in extensive genome admixture of nine baboon Lak phage populations. We infer that Lak phages are widespread in gut communities that contain the Prevotella species, and conclude that megaphages, with fascinating and underexplored biology, may be common but largely overlooked components of human and animal gut microbiomes.

    View details for DOI 10.1038/s41564-018-0338-9

    View details for Web of Science ID 000461999200022

    View details for PubMedID 30692672

    View details for PubMedCentralID PMC6784885

  • Hydrogen-based metabolism as an ancestral trait in lineages sibling to the Cyanobacteria NATURE COMMUNICATIONS Carnevali, P., Schulz, F., Castelle, C. J., Kantor, R. S., Shih, P. M., Sharon, I., Santini, J. M., Olm, M. R., Amano, Y., Thomas, B. C., Anantharaman, K., Burstein, D., Becraft, E. D., Stepanauskas, R., Woyke, T., Banfield, J. F. 2019; 10: 463


    The evolution of aerobic respiration was likely linked to the origins of oxygenic Cyanobacteria. Close phylogenetic neighbors to Cyanobacteria, such as Margulisbacteria (RBX-1 and ZB3), Saganbacteria (WOR-1), Melainabacteria and Sericytochromatia, may constrain the metabolic platform in which aerobic respiration arose. Here, we analyze genomic sequences and predict that sediment-associated Margulisbacteria have a fermentation-based metabolism featuring a variety of hydrogenases, a streamlined nitrogenase, and electron bifurcating complexes involved in cycling of reducing equivalents. The genomes of ocean-associated Margulisbacteria encode an electron transport chain that may support aerobic growth. Some Saganbacteria genomes encode various hydrogenases, and others may be able to use O2 under certain conditions via a putative novel type of heme copper O2 reductase. Similarly, Melainabacteria have diverse energy metabolisms and are capable of fermentation and aerobic or anaerobic respiration. The ancestor of all these groups may have been an anaerobe in which fermentation and H2 metabolism were central metabolic features. The ability to use O2 as a terminal electron acceptor must have been subsequently acquired by these lineages.

    View details for DOI 10.1038/s41467-018-08246-y

    View details for Web of Science ID 000456828500001

    View details for PubMedID 30692531

    View details for PubMedCentralID PMC6349859

  • The developing premature infant gut microbiome is a major factor shaping the microbiome of neonatal intensive care unit rooms MICROBIOME Brooks, B., Olm, M. R., Firek, B. A., Baker, R., Geller-McGrath, D., Reimer, S. R., Soenjoyo, K. R., Yip, J. S., Dahan, D., Thomas, B. C., Morowitz, M. J., Bonfield, J. F. 2018; 6: 112


    The neonatal intensive care unit (NICU) contains a unique cohort of patients with underdeveloped immune systems and nascent microbiome communities. Patients often spend several months in the same room, and it has been previously shown that the gut microbiomes of these infants often resemble the microbes found in the NICU. Little is known, however, about the identity, persistence, and absolute abundance of NICU room-associated bacteria over long stretches of time. Here, we couple droplet digital PCR (ddPCR), 16S rRNA gene surveys, and recently published metagenomics data from infant gut samples to infer the extent to which the NICU microbiome is shaped by its room occupants.Over 2832 swabs, wipes, and air samples were collected from 16 private-style NICU rooms housing very low birth weight (< 1500 g), premature (< 31 weeks' gestation) infants. For each infant, room samples were collected daily, Monday through Friday, for 1 month. The first samples from the first infant and the last samples from the last infant were collected 383 days apart. Twenty-two NICU locations spanning room surfaces, hands, electronics, sink basins, and air were collected. Results point to an incredibly simple room community where 5-10 taxa, mostly skin-associated, account for over 50% of the amplicon reads. Biomass estimates reveal four to five orders of magnitude difference between the least to the most dense microbial communities, air, and sink basins, respectively. Biomass trends from bioaerosol samples and petri dish dust collectors suggest occupancy to be a main driver of suspended biological particles within the NICU. Using a machine learning algorithm to classify the origin of room samples, we show that each room has a unique microbial fingerprint. Several important taxa driving this model were dominant gut colonizers of infants housed within each room.Despite regular cleaning of hospital surfaces, bacterial biomass was detectable at varying densities. A room-specific microbiome signature was detected, suggesting microbes seeding NICU surfaces are sourced from reservoirs within the room and that these reservoirs contain actively dividing cells. Collectively, the data suggests that hospitalized infants, in combination with their caregivers, shape the microbiome of NICU rooms.

    View details for PubMedID 29925423

  • Hospitalized Premature Infants Are Colonized by Related Bacterial Strains with Distinct Proteomic Profiles MBIO Brown, C. T., Xiong, W., Olm, M. R., Thomas, B. C., Baker, R., Firek, B., Morowitz, M. J., Hettich, R. L., Banfield, J. F. 2018; 9 (2)


    During the first weeks of life, microbial colonization of the gut impacts human immune system maturation and other developmental processes. In premature infants, aberrant colonization has been implicated in the onset of necrotizing enterocolitis (NEC), a life-threatening intestinal disease. To study the premature infant gut colonization process, genome-resolved metagenomics was conducted on 343 fecal samples collected during the first 3 months of life from 35 premature infants housed in a neonatal intensive care unit, 14 of whom developed NEC, and metaproteomic measurements were made on 87 samples. Microbial community composition and proteomic profiles remained relatively stable on the time scale of a week, but the proteome was more variable. Although genetically similar organisms colonized many infants, most infants were colonized by distinct strains with metabolic profiles that could be distinguished using metaproteomics. Microbiome composition correlated with infant, antibiotics administration, and NEC diagnosis. Communities were found to cluster into seven primary types, and community type switched within infants, sometimes multiple times. Interestingly, some communities sampled from the same infant at subsequent time points clustered with those of other infants. In some cases, switches preceded onset of NEC; however, no species or community type could account for NEC across the majority of infants. In addition to a correlation of protein abundances with organism replication rates, we found that organism proteomes correlated with overall community composition. Thus, this genome-resolved proteomics study demonstrated that the contributions of individual organisms to microbiome development depend on microbial community context.IMPORTANCE Humans are colonized by microbes at birth, a process that is important to health and development. However, much remains to be known about the fine-scale microbial dynamics that occur during the colonization period. We conducted a genome-resolved study of microbial community composition, replication rates, and proteomes during the first 3 months of life of both healthy and sick premature infants. Infants were found to be colonized by similar microbes, but each underwent a distinct colonization trajectory. Interestingly, related microbes colonizing different infants were found to have distinct proteomes, indicating that microbiome function is not only driven by which organisms are present, but also largely depends on microbial responses to the unique set of physiological conditions in the infant gut.

    View details for DOI 10.1128/mBio.00441-18

    View details for Web of Science ID 000431279600073

    View details for PubMedID 29636439

    View details for PubMedCentralID PMC5893878

  • Machine Learning Leveraging Genomes from Metagenomes Identifies Influential Antibiotic Resistance Genes in the Infant Gut Microbiome. mSystems Rahman, S. F., Olm, M. R., Morowitz, M. J., Banfield, J. F. 2018; 3 (1)


    Antibiotic resistance in pathogens is extensively studied, and yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leveraged genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We found that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants than C. difficile strains lacking this gene. Organisms with genes for major facilitator superfamily drug efflux pumps have higher replication rates under all conditions, even in the absence of antibiotic therapy. Using a machine learning approach, we identified genes that are predictive of an organism's direction of change in relative abundance after administration of vancomycin and cephalosporin antibiotics. The most accurate results were obtained by reducing annotated genomic data to five principal components classified by boosted decision trees. Among the genes involved in predicting whether an organism increased in relative abundance after treatment are those that encode subclass B2 beta-lactamases and transcriptional regulators of vancomycin resistance. This demonstrates that machine learning applied to genome-resolved metagenomics data can identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration. IMPORTANCE The process of reconstructing genomes from environmental sequence data (genome-resolved metagenomics) allows unique insight into microbial systems. We apply this technique to investigate how the antibiotic resistance genes of bacteria affect their ability to flourish in the gut under various conditions. Our analysis reveals that strain-level selection in formula-fed infants drives enrichment of beta-lactamase genes in the gut resistome. Using genomes from metagenomes, we built a machine learning model to predict how organisms in the gut microbial community respond to perturbation by antibiotics. This may eventually have clinical applications.

    View details for DOI 10.1128/mSystems.00123-17

    View details for PubMedID 29359195

    View details for PubMedCentralID PMC5758725

  • Strain-resolved analysis of hospital rooms and infants reveals overlap between the human and room microbiome NATURE COMMUNICATIONS Brooks, B., Olm, M. R., Firek, B. A., Baker, R., Thomas, B. C., Morowitz, M. J., Banfield, J. F. 2017; 8: 1814


    Preterm infants exhibit different microbiome colonization patterns relative to full-term infants, and it is speculated that the hospital room environment may contribute to infant microbiome development. Here, we present a genome-resolved metagenomic study of microbial genotypes from the gastrointestinal tracts of infants and from the neonatal intensive care unit (NICU) room environment. Some strains detected in hospitalized infants also occur in sinks and on surfaces, and belong to species such as Staphylococcus epidermidis, Enterococcus faecalis, Pseudomonas aeruginosa, and Klebsiella pneumoniae, which are frequently implicated in nosocomial infection and preterm infant gut colonization. Of the 15 K. pneumoniae strains detected in the study, four were detected in both infant gut and room samples. Time series experiments showed that nearly all strains associated with infant gut colonization can be detected in the room after, and often before, detection in the gut. Thus, we conclude that a component of premature infant gut colonization is the cycle of microbial exchange between the room and the occupant.

    View details for DOI 10.1038/s41467-017-02018-w

    View details for Web of Science ID 000416293600030

    View details for PubMedID 29180750

    View details for PubMedCentralID PMC5703836

  • Measurement of bacterial replication rates in microbial communities NATURE BIOTECHNOLOGY Brown, C. T., Olm, M. R., Thomas, B. C., Banfield, J. F. 2016; 34 (12): 1256–63


    Culture-independent microbiome studies have increased our understanding of the complexity and metabolic potential of microbial communities. However, to understand the contribution of individual microbiome members to community functions, it is important to determine which bacteria are actively replicating. We developed an algorithm, iRep, that uses draft-quality genome sequences and single time-point metagenome sequencing to infer microbial population replication rates. The algorithm calculates an index of replication (iRep) based on the sequencing coverage trend that results from bi-directional genome replication from a single origin of replication. We apply this method to show that microbial replication rates increase after antibiotic administration in human infants. We also show that uncultivated, groundwater-associated, Candidate Phyla Radiation bacteria only rarely replicate quickly in subsurface communities undergoing substantial changes in geochemistry. Our method can be applied to any genome-resolved microbiome study to track organism responses to varying conditions, identify actively growing populations and measure replication rates for use in modeling studies.

    View details for DOI 10.1038/nbt.3704

    View details for Web of Science ID 000390185300016

    View details for PubMedID 27819664

    View details for PubMedCentralID PMC5538567

  • Function, expression, specificity, diversity and incompatibility of actinobacteriophage parABS systems MOLECULAR MICROBIOLOGY Dedrick, R. M., Mavrich, T. N., Ng, W. L., Reyes, J., Olm, M. R., Rush, R. E., Jacobs-Sera, D., Russell, D. A., Hatfull, G. F. 2016; 101 (4): 625–44


    More than 180 individual phages infecting hosts in the phylum Actinobacteria have been sequenced and grouped into Cluster A because of their similar overall nucleotide sequences and genome architectures. These Cluster A phages are either temperate or derivatives of temperate parents, and most have an integration cassette near the centre of the genome containing an integrase gene and attP. However, about 20% of the phages lack an integration cassette, which is replaced by a 1.4 kbp segment with predicted partitioning functions, including plasmid-like parA and parB genes. Phage RedRock forms stable lysogens in Mycobacterium smegmatis in which the prophage replicates at 2.4 copies/chromosome and the partitioning system confers prophage maintenance. The parAB genes are expressed upon RedRock infection of M. smegmatis, but are downregulated once lysogeny is established by binding of RedRock ParB to parS-L, one of two centromere-like sites flanking the parAB genes. The RedRock parS-L and parS-R sites are composed of eight directly repeated copies of an 8 bp motif that is recognized by ParB. The actinobacteriophage parABS cassettes span considerable sequence diversity and specificity, providing a suite of tools for use in mycobacterial genetics.

    View details for DOI 10.1111/mmi.13414

    View details for Web of Science ID 000382541900007

    View details for PubMedID 27146086

    View details for PubMedCentralID PMC4998052

  • Cluster M Mycobacteriophages Bongo, PegLeg, and Rey with Unusually Large Repertoires of tRNA Isotypes JOURNAL OF VIROLOGY Pope, W. H., Anders, K. R., Baird, M., Bowman, C. A., Boyle, M. M., Broussard, G. W., Chow, T., Clase, K. L., Cooper, S., Cornely, K. A., DeJong, R. J., Delesalle, V. A., Deng, L., Dunbar, D., Edgington, N. P., Ferreira, C. M., Hafer, K., Hartzog, G. A., Hatherill, J., Hughes, L. E., Ipapo, K., Krukonis, G. P., Meier, C. G., Monti, D. L., Olm, M. R., Page, S. T., Peebles, C. L., Rinehart, C. A., Rubin, M. R., Russell, D. A., Sanders, E. R., Schoer, M., Shaffer, C. D., Wherley, J., Vazquez, E., Yuan, H., Zhang, D., Cresawn, S. G., Jacobs-Sera, D., Hendrix, R. W., Hatfull, G. F. 2014; 88 (5): 2461–80


    Genomic analysis of a large set of phages infecting the common host Mycobacterium smegmatis mc(2)155 shows that they span considerable genetic diversity. There are more than 20 distinct types that lack nucleotide similarity with each other, and there is considerable diversity within most of the groups. Three newly isolated temperate mycobacteriophages, Bongo, PegLeg, and Rey, constitute a new group (cluster M), with the closely related phages Bongo and PegLeg forming subcluster M1 and the more distantly related Rey forming subcluster M2. The cluster M mycobacteriophages have siphoviral morphologies with unusually long tails, are homoimmune, and have larger than average genomes (80.2 to 83.7 kbp). They exhibit a variety of features not previously described in other mycobacteriophages, including noncanonical genome architectures and several unusual sets of conserved repeated sequences suggesting novel regulatory systems for both transcription and translation. In addition to containing transfer-messenger RNA and RtcB-like RNA ligase genes, their genomes encode 21 to 24 tRNA genes encompassing complete or nearly complete sets of isotypes. We predict that these tRNAs are used in late lytic growth, likely compensating for the degradation or inadequacy of host tRNAs. They may represent a complete set of tRNAs necessary for late lytic growth, especially when taken together with the apparent lack of codons in the same late genes that correspond to tRNAs that the genomes of the phages do not obviously encode.The bacteriophage population is vast, dynamic, and old and plays a central role in bacterial pathogenicity. We know surprisingly little about the genetic diversity of the phage population, although metagenomic and phage genome sequencing indicates that it is great. Probing the depth of genetic diversity of phages of a common host, Mycobacterium smegmatis, provides a higher resolution of the phage population and how it has evolved. Three new phages constituting a new cluster M further expand the diversity of the mycobacteriophages and introduce novel features. As such, they provide insights into phage genome architecture, virion structure, and gene regulation at the transcriptional and translational levels.

    View details for DOI 10.1128/JVI.03363-13

    View details for Web of Science ID 000331131700010

    View details for PubMedID 24335314

    View details for PubMedCentralID PMC3958112