I am a bioinformatician and microbiologist interested in studying the human microbiome and fine-scale microbial population genetics.
Doctor of Philosophy, University of California Berkeley (2019)
Bachelor of Science, University of Pittsburgh (2014)
PhD, The University of California, Berkeley, Microbiology (2019)
BS, University of Pittsburgh, Microbiology (2014)
Justin Sonnenburg, Postdoctoral Faculty Sponsor
Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries.
2020; 5 (1)
Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination.IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.
View details for DOI 10.1128/mSystems.00731-19
View details for PubMedID 31937678
View details for PubMedCentralID PMC6967389
Necrotizing enterocolitis is preceded by increased gut bacterial replication, Klebsiella, and fimbriae-encoding bacteria
2019; 5 (12): eaax5727
Necrotizing enterocolitis (NEC) is a devastating intestinal disease that occurs primarily in premature infants. We performed genome-resolved metagenomic analysis of 1163 fecal samples from premature infants to identify microbial features predictive of NEC. Features considered include genes, bacterial strain types, eukaryotes, bacteriophages, plasmids, and growth rates. A machine learning classifier found that samples collected before NEC diagnosis harbored significantly more Klebsiella, bacteria encoding fimbriae, and bacteria encoding secondary metabolite gene clusters related to quorum sensing and bacteriocin production. Notably, replication rates of all bacteria, especially Enterobacteriaceae, were significantly higher 2 days before NEC diagnosis. The findings uncover biomarkers that could lead to early detection of NEC and targets for microbiome-based therapeutics.
View details for DOI 10.1126/sciadv.aax5727
View details for Web of Science ID 000505069600043
View details for PubMedID 31844663
View details for PubMedCentralID PMC6905865
Genome-resolved metagenomics of eukaryotic populations during early colonization of premature infants and in hospital rooms
2019; 7: 26
Fungal infections are a significant cause of mortality and morbidity in hospitalized preterm infants, yet little is known about eukaryotic colonization of infants and of the neonatal intensive care unit as a possible source of colonizing strains. This is partly because microbiome studies often utilize bacterial 16S rRNA marker gene sequencing, a technique that is blind to eukaryotic organisms. Knowledge gaps exist regarding the phylogeny and microdiversity of eukaryotes that colonize hospitalized infants, as well as potential reservoirs of eukaryotes in the hospital room built environment.Genome-resolved analysis of 1174 time-series fecal metagenomes from 161 premature infants revealed fungal colonization of 10 infants. Relative abundance levels reached as high as 97% and were significantly higher in the first weeks of life (p = 0.004). When fungal colonization occurred, multiple species were present more often than expected by random chance (p = 0.008). Twenty-four metagenomic samples were analyzed from hospital rooms of six different infants. Compared to floor and surface samples, hospital sinks hosted diverse and highly variable communities containing genomically novel species, including from Diptera (fly) and Rhabditida (worm) for which genomes were assembled. With the exception of Diptera and two other organisms, zygosity of the newly assembled diploid eukaryote genomes was low. Interestingly, Malassezia and Candida species were present in both room and infant gut samples.Increased levels of fungal co-colonization may reflect synergistic interactions or differences in infant susceptibility to fungal colonization. Discovery of eukaryotic organisms that have not been sequenced previously highlights the benefit of genome-resolved analyses, and low zygosity of assembled genomes could reflect inbreeding or strong selection imposed by room conditions.
View details for DOI 10.1186/s40168-019-0638-1
View details for Web of Science ID 000458988100002
View details for PubMedID 30770768
View details for PubMedCentralID PMC6377789
dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication
2017; 11 (12): 2864–68
The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity. dRep achieves a 28 × increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly.
View details for DOI 10.1038/ismej.2017.126
View details for Web of Science ID 000415947900019
View details for PubMedID 28742071
View details for PubMedCentralID PMC5702732
Identical bacterial populations colonize premature infant gut, skin, and oral microbiomes and exhibit different in situ growth rates
2017; 27 (4): 601–12
The initial microbiome impacts the health and future development of premature infants. Methodological limitations have led to gaps in our understanding of the habitat range and subpopulation complexity of founding strains, as well as how different body sites support microbial growth. Here, we used metagenomics to reconstruct genomes of strains that colonized the skin, mouth, and gut of two hospitalized premature infants during the first month of life. Seven bacterial populations, considered to be identical given whole-genome average nucleotide identity of >99.9%, colonized multiple body sites, yet none were shared between infants. Gut-associated Citrobacter koseri genomes harbored 47 polymorphic sites that we used to define 10 subpopulations, one of which appeared in the gut after 1 wk but did not spread to other body sites. Differential genome coverage was used to measure bacterial population replication rates in situ. In all cases where the same bacterial population was detected in multiple body sites, replication rates were faster in mouth and skin compared to the gut. The ability of identical strains to colonize multiple body sites underscores the habit flexibility of initial colonists, whereas differences in microbial replication rates between body sites suggest differences in host control and/or resource availability. Population genomic analyses revealed microdiversity within bacterial populations, implying initial inoculation by multiple individual cells with distinct genotypes. Overall, however, the overlap of strains across body sites implies that the premature infant microbiome can exhibit very low microbial diversity.
View details for DOI 10.1101/gr.213256.116
View details for Web of Science ID 000398058600010
View details for PubMedID 28073918
View details for PubMedCentralID PMC5378178
The Source and Evolutionary History of a Microbial Contaminant Identified Through Soil Metagenomic Analysis
2017; 8 (1)
In this study, strain-resolved metagenomics was used to solve a mystery. A 6.4-Mbp complete closed genome was recovered from a soil metagenome and found to be astonishingly similar to that of Delftia acidovorans SPH-1, which was isolated in Germany a decade ago. It was suspected that this organism was not native to the soil sample because it lacked the diversity that is characteristic of other soil organisms; this suspicion was confirmed when PCR testing failed to detect the bacterium in the original soil samples. D. acidovorans was also identified in 16 previously published metagenomes from multiple environments, but detailed-scale single nucleotide polymorphism analysis grouped these into five distinct clades. All of the strains indicated as contaminants fell into one clade. Fragment length anomalies were identified in paired reads mapping to the contaminant clade genotypes only. This finding was used to establish that the DNA was present in specific size selection reagents used during sequencing. Ultimately, the source of the contaminant was identified as bacterial biofilms growing in tubing. On the basis of direct measurement of the rate of fixation of mutations across the period of time in which contamination was occurring, we estimated the time of separation of the contaminant strain from the genomically sequenced ancestral population within a factor of 2. This research serves as a case study of high-resolution microbial forensics and strain tracking accomplished through metagenomics-based comparative genomics. The specific case reported here is unusual in that the study was conducted in the background of a soil metagenome and the conclusions were confirmed by independent methods.IMPORTANCE It is often important to determine the source of a microbial strain. Examples include tracking a bacterium linked to a disease epidemic, contaminating the food supply, or used in bioterrorism. Strain identification and tracking are generally approached by using cultivation-based or relatively nonspecific gene fingerprinting methods. Genomic methods have the ability to distinguish strains, but this approach typically has been restricted to isolates or relatively low-complexity communities. We demonstrate that strain-resolved metagenomics can be applied to extremely complex soil samples. We genotypically defined a soil-associated bacterium and identified it as a contaminant. By linking together snapshots of the bacterial genome over time, it was possible to estimate how long the contaminant had been diverging from a likely source population. The results are congruent with the derivation of the bacterium from a strain isolated in Germany and sequenced a decade ago and highlight the utility of metagenomics in strain tracking.
View details for DOI 10.1128/mBio.01969-16
View details for Web of Science ID 000395835000052
View details for PubMedID 28223457
View details for PubMedCentralID PMC5358914
Clades of huge phages from across Earth's ecosystems.
Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200kilobases (kb), including a genome of 735kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems.
View details for DOI 10.1038/s41586-020-2007-4
View details for PubMedID 32051592
Impacts of microbial assemblage and environmental conditions on the distribution of anatoxin-a producing cyanobacteria within a river network
2019; 13 (6): 1618–34
Blooms of planktonic cyanobacteria have long been of concern in lakes, but more recently, harmful impacts of riverine benthic cyanobacterial mats been recognized. As yet, we know little about how various benthic cyanobacteria are distributed in river networks, or how environmental conditions or other associated microbes in their consortia affect their biosynthetic capacities. We performed metagenomic sequencing for 22 Oscillatoriales-dominated (Cyanobacteria) microbial mats collected across the Eel River network in Northern California and investigated factors associated with anatoxin-a producing cyanobacteria. All microbial communities were dominated by one or two cyanobacterial species, so the key mat metabolisms involve oxygenic photosynthesis and carbon oxidation. Only a few metabolisms fueled the growth of the mat communities, with little evidence for anaerobic metabolic pathways. We genomically defined four cyanobacterial species, all which shared <96% average nucleotide identity with reference Oscillatoriales genomes and are potentially novel species in the genus Microcoleus. One of the Microcoleus species contained the anatoxin-a biosynthesis genes, and we describe the first anatoxin-a gene cluster from the Microcoleus clade within Oscillatoriales. Occurrence of these four Microcoleus species in the watershed was correlated with total dissolved nitrogen and phosphorus concentrations, and the species that contains the anatoxin-a gene cluster was found in sites with higher nitrogen concentrations. Microbial assemblages in mat samples with the anatoxin-a gene cluster consistently had a lower abundance of Burkholderiales (Betaproteobacteria) species than did mats without the anatoxin-producing genes. The associations of water nutrient concentrations and certain co-occurring microbes with anatoxin-a producing Microcoleus motivate further exploration for their roles as potential controls on the distributions of toxigenic benthic cyanobacteria in river networks.
View details for DOI 10.1038/s41396-019-0374-3
View details for Web of Science ID 000468529400018
View details for PubMedID 30809011
View details for PubMedCentralID PMC6776057
Megaphages infect Prevotella and variants are widespread in gut microbiomes
2019; 4 (4): 693–700
Bacteriophages (phages) dramatically shape microbial community composition, redistribute nutrients via host lysis and drive evolution through horizontal gene transfer. Despite their importance, much remains to be learned about phages in the human microbiome. We investigated the gut microbiomes of humans from Bangladesh and Tanzania, two African baboon social groups and Danish pigs; many of these microbiomes contain phages belonging to a clade with genomes >540 kilobases in length, the largest yet reported in the human microbiome and close to the maximum size ever reported for phages. We refer to these as Lak phages. CRISPR spacer targeting indicates that Lak phages infect bacteria of the genus Prevotella. We manually curated to completion 15 distinct Lak phage genomes recovered from metagenomes. The genomes display several interesting features, including use of an alternative genetic code, large intergenic regions that are highly expressed and up to 35 putative transfer RNAs, some of which contain enigmatic introns. Different individuals have distinct phage genotypes, and shifts in variant frequencies over consecutive sampling days reflect changes in the relative abundance of phage subpopulations. Recent homologous recombination has resulted in extensive genome admixture of nine baboon Lak phage populations. We infer that Lak phages are widespread in gut communities that contain the Prevotella species, and conclude that megaphages, with fascinating and underexplored biology, may be common but largely overlooked components of human and animal gut microbiomes.
View details for DOI 10.1038/s41564-018-0338-9
View details for Web of Science ID 000461999200022
View details for PubMedID 30692672
View details for PubMedCentralID PMC6784885
Hydrogen-based metabolism as an ancestral trait in lineages sibling to the Cyanobacteria
2019; 10: 463
The evolution of aerobic respiration was likely linked to the origins of oxygenic Cyanobacteria. Close phylogenetic neighbors to Cyanobacteria, such as Margulisbacteria (RBX-1 and ZB3), Saganbacteria (WOR-1), Melainabacteria and Sericytochromatia, may constrain the metabolic platform in which aerobic respiration arose. Here, we analyze genomic sequences and predict that sediment-associated Margulisbacteria have a fermentation-based metabolism featuring a variety of hydrogenases, a streamlined nitrogenase, and electron bifurcating complexes involved in cycling of reducing equivalents. The genomes of ocean-associated Margulisbacteria encode an electron transport chain that may support aerobic growth. Some Saganbacteria genomes encode various hydrogenases, and others may be able to use O2 under certain conditions via a putative novel type of heme copper O2 reductase. Similarly, Melainabacteria have diverse energy metabolisms and are capable of fermentation and aerobic or anaerobic respiration. The ancestor of all these groups may have been an anaerobe in which fermentation and H2 metabolism were central metabolic features. The ability to use O2 as a terminal electron acceptor must have been subsequently acquired by these lineages.
View details for DOI 10.1038/s41467-018-08246-y
View details for Web of Science ID 000456828500001
View details for PubMedID 30692531
View details for PubMedCentralID PMC6349859
The developing premature infant gut microbiome is a major factor shaping the microbiome of neonatal intensive care unit rooms
2018; 6: 112
The neonatal intensive care unit (NICU) contains a unique cohort of patients with underdeveloped immune systems and nascent microbiome communities. Patients often spend several months in the same room, and it has been previously shown that the gut microbiomes of these infants often resemble the microbes found in the NICU. Little is known, however, about the identity, persistence, and absolute abundance of NICU room-associated bacteria over long stretches of time. Here, we couple droplet digital PCR (ddPCR), 16S rRNA gene surveys, and recently published metagenomics data from infant gut samples to infer the extent to which the NICU microbiome is shaped by its room occupants.Over 2832 swabs, wipes, and air samples were collected from 16 private-style NICU rooms housing very low birth weight (< 1500 g), premature (< 31 weeks' gestation) infants. For each infant, room samples were collected daily, Monday through Friday, for 1 month. The first samples from the first infant and the last samples from the last infant were collected 383 days apart. Twenty-two NICU locations spanning room surfaces, hands, electronics, sink basins, and air were collected. Results point to an incredibly simple room community where 5-10 taxa, mostly skin-associated, account for over 50% of the amplicon reads. Biomass estimates reveal four to five orders of magnitude difference between the least to the most dense microbial communities, air, and sink basins, respectively. Biomass trends from bioaerosol samples and petri dish dust collectors suggest occupancy to be a main driver of suspended biological particles within the NICU. Using a machine learning algorithm to classify the origin of room samples, we show that each room has a unique microbial fingerprint. Several important taxa driving this model were dominant gut colonizers of infants housed within each room.Despite regular cleaning of hospital surfaces, bacterial biomass was detectable at varying densities. A room-specific microbiome signature was detected, suggesting microbes seeding NICU surfaces are sourced from reservoirs within the room and that these reservoirs contain actively dividing cells. Collectively, the data suggests that hospitalized infants, in combination with their caregivers, shape the microbiome of NICU rooms.
View details for PubMedID 29925423
Hospitalized Premature Infants Are Colonized by Related Bacterial Strains with Distinct Proteomic Profiles
2018; 9 (2)
During the first weeks of life, microbial colonization of the gut impacts human immune system maturation and other developmental processes. In premature infants, aberrant colonization has been implicated in the onset of necrotizing enterocolitis (NEC), a life-threatening intestinal disease. To study the premature infant gut colonization process, genome-resolved metagenomics was conducted on 343 fecal samples collected during the first 3 months of life from 35 premature infants housed in a neonatal intensive care unit, 14 of whom developed NEC, and metaproteomic measurements were made on 87 samples. Microbial community composition and proteomic profiles remained relatively stable on the time scale of a week, but the proteome was more variable. Although genetically similar organisms colonized many infants, most infants were colonized by distinct strains with metabolic profiles that could be distinguished using metaproteomics. Microbiome composition correlated with infant, antibiotics administration, and NEC diagnosis. Communities were found to cluster into seven primary types, and community type switched within infants, sometimes multiple times. Interestingly, some communities sampled from the same infant at subsequent time points clustered with those of other infants. In some cases, switches preceded onset of NEC; however, no species or community type could account for NEC across the majority of infants. In addition to a correlation of protein abundances with organism replication rates, we found that organism proteomes correlated with overall community composition. Thus, this genome-resolved proteomics study demonstrated that the contributions of individual organisms to microbiome development depend on microbial community context.IMPORTANCE Humans are colonized by microbes at birth, a process that is important to health and development. However, much remains to be known about the fine-scale microbial dynamics that occur during the colonization period. We conducted a genome-resolved study of microbial community composition, replication rates, and proteomes during the first 3 months of life of both healthy and sick premature infants. Infants were found to be colonized by similar microbes, but each underwent a distinct colonization trajectory. Interestingly, related microbes colonizing different infants were found to have distinct proteomes, indicating that microbiome function is not only driven by which organisms are present, but also largely depends on microbial responses to the unique set of physiological conditions in the infant gut.
View details for DOI 10.1128/mBio.00441-18
View details for Web of Science ID 000431279600073
View details for PubMedID 29636439
View details for PubMedCentralID PMC5893878
Machine Learning Leveraging Genomes from Metagenomes Identifies Influential Antibiotic Resistance Genes in the Infant Gut Microbiome.
2018; 3 (1)
Antibiotic resistance in pathogens is extensively studied, and yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leveraged genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We found that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants than C. difficile strains lacking this gene. Organisms with genes for major facilitator superfamily drug efflux pumps have higher replication rates under all conditions, even in the absence of antibiotic therapy. Using a machine learning approach, we identified genes that are predictive of an organism's direction of change in relative abundance after administration of vancomycin and cephalosporin antibiotics. The most accurate results were obtained by reducing annotated genomic data to five principal components classified by boosted decision trees. Among the genes involved in predicting whether an organism increased in relative abundance after treatment are those that encode subclass B2 beta-lactamases and transcriptional regulators of vancomycin resistance. This demonstrates that machine learning applied to genome-resolved metagenomics data can identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration. IMPORTANCE The process of reconstructing genomes from environmental sequence data (genome-resolved metagenomics) allows unique insight into microbial systems. We apply this technique to investigate how the antibiotic resistance genes of bacteria affect their ability to flourish in the gut under various conditions. Our analysis reveals that strain-level selection in formula-fed infants drives enrichment of beta-lactamase genes in the gut resistome. Using genomes from metagenomes, we built a machine learning model to predict how organisms in the gut microbial community respond to perturbation by antibiotics. This may eventually have clinical applications.
View details for DOI 10.1128/mSystems.00123-17
View details for PubMedID 29359195
View details for PubMedCentralID PMC5758725
Strain-resolved analysis of hospital rooms and infants reveals overlap between the human and room microbiome
2017; 8: 1814
Preterm infants exhibit different microbiome colonization patterns relative to full-term infants, and it is speculated that the hospital room environment may contribute to infant microbiome development. Here, we present a genome-resolved metagenomic study of microbial genotypes from the gastrointestinal tracts of infants and from the neonatal intensive care unit (NICU) room environment. Some strains detected in hospitalized infants also occur in sinks and on surfaces, and belong to species such as Staphylococcus epidermidis, Enterococcus faecalis, Pseudomonas aeruginosa, and Klebsiella pneumoniae, which are frequently implicated in nosocomial infection and preterm infant gut colonization. Of the 15 K. pneumoniae strains detected in the study, four were detected in both infant gut and room samples. Time series experiments showed that nearly all strains associated with infant gut colonization can be detected in the room after, and often before, detection in the gut. Thus, we conclude that a component of premature infant gut colonization is the cycle of microbial exchange between the room and the occupant.
View details for DOI 10.1038/s41467-017-02018-w
View details for Web of Science ID 000416293600030
View details for PubMedID 29180750
View details for PubMedCentralID PMC5703836
Measurement of bacterial replication rates in microbial communities
2016; 34 (12): 1256–63
Culture-independent microbiome studies have increased our understanding of the complexity and metabolic potential of microbial communities. However, to understand the contribution of individual microbiome members to community functions, it is important to determine which bacteria are actively replicating. We developed an algorithm, iRep, that uses draft-quality genome sequences and single time-point metagenome sequencing to infer microbial population replication rates. The algorithm calculates an index of replication (iRep) based on the sequencing coverage trend that results from bi-directional genome replication from a single origin of replication. We apply this method to show that microbial replication rates increase after antibiotic administration in human infants. We also show that uncultivated, groundwater-associated, Candidate Phyla Radiation bacteria only rarely replicate quickly in subsurface communities undergoing substantial changes in geochemistry. Our method can be applied to any genome-resolved microbiome study to track organism responses to varying conditions, identify actively growing populations and measure replication rates for use in modeling studies.
View details for DOI 10.1038/nbt.3704
View details for Web of Science ID 000390185300016
View details for PubMedID 27819664
View details for PubMedCentralID PMC5538567
Function, expression, specificity, diversity and incompatibility of actinobacteriophage parABS systems
2016; 101 (4): 625–44
More than 180 individual phages infecting hosts in the phylum Actinobacteria have been sequenced and grouped into Cluster A because of their similar overall nucleotide sequences and genome architectures. These Cluster A phages are either temperate or derivatives of temperate parents, and most have an integration cassette near the centre of the genome containing an integrase gene and attP. However, about 20% of the phages lack an integration cassette, which is replaced by a 1.4 kbp segment with predicted partitioning functions, including plasmid-like parA and parB genes. Phage RedRock forms stable lysogens in Mycobacterium smegmatis in which the prophage replicates at 2.4 copies/chromosome and the partitioning system confers prophage maintenance. The parAB genes are expressed upon RedRock infection of M. smegmatis, but are downregulated once lysogeny is established by binding of RedRock ParB to parS-L, one of two centromere-like sites flanking the parAB genes. The RedRock parS-L and parS-R sites are composed of eight directly repeated copies of an 8 bp motif that is recognized by ParB. The actinobacteriophage parABS cassettes span considerable sequence diversity and specificity, providing a suite of tools for use in mycobacterial genetics.
View details for DOI 10.1111/mmi.13414
View details for Web of Science ID 000382541900007
View details for PubMedID 27146086
View details for PubMedCentralID PMC4998052
Cluster M Mycobacteriophages Bongo, PegLeg, and Rey with Unusually Large Repertoires of tRNA Isotypes
JOURNAL OF VIROLOGY
2014; 88 (5): 2461–80
Genomic analysis of a large set of phages infecting the common host Mycobacterium smegmatis mc(2)155 shows that they span considerable genetic diversity. There are more than 20 distinct types that lack nucleotide similarity with each other, and there is considerable diversity within most of the groups. Three newly isolated temperate mycobacteriophages, Bongo, PegLeg, and Rey, constitute a new group (cluster M), with the closely related phages Bongo and PegLeg forming subcluster M1 and the more distantly related Rey forming subcluster M2. The cluster M mycobacteriophages have siphoviral morphologies with unusually long tails, are homoimmune, and have larger than average genomes (80.2 to 83.7 kbp). They exhibit a variety of features not previously described in other mycobacteriophages, including noncanonical genome architectures and several unusual sets of conserved repeated sequences suggesting novel regulatory systems for both transcription and translation. In addition to containing transfer-messenger RNA and RtcB-like RNA ligase genes, their genomes encode 21 to 24 tRNA genes encompassing complete or nearly complete sets of isotypes. We predict that these tRNAs are used in late lytic growth, likely compensating for the degradation or inadequacy of host tRNAs. They may represent a complete set of tRNAs necessary for late lytic growth, especially when taken together with the apparent lack of codons in the same late genes that correspond to tRNAs that the genomes of the phages do not obviously encode.The bacteriophage population is vast, dynamic, and old and plays a central role in bacterial pathogenicity. We know surprisingly little about the genetic diversity of the phage population, although metagenomic and phage genome sequencing indicates that it is great. Probing the depth of genetic diversity of phages of a common host, Mycobacterium smegmatis, provides a higher resolution of the phage population and how it has evolved. Three new phages constituting a new cluster M further expand the diversity of the mycobacteriophages and introduce novel features. As such, they provide insights into phage genome architecture, virion structure, and gene regulation at the transcriptional and translational levels.
View details for DOI 10.1128/JVI.03363-13
View details for Web of Science ID 000331131700010
View details for PubMedID 24335314
View details for PubMedCentralID PMC3958112