Dmitri Petrov

Michelle and Kevin Douglas Professor in the School of Humanities and Sciences

Biology

Academic Appointments

Professor, Biology
Member, Bio-X
Member, Maternal & Child Health Research Institute (MCHRI)
Member, Stanford Cancer Institute
Affiliate, Stanford Woods Institute for the Environment

Current Research and Scholarly Interests

Evolution of genomes and population genomics of adaptation and variation

2025-26 Courses

Independent Studies (10)
- Biomedical Informatics Teaching Methods
  BMDS 295 (Aut, Win, Spr)
- Directed Investigation
  BIOE 392 (Sum)
- Directed Reading
  BMDS 299 (Aut, Win, Spr)
- Directed Reading in Biology
  BIO 198 (Aut, Win, Spr, Sum)
- Graduate Research
  BIO 300 (Aut, Win, Spr, Sum)
- Graduate Research
  BIOPHYS 300 (Aut, Win, Spr, Sum)
- Medical Scholars Research
  BMDS 370 (Aut, Win, Spr)
- Out-of-Department Undergraduate Research
  BIO 199X (Aut, Win, Spr)
- Teaching Practicum in Biology
  BIO 290 (Aut, Win, Spr)
- Undergraduate Research
  BIO 199 (Aut, Win, Spr, Sum)
Prior Year Courses
2024-25 Courses
- Evolution
  BIO 85 (Win)
- Fundamentals of Molecular Evolution
  BIO 113, BIO 244 (Win)
2023-24 Courses
- Evolution
  BIO 85 (Win)
- Fundamentals of Molecular Evolution
  BIO 113, BIO 244 (Win)
2022-23 Courses
- Evolution
  BIO 85 (Win)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Tristram Dodge, Egor Lappo, Jay Yeam
Postdoctoral Faculty Sponsor
Stefan Bassler, Mark Bitter, Shreyas Gopalakrishnan, Alexandra Khristich, Jean Vila, Elisa Visher, Haiqing Xu
Doctoral Dissertation Advisor (AC)
Tatiana Bellagio, Sofia Beskid, Hannah Gellert, James Hemker, Nicholas Hoeffner, Jess Rhodes, Karen Shih, Alan Su, Sophie Walton
Doctoral Dissertation Co-Advisor (AC)
Olivia Ghosh, Victoria Grant, Christopher Kirby, Anastasia Lyulina, Shaili Mathur

Graduate and Fellowship Programs

Biology (School of Humanities and Sciences) (Phd Program)
Biomedical Data Science (Phd Program)

All Publications

Tumor suppressor genotype influences the extent and mode of immunosurveillance in lung cancer. Nature communications Adler, K. M., Xu, H., Gladstein, A. C., Irizarry-Negron, V. M., Robertson, M. R., Doerig, K. R., Petrov, D. A., Winslow, M. M., Feldser, D. M. 2026

Abstract

The impact of cancer driving mutations on immunosurveillance throughout tumor development remains poorly understood. To better understand the contribution of tumor genotype to immunosurveillance, we generated and validated lentiviral-based vectors that create increasingly immunogenic neoantigens. This vector system is compatible with autochthonous Cre-regulated cancer models, CRISPR/Cas9-mediated somatic genome editing, and tumor barcoding. Here, we show that in the context of oncogenic KRAS-driven lung cancer and strong neoantigen expression, tumor suppressor genotype dictates the degree of immune cell recruitment, positive selection of tumors with neoantigen silencing, and tumor outgrowth. By quantifying the impact of 11 commonly inactivated tumor suppressor genes on tumor growth across neoantigenic contexts, we show that the growth-promoting effects of tumor suppressor gene inactivation correlate with increasing sensitivity to immunosurveillance. Importantly, some genotypes also dramatically changed sensitivity to immunosurveillance independently of their growth-promoting effects. We propose a model of immunoediting in which tumor suppressor gene inactivation works in tandem with neoantigen expression to shape tumor immunosurveillance and immunoediting such that the same neoantigens uniquely modulate tumor immunoediting depending on the genetic context.

View details for DOI 10.1038/s41467-026-74023-x

View details for PubMedID 42297823
Genetic Diversity and Population Structure of the Black-Footed Cat: Insights into Felis's Deadliest Predator. bioRxiv : the preprint server for biology Grant, V. B., Hunnicutt, K., Schroeder, M., Küsters, M., Oppenheimer, J., Banerjee, S., Baczenas, J. J., Petrov, D., Bishop, J. M., Lamberski, N., Wilson, B., Sliwa, A., Shapiro, B., Solari, K. A., Aguillón, S. M., Armstrong, E. E., Schumer, M. 2026

Abstract

Black-footed cats (Felis nigripes) are one of Africa's least studied felines. The population dynamics and demographic history of this solitary species have not been well-described. Reports of ongoing decline of present-day populations resulted in the IUCN Red List categorizing the species as vulnerable to extinction. As populations decline and become isolated from each other, they become susceptible to strong genetic drift and inbreeding, which can lead to the accumulation of deleterious alleles and increased extinction risk. However, the IUCN cited data deficiencies across the species range as a limitation in this categorization for black-footed cats. In cases where ecological surveys are lacking, range-wide population genomic surveys can improve our understanding of population dynamics.In the first genomic study of free-roaming individuals, we sequenced whole genomes of black-footed cats (N=44) from across their distribution. To do so, we incorporated whole genome sequences generated from both modern biological samples and century-old museum specimens. We assembled a highly contiguous reference genome using a combination of PacBio HiFi data and publicly available Hi-C data and investigated the demographic history, population structure, and genetic diversity of wild black-footed cats. We found evidence of historical effective population sizes of ~11,500 individuals, which is lower than estimates reported in other felid species. Consistent with modest historical population sizes, we found that present-day genome-wide diversity was low (π ≈ 0.0004). However, despite low genetic diversity, we find that black-footed cat genomes do not harbor long runs of homozygosity. Simulation results indicate that low present-day genetic diversity may simply result from modest historical population size. However, other analyses point to evidence of a population contraction in the last 50 generations, which could contribute to future genomic erosion. We also compared genomic variation in populations across the range to evaluate patterns of population structure, finding evidence of higher genetic similarity between individuals in closer geographic proximity.Overall, these results provide range-wide information about the demographic history and present-day genetic diversity of an understudied species. Together with analyses of population structure, we speculate that there may be greater connectivity between populations of black-footed cats than previously assumed. Our study underscores the utility of genomic data in providing insights into population dynamics for better conservation management.

View details for DOI 10.64898/2026.05.29.728895

View details for PubMedID 42282534

View details for PubMedCentralID PMC13251930
Tumor suppressor gene inactivation shapes the landscape of EGFR-mutant lung adenocarcinoma progression with therapeutic implications Carmo, M., Martin, M., Rosen, M., Blair, L., Tribe, A., Maemura, K., Foggetti, G., Exposito, F., Ugur, Z., Sebastian, L., Tran, V., Lai, I., Katti, A., Winters, I., Petrov, D. A., Floc'h, N., Winslow, M. M., Politi, K. A. AMER ASSOC CANCER RESEARCH. 2026

View details for DOI 10.1158/1538-7445.AM2026-6058

View details for Web of Science ID 001734104900036
Inactivation of CDKN2AARF Promotes p53-Independent Remodeling of the PDAC Tumor Microenvironment. Cancer research Ferreira, S., Flowers, B. M., Choi, W. Y., Farina-Morillas, M., Gatto, A., Bhattacharyya, S., Boross, G., Hassan, G., Mulligan, A. S., Vogel, H., Wood, L. D., Weaver, V. M., Winslow, M. M., Petrov, D. A., Sherman, M. H., Choi, H. Y., Hayes, D. N., Aguirre, A. J., Seoane, J. A., Attardi, L. D. 2026

Abstract

The CDKN2A locus, which is frequently deleted in pancreatic ductal adenocarcinoma (PDAC), encodes two tumor suppressors, ARF and INK4A, that may influence tumorigenesis through distinct mechanisms. Distinguishing their individual contributions to cancer could help improve the understanding of PDAC pathogenesis and potentially uncover targetable vulnerabilities. Moreover, while ARF is known to enhance p53 function, defining its p53-independent activities could elucidate processes that drive PDAC development. Here, we sought to understand ARF function in PDAC suppression. Expression and mutational patterns in human TCGA data indicated that CDKN2AARF and CDKN2AINK4A are commonly both affected by point mutations and/or deletions, suggesting that their combined inactivation contributes to PDAC development. In genetically engineered mouse models (GEMMs), Arf inactivation accelerated KRASG12D-driven PDAC development, both in the presence and absence of Trp53, demonstrating that ARF is a PDAC suppressor and can act in a p53-independent manner. Transcriptomic analyses of PDACs supported a p53-independent role for ARF, with ARF deficiency promoting extracellular matrix, collagen synthesis/assembly, and epithelial-mesenchymal transition gene expression programs. Accordingly, ARF-deficient PDACs displayed extensive remodeling of the tumor microenvironment (TME), associated with collagen deposition, increased tissue stiffness, and higher fibroblast content - hallmarks of aggressive and treatment-resistant PDAC stroma. Together, this study shows how ARF deficiency associated with CDKN2A inactivation sculpts the PDAC TME in a p53-independent fashion. Given the central role of the TME in PDAC progression and therapeutic resistance, these findings may provide insight critical for improving therapeutic interventions for PDAC.

View details for DOI 10.1158/0008-5472.CAN-25-1969

View details for PubMedID 41811433
Manual validation finds ultra-long-read sequencing best enables faithful, population-level structural variant calling in Drosophila melanogaster euchromatin with nanopore. G3 (Bethesda, Md.) Hemker, J. A., Gellert, H. R., Smiley-Rhodes, J. A., Kim, B. Y., Petrov, D. A. 2026

Abstract

The increasing accessibility of long-read sequencing and the rapid development of automated variant callers are promoting the generation of population-level structural variation data. However, the effect of the length of long-reads on automated variant callers is not well understood, especially for non-human species. Here we show that only ultra-long long-reads, with read N50s greater than 50 kb, are capable of accurately calling structural variants of any size in Drosophila melanogaster euchromatin. We used Oxford Nanopore Technologies to long-read sequence eight, inbred D. melanogaster strains to extremely high coverage (mean 238×), and we then downsampled the reads to create read pools of different length distributions. We assembled genomes from these different read-length pools and used both read-based and assembly-based structural variant callers to call variants in each strain before merging the calls into population-level datasets. We manually validated over 2,300 putative structural variants to assess the precision of the variant calls across the different read-length distributions and to determine the cause and rates of false positive errors. We found that more than half of all structural-variant-calling errors stem from misaligned reads that contain mobile elements or are located in repetitive and complex regions. Overall, our results show that long reads should be at least three times longer than the largest transposable elements found in the genome in order to accurately call structural variants at the population level.

View details for DOI 10.1093/g3journal/jkag043

View details for PubMedID 41806374
Genotype-fitness mapping of adaptive mutants reveals shifting low-dimensional structure across divergent environments. PLoS biology Ghosh, O. M., Kinsler, G., Good, B. H., Petrov, D. A. 2026; 24 (3): e3003618

Abstract

A central goal in evolutionary biology is to predict the effect of a genetic mutation on fitness. This is a major challenge because it requires knowledge of both the phenotypic effects of a mutation and their importance in an arbitrary environment, which are high-dimensional quantities and difficult to guess a priori. Here, we address this problem by taking a top-down, data-driven approach to infer the mapping between genotypes, latent phenotypes, and fitness. We measure the fitness effects of a large collection of adaptive yeast mutants in many lab environments, from which we build low-dimensional, linear fitness landscapes. We find that these models are highly predictive of fitness variation for thousands of adaptive mutants, both in environments similar to where they evolved and also in divergent environments. This implies that the underlying genotype-phenotype-fitness maps for these adaptive mutants tend to be broadly low-dimensional. We further demonstrate that these maps only partially overlap across divergent environments, suggesting that the phenotypic determinants of fitness shift with the environment but remain low-dimensional. These results combine to emphasize the importance of environmental context in evolution, and suggest that top-down, low-dimensional fitness landscapes pave the way for evolutionary prediction.

View details for DOI 10.1371/journal.pbio.3003618

View details for PubMedID 41886417
Integrating noninvasive genetics and SECR to estimate snow leopard population in Pakistan BIOLOGICAL CONSERVATION Ahmad, S., Solari, K. A., Durbach, I., Ali, H., Hameed, S., Din, J., Asif, M., Petrov, D. A., Nawaz, M. 2026; 315

View details for DOI 10.1016/j.biocon.2026.111709

View details for Web of Science ID 001678837800001
EMBO Press co-evolves with molecular ecology and evolutionary biology. The EMBO journal Moran, Y., Coelho, S. M., Ettema, T. J., Feschotte, C., Kaltenpoth, M., Khila, A., Laine, A. L., Liow, L. H., Petrov, D., Ramakrishnan, U., Sarkies, P., Srivastava, M., Voolstra, C., Pulverer, B. 2026

View details for DOI 10.1038/s44318-026-00723-1

View details for PubMedID 41708875

View details for PubMedCentralID 3196472
Comparative gene annotation and orthology assignments across 301 species of Drosophilidae. PLoS biology Dhakad, P., Kim, B. Y., Petrov, D. A., Obbard, D. J. 2026; 24 (2): e3003663

Abstract

High-quality genome annotations are essential if we are to address central questions in comparative genomics, such as the origin of new genes, the drivers of genome size variation, and the evolutionary forces shaping gene content and structure. Here, we present protein-coding gene annotations for 301 species of the family Drosophilidae, generated using the Comparative Annotation Toolkit (CAT) and BRAKER3, and incorporating available RNA-seq and protein evidence. We take a comparative phylogenetic approach to annotation, with the aim of improving consistency and accuracy, and to generate a robust set of gene annotations and orthology assignments. We analyze our annotations using a phylogenetic mixed-model approach and find that gene number and CDS length exhibit moderate phylogenetic heritability (40% and 9.7%, respectively). For comparison, we also present analyses using a subset of the 215 highest quality genomes, although the findings were not markedly different. Our work suggests that while evolutionary history contributes to variation in these traits, species-specific factors-including assembly error-play a substantial role in shaping observed differences. To illustrate the utility of our annotations for comparative analyses, we investigate codon usage bias and amino acid composition across Drosophilidae. We find that codon usage is correlated with overall GC content and evolves slowly, but that it is also strongly shaped by selection-such that, in general, species with the strongest selection on synonymous codon usage show the lowest GC bias in third codon positions. This comparative annotation dataset forms part of an ongoing collaborative project to sequence and annotate all species of Drosophilidae, with data and annotations being made rapidly and freely available on an ongoing basis. We hope that this effort will serve as a foundation for studies in evolutionary and functional genomics and comparative biology across Drosophilidae.

View details for DOI 10.1371/journal.pbio.3003663

View details for PubMedID 41706752
Comparative gene annotation of 301 species of Drosophilidae. bioRxiv : the preprint server for biology Dhakad, P., Kim, B., Petrov, D., Obbard, D. J. 2026

Abstract

High-quality genome annotations are essential if we are to address central questions in comparative genomics, such as the origin of new genes, the drivers of genome size variation, and the evolutionary forces shaping gene content and structure. Here, we present protein-coding gene annotations for 301 species of the family Drosophilidae, generated using the Comparative Annotation Toolkit (CAT) and BRAKER3, and incorporating available RNA-seq and protein evidence. We take a comparative phylogenetic approach to annotation, with the aim of improving consistency and accuracy, and to generate a robust set of gene annotations and orthology assignments. We analyze our annotations using a phylogenetic mixed-model approach and find that gene number and CDS length exhibit moderate phylogenetic heritability (40% and 9.7%, respectively). For comparison, we also present analyses using a subset of the 215 highest quality genomes, although the findings were not markedly different. Our work suggests that while evolutionary history contributes to variation in these traits, species-specific factors-including assembly error-play a substantial role in shaping observed differences. To illustrate the utility of our annotations for comparative analyses, we investigate codon usage bias and amino acid composition across Drosophilidae. We find that codon usage is correlated with overall GC content and evolves slowly, but that it is also strongly shaped by selection-such that, in general, species with the strongest selection on synonymous codon usage show the lowest GC bias in third codon positions. This comparative annotation dataset forms part of an on-going collaborative project to sequence and annotate all species of Drosophilidae, with data and annotations being made rapidly and freely available on an on-going basis. We hope that this effort will serve as a foundation for studies in evolutionary and functional genomics and comparative biology across Drosophilidae.

View details for DOI 10.1101/2025.04.14.648771

View details for PubMedID 41542448

View details for PubMedCentralID PMC12803070
Variation in the resource environment affects patterns of seasonal adaptation at phenotypic and genomic levels in Drosophila melanogaster. Evolution letters Beltz, J. K., Bitter, M. C., Goldfischer, A., Petrov, D. A., Schmidt, P. 2025; 9 (6): 663-674

Abstract

Natural populations often experience heterogeneity in the quality and abundance of environmentally acquired resources across both space and time, and this variation can influence population demographics and evolutionary dynamics. In this study, we directly manipulated diet in replicate populations of Drosophila melanogaster cultured in experimental mesocosms in the field. We found no significant effect of resource variation on estimates of adult census size. Resource variation altered patterns of phenotypic and genomic evolution across replicate populations; however, we find that this effect is secondary to selection driven by the fluctuating seasonal environment. Seasonal adaptation was observed for all traits assayed and elicited genome-wide signatures of selection. In contrast, adaptation to the resource environment was trait-specific and exhibited an oligogenic architecture. This illustrates the capacity of populations to adapt to a specific axis of variation (the resource environment) without hindering the adaptive response to seasonal change. This, in turn, suggests that resource variation may be an important force driving fluctuating selection across natural populations, ultimately contributing to the maintenance of genetic and phenotypic variation.

View details for DOI 10.1093/evlett/qraf031

View details for PubMedID 41357151

View details for PubMedCentralID PMC12676458
Community coalescence reveals strong selection and coexistence within species in complex microbial communities. bioRxiv : the preprint server for biology Walton, S. J., Xu, Q., Sharma, R., Gellert, H. R., Yeh, C. F., Cremer, J., Xue, K. S., Petrov, D. A., Good, B. H. 2025

Abstract

Complex microbial ecosystems harbor extensive intra-species diversity, but the fitness consequences of this genetic variation are poorly understood in community settings. Here we address this question by competing in vitro gut communities derived from different human donors, revealing the emergent fitness differences between conspecific strains as they competed within larger communities. Most pairs of strains experienced strong and context-dependent selection, even when their parent communities were originally selected in the same nutrient environment. However, these fitness differences typically attenuated over time due to biotic interactions within the community, leading to extended coexistence within many species, and competitive exclusion in others. These results support the view that conspecific strains can fulfill distinct ecological roles when competing within a diverse community, even when their genomic diversity exhibits the hallmarks of a single biological species.

View details for DOI 10.1101/2025.11.06.687011

View details for PubMedID 41278818

View details for PubMedCentralID PMC12637602
Aging represses oncogenic KRAS-driven lung tumorigenesis and alters tumor suppression. Nature aging Shuldiner, E. G., Karmakar, S., Tsai, M. K., Hebert, J. D., Tang, Y. J., Andrejka, L., Robertson, M. R., Wang, M., Detrick, C. R., Cai, H., Tang, R., Kunder, C. A., Feldser, D. M., Petrov, D. A., Winslow, M. M. 2025

Abstract

Most cancers are diagnosed in people over 60 years of age, but little is known about how age impacts tumorigenesis. While aging is accompanied by mutation accumulation (widely understood to contribute to cancer risk) it is associated with numerous other cellular and molecular changes likely to impact tumorigenesis. Moreover, cancer incidence decreases in the oldest part of the population, suggesting that very old age may reduce carcinogenesis. Here we show that aging represses oncogenic KRAS-driven tumor initiation and growth in genetically engineered mouse models of human lung cancer. Moreover, aging dampens the impact of inactivating many tumor suppressor genes with the impact of inactivating PTEN, a negative regulator of the PI3K-AKT pathway, weakened disproportionately. Single-cell transcriptomic analysis revealed that neoplastic cells in aged mice retain age-related transcriptomic changes, showing that the impact of age persists through oncogenic transformation. Furthermore, the consequences of PTEN inactivation were strikingly age-dependent, with PTEN deficiency reducing signatures of aging in cancer cells and the tumor microenvironment. Our findings underscore the interconnectedness of the pathways involved in aging and tumorigenesis and document tumor-suppressive effects of aging that may contribute to the deceleration in cancer incidence with age.

View details for DOI 10.1038/s43587-025-00986-z

View details for PubMedID 41188600

View details for PubMedCentralID 8003441
Exceedingly low genetic diversity in snow leopards due to persistently small population size. Proceedings of the National Academy of Sciences of the United States of America Solari, K. A., Morgan, S., Poyarkov, A. D., Weckworth, B., Samelius, G., Sharma, K., Ostrowski, S., Ramakrishnan, U., Kubanychbekov, Z., Kachel, S., Johansson, Ö., Lkhagvajav, P., Hemmingmoore, H., Alexandrov, D. Y., Bayaraa, M., Grachev, A., Korablev, M. P., Hernandez-Blanco, J. A., Munkhtsog, B., Rosenbaum, B., Rozhnov, V. V., Madad Rajabi, A., Noori, H., Suryawanshi, K. R., Armstrong, E. E., Petrov, D. A. 2025; 122 (41): e2502584122

Abstract

Snow leopards (Panthera uncia) serve as an umbrella species whose conservation benefits their high-elevation Asian habitat. Their numbers are believed to be in decline due to numerous anthropogenic threats; however, their conservation is hindered by numerous knowledge gaps. In particular, the dearth of genetic data, unique among all big cat species, hinders a full understanding of their population structure, historical population size, and current levels of genetic diversity. Here, we use whole-genome sequencing data for 41 snow leopards (37 newly sequenced) to offer insights into these unresolved aspects of snow leopard biology. Among our samples, we find evidence of a primary genetic divide between the northern and southern part of the range around the Dzungarian Basin-as previously identified using landscape models and fecal microsatellite markers-and a secondary divide south of Kyrgyzstan around the Taklamakan Desert. Most noteworthy, we find that snow leopards have the lowest genetic diversity of any big cat species, likely due to a persistently small population size throughout their evolutionary history rather than recent inbreeding. We also find that snow leopards have significantly less highly deleterious homozygous load compared to numerous Panthera species, suggesting effective purging during their evolutionary history at small population sizes. Without a large population size or ample standing genetic variation to help buffer them from any forthcoming anthropogenic challenges, snow leopard persistence may be more tenuous than currently appreciated.

View details for DOI 10.1073/pnas.2502584122

View details for PubMedID 41055990
An empirical long-term competition among natural yeast isolates reveals that short-term fitness largely but not entirely predicts long-term outcomes. bioRxiv : the preprint server for biology Khristich, A. N., Ghosh, O. M., Vila, J. C., Mathur, S., Dutta, A., Garin, M., Schacherer, J., Petrov, D. A. 2025

Abstract

In this study, we investigate the relative contribution of initial fitness to the long-term success of a genotype competing in a naturally diverse population. Specifically, we compete over 300 genetically barcoded S. cerevisiae isolates in a pooled setting for over 700 generations. We found that the strains that remain at detectable frequency until the end of the competition uniformly come from the top 95th percentile in the initial fitness values, making initial fitness the most significant predictor of long-term success. However, we occasionally see heterogeneity in the competition outcomes, which suggests a role of stochastic adaptation, clonal interference, and possibly frequency-dependent changes in strains' fitness. We demonstrate that the "finalists" of our competition change on the genetic level, and that the spectrum of de novo mutations depends both on the strains' genotype and environment. Finally, we show that gene targets of the novel mutations are specific to the combination of strain identity and environment, even among the genetically similar strains and environments that select for the same strains in the beginning of the competition.

View details for DOI 10.1101/2025.10.09.681448

View details for PubMedID 41279677

View details for PubMedCentralID PMC12632622
Inactivation of CDKN2A ARF promotes p53-independent remodeling of the PDAC tumor microenvironment Ferreira, S., Flowers, B. M., Choi, W., Farina-Morillas, M., Gatto, A., Bhattacharyya, S., Boross, G., Hassan, G., Mulligan, A. S., Vogel, H., Wood, L. D., Weaver, V. M., Winslow, M. M., Petrov, D., Sherman, M. H., Choi, H., Hayes, D., Aguirre, A. J., Seoane, J. A., Attardi, L. D. AMER ASSOC CANCER RESEARCH. 2025

View details for DOI 10.1158/1538-7445.PANCREATIC25-A030

View details for Web of Science ID 001588168500030
Functional mapping of epigenomic regulators uncovers coordinated tumor suppression by the HBO1 and MLL1 complexes. Cancer discovery Tang, Y. J., Xu, H., Hughes, N. W., Ruiz, P., Kim, S. H., Shuldiner, E. G., Lopez, S. S., Hebert, J. D., Karmakar, S., Andrejka, L., Dolcen, D. N., Boross, G., Chu, P., Kunder, C. A., Detrick, C., Pierce, S. E., Ashkin, E. L., Greenleaf, W. J., Voss, A. K., Thomas, T., van de Rijn, M., Petrov, D. A., Winslow, M. M. 2025

Abstract

Epigenomic dysregulation is widespread in cancer. However, the specific epigenomic regulators and the processes they control to drive cancer phenotypes are poorly understood. We employed a novel high-throughput in vivo method to perform iterative functional screens of >250 epigenomic regulators within autochthonous oncogenic KRAS-driven lung tumors. We identified many previously unappreciated epigenomic tumor-suppressor and tumor-dependency genes. We show that a specific HBO1 complex and MLL1 complex are robust tumor suppressors in lung adenocarcinoma. Histone modifications generated by HBO1 complex are frequently reduced in human lung adenocarcinomas and are associated with worse clinical features. HBO1 and MLL1 complexes co-occupy shared genomic regions, impact chromatin accessibility, and control the expression of canonical tumor suppressor genes and lineage fidelity. The HBO1 complex is epistatic with the MLL1 complex and other tumor suppressor genes in lung adenocarcinoma development. Collectively, these results provide a phenotypic roadmap of epigenomic regulators in lung tumorigenesis in vivo.

View details for DOI 10.1158/2159-8290.CD-24-1565

View details for PubMedID 40997327
EML4-ALK variant-specific genetic interactions shape lung tumorigenesis. Cancer discovery Diaz-Jimenez, A., Shuldiner, E. G., Somogyi, K., Shih, K., Gonzalez-Velasco, O., Najajreh, M., Kim, S., Akkas, F., Murray, C. W., Andrejka, L., Tsai, M. K., Brors, B., Hofmann, I., Sivakumar, S., Sisoudiya, S. D., Sokol, E. S., Cai, H., Petrov, D. A., Winslow, M. M., Sotillo, R. 2025

Abstract

Diverse fusions of EML4 and ALK are oncogenic drivers in lung adenocarcinomas. EML4-ALK variants have distinct breakpoints within EML4, but their functional differences remain poorly understood. Here, we use somatic genome editing to generate autochthonous mouse models of EML4-ALK-driven lung tumors and show that V3 is more oncogenic than V1. By employing multiplexed genome editing and quantifying the effects of 29 putative tumor suppressor genes on V1- and V3-driven lung cancer growth, we show that many tumor suppressor genes have variant-specific effects on tumorigenesis. Pharmacogenomic analyses further suggest that tumor genotype can influence therapeutic responses. Analysis of human EML4-ALK-positive lung cancers also identified variant-specific differences in their genomic landscapes. These findings suggest that EML4-ALK variants behave more like distinct oncogenes rather than a uniform entity and highlight the dramatic impact of oncogenic fusion partner proteins and coincident tumor suppressor gene alterations on the biology of oncogenic fusion-driven cancers.

View details for DOI 10.1158/2159-8290.CD-24-1417

View details for PubMedID 40986428
Variation in the resource environment affects patterns of seasonal adaptation at phenotypic and genomic levels in Drosophila melanogaster EVOLUTION LETTERS Beltz, J. K., Bitter, M., Goldfischer, A., Petrov, D. A., Schmidt, P. 2025

View details for DOI 10.1093/evlett/qraf031

View details for Web of Science ID 001575516500001
Beneficial reversal of dominance maintains a large-effect resistance polymorphism under fluctuating insecticide selection. Nature ecology & evolution Karageorgi, M., Lyulina, A. S., Bitter, M. C., Lappo, E., Greenblum, S. I., Mouza, Z. K., Tran, C. T., Huynh, A. V., Oken, H., Schmidt, P., Petrov, D. A. 2025

Abstract

Large-effect standing genetic variation is commonly found in natural populations and must be maintained in the face of directional natural selection. Theory suggests that under fluctuating selective pressures, beneficial reversal of dominance-where alleles are dominant when beneficial and recessive when deleterious-can strongly stabilize large-effect polymorphisms. However, empirical evidence for this mechanism remains limited because testing requires measurements of selection and dominance in fitness in natural conditions. Here we investigate large-effect fitness polymorphisms at the Ace locus of Drosophila melanogaster that confer insecticide resistance and persist at intermediate frequencies worldwide. By combining laboratory and large-scale field mesocosm experiments with insecticide manipulation and mathematical modelling, we show that the benefits of the resistant Ace alleles are dominant in pesticide-rich environments, while their fitness costs are recessive in pesticide-free environments. We further show that temporally fluctuating insecticide selection generates chromosome-scale genomic perturbations at sites linked to the resistant Ace alleles. Overall, our results suggest that beneficial reversal of dominance under temporally fluctuating selection might plausibly contribute to the maintenance of functional genetic variation and, by stabilizing large frequency fluctuations, impact long-range patterns of genomic variation.

View details for DOI 10.1038/s41559-025-02853-x

View details for PubMedID 40954284

View details for PubMedCentralID 4222749
Drosophila melanogaster pigmentation demonstrates adaptive phenotypic parallelism over multiple spatiotemporal scales. Evolution letters Berardi, S., Rhodes, J. A., Berner, M. C., Greenblum, S. I., Bitter, M. C., Behrman, E. L., Betancourt, N. J., Bergland, A. O., Petrov, D. A., Rajpurohit, S., Schmidt, P. 2025; 9 (4): 408-420

Abstract

Populations are capable of responding to environmental change over ecological timescales via adaptive tracking. However, the translation from patterns of allele frequency change to rapid adaptation of complex traits remains unresolved. We used abdominal pigmentation in Drosophila melanogaster as a model phenotype to address the nature, genetic architecture, and repeatability of rapid adaptation in the field. We show that D. melanogaster pigmentation evolves as a highly parallel and deterministic response to shared environmental variation across latitude and season in natural North American populations. We then experimentally evolved replicate, genetically diverse fly populations in field mesocosms to remove any confounding effects of demography and/or cryptic structure that may drive patterns in wild populations; we show that pigmentation rapidly responds, in parallel, in fewer than 15 generations. Thus, pigmentation evolves concordantly in response to spatial and temporal climatic axes. We next examined whether phenotypic differentiation was associated with allele frequency change at loci with established links to genetic variance in pigmentation in natural populations. We found that across all spatial and temporal scales, phenotypic patterns were associated with variation at pigmentation-related loci, and the sets of genes we identified at each scale were largely nonoverlapping. Therefore, our findings suggest that parallel phenotypic evolution is associated with distinct components of the polygenic architecture shifting across each environmental axis to produce redundant adaptive patterns.

View details for DOI 10.1093/evlett/qraf008

View details for PubMedID 40980706

View details for PubMedCentralID PMC12448190
Footprints of Worldwide Adaptation in Structured Populations of Drosophila melanogaster Through the Expanded DEST 2.0 Genomic Resource. Molecular biology and evolution Nunez, J. C., Coronado-Zamora, M., Gautier, M., Kapun, M., Steindl, S., Ometto, L., Hoedjes, K., Beets, J., Wiberg, R. A., Mazzeo, G. R., Bass, D. J., Radionov, D., Kozeretska, I., Zinchenko, M., Protsenko, O., Serga, S. V., Amor-Jimenez, C., Casillas, S., Sánchez-Gracia, A., Patenkovic, A., Glaser-Schmitt, A., Barbadilla, A., Buendia-Ruíz, A. J., Bertelli, A. C., Kiss, B., Önder, B. S., Matrín, B. R., Wertheim, B., Deschamps, C., Arboleda-Bustos, C. E., Tinedo, C., Feller, C., Schlötterer, C., Lawler, C., Fricke, C., Vieira, C. P., Vieira, C., Obbard, D. J., Orengo, D. J., Vela, D., Amat, E., Loreto, E., Kerdaffrec, E., Mitchell, E. D., Puerma, E., Staubach, F., Camus, M. F., Colinet, H., Hrcek, J., Sørensen, J. G., Abbott, J., Torro, J., Parsch, J., Vieira, J., Olmo, J. L., Khfif, K., Wojciechowski, K., Madi-Ravazzi, L., Kankare, M., Schou, M. F., Ladoukakis, E. D., Gómez-Julián, M. J., Espinosa-Jimenez, M. L., Garcia Guerreiro, M. P., Parakatselaki, M. E., Savic Veselinovic, M., Tanaskovic, M., Stamenkovic-Radak, M., Paris, M., Pascual, M., Ritchie, M. G., Rera, M., Jelić, M., Ansari, M. H., Rakic, M., Merenciano, M., Hernandes, N., Gora, N., Rode, N., Rota-Stabelli, O., Sepulveda, P., Gibert, P., Carazo, P., Kohlmeier, P., Erickson, P. A., Vitalis, R., Torres, J. R., Guirao-Rico, S., Ramos-Onsins, S. E., Castillo, S., Paulo, T. F., Tyukmaeva, V., Alonso, Z., Alatortsev, V. E., Pasyukova, E., Mukha, D. V., Petrov, D. A., Schmidt, P., Flatt, T., Bergland, A. O., Gonzalez, J. 2025; 42 (8)

Abstract

Large-scale genomic resources can place genetic variation into an ecologically informed context. To advance our understanding of the population genetics of the fruit fly Drosophila melanogaster, we present an expanded release of the community-generated population genomics resource Drosophila Evolution over Space and Time (DEST 2.0; https://dest.bio/). This release includes 530 high-quality pooled libraries from flies collected across six continents over more than a decade (2009 to 2021), most at multiple time points per year; 211 of these libraries are sequenced and shared here for the first time. We used this enhanced resource to elucidate several aspects of the species' demographic history and identify novel signs of adaptation across spatial and temporal dimensions. For example, we showed that the spatial genetic structure of populations is stable over time, but that drift due to seasonal contractions of population size causes populations to diverge over time. We identified signals of adaptation that vary between continents in genomic regions associated with xenobiotic resistance, consistent with independent adaptation to common pesticides. Moreover, by analyzing samples collected during spring and fall across Europe, we provide new evidence for seasonal adaptation related to loci associated with pathogen response. Furthermore, we have also released an updated version of the DEST genome browser. This is a useful tool for studying spatiotemporal patterns of genetic variation in this classic model system.

View details for DOI 10.1093/molbev/msaf132

View details for PubMedID 40824865

View details for PubMedCentralID PMC12360290
RIT1 Drives Oncogenic Transformation and is an Actionable Target in Lung Adenocarcinoma. Cancer research Mozzarelli, A. M., Cuevas-Navarro, A., Shuldiner, E. G., Vega, M., Chatila, W. K., Xu, J., Walch, H. S., Niu, Y., Petrov, D. A., Schultz, N., Urisman, A., Rudin, C. M., Winslow, M. M., Castel, P. 2025

Abstract

RIT1 is a small GTPase of the RAS family, and RIT1 mutations have been identified in lung cancer, leukemias, and the developmental disorder Noonan syndrome. Mutations in RIT1 lead to increased protein levels due to impaired proteolysis, resulting in dysregulation of RAS/MAPK signaling and other pathways. Here, we documented the diversity of RIT1 mutations in human lung cancer and showed that physiological expression of RIT1 M90I is sufficient to drive autochthonous lung tumor development in vivo in mouse models. Evaluation of complementary methods to either inhibit RIT1 directly or the downstream RAS/MAPK pathway revealed that RIT1 M90I tumors are sensitive to SHP2 inhibitors and RAS nucleotide exchange inhibition. Additionally, a proof-of-concept chemical biology approach identified that RAS tri-complex inhibitors bind directly to GTP-bound RIT1, resulting in tumor shrinkage. These molecules provide a feasible therapeutic approach for RIT1-driven lung tumors.

View details for DOI 10.1158/0008-5472.CAN-24-3819

View details for PubMedID 40644578
Efficient and multiplexed somatic genome editing with Cas12a mice. Nature biomedical engineering Hebert, J. D., Xu, H., Tang, Y. J., Ruiz, P. A., Detrick, C. R., Wang, J., Hughes, N. W., Donosa, O., Siah, V. P., Andrejka, L., Karmakar, S., Aboiralor, I., Tang, R., Sotillo, R., Sage, J., Cong, L., Petrov, D. A., Winslow, M. M. 2025

Abstract

Somatic genome editing in mouse models has increased our understanding of the in vivo effects of genetic alterations. However, existing models have a limited ability to create multiple targeted edits, hindering our understanding of complex genetic interactions. Here we generate transgenic mice with Cre-regulated and constitutive expression of enhanced Acidaminococcus sp. Cas12a (enAsCas12a), which robustly generates compound genotypes, including diverse cancers driven by inactivation of trios of tumour suppressor genes or an oncogenic translocation. We integrate these modular CRISPR RNA (crRNA) arrays with clonal barcoding to quantify the size and number of tumours with each array, as well as the impact of varying the guide number and position within a four-guide array. Finally, we generate tumours with inactivation of all combinations of nine tumour suppressor genes and find that the fitness of triple-knockout genotypes is largely explainable by one- and two-gene effects. These Cas12a alleles will enable further rapid creation of disease models and high-throughput investigation of coincident genomic alterations in vivo.

View details for DOI 10.1038/s41551-025-01407-7

View details for PubMedID 40447760

View details for PubMedCentralID 4530801
Integrative multiomic approaches reveal ZMAT3 and p21 as conserved hubs in the p53 tumor suppression network. Cell death and differentiation Boutelle, A. M., Mabene, A. R., Yao, D., Xu, H., Wang, M., Tang, Y. J., Lopez, S. S., Sinha, S., Demeter, J., Cheng, R., Benard, B. A., McCrea, E. M., Valente, L. J., Drainas, A. P., Fischer, M., Majeti, R., Petrov, D. A., Jackson, P. K., Yang, F., Winslow, M. M., Bassik, M. C., Attardi, L. D. 2025

Abstract

TP53, the most frequently mutated gene in human cancer, encodes a transcriptional activator that induces myriad downstream target genes. Despite the importance of p53 in tumor suppression, the specific p53 target genes important for tumor suppression remain unclear. Recent studies have identified the p53-inducible gene Zmat3 as a critical effector of tumor suppression, but many questions remain regarding its p53-dependence, activity across contexts, and mechanism of tumor suppression alone and in cooperation with other p53-inducible genes. To address these questions, we used Tuba-seqUltra somatic genome editing and tumor barcoding in a mouse lung adenocarcinoma model, combinatorial in vivo CRISPR/Cas9 screens, meta-analyses of gene expression and Cancer Dependency Map data, and integrative RNA-sequencing and shotgun proteomic analyses. We established Zmat3 as a core component of p53-mediated tumor suppression and identified Cdkn1a as the most potent cooperating p53-induced gene in tumor suppression. We discovered that ZMAT3/CDKN1A serve as near-universal effectors of p53-mediated tumor suppression that regulate cell division, migration, and extracellular matrix organization. Accordingly, combined Zmat3-Cdkn1a inactivation dramatically enhanced cell proliferation and migration compared to controls, akin to p53 inactivation. Together, our findings place ZMAT3 and CDKN1A as hubs of a p53-induced gene program that opposes tumorigenesis across various cellular and genetic contexts.

View details for DOI 10.1038/s41418-025-01513-8

View details for PubMedID 40263541

View details for PubMedCentralID 3927368
Low-dimensional genotype-fitness mapping across divergent environments suggests a limiting functions model of fitness. bioRxiv : the preprint server for biology Ghosh, O. M., Kinsler, G., Good, B. H., Petrov, D. A. 2025

Abstract

A central goal in evolutionary biology is to be able to predict the effect of a genetic mutation on fitness. This is a major challenge because fitness depends both on phenotypic changes due to the mutation, and how these phenotypes map onto fitness in a particular environment. Genotype, phenotype, and environment spaces are all extremely complex, rendering bottom-up prediction unlikely. Here we show, using a large collection of adaptive yeast mutants, that fitness across a set of lab environments can be well-captured by top-down, low-dimensional linear models that generate abstract genotype-phenotype-fitness maps. We find that these maps are low-dimensional not only in the environment where the adaptive mutants evolved, but also in more divergent environments. We further find that the genotype-phenotype-fitness spaces implied by these maps overlap only partially across environments. We argue that these patterns are consistent with a "limiting functions" model of fitness, whereby only a small number of limiting functions can be modified to affect fitness in any given environment. The pleiotropic side-effects on non-limiting functions are effectively hidden from natural selection locally, but can be revealed globally. These results combine to emphasize the importance of environmental context in genotype-phenotype-fitness mapping, and have implications for the predictability and trajectory of evolution in complex environments.

View details for DOI 10.1101/2025.04.05.647371

View details for PubMedID 40291729
Drosophila melanogaster pigmentation demonstrates adaptive phenotypic parallelism over multiple spatiotemporal scales EVOLUTION LETTERS Berardi, S., Rhodes, J. A., Berner, M., Greenblum, S., Bitter, M. C., Behrman, E. L., Betancourt, N. J., Bergland, A. O., Petrov, D. A., Rajpurohit, S., Schmidt, P. 2025

View details for DOI 10.1093/evlett/qraf008

View details for Web of Science ID 001461241000001
Parameterizing Pantherinae: de novo mutation rate estimates from Panthera and Neofelis pedigrees. Genome biology and evolution Armstrong, E. E., Carey, S. B., Harkess, A., Zenato Lazzari, G., Solari, K. A., Maldonado, J. E., Fleischer, R. C., Aziz, N., Walsh, P., Koepfli, K. P., Eizirik, E., Petrov, D. A., Campana, M. G. 2025

Abstract

Estimates of de novo mutation rates are essential for phylogenetic and demographic analyses, but their inference has previously been impeded by high error rates in sequence data and uncertainty in the fossil record. Here, we directly estimate de novo germline mutation rates for all extant members of Panthera, as well as the closely related outgroup Neofelis nebulosa, using pedigrees. We use a previously validated pipeline (RatesTools) to calculate mutation rates for each species and subsequently explore the impacts of the novel rates on historic effective population size estimates in each of these charismatic felids of conservation concern. Importantly, we find that the choice of reference genome, the data type and coverage, and the individual all impact estimates of the mutation rate, but these can be largely ameliorated through extensive manual curation. Despite these stochastic effects, manual validation of de novo mutation candidates permitted the reliable inference of pantherine mutation rates. We inferred that base pair mutation rates for all species fell between 3.6 × 10-9 and 7.6 × 10-9 per generation per base pair (mean 5.5 × 10-9 ± 1.7 × 10-9 across Pantherinae at a mean parental age of 5.5 years). Similar to other studies, we show a positive trend of mean parental age with mutation rate and our inferred rates are well within the expected range for other mammals.

View details for DOI 10.1093/gbe/evaf060

View details for PubMedID 40171701
Competition for shared resources increases dependence on initial population size during coalescence of gut microbial communities. Proceedings of the National Academy of Sciences of the United States of America Goldman, D. A., Xue, K. S., Parrott, A. B., Lopez, J. A., Vila, J. C., Jeeda, R. R., Franzese, L. R., Porter, R. L., Gray, I. J., DeFelice, B. C., Petrov, D. A., Good, B. H., Relman, D. A., Huang, K. C. 2025; 122 (11): e2322440122

Abstract

The long-term success of introduced populations depends on both their initial size and ability to compete against existing residents, but it remains unclear how these factors collectively shape colonization dynamics. Here, we investigate how initial population (propagule) size shapes the outcome of community coalescence by systematically mixing eight pairs of in vitro microbial communities at ratios that vary over six orders of magnitude, and we compare our results to neutral ecological theory. Although the composition of the resulting cocultures deviated substantially from neutral expectations, each coculture contained species whose relative abundance depended on propagule size even after ~40 generations of growth. Using a consumer-resource model, we show that this dose-dependent colonization can arise when resident and introduced species have high niche overlap and consume shared resources at similar rates. Strain isolates displayed longer-lasting dose dependence when introduced into diverse communities than in pairwise cocultures, consistent with our model's prediction that propagule size should have larger, more persistent effects in diverse communities. Our model also successfully predicted that species with similar resource-utilization profiles, as inferred from growth in spent media and untargeted metabolomics, would show stronger dose dependence in pairwise coculture. This work demonstrates that transient, dose-dependent colonization dynamics can emerge from resource competition and exert long-term effects on the outcomes of community coalescence.

View details for DOI 10.1073/pnas.2322440122

View details for PubMedID 40063808
Combinatorial in vivo genome editing identifies widespread epistasis and an accessible fitness landscape during lung tumorigenesis. Molecular biology and evolution Hebert, J. D., Tang, Y. J., Szamecz, M., Andrejka, L., Lopez, S. S., Petrov, D. A., Boross, G., Winslow, M. M. 2025

Abstract

Lung adenocarcinoma, the most common subtype of lung cancer, is genomically complex, with tumors containing tens to hundreds of non-synonymous mutations. However, little is understood about how genes interact with each other to enable the evolution of cancer in vivo, largely due to a lack of methods for investigating genetic interactions in a high-throughput and quantitative manner. Here, we employed a novel platform to generate tumors with inactivation of pairs of ten diverse tumor suppressor genes within an autochthonous mouse model of oncogenic KRAS-driven lung cancer. By quantifying the fitness of tumors with every single and double mutant genotype, we show that most tumor suppressor genetic interactions exhibited negative epistasis, with diminishing returns on tumor fitness. In contrast, Apc inactivation showed positive epistasis with the inactivation of several other genes, including synergistic effects on tumor fitness in combination with Lkb1 or Nf1 inactivation. Sign epistasis was extremely rare, suggesting a surprisingly accessible fitness landscape during lung tumorigenesis. These findings expand our understanding of the evolutionary interactions that drive tumorigenesis in vivo.

View details for DOI 10.1093/molbev/msaf023

View details for PubMedID 39907430
Next-Generation Snow Leopard Population Assessment Tool: Multiplex-PCR SNP Panel for Individual Identification From Faeces. Molecular ecology resources Solari, K. A., Ahmad, S., Armstrong, E. E., Campana, M. G., Ali, H., Hameed, S., Ullah, J., Khan, B. U., Nawaz, M. A., Petrov, D. A. 2025: e14074

Abstract

In recent years, numerous single nucleotide polymorphism (SNP) panel methods to genotype non-invasive faecal samples have been developed. However, none of these existing methods fit all of the criteria necessary to make a SNP panel broadly usable for conservation projects in any country-cost effective, streamlined lab protocol and user-friendly open-source bioinformatics protocols for panel design and analysis. Here, we present such a method and display its utility by developing a multiplex PCR SNP panel for conducting individual ID of snow leopards, Panthera uncia, from faecal samples. The SNP panel we present consists of 144 SNPs and utilises next-generation sequencing technology. We validate our SNP panel with paired tissue and faecal samples from zoo individuals, showing a minimum of 96.7% accuracy in allele calls per run. We then generate SNP data from 235 field-collected faecal samples from across Pakistan to show that the panel can reliably identify individuals from low-quality faecal samples of unknown age and is robust to contamination. We also show that our SNP panel has the capability to identify first-order relatives among sampled zoo individuals and provides insights into the geographic origin of samples. This SNP panel will empower the snow leopard research community in their efforts to assess local and global snow leopard population sizes. More broadly, we present a SNP panel development method that can be used for any species of interest for which adequate genomic reference data is available.

View details for DOI 10.1111/1755-0998.14074

View details for PubMedID 39887922
A STAG2-PAXIP1/PAGR1 axis suppresses lung tumorigenesis. The Journal of experimental medicine Ashkin, E. L., Tang, Y. J., Xu, H., Hung, K. L., Belk, J. A., Cai, H., Lopez, S. S., Dolcen, D. N., Hebert, J. D., Li, R., Ruiz, P. A., Keal, T., Andrejka, L., Chang, H. Y., Petrov, D. A., Dixon, J. R., Xu, Z., Winslow, M. M. 2025; 222 (1)

Abstract

The cohesin complex is a critical regulator of gene expression. STAG2 is the most frequently mutated cohesin subunit across several cancer types and is a key tumor suppressor in lung cancer. Here, we coupled somatic CRISPR-Cas9 genome editing and tumor barcoding with an autochthonous oncogenic KRAS-driven lung cancer model and showed that STAG2 is uniquely tumor-suppressive among all core and auxiliary cohesin components. The heterodimeric complex components PAXIP1 and PAGR1 have highly correlated effects with STAG2 in human lung cancer cell lines, are tumor suppressors in vivo, and are epistatic to STAG2 in oncogenic KRAS-driven lung tumorigenesis in vivo. STAG2 inactivation elicits changes in gene expression, chromatin accessibility, and 3D genome conformation that impact the cancer cell state. Gene expression and chromatin accessibility similarities between STAG2- and PAXIP1-deficient neoplastic cells further relate STAG2-cohesin to PAXIP1/PAGR1. These findings reveal a STAG2-PAXIP1/PAGR1 tumor-suppressive axis and uncover novel PAXIP1-dependent and PAXIP1-independent STAG2-cohesin-mediated mechanisms of lung tumor suppression.

View details for DOI 10.1084/jem.20240765

View details for PubMedID 39652422
A high-resolution two-step evolution experiment in yeast reveals a shift from pleiotropic to modular adaptation. PLoS biology Kinsler, G., Li, Y., Sherlock, G., Petrov, D. A. 2024; 22 (12): e3002848

Abstract

Evolution by natural selection is expected to be a slow and gradual process. In particular, the mutations that drive evolution are predicted to be small and modular, incrementally improving a small number of traits. However, adaptive mutations identified early in microbial evolution experiments, cancer, and other systems often provide substantial fitness gains and pleiotropically improve multiple traits at once. We asked whether such pleiotropically adaptive mutations are common throughout adaptation or are instead a rare feature of early steps in evolution that tend to target key signaling pathways. To do so, we conducted barcoded second-step evolution experiments initiated from 5 first-step mutations identified from a prior yeast evolution experiment. We then isolated hundreds of second-step mutations from these evolution experiments, measured their fitness and performance in several growth phases, and conducted whole genome sequencing of the second-step clones. Here, we found that while the vast majority of mutants isolated from the first-step of evolution in this condition show patterns of pleiotropic adaptation-improving both performance in fermentation and respiration growth phases-second-step mutations show a shift towards modular adaptation, mostly improving respiration performance and only rarely improving fermentation performance. We also identified a shift in the molecular basis of adaptation from genes in cellular signaling pathways towards genes involved in respiration and mitochondrial function. Our results suggest that the genes in cellular signaling pathways may be more likely to provide large, adaptively pleiotropic benefits to the organism due to their ability to coherently affect many phenotypes at once. As such, these genes may serve as the source of pleiotropic adaptation in the early stages of evolution, and once these become exhausted, organisms then adapt more gradually, acquiring smaller, more modular mutations.

View details for DOI 10.1371/journal.pbio.3002848

View details for PubMedID 39636818
A Pipeline and Recommendations for Population and Individual Diagnostic SNP Selection in Non-Model Species. Molecular ecology resources Armstrong, E. E., Li, C., Campana, M. G., Ferrari, T., Kelley, J. L., Petrov, D. A., Solari, K. A., Mooney, J. A. 2024: e14048

Abstract

Despite substantial reductions in the cost of sequencing over the last decade, genetic panels remain relevant due to their cost-effectiveness and flexibility across a variety of sample types. In particular, single nucleotide polymorphism (SNP) panels are increasingly favoured for conservation applications. SNP panels are often used because of their adaptability, effectiveness with low-quality samples, and cost-efficiency for population monitoring and forensics. However, the selection of diagnostic SNPs for population assignment and individual identification can be challenging. The consequences of poor SNP selection are under-powered panels, inaccurate results, and monetary loss. Here, we develop a novel and user-friendly SNP selection pipeline (mPCRselect) that can be used to select SNPs for population assignment and/or individual identification. mPCRselect allows any researcher, who has sufficient SNP-level data, to design a successful and cost-effective SNP panel for a diploid species of conservation concern.

View details for DOI 10.1111/1755-0998.14048

View details for PubMedID 39611246
Cancers adapt to their mutational load by buffering protein misfolding stress. eLife Tilk, S., Frydman, J., Curtis, C., Petrov, D. A. 2024; 12

Abstract

In asexual populations that don't undergo recombination, such as cancer, deleterious mutations are expected to accrue readily due to genome-wide linkage between mutations. Despite this mutational load of often thousands of deleterious mutations, many tumors thrive. How tumors survive the damaging consequences of this mutational load is not well understood. Here, we investigate the functional consequences of mutational load in 10,295 human tumors by quantifying their phenotypic response through changes in gene expression. Using a generalized linear mixed model (GLMM), we find that high mutational load tumors up-regulate proteostasis machinery related to the mitigation and prevention of protein misfolding. We replicate these expression responses in cancer cell lines and show that the viability in high mutational load cancer cells is strongly dependent on complexes that degrade and refold proteins. This indicates that the upregulation of proteostasis machinery is causally important for high mutational burden tumors and uncovers new therapeutic vulnerabilities.

View details for DOI 10.7554/eLife.87301

View details for PubMedID 39585785
Genomics of a sexually selected sperm ornament and female preference in Drosophila. Nature ecology & evolution Syed, Z. A., Gomez, R. A., Borziak, K., Asif, A., Cong, A. S., O'Grady, P. M., Kim, B. Y., Suvorov, A., Petrov, D. A., Lüpold, S., Wengert, P., McDonough-Goldstein, C., Ahmed-Braimah, Y. H., Dorus, S., Pitnick, S. 2024

Abstract

Our understanding of animal ornaments and the mating preferences driving their exaggeration is limited by knowledge of their genetics. Post-copulatory sexual selection is credited with the rapid evolution of female sperm-storage organ morphology and corresponding sperm quality traits across diverse taxa. In Drosophila, the mechanisms by which longer flagella convey an advantage in the competition among sperm for limited storage space in the female, and by which female sperm-storage organ morphology biases fertilization in favour of longer sperm have been resolved. However, the evolutionary genetics underlying this model post-copulatory ornament and preference system have remained elusive. Here we combined comparative analyses of 149 Drosophila species, a genome-wide association study in Drosophila melanogaster and molecular evolutionary analysis of ~9,400 genes to elucidate how sperm and female sperm-storage organ length co-evolved into one of nature's most extreme ornaments and preferences. Our results reveal a diverse repertoire of pleiotropic genes linking sperm length and seminal receptacle length expression to central nervous system development and sensory biology. Sperm length development appears condition-dependent and is governed by conserved hormonal (insulin/insulin-like growth factor) and developmental (including Notch and Fruitless) pathways. Central developmental pathway genes, including Notch, also comprised the majority of a restricted set of genes contributing to both intraspecific and interspecific variation in sperm length. Our findings support 'good genes' models of female preference evolution.

View details for DOI 10.1038/s41559-024-02587-2

View details for PubMedID 39578595

View details for PubMedCentralID 7523561
Massively parallel experimental interrogation of natural variants in ancient signaling pathways reveals both purifying selection and local adaptation. bioRxiv : the preprint server for biology Aguilar-Rodríguez, J., Vila, J., Chen, S. A., Razo-Mejia, M., Ghosh, O., Fraser, H. B., Jarosz, D. F., Petrov, D. A. 2024

Abstract

The nature of standing genetic variation remains a central debate in population genetics, with differing perspectives on whether common variants are mostly neutral or have functional effects. We address this question by directly mapping the fitness effects of over 9,000 natural variants in the Ras/PKA and TOR/Sch9 pathways-key regulators of cell proliferation in eukaryotes-across four conditions in Saccharomyces cerevisiae. While many variants are neutral in our assay, on the order of 3,500 exhibited significant fitness effects. These non-neutral variants tend to be missense and affect conserved, more densely packed, and less solvent-exposed protein regions. They are also typically younger, occur at lower frequencies, and more often found in heterozygous states, suggesting they are subject to purifying selection. A substantial fraction of non-neutral variants showing strong fitness effects in our experiments, however, is present at high frequencies in the population. These variants show signs of local adaptation as they tend to be found specifically in domesticated strains adapted to human-made environments. Our findings support the view that while common variants are often neutral, a significant proportion have adaptive functional consequences and are driven into the population by local positive selection. This study highlights the potential to explore the functional effects of natural genetic variation on a genome scale with quantitative fitness measurements in the laboratory, bridging the gap between population genetics and functional genomics to understand evolutionary dynamics in the wild.

View details for DOI 10.1101/2024.10.30.621178

View details for PubMedID 39553990

View details for PubMedCentralID PMC11565963
Unraveling the genomic diversity and admixture history of captive tigers in the United States. Proceedings of the National Academy of Sciences of the United States of America Armstrong, E. E., Mooney, J. A., Solari, K. A., Kim, B. Y., Barsh, G. S., Grant, V. B., Greenbaum, G., Kaelin, C. B., Panchenko, K., Pickrell, J. K., Rosenberg, N., Ryder, O. A., Yokoyama, T., Ramakrishnan, U., Petrov, D. A., Hadly, E. A. 2024; 121 (39): e2402924121

Abstract

Genomic studies of endangered species have primarily focused on describing diversity patterns and resolving phylogenetic relationships, with the overarching goal of informing conservation efforts. However, few studies have investigated genomic diversity housed in captive populations. For tigers (Panthera tigris), captive individuals vastly outnumber those in the wild, but their diversity remains largely unexplored. Privately owned captive tiger populations have remained an enigma in the conservation community, with some believing that these individuals are severely inbred, while others believe they may be a source of now-extinct diversity. Here, we present a large-scale genetic study of the private (non-zoo) captive tiger population in the United States, also known as "Generic" tigers. We find that the Generic tiger population has an admixture fingerprint comprising all six extant wild tiger subspecies. Of the 138 Generic individuals sequenced for the purpose of this study, no individual had ancestry from only one subspecies. We show that the Generic tiger population has a comparable amount of genetic diversity relative to most wild subspecies, few private variants, and fewer deleterious mutations. We observe inbreeding coefficients similar to wild populations, although there are some individuals within both the Generic and wild populations that are substantially inbred. Additionally, we develop a reference panel for tigers that can be used with imputation to accurately distinguish individuals and assign ancestry with ultralow coverage (0.25×) data. By providing a cost-effective alternative to whole-genome sequencing (WGS), the reference panel provides a resource to assist in tiger conservation efforts for both ex- and in situ populations.

View details for DOI 10.1073/pnas.2402924121

View details for PubMedID 39298482
Continuously fluctuating selection reveals fine granularity of adaptation. Nature Bitter, M. C., Berardi, S., Oken, H., Huynh, A., Lappo, E., Schmidt, P., Petrov, D. A. 2024

Abstract

Temporally fluctuating environmental conditions are a ubiquitous feature of natural habitats. Yet, how finely natural populations adaptively track fluctuating selection pressures via shifts in standing genetic variation is unknown1,2. Here we generated genome-wide allele frequency data every 1-2 generations from a genetically diverse population of Drosophila melanogaster in extensively replicated field mesocosms from late June to mid-December (a period of approximately 12 total generations). Adaptation throughout the fundamental ecological phases of population expansion, peak density and collapse was underpinned by extremely rapid, parallel changes in genomic variation across replicates. Yet, the dominant direction of selection fluctuated repeatedly, even within each of these ecological phases. Comparing patterns of change in allele frequency to an independent dataset procured from the same experimental system demonstrated that the targets of selection are predictable across years. In concert, our results reveal a fitness relevance of standing variation that is likely to be masked by inference approaches based on static population sampling or insufficiently resolved time-series data. We propose that such fine-scaled, temporally fluctuating selection may be an important force contributing to the maintenance of functional genetic variation in natural populations and an important stochastic force impacting genome-wide patterns of diversity at linked neutral sites, akin to genetic draft.

View details for DOI 10.1038/s41586-024-07834-x

View details for PubMedID 39143223

View details for PubMedCentralID 8385344
Improving the accuracy of bulk fitness assays by correcting barcode processing biases. Molecular biology and evolution McGee, R. S., Kinsler, G., Petrov, D., Tikhonov, M. 2024

Abstract

Measuring the fitnesses of genetic variants is a fundamental objective in evolutionary biology. A standard approach for measuring microbial fitnesses in bulk involves labeling a library of genetic variants with unique sequence barcodes, competing the labeled strains in batch culture, and using deep sequencing to track changes in the barcode abundances over time. However, idiosyncratic properties of barcodes can induce non-uniform amplification or uneven sequencing coverage that causes some barcodes to be over- or under-represented in samples. This systematic bias can result in erroneous read count trajectories and misestimates of fitness. Here we develop a computational method, REBAR, for inferring the effects of barcode processing bias by leveraging the structure of systematic deviations in the data. We illustrate this approach by applying it to two independent data sets, and demonstrate that this method estimates and corrects for bias more accurately than standard proxies, such as GC-based corrections. REBAR mitigates bias and improves fitness estimates in high-throughput assays without introducing additional complexity to the experimental protocols, with potential applications in a range of experimental evolution and mutation screening contexts.

View details for DOI 10.1093/molbev/msae152

View details for PubMedID 39041198
Environmental memory alters the fitness effects of adaptive mutations in fluctuating environments. Nature ecology & evolution Abreu, C. I., Mathur, S., Petrov, D. A. 2024

Abstract

Evolution in a static laboratory environment often proceeds via large-effect beneficial mutations that may become maladaptive in other environments. Conversely, natural settings require populations to endure environmental fluctuations. A sensible assumption is that the fitness of a lineage in a fluctuating environment is the time average of its fitness over the sequence of static conditions it encounters. However, transitions between conditions may pose entirely new challenges, which could cause deviations from this time average. To test this, we tracked hundreds of thousands of barcoded yeast lineages evolving in static and fluctuating conditions and subsequently isolated 900 mutants for pooled fitness assays in 15 environments. Here we find that fitness in fluctuating environments indeed often deviates from the time average, leading to fitness non-additivity. Moreover, closer examination reveals that fitness in one component of a fluctuating environment is often strongly influenced by the previous component. We show that this environmental memory is especially common for mutants with high variance in fitness across tested environments. We use a simple mathematical model and whole-genome sequencing to propose mechanisms underlying this effect, including lag time evolution and sensing mutations. Our results show that environmental fluctuations impact fitness and suggest that variance in static environments can explain these impacts.

View details for DOI 10.1038/s41559-024-02475-9

View details for PubMedID 39020024

View details for PubMedCentralID 1482574
Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life. PLoS biology Kim, B. Y., Gellert, H. R., Church, S. H., Suvorov, A., Anderson, S. S., Barmina, O., Beskid, S. G., Comeault, A. A., Crown, K. N., Diamond, S. E., Dorus, S., Fujichika, T., Hemker, J. A., Hrcek, J., Kankare, M., Katoh, T., Magnacca, K. N., Martin, R. A., Matsunaga, T., Medeiros, M. J., Miller, D. E., Pitnick, S., Schiffer, M., Simoni, S., Steenwinkel, T. E., Syed, Z. A., Takahashi, A., Wei, K. H., Yokoyama, T., Eisen, M. B., Kopp, A., Matute, D., Obbard, D. J., O'Grady, P. M., Price, D. K., Toda, M. J., Werner, T., Petrov, D. A. 2024; 22 (7): e3002697

Abstract

Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1 Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.

View details for DOI 10.1371/journal.pbio.3002697

View details for PubMedID 39024225
Bayesian inference of relative fitness on high-throughput pooled competition assays. PLoS computational biology Razo-Mejia, M., Mani, M., Petrov, D. 2024; 20 (3): e1011937

Abstract

The tracking of lineage frequencies via DNA barcode sequencing enables the quantification of microbial fitness. However, experimental noise coming from biotic and abiotic sources complicates the computation of a reliable inference. We present a Bayesian pipeline to infer relative microbial fitness from high-throughput lineage tracking assays. Our model accounts for multiple sources of noise and propagates uncertainties throughout all parameters in a systematic way. Furthermore, using modern variational inference methods based on automatic differentiation, we are able to scale the inference to a large number of unique barcodes. We extend this core model to analyze multi-environment assays, replicate experiments, and barcodes linked to genotypes. On simulations, our method recovers known parameters within posterior credible intervals. This work provides a generalizable Bayesian framework to analyze lineage tracking experiments. The accompanying open-source software library enables the adoption of principled statistical methods in experimental evolution.

View details for DOI 10.1371/journal.pcbi.1011937

View details for PubMedID 38489348
Combinatorialin vivogenome editing identifies widespread epistasis during lung tumorigenesis. bioRxiv : the preprint server for biology Hebert, J. D., Tang, Y. J., Andrejka, L., Lopez, S. S., Petrov, D. A., Boross, G., Winslow, M. M. 2024

Abstract

Lung adenocarcinoma, the most common subtype of lung cancer, is genomically complex, with tumors containing tens to hundreds of non-synonymous mutations. However, little is understood about how genes interact with each other to enable tumorigenesis in vivo , largely due to a lack of methods for investigating genetic interactions in a high-throughput and multiplexed manner. Here, we employed a novel platform to generate tumors with all pairwise inactivation of ten tumor suppressor genes within an autochthonous mouse model of oncogenic KRAS-driven lung cancer. By quantifying the fitness of tumors with every single and double mutant genotype, we show that most tumor suppressor genetic interactions exhibited negative epistasis, with diminishing returns on tumor fitness. In contrast, Apc inactivation showed positive epistasis with the inactivation of several other genes, including dramatically synergistic effects on tumor fitness in combination with Lkb1 or Nf1 inactivation. This approach has the potential to expand the scope of genetic interactions that may be functionally characterized in vivo , which could lead to a better understanding of how complex tumor genotypes impact each step of carcinogenesis.

View details for DOI 10.1101/2024.03.07.583981

View details for PubMedID 38496564
Competition for shared resources increases dependence on initial population size during coalescence of gut microbial communities. bioRxiv : the preprint server for biology Goldman, D. A., Xue, K. S., Parrott, A. B., Jeeda, R. R., Franzese, L. R., Lopez, J. G., Vila, J. C., Petrov, D. A., Good, B. H., Relman, D. A., Huang, K. C. 2023

Abstract

The long-term success of introduced populations depends on their initial size and ability to compete against existing residents, but it remains unclear how these factors collectively shape colonization. Here, we investigate how initial population (propagule) size and resource competition interact during community coalescence by systematically mixing eight pairs of in vitro microbial communities at ratios that vary over six orders of magnitude, and we compare our results to a neutral ecological model. Although the composition of the resulting co-cultures deviated substantially from neutral expectations, each co-culture contained species whose relative abundance depended on propagule size even after ~40 generations of growth. Using a consumer-resource model, we show that this dose-dependent colonization can arise when resident and introduced species have high niche overlap and consume shared resources at similar rates. This model predicts that propagule size will have larger, longer-lasting effects in diverse communities in which niche overlap is higher, and we experimentally confirm that strain isolates show stronger dose dependence when introduced into diverse communities than in pairwise co-culture. This work shows how neutral-like colonization dynamics can emerge from non-neutral resource competition and have lasting effects on the outcomes of community coalescence.

View details for DOI 10.1101/2023.11.29.569120

View details for PubMedID 38076867

View details for PubMedCentralID PMC10705444
Evolution of haploid and diploid populations reveals common, strong, and variable pleiotropic effects in non-home environments. eLife Chen, V., Johnson, M. S., Hérissant, L., Humphrey, P. T., Yuan, D. C., Li, Y., Agarwala, A., Hoelscher, S. B., Petrov, D. A., Desai, M. M., Sherlock, G. 2023; 12

Abstract

Adaptation is driven by the selection for beneficial mutations that provide a fitness advantage in the specific environment in which a population is evolving. However, environments are rarely constant or predictable. When an organism well adapted to one environment finds itself in another, pleiotropic effects of mutations that made it well adapted to its former environment will affect its success. To better understand such pleiotropic effects, we evolved both haploid and diploid barcoded budding yeast populations in multiple environments, isolated adaptive clones, and then determined the fitness effects of adaptive mutations in "non-home" environments in which they were not selected. We find that pleiotropy is common, with most adaptive evolved lineages showing fitness effects in non-home environments. Consistent with other studies, we find that these pleiotropic effects are unpredictable: they are beneficial in some environments and deleterious in others. However, we do find that lineages with adaptive mutations in the same genes tend to show similar pleiotropic effects. We also find that ploidy influences the observed adaptive mutational spectra in a condition-specific fashion. In some conditions, haploids and diploids are selected with adaptive mutations in identical genes, while in others they accumulate mutations in almost completely disjoint sets of genes.

View details for DOI 10.7554/eLife.92899

View details for PubMedID 37861305
Bayesian inference of relative fitness on high-throughput pooled competition assays. bioRxiv : the preprint server for biology Razo-Mejia, M., Mani, M., Petrov, D. 2023

Abstract

The tracking of lineage frequencies via DNA barcode sequencing enables the quantification of microbial fitness. However, experimental noise coming from biotic and abiotic sources complicates the computation of a reliable inference. We present a Bayesian pipeline to infer relative microbial fitness from high-throughput lineage tracking assays. Our model accounts for multiple sources of noise and propagates uncertainties throughout all parameters in a systematic way. Furthermore, using modern variational inference methods based on automatic differentiation, we are able to scale the inference to a large number of unique barcodes. We extend this core model to analyze multi-environment assays, replicate experiments, and barcodes linked to genotypes. On simulations, our method recovers known parameters within posterior credible intervals. This work provides a generalizable Bayesian framework to analyze lineage tracking experiments. The accompanying open-source software library enables the adoption of principled statistical methods in experimental evolution.

View details for DOI 10.1101/2023.10.14.562365

View details for PubMedID 37904971
Oncogenic context shapes the fitness landscape of tumor suppression. Nature communications Blair, L. M., Juan, J. M., Sebastian, L., Tran, V. B., Nie, W., Wall, G. D., Gerceker, M., Lai, I. K., Apilado, E. A., Grenot, G., Amar, D., Foggetti, G., Do Carmo, M., Ugur, Z., Deng, D., Chenchik, A., Paz Zafra, M., Dow, L. E., Politi, K., MacQuitty, J. J., Petrov, D. A., Winslow, M. M., Rosen, M. J., Winters, I. P. 2023; 14 (1): 6422

Abstract

Tumors acquire alterations in oncogenes and tumor suppressor genes in an adaptive walk through the fitness landscape of tumorigenesis. However, the interactions between oncogenes and tumor suppressor genes that shape this landscape remain poorly resolved and cannot be revealed by human cancer genomics alone. Here, we use a multiplexed, autochthonous mouse platform to model and quantify the initiation and growth of more than one hundred genotypes of lung tumors across four oncogenic contexts: KRAS G12D, KRAS G12C, BRAF V600E, and EGFR L858R. We show that the fitness landscape is rugged-the effect of tumor suppressor inactivation often switches between beneficial and deleterious depending on the oncogenic context-and shows no evidence of diminishing-returns epistasis within variants of the same oncogene. These findings argue against a simple linear signaling relationship amongst these three oncogenes and imply a critical role for off-axis signaling in determining the fitness effects of inactivating tumor suppressors.

View details for DOI 10.1038/s41467-023-42156-y

View details for PubMedID 37828026

View details for PubMedCentralID 8412936
Single-fly assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life. bioRxiv : the preprint server for biology Kim, B. Y., Gellert, H. R., Church, S. H., Suvorov, A., Anderson, S. S., Barmina, O., Beskid, S. G., Comeault, A. A., Crown, K. N., Diamond, S. E., Dorus, S., Fujichika, T., Hemker, J. A., Hrcek, J., Kankare, M., Katoh, T., Magnacca, K. N., Martin, R. A., Matsunaga, T., Medeiros, M. J., Miller, D. E., Pitnick, S., Simoni, S., Steenwinkel, T. E., Schiffer, M., Syed, Z. A., Takahashi, A., Wei, K. H., Yokoyama, T., Eisen, M. B., Kopp, A., Matute, D., Obbard, D. J., O'Grady, P. M., Price, D. K., Toda, M. J., Werner, T., Petrov, D. A. 2023

Abstract

Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.

View details for DOI 10.1101/2023.10.02.560517

View details for PubMedID 37873137

View details for PubMedCentralID PMC10592941
Prolonged delays in human microbiota transmission after a controlled antibiotic perturbation. bioRxiv : the preprint server for biology Xue, K. S., Walton, S. J., Goldman, D. A., Morrison, M. L., Verster, A. J., Parrott, A. B., Yu, F. B., Neff, N. F., Rosenberg, N. A., Ross, B. D., Petrov, D. A., Huang, K. C., Good, B. H., Relman, D. A. 2023

Abstract

Humans constantly encounter new microbes, but few become long-term residents of the adult gut microbiome. Classical theories predict that colonization is determined by the availability of open niches, but it remains unclear whether other ecological barriers limit commensal colonization in natural settings. To disentangle these effects, we used a controlled perturbation with the antibiotic ciprofloxacin to investigate the dynamics of gut microbiome transmission in 22 households of healthy, cohabiting adults. Colonization was rare in three-quarters of antibiotic-taking subjects, whose resident strains rapidly recovered in the week after antibiotics ended. In contrast, the remaining subjects exhibited lasting responses to antibiotics, with extensive species losses and transient expansions of potential opportunistic pathogens. These subjects experienced elevated rates of commensal colonization, but only after long delays: many new colonizers underwent sudden, correlated expansions months after the antibiotic perturbation. Furthermore, strains that had previously transmitted between cohabiting partners rarely recolonized after antibiotic disruptions, showing that colonization displays substantial historical contingency. This work demonstrates that there remain substantial ecological barriers to colonization even after major microbiome disruptions, suggesting that dispersal interactions and priority effects limit the pace of community change.

View details for DOI 10.1101/2023.09.26.559480

View details for PubMedID 37808827

View details for PubMedCentralID PMC10557656
Strong environmental memory revealed by experimental evolution in static and fluctuating environments. bioRxiv : the preprint server for biology Abreu, C. I., Mathur, S., Petrov, D. A. 2023

Abstract

Evolution in a static environment, such as a laboratory setting with constant and uniform conditions, often proceeds via large-effect beneficial mutations that may become maladaptive in other environments. Conversely, natural settings require populations to endure environmental fluctuations. A sensible assumption is that the fitness of a lineage in a fluctuating environment is the time-average of its fitness over the sequence of static conditions it encounters. However, transitions between conditions may pose entirely new challenges, which could cause deviations from this time-average. To test this, we tracked hundreds of thousands of barcoded yeast lineages evolving in static and fluctuating conditions and subsequently isolated 900 mutants for pooled fitness assays in 15 environments. We find that fitness in fluctuating environments indeed often deviates from the expectation based on static components, leading to fitness non-additivity. Moreover, closer examination reveals that fitness in one component of a fluctuating environment is often strongly influenced by the previous component. We show that this environmental memory is especially common for mutants with high variance in fitness across tested environments, even if the components of the focal fluctuating environment are excluded from this variance. We employ a simple mathematical model and whole-genome sequencing to propose mechanisms underlying this effect, including lag time evolution and sensing mutations. Our results demonstrate that environmental fluctuations have large impacts on fitness and suggest that variance in static environments can explain these impacts.

View details for DOI 10.1101/2023.09.14.557739

View details for PubMedID 37745585

View details for PubMedCentralID PMC10515930
Fully accessible fitness landscape of oncogene-negative lung adenocarcinoma. Proceedings of the National Academy of Sciences of the United States of America Yousefi, M., Andrejka, L., Szamecz, M., Winslow, M. M., Petrov, D. A., Boross, G. 2023; 120 (38): e2303224120

Abstract

Cancer genomes are almost invariably complex with genomic alterations cooperating during each step of carcinogenesis. In cancers that lack a single dominant oncogene mutation, cooperation between the inactivation of multiple tumor suppressor genes can drive tumor initiation and growth. Here, we shed light on how the sequential acquisition of genomic alterations generates oncogene-negative lung tumors. We couple tumor barcoding with combinatorial and multiplexed somatic genome editing to characterize the fitness landscapes of three tumor suppressor genes NF1, RASA1, and PTEN, the inactivation of which jointly drives oncogene-negative lung adenocarcinoma initiation and growth. The fitness landscape was surprisingly accessible, with each additional mutation leading to growth advantage. Furthermore, the fitness landscapes remained fully accessible across backgrounds with the inactivation of additional tumor suppressor genes. These results suggest that while predicting cancer evolution will be challenging, acquiring the multiple alterations that drive the growth of oncogene-negative tumors can be facilitated by the lack of constraints on mutational order.

View details for DOI 10.1073/pnas.2303224120

View details for PubMedID 37695905
Author Correction: Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nature neuroscience Zhu, X., Zhou, B., Pattni, R., Gleason, K., Tan, C., Kalinowski, A., Sloan, S., Fiston-Lavier, A. S., Mariani, J., Petrov, D., Barres, B. A., Duncan, L., Abyzov, A., Vogel, H., Moran, J. V., Vaccarino, F. M., Tamminga, C. A., Levinson, D. F., Urban, A. E. 2023

View details for DOI 10.1038/s41593-023-01438-w

View details for PubMedID 37648813
In Vitro Reconstitution and Analysis of SARS-CoV-2/Host Protein-Protein Interactions. ACS omega Moradi, S. V., Wu, Y., Walden, P., Cui, Z., Johnston, W. A., Petrov, D., Alexandrov, K. 2023; 8 (28): 25009-25019

Abstract

The emergence of viral threats such as Ebola, ZIKA, and severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) requires a rapid and efficient approach for elucidating mechanisms of pathogenesis and development of therapeutics. In this context, cell-free protein synthesis (CFPS) holds a promise to resolve the bottlenecks of multiplexed protein production and interaction analysis among host and pathogen proteins. Here, we applied a eukaryotic CFPS system based on Leishmania tarentolae extract (LTE) protein expression in combination with AlphaLISA proximity-based protein interaction technology to identify intraviral and viral-human protein interactions of SARS-CoV-2 virus that can potentially be targeted by the existing or novel antiviral therapeutics. We produced and tested 54 putative human-viral protein pairs in vitro and identified 45 direct binary protein interactions. As a casing example of the assay's suitability for drug development applications, we analyzed the effect of a putative biologic on the human angiotensin-converting enzyme 2/receptor-binding domain (hACE2/RBD) interaction. This suggests that the presented pathogen characterization platform can facilitate the development of new therapeutic agents.

View details for DOI 10.1021/acsomega.3c01625

View details for PubMedID 37483225

View details for PubMedCentralID PMC10357528
Extreme Sensitivity of Fitness to Environmental Conditions: Lessons from #1BigBatch. Journal of molecular evolution Kinsler, G., Schmidlin, K., Newell, D., Eder, R., Apodaca, S., Lam, G., Petrov, D., Geiler-Samerotte, K. 2023

Abstract

The phrase "survival of the fittest" has become an iconic descriptor of how natural selection works. And yet, precisely measuring fitness, even for single-celled microbial populations growing in controlled laboratory conditions, remains a challenge. While numerous methods exist to perform these measurements, including recently developed methods utilizing DNA barcodes, all methods are limited in their precision to differentiate strains with small fitness differences. In this study, we rule out some major sources of imprecision, but still find that fitness measurements vary substantially from replicate to replicate. Our data suggest that very subtle and difficult to avoid environmental differences between replicates create systematic variation across fitness measurements. We conclude by discussing how fitness measurements should be interpreted given their extreme environment dependence. This work was inspired by the scientific community who followed us and gave us tips as we live tweeted a high-replicate fitness measurement experiment at #1BigBatch.

View details for DOI 10.1007/s00239-023-10114-3

View details for PubMedID 37237236
Fully accessible fitness landscape of oncogene-negative lung adenocarcinoma. bioRxiv : the preprint server for biology Yousefi, M., Andrejka, L., Winslow, M. M., Petrov, D. A., Boross, G. 2023

Abstract

Cancer genomes are almost invariably complex with genomic alterations cooperating during each step of carcinogenesis. In cancers that lack a single dominant oncogene mutation, cooperation between the inactivation of multiple tumor suppressor genes can drive tumor initiation and growth. Here, we shed light on how the sequential acquisition of genomic alterations generates oncogene-negative lung tumors. We couple tumor barcoding with combinatorial and multiplexed somatic genome editing to characterize the fitness landscapes of three tumor suppressor genes NF1, RASA1, and PTEN, the inactivation of which jointly drives oncogene-negative lung adenocarcinoma initiation and growth. The fitness landscape was surprisingly accessible, with each additional mutation leading to growth advantage. Furthermore, the fitness landscapes remained fully accessible across backgrounds with additional tumor suppressor mutations. These results suggest that while predicting cancer evolution will be challenging, acquiring the multiple alterations required for the growth of oncogene-negative tumors can be facilitated by the lack of constraints on mutational order.

View details for DOI 10.1101/2023.01.30.526178

View details for PubMedID 36778226
Antigenic diversity in malaria parasites is maintained on extrachromosomal DNA. bioRxiv : the preprint server for biology Ebel, E. R., Kim, B. Y., McDew-White, M., Egan, E. S., Anderson, T. J., Petrov, D. A. 2023

Abstract

Sequence variation among antigenic var genes enables Plasmodium falciparum malaria parasites to evade host immunity. Using long sequence reads from haploid clones from a mutation accumulation experiment, we detect var diversity inconsistent with simple chromosomal inheritance. We discover putatively circular DNA that is strongly enriched for var genes, which exist in multiple alleles per locus separated by recombination and indel events. Extrachromosomal DNA likely contributes to rapid antigenic diversification in P. falciparum .

View details for DOI 10.1101/2023.02.02.526885

View details for PubMedID 36778235
A multiplexed in vivo approach to identify driver genes in small cell lung cancer. Cell reports Lee, M. C., Cai, H., Murray, C. W., Li, C., Shue, Y. T., Andrejka, L., He, A. L., Holzem, A. M., Drainas, A. P., Ko, J. H., Coles, G. L., Kong, C., Zhu, S., Zhu, C., Wang, J., van de Rijn, M., Petrov, D. A., Winslow, M. M., Sage, J. 2023; 42 (1): 111990

Abstract

Small cell lung cancer (SCLC) is a lethal form of lung cancer. Here, we develop a quantitative multiplexed approach on the basis of lentiviral barcoding with somatic CRISPR-Cas9-mediated genome editing to functionally investigate candidate regulators of tumor initiation and growth in genetically engineered mouse models of SCLC. We found that naphthalene pre-treatment enhances lentiviral vector-mediated SCLC initiation, enabling high multiplicity of tumor clones for analysis through high-throughput sequencing methods. Candidate drivers of SCLC identified from a meta-analysis across multiple human SCLC genomic datasets were tested using this approach, which defines both positive and detrimental impacts of inactivating 40 genes across candidate pathways on SCLC development. This analysis and subsequent validation in human SCLC cells establish TSC1 in the PI3K-AKT-mTOR pathway as a robust tumor suppressor in SCLC. This approach should illuminate drivers of SCLC, facilitate the development of precision therapies for defined SCLC genotypes, and identify therapeutic targets.

View details for DOI 10.1016/j.celrep.2023.111990

View details for PubMedID 36640300
Multiplexed screens identify RAS paralogues HRAS and NRAS as suppressors of KRAS-driven lung cancer growth. Nature cell biology Tang, R., Shuldiner, E. G., Kelly, M., Murray, C. W., Hebert, J. D., Andrejka, L., Tsai, M. K., Hughes, N. W., Parker, M. I., Cai, H., Li, Y. C., Wahl, G. M., Dunbrack, R. L., Jackson, P. K., Petrov, D. A., Winslow, M. M. 2023

Abstract

Oncogenic KRAS mutations occur in approximately 30% of lung adenocarcinoma. Despite several decades of effort, oncogenic KRAS-driven lung cancer remains difficult to treat, and our understanding of the regulators of RAS signalling is incomplete. Here to uncover the impact of diverse KRAS-interacting proteins on lung cancer growth, we combined multiplexed somatic CRISPR/Cas9-based genome editing in genetically engineered mouse models with tumour barcoding and high-throughput barcode sequencing. Through a series of CRISPR/Cas9 screens in autochthonous lung cancer models, we show that HRAS and NRAS are suppressors of KRASG12D-driven tumour growth in vivo and confirm these effects in oncogenic KRAS-driven human lung cancer cell lines. Mechanistically, RAS paralogues interact with oncogenic KRAS, suppress KRAS-KRAS interactions, and reduce downstream ERK signalling. Furthermore, HRAS and NRAS mutations identified in oncogenic KRAS-driven human tumours partially abolished this effect. By comparing the tumour-suppressive effects of HRAS and NRAS in oncogenic KRAS- and oncogenic BRAF-driven lung cancer models, we confirm that RAS paralogues are specific suppressors of KRAS-driven lung cancer in vivo. Our study outlines a technological avenue to uncover positive and negative regulators of oncogenic KRAS-driven cancer in a multiplexed manner in vivo and highlights the role RAS paralogue imbalance in oncogenic KRAS-driven lung cancer.

View details for DOI 10.1038/s41556-022-01049-w

View details for PubMedID 36635501
Genome Report: Chromosome-level draft assemblies of the snow leopard, African leopard, and tiger (Panthera uncia, Panthera pardus pardus, and Panthera tigris). G3 (Bethesda, Md.) Armstrong, E. E., Campana, M. G., Solari, K. A., Morgan, S. R., Ryder, O. A., Naude, V. N., Samelius, G., Sharma, K., Hadly, E. A., Petrov, D. A. 2022

Abstract

The big cats (genus Panthera) represent some of the most popular and charismatic species on the planet. Although some reference genomes are available for this clade, few are at the chromosome level, inhibiting high-resolution genomic studies. We assembled genomes from three members of the genus, the tiger (Panthera tigris), the snow leopard (Panthera uncia), and the African leopard (Panthera pardus pardus), at chromosome or near-chromosome level. We used a combination of short- and long-read technologies, as well as proximity ligation data from Hi-C technology, to achieve high continuity and contiguity for each individual. We hope these genomes will aid in further evolutionary and conservation research of this iconic group of mammals.

View details for DOI 10.1093/g3journal/jkac277

View details for PubMedID 36250809
Most cancers carry a substantial deleterious load due to Hill-Robertson interference. eLife Tilk, S., Tkachenko, S., Curtis, C., Petrov, D. A., McFarland, C. D. 2022; 11

Abstract

Cancer genomes exhibit surprisingly weak signatures of negative selection1,2. This may be because selective pressures are relaxed or because genome-wide linkage prevents deleterious mutations from being removed (Hill-Robertson interference)3. By stratifying tumors by their genome-wide mutational burden, we observe negative selection (dN/dS ~ 0.56) in low mutational burden tumors, while remaining cancers exhibit dN/dS ratios ~1. This suggests that most tumors do not remove deleterious passengers. To buffer against deleterious passengers, tumors upregulate heat shock pathways as their mutational burden increases. Finally, evolutionary modeling finds that Hill-Robertson interference alone can reproduce patterns of attenuated selection and estimates the total fitness cost of passengers to be 46% per cell on average. Collectively, our findings suggest that the lack of observed negative selection in most tumors is not due to relaxed selective pressures, but rather the inability of selection to remove deleterious mutations in the presence of genome-wide linkage.

View details for DOI 10.7554/eLife.67790

View details for PubMedID 36047771
Dissecting the role of Stag2 in lung adenocarcinoma Ashkin, E. L., Cai, H., Tang, Y. J., Li, C., Chew, S., Hung, K., Belk, J., Karmakar, S., Hebert, J., Yousefi, M., Swanton, C., Petrov, D. A., Winslow, M. AMER ASSOC CANCER RESEARCH. 2022

View details for Web of Science ID 000892509506122
A journey to deconvolute the multifaceted functions and context-dependency of cancer driver genes Cai, H., Chew, S., Li, C., Murray, C. W., Andrejka, L., Hebert, J. D., Tsai, M. K., Tang, R., Hughes, N. W., Shuldiner, E. G., Ashkin, E. L., Lee, S. C., Yousefi, M., Petrov, D. A., Swanton, C., Winslow, M. W. AMER ASSOC CANCER RESEARCH. 2022

View details for Web of Science ID 000892509506127
A quantitative in vivo pharmacogenomics platform uncovers biomarkers of therapy response Rosen, M., Amar, D., Winters, I., Rizvi, H., Nie, W., Wall, G., Petrov, D., Winslow, M., Rudin, C., Juan, J. AMER ASSOC CANCER RESEARCH. 2022

View details for Web of Science ID 000892509505311
Combinatorial Inactivation of Tumor Suppressors Efficiently Initiates Lung Adenocarcinoma with Therapeutic Vulnerabilities. Cancer research Yousefi, M., Boross, G., Weiss, C., Murray, C. W., Hebert, J. D., Cai, H., Ashkin, E. L., Karmakar, S., Andrejka, L., Chen, L., Wang, M., Tsai, M. K., Lin, W., Li, C., Yakhchalian, P., Colon, C. I., Chew, S., Chu, P., Swanton, C., Kunder, C. A., Petrov, D. A., Winslow, M. M. 2022; 82 (8): 1589-1602

Abstract

Lung cancer is the leading cause of cancer death worldwide, with lung adenocarcinoma being the most common subtype. Many oncogenes and tumor suppressor genes are altered in this cancer type, and the discovery of oncogene mutations has led to the development of targeted therapies that have improved clinical outcomes. However, a large fraction of lung adenocarcinomas lacks mutations in known oncogenes, and the genesis and treatment of these oncogene-negative tumors remain enigmatic. Here, we perform iterative in vivo functional screens using quantitative autochthonous mouse model systems to uncover the genetic and biochemical changes that enable efficient lung tumor initiation in the absence of oncogene alterations. Generation of hundreds of diverse combinations of tumor suppressor alterations demonstrates that inactivation of suppressors of the RAS and PI3K pathways drives the development of oncogene-negative lung adenocarcinoma. Human genomic data and histology identified RAS/MAPK and PI3K pathway activation as a common feature of an event in oncogene-negative human lung adenocarcinomas. These Onc-negativeRAS/PI3K tumors and related cell lines are vulnerable to pharmacologic inhibition of these signaling axes. These results transform our understanding of this prevalent yet understudied subtype of lung adenocarcinoma.SIGNIFICANCE: To address the large fraction of lung adenocarcinomas lacking mutations in proto-oncogenes for which targeted therapies are unavailable, this work uncovers driver pathways of oncogene-negative lung adenocarcinomas and demonstrates their therapeutic vulnerabilities.

View details for DOI 10.1158/0008-5472.CAN-22-0059

View details for PubMedID 35425962
Direct observation of adaptive tracking on ecological time scales in Drosophila. Science (New York, N.Y.) Rudman, S. M., Greenblum, S. I., Rajpurohit, S., Betancourt, N. J., Hanna, J., Tilk, S., Yokoyama, T., Petrov, D. A., Schmidt, P. 2022; 375 (6586): eabj7484

Abstract

Direct observation of evolution in response to natural environmental change can resolve fundamental questions about adaptation, including its pace, temporal dynamics, and underlying phenotypic and genomic architecture. We tracked the evolution of fitness-associated phenotypes and allele frequencies genome-wide in 10 replicate field populations of Drosophila melanogaster over 10 generations from summer to late fall. Adaptation was evident over each sampling interval (one to four generations), with exceptionally rapid phenotypic adaptation and large allele frequency shifts at many independent loci. The direction and basis of the adaptive response shifted repeatedly over time, consistent with the action of strong and rapidly fluctuating selection. Overall, we found clear phenotypic and genomic evidence of adaptive tracking occurring contemporaneously with environmental change, thus demonstrating the temporally dynamic nature of adaptation.

View details for DOI 10.1126/science.abj7484

View details for PubMedID 35298245
Revisiting the malaria hypothesis: accounting for polygenicity and pleiotropy. Trends in parasitology Ebel, E. R., Uricchio, L. H., Petrov, D. A., Egan, E. S. 1800

Abstract

The malaria hypothesis predicts local, balancing selection of deleterious alleles that confer strong protection from malaria. Three protective variants, recently discovered in red cell genes, are indeed more common in African than European populations. Still, up to 89% of the heritability of severe malaria is attributed to many genome-wide loci with individually small effects. Recent analyses of hundreds of genome-wide association studies (GWAS) in humans suggest that most functional, polygenic variation is pleiotropic for multiple traits. Interestingly, GWAS alleles and red cell traits associated with small reductions in malaria risk are not enriched in African populations. We propose that other selective and neutral forces, in addition to malaria prevalence, explain the global distribution of most genetic variation impacting malaria risk.

View details for DOI 10.1016/j.pt.2021.12.007

View details for PubMedID 35065882
Tumor suppressor pathways shape EGFR-driven lung tumor progression and response to treatment. Molecular & cellular oncology Foggetti, G., Li, C., Cai, H., Petrov, D. A., Winslow, M. M., Politi, K. 2022; 9 (1): 1994328

Abstract

In vivo modeling combined with CRISPR/Cas9-mediated somatic genome editing has contributed to elucidating the functional importance of specific genetic alterations in human tumors. Our recent work uncovered tumor suppressor pathways that affect EGFR-driven lung tumor growth and sensitivity to tyrosine kinase inhibitors and reflect the mutational landscape and treatment outcomes in the human disease.

View details for DOI 10.1080/23723556.2021.1994328

View details for PubMedID 35252550

View details for PubMedCentralID PMC8890383
Tumor suppressor pathways shape EGFR-driven lung tumor progression and response to treatment MOLECULAR & CELLULAR ONCOLOGY Foggetti, G., Li, C., Cai, H., Petrov, D. A., Winslow, M. M., Politi, K. 2021

View details for DOI 10.1080/23723556.2021.1994328

View details for Web of Science ID 000742922000001
The Tetragnatha kauaiensis genome sheds light on the origins of genomic novelty in spiders. Genome biology and evolution Cerca, J., Armstrong, E. E., Vizueta, J., Fernandez, R., Dimitrov, D., Petersen, B., Prost, S., Rozas, J., Petrov, D., Gillespie, R. G. 2021

Abstract

Spiders (Araneae) have a diverse spectrum of morphologies, behaviours and physiologies. Attempts to understand the genomic-basis of this diversity are often hindered by their large, heterozygous and AT-rich genomes with high repeat content resulting in highly fragmented, poor-quality assemblies. As a result, the key attributes of spider genomes, including gene family evolution, repeat content, and gene function, remain poorly understood. Here, we used Illumina and Dovetail Chicago technologies to sequence the genome of the long jawed spider Tetragnatha kauaiensis, producing an assembly distributed along 3,925 scaffolds with a N50 of 2Mb. Using comparative genomics tools, we explore genome evolution across available spider assemblies. Our findings suggest that the previously reported and vast genome size variation in spiders is linked to the different representation and number of transposable elements. Using statistical tools to uncover gene-family level evolution, we find expansions associated with the sensory perception of taste, immunity and metabolism. In addition, we report strikingly different histories of chemosensory, venom and silk gene families, with the first two evolving much earlier, affected by the ancestral whole genome duplication in Arachnopulmonata (450 million years ago) and exhibiting higher numbers. Together, our findings reveal that spider genomes are highly variable and that genomic novelty may have been driven by the burst of an ancient whole genome duplication, followed by gene family and transposable element expansion.

View details for DOI 10.1093/gbe/evab262

View details for PubMedID 34849853
Common host variation drives malaria parasite fitness in healthy human red cells. eLife Ebel, E. R., Kuypers, F. A., Lin, C., Petrov, D. A., Egan, E. S. 2021; 10

Abstract

The replication of Plasmodium falciparum parasites within red blood cells (RBCs) causes severe disease in humans, especially in Africa. Deleterious alleles like hemoglobin S are well-known to confer strong resistance to malaria, but the effects of common RBC variation are largely undetermined. Here we collected fresh blood samples from 121 healthy donors, most with African ancestry, and performed exome sequencing, detailed RBC phenotyping, and parasite fitness assays. Over one third of healthy donors unknowingly carried alleles for G6PD deficiency or hemoglobinopathies, which were associated with characteristic RBC phenotypes. Among non-carriers alone, variation in RBC hydration, membrane deformability, and volume was strongly associated with P. falciparum growth rate. Common genetic variants in PIEZO1, SPTA1/SPTB, and several P. falciparum invasion receptors were also associated with parasite growth rate. Interestingly, we observed little or negative evidence for divergent selection on non-pathogenic RBC variation between Africans and Europeans. These findings suggest a model in which globally widespread variation in a moderate number of genes and phenotypes modulates P. falciparum fitness in RBCs.

View details for DOI 10.7554/eLife.69808

View details for PubMedID 34553687
Common host variation drives malaria parasite fitness in healthy human red cells ELIFE Ebel, E. R., Kuypers, F. A., Lin, C., Petrov, D. A., Egan, E. S. 2021; 10

View details for DOI 10.7554/eLife.69808.sa2

View details for Web of Science ID 000706286200001
Richard C. Lewontin (1929-2021). Science (New York, N.Y.) Berry, A., Petrov, D. A. 2021; 373 (6556): 745

View details for DOI 10.1126/science.abl5430

View details for PubMedID 34385385
Highly contiguous assemblies of 101 drosophilid genomes. eLife Kim, B. Y., Wang, J., Miller, D. E., Barmina, O., Delaney, E. K., Thompson, A., Comeault, A. A., Peede, D., D'Agostino, E. R., Pelaez, J., Aguilar, J. M., Haji, D., Matsunaga, T., Armstrong, E., Zych, M., Ogawa, Y., Stamenkovic-Radak, M., Jelic, M., Veselinovic, M. S., Tanaskovic, M., Eric, P., Gao, J., Katoh, T. K., Toda, M. J., Watabe, H., Watada, M., Davis, J. S., Moyle, L., Manoli, G., Bertolini, E., Kostal, V., Hawley, R. S., Takahashi, A., Jones, C. D., Price, D. K., Whiteman, N. K., Kopp, A., Matute, D. R., Petrov, D. A. 2021; 10

Abstract

Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.

View details for DOI 10.7554/eLife.66405

View details for PubMedID 34279216
Quantitative in vivo analyses reveal a complex pharmacogenomic landscape in lung adenocarcinoma. Cancer research Li, C., Lin, W., Rizvi, H., Cai, H., McFarland, C. D., Rogers, Z. N., Yousefi, M., Winters, I. P., Rudin, C. M., Petrov, D. A., Winslow, M. M. 2021

Abstract

The lack of knowledge about the relationship between tumor genotypes and therapeutic responses remains one of the most critical gaps in enabling the effective use of cancer therapies. Here we couple a multiplexed and quantitative experimental platform with robust statistical methods to enable pharmacogenomic mapping of lung cancer treatment responses in vivo. The complex map of genotype-specific treatment responses uncovered that over 20% of possible interactions show significant resistance or sensitivity. Known and novel interactions were identified, and one of these interactions, the resistance of KEAP1 mutant lung tumors to platinum therapy, was validated using a large patient response dataset. These results highlight the broad impact of tumor suppressor genotype on treatment responses and define a strategy to identify the determinants of precision therapies.

View details for DOI 10.1158/0008-5472.CAN-21-0716

View details for PubMedID 34215621
Broad geographic sampling reveals the shared basis and environmental correlates of seasonal adaptation in Drosophila. eLife Machado, H. E., Bergland, A., Taylor, R. W., Tilk, S., Behrman, E., Dyer, K., Fabian, D. K., Flatt, T., Gonzalez, J., Karasov, T. L., Kim, B. Y., Kozeretska, I., Lazzaro, B. P., Merritt, T., Pool, J. E., O'Brien, K., Rajpurohit, S., Roy, P. R., Schaeffer, S. W., Serga, S., Schmidt, P., Petrov, D. A. 2021; 10

Abstract

To advance our understanding of adaptation to temporally varying selection pressures, we identified signatures of seasonal adaptation occurring in parallel among Drosophila melanogaster populations. Specifically, we estimated allele frequencies genome-wide from flies sampled early and late in the growing season from 20 widely dispersed populations. We identified parallel seasonal allele frequency shifts across North America and Europe, demonstrating that seasonal adaptation is a general phenomenon of temperate fly populations. Seasonally fluctuating polymorphisms are enriched in large chromosomal inversions and we find a broad concordance between seasonal and spatial allele frequency change. The direction of allele frequency change at seasonally variable polymorphisms can be predicted by weather conditions in the weeks prior to sampling, linking the environment and the genomic response to selection. Our results suggest that fluctuating selection is an important evolutionary force affecting patterns of genetic variation in Drosophila.

View details for DOI 10.7554/eLife.67577

View details for PubMedID 34155971
Functional biology in its natural context: A search for emergent simplicity. eLife Bergelson, J., Kreitman, M., Petrov, D. A., Sanchez, A., Tikhonov, M. 2021; 10

Abstract

The immeasurable complexity at every level of biological organization creates a daunting task for understanding biological function. Here, we highlight the risks of stripping it away at the outset and discuss a possible path toward arriving at emergent simplicity of understanding while still embracing the ever-changing complexity of biotic interactions that we see in nature.

View details for DOI 10.7554/eLife.67646

View details for PubMedID 34096867
The cis-regulatory effects of modern human-specific variants. eLife Weiss, C. V., Harshman, L., Inoue, F., Fraser, H. B., Petrov, D. A., Ahituv, N., Gokhman, D. 2021; 10

Abstract

The Neanderthal and Denisovan genomes enabled the discovery of sequences that differ between modern and archaic humans, the majority of which are noncoding. However, our understanding of the regulatory consequences of these differences remains limited, in part due to the decay of regulatory marks in ancient samples. Here, we used a massively parallel reporter assay in embryonic stem cells, neural progenitor cells, and bone osteoblasts to investigate the regulatory effects of the 14,042 single-nucleotide modern human-specific variants. Overall, 1791 (13%) of sequences containing these variants showed active regulatory activity, and 407 (23%) of these drove differential expression between human groups. Differentially active sequences were associated with divergent transcription factor binding motifs, and with genes enriched for vocal tract and brain anatomy and function. This work provides insight into the regulatory function of variants that emerged along the modern human lineage and the recent evolution of human gene expression.

View details for DOI 10.7554/eLife.63713

View details for PubMedID 33885362
The AMBRA1 E3 ligase adaptor regulates the stability of cyclinD. Nature Chaikovsky, A. C., Li, C., Jeng, E. E., Loebell, S., Lee, M. C., Murray, C. W., Cheng, R., Demeter, J., Swaney, D. L., Chen, S., Newton, B. W., Johnson, J. R., Drainas, A. P., Shue, Y. T., Seoane, J. A., Srinivasan, P., He, A., Yoshida, A., Hipkins, S. Q., McCrea, E., Poltorack, C. D., Krogan, N. J., Diehl, J. A., Kong, C., Jackson, P. K., Curtis, C., Petrov, D. A., Bassik, M. C., Winslow, M. M., Sage, J. 2021

Abstract

The initiation of cell division integrates a large number of intra- and extracellular inputs. D-type cyclins (hereafter, cyclinD) couple these inputs to the initiation of DNA replication1. Increased levels of cyclinD promote cell division by activating cyclin-dependent kinases4 and 6 (hereafter, CDK4/6), which in turn phosphorylate and inactivate the retinoblastoma tumour suppressor. Accordingly, increased levels and activity of cyclinD-CDK4/6 complexes are strongly linked to unchecked cell proliferation and cancer2,3. However, the mechanisms that regulate levels of cyclinD are incompletely understood4,5. Here we show that autophagy and beclin1 regulator1 (AMBRA1) is the main regulator of the degradation of cyclinD. We identified AMBRA1 in a genome-wide screen to investigate the genetic basis of the response to CDK4/6 inhibition. Loss of AMBRA1 results in high levels of cyclinD in cells and in mice, which promotes proliferation and decreases sensitivity to CDK4/6 inhibition. Mechanistically, AMBRA1 mediates ubiquitylation and proteasomal degradation of cyclinD as a substrate receptor for the cullin4 E3 ligase complex. Loss of AMBRA1 enhances the growth of lung adenocarcinoma in a mouse model, and low levels of AMBRA1 correlate with worse survival in patients with lung adenocarcinoma. Thus, AMBRA1 regulates cellular levels of cyclinD, and contributes to cancer development and the response of cancer cells to CDK4/6 inhibitors.

View details for DOI 10.1038/s41586-021-03474-7

View details for PubMedID 33854239
Historical trends and new surveillance of Plasmodium falciparum drug resistance markers in Angola. Malaria journal Ebel, E. R., Reis, F., Petrov, D. A., Beleza, S. 2021; 20 (1): 175

Abstract

BACKGROUND: Plasmodium falciparum resistance to chloroquine (CQ) and sulfadoxine-pyrimethamine (SP) has historically posed a major threat to malaria control throughout the world. The country of Angola officially replaced CQ with artemisinin-based combination therapy (ACT) as a first-line treatment in 2006, but malaria cases and deaths have recently been rising. Many classic resistance mutations are relevant for the efficacy of currently available drugs, making it important to continue monitoring their frequency in Angola.METHODS: Plasmodium falciparum DNA was sampled from the blood of 50 hospital patients in Cabinda, Angola from October-December of 2018. Each infection was genotyped for 13 alleles in the genes crt, mdr1, dhps, dhfr, and kelch13, which are collectively involved in resistance to six common anti-malarials. To compare frequency patterns over time, P. falciparum genotype data were also collated from studies published from across Angola in the last two decades.RESULTS: The two most important alleles for CQ resistance, crt 76T and mdr1 86Y, were found at respective frequencies of 71.4% and 6.5%. Historical data suggest that mdr1 N86 has been steadily replacing 86Y throughout Angola in the last decade, while the frequency of crt 76T has been more variable across studies. Over a third of new samples from Cabinda were 'quintuple mutants' for SP resistance in dhfr/dhps, with a sixth mutation at dhps A581G present at 9.6% frequency. The markers dhfr 51I, dhfr 108N, and dhps 437G have been nearly fixed in Angola since the early 2000s, whereas dhfr 59R may have risen to high frequency more recently. Finally, no non-synonymous polymorphisms were detected in kelch13, which is involved in artemisinin resistance in Southeast Asia.CONCLUSIONS: Genetic markers of P. falciparum resistance to CQ are likely declining in frequency in Angola, consistent with the official discontinuation of CQ in 2006. The high frequency of multiple genetic markers of SP resistance is consistent with the continued public and private use of SP. In the future, more complete haplotype data from mdr1, dhfr, and dhps will be critical for understanding the changing efficacy of multiple anti-malarial drugs. These data can be used to support effective drug policy decisions in Angola.

View details for DOI 10.1186/s12936-021-03713-2

View details for PubMedID 33827587
Genetic determinants of EGFR-Driven Lung Cancer Growth and Therapeutic Response In Vivo. Cancer discovery Foggetti, G., Li, C., Cai, H., Hellyer, J. A., Lin, W., Ayeni, D., Hastings, K., Choi, J., Wurtz, A., Andrejka, L., Maghini, D. G., Rashleigh, N., Levy, S., Homer, R., Gettinger, S. N., Diehn, M., Wakelee, H. A., Petrov, D. A., Winslow, M. M., Politi, K. 2021

Abstract

In lung adenocarcinoma, oncogenic EGFR mutations co-occur with many tumor suppressor gene alterations, however the extent to which these contribute to tumor growth and response to therapy in vivo remains largely unknown. By quantifying the effects of inactivating ten putative tumor suppressor genes in a mouse model of EGFR-driven Trp53-deficient lung adenocarcinoma, we found that Apc, Rb1, or Rbm10 inactivation strongly promoted tumor growth. Unexpectedly, inactivation of Lkb1 or Setd2 - the strongest drivers of growth in a Kras-driven model - reduced EGFR-driven tumor growth. These results are consistent with mutational frequencies in human EGFR- and KRAS-driven lung adenocarcinomas. Furthermore, Keap1 inactivation reduced the sensitivity of EGFR-driven tumors to the EGFR inhibitor osimertinib and mutations in the KEAP1 pathway were associated with decreased time on tyrosine kinase inhibitor treatment in patients. Our study highlights how the impact of genetic alterations differ across oncogenic contexts and that the fitness landscape shifts upon treatment.

View details for DOI 10.1158/2159-8290.CD-20-1385

View details for PubMedID 33707235
Detection of hard and soft selective sweeps from Drosophila melanogaster population genomic data. PLoS genetics Garud, N. R., Messer, P. W., Petrov, D. A. 2021; 17 (2): e1009373

Abstract

Whether hard sweeps or soft sweeps dominate adaptation has been a matter of much debate. Recently, we developed haplotype homozygosity statistics that (i) can detect both hard and soft sweeps with similar power and (ii) can classify the detected sweeps as hard or soft. The application of our method to population genomic data from a natural population of Drosophila melanogaster (DGRP) allowed us to rediscover three known cases of adaptation at the loci Ace, Cyp6g1, and CHKov1 known to be driven by soft sweeps, and detected additional candidate loci for recent and strong sweeps. Surprisingly, all of the top 50 candidates showed patterns much more consistent with soft rather than hard sweeps. Recently, Harris et al. 2018 criticized this work, suggesting that all the candidate loci detected by our haplotype statistics, including the positive controls, are unlikely to be sweeps at all and that instead these haplotype patterns can be more easily explained by complex neutral demographic models. They also claim that these neutral non-sweeps are likely to be hard instead of soft sweeps. Here, we reanalyze the DGRP data using a range of complex admixture demographic models and reconfirm our original published results suggesting that the majority of recent and strong sweeps in D. melanogaster are first likely to be true sweeps, and second, that they do appear to be soft. Furthermore, we discuss ways to take this work forward given that most demographic models employed in such analyses are necessarily too simple to capture the full demographic complexity, while more realistic models are unlikely to be inferred correctly because they require a large number of free parameters.

View details for DOI 10.1371/journal.pgen.1009373

View details for PubMedID 33635910
The clarifying role of time series data in the population genetics of HIV. PLoS genetics Feder, A. F., Pennings, P. S., Petrov, D. A. 2021; 17 (1): e1009050

Abstract

HIV can evolve remarkably quickly in response to antiretroviral therapies and the immune system. This evolution stymies treatment effectiveness and prevents the development of an HIV vaccine. Consequently, there has been a great interest in using population genetics to disentangle the forces that govern the HIV adaptive landscape (selection, drift, mutation, and recombination). Traditional population genetics approaches look at the current state of genetic variation and infer the processes that can generate it. However, because HIV evolves rapidly, we can also sample populations repeatedly over time and watch evolution in action. In this paper, we demonstrate how time series data can bound evolutionary parameters in a way that complements and informs traditional population genetic approaches. Specifically, we focus on our recent paper (Feder et al., 2016, eLife), in which we show that, as improved HIV drugs have led to fewer patients failing therapy due to resistance evolution, less genetic diversity has been maintained following the fixation of drug resistance mutations. Because soft sweeps of multiple drug resistance mutations spreading simultaneously have been previously documented in response to the less effective HIV therapies used early in the epidemic, we interpret the maintenance of post-sweep diversity in response to poor therapies as further evidence of soft sweeps and therefore a high population mutation rate (θ) in these intra-patient HIV populations. Because improved drugs resulted in rarer resistance evolution accompanied by lower post-sweep diversity, we suggest that both observations can be explained by decreased population mutation rates and a resultant transition to hard selective sweeps. A recent paper (Harris et al., 2018, PLOS Genetics) proposed an alternative interpretation: Diversity maintenance following drug resistance evolution in response to poor therapies may have been driven by recombination during slow, hard selective sweeps of single mutations. Then, if better drugs have led to faster hard selective sweeps of resistance, recombination will have less time to rescue diversity during the sweep, recapitulating the decrease in post-sweep diversity as drugs have improved. In this paper, we use time series data to show that drug resistance evolution during ineffective treatment is very fast, providing new evidence that soft sweeps drove early HIV treatment failure.

View details for DOI 10.1371/journal.pgen.1009050

View details for PubMedID 33444376
Widespread introgression across a phylogeny of 155 Drosophila genomes. Current biology : CB Suvorov, A., Kim, B. Y., Wang, J., Armstrong, E. E., Peede, D., D'Agostino, E. R., Price, D. K., Waddell, P., Lang, M., Courtier-Orgogozo, V., David, J. R., Petrov, D., Matute, D. R., Schrider, D. R., Comeault, A. A. 2021

Abstract

Genome-scale sequence data have invigorated the study of hybridization and introgression, particularly in animals. However, outside of a few notable cases, we lack systematic tests for introgression at a larger phylogenetic scale across entire clades. Here, we leverage 155 genome assemblies from 149 species to generate a fossil-calibrated phylogeny and conduct multilocus tests for introgression across 9 monophyletic radiations within the genus Drosophila. Using complementary phylogenomic approaches, we identify widespread introgression across the evolutionary history of Drosophila. Mapping gene-tree discordance onto the phylogeny revealed that both ancient and recent introgression has occurred across most of the 9 clades that we examined. Our results provide the first evidence of introgression occurring across the evolutionary history of Drosophila and highlight the need to continue to study the evolutionary consequences of hybridization and introgression in this genus and across the tree of life.

View details for DOI 10.1016/j.cub.2021.10.052

View details for PubMedID 34788634
Drosophila Evolution over Space and Time (DEST) - A New Population Genomics Resource. Molecular biology and evolution Kapun, M., Nunez, J. C., Bogaerts-Márquez, M., Murga-Moreno, J., Paris, M., Outten, J., Coronado-Zamora, M., Tern, C., Rota-Stabelli, O., García Guerreiro, M. P., Casillas, S., Orengo, D. J., Puerma, E., Kankare, M., Ometto, L., Loeschcke, V., Onder, B. S., Abbott, J. K., Schaeffer, S. W., Rajpurohit, S., Behrman, E. L., Schou, M. F., Merritt, T. J., Lazzaro, B. P., Glaser-Schmitt, A., Argyridou, E., Staubach, F., Wang, Y., Tauber, E., Serga, S. V., Fabian, D. K., Dyer, K. A., Wheat, C. W., Parsch, J., Grath, S., Veselinovic, M. S., Stamenkovic-Radak, M., Jelic, M., Buendía-Ruíz, A. J., Gómez-Julián, M. J., Espinosa-Jimenez, M. L., Gallardo-Jiménez, F. D., Patenkovic, A., Eric, K., Tanaskovic, M., Ullastres, A., Guio, L., Merenciano, M., Guirao-Rico, S., Horváth, V., Obbard, D. J., Pasyukova, E., Alatortsev, V. E., Vieira, C. P., Vieira, J., Torres, J. R., Kozeretska, I., Maistrenko, O. M., Montchamp-Moreau, C., Mukha, D. V., Machado, H. E., Lamb, K., Paulo, T., Yusuf, L., Barbadilla, A., Petrov, D., Schmidt, P., Gonzalez, J., Flatt, T., Bergland, A. O. 2021

Abstract

Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome datasets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate datasets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in > 20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This dataset, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental meta-data. A web-based genome browser and web portal provide easy access to the SNP dataset. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan dataset. Our resource will enable population geneticists to analyze spatio-temporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.

View details for DOI 10.1093/molbev/msab259

View details for PubMedID 34469576
Publisher Correction: Human-chimpanzee fused cells reveal cis-regulatory divergence underlying skeletal evolution. Nature genetics Gokhman, D. n., Agoglia, R. M., Kinnebrew, M. n., Gordon, W. n., Sun, D. n., Bajpai, V. K., Naqvi, S. n., Chen, C. n., Chan, A. n., Chen, C. n., Petrov, D. A., Ahituv, N. n., Zhang, H. n., Mishina, Y. n., Wysocka, J. n., Rohatgi, R. n., Fraser, H. B. 2021

View details for DOI 10.1038/s41588-021-00849-4

View details for PubMedID 33762754
Human-chimpanzee fused cells reveal cis-regulatory divergence underlying skeletal evolution. Nature genetics Gokhman, D. n., Agoglia, R. M., Kinnebrew, M. n., Gordon, W. n., Sun, D. n., Bajpai, V. K., Naqvi, S. n., Chen, C. n., Chan, A. n., Chen, C. n., Petrov, D. A., Ahituv, N. n., Zhang, H. n., Mishina, Y. n., Wysocka, J. n., Rohatgi, R. n., Fraser, H. B. 2021

Abstract

Gene regulatory divergence is thought to play a central role in determining human-specific traits. However, our ability to link divergent regulation to divergent phenotypes is limited. Here, we utilized human-chimpanzee hybrid induced pluripotent stem cells to study gene expression separating these species. The tetraploid hybrid cells allowed us to separate cis- from trans-regulatory effects, and to control for nongenetic confounding factors. We differentiated these cells into cranial neural crest cells, the primary cell type giving rise to the face. We discovered evidence of lineage-specific selection on the hedgehog signaling pathway, including a human-specific sixfold down-regulation of EVC2 (LIMBIN), a key hedgehog gene. Inducing a similar down-regulation of EVC2 substantially reduced hedgehog signaling output. Mice and humans lacking functional EVC2 show striking phenotypic parallels to human-chimpanzee craniofacial differences, suggesting that the regulatory divergence of hedgehog signaling may have contributed to the unique craniofacial morphology of humans.

View details for DOI 10.1038/s41588-021-00804-3

View details for PubMedID 33731941
Recent evolutionary history of tigers highlights contrasting roles of genetic drift and selection. Molecular biology and evolution Armstrong, E. E., Khan, A. n., Taylor, R. W., Gouy, A. n., Greenbaum, G. n., Thiéry, A. n., Kang, J. T., Redondo, S. A., Prost, S. n., Barsh, G. n., Kaelin, C. n., Phalke, S. n., Chugani, A. n., Gilbert, M. n., Miquelle, D. n., Zachariah, A. n., Borthakur, U. n., Reddy, A. n., Louis, E. n., Ryder, O. A., Jhala, Y. V., Petrov, D. n., Excoffier, L. n., Hadly, E. n., Ramakrishnan, U. n. 2021

Abstract

Species conservation can be improved by knowledge of evolutionary and genetic history. Tigers are among the most charismatic of endangered species and garner significant conservation attention. However, their evolutionary history and genomic variation remains poorly known, especially for Indian tigers. With 70% of the worlds wild tigers living in India, such knowledge is critical. We re-sequenced 65 individual tiger genomes representing most extant subspecies with a specific focus on tigers from India. As suggested by earlier studies, we found strong genetic differentiation between the putative tiger subspecies. Despite high total genomic diversity in India, individual tigers host longer runs of homozygosity, potentially suggesting recent inbreeding or founding events, possibly due to small and fragmented protected areas. We suggest the impacts of ongoing connectivity loss on inbreeding and persistence of Indian tigers be closely monitored. Surprisingly, demographic models suggest recent divergence (within the last 20,000 years) between subspecies, and strong population bottlenecks. Amur tiger genomes revealed the strongest signals of selection related to metabolic adaptation to cold, while Sumatran tigers show evidence of weak selection for genes involved in body size regulation. We recommend detailed investigation of local adaptation in Amur and Sumatran tigers prior to initiating genetic rescue.

View details for DOI 10.1093/molbev/msab032

View details for PubMedID 33592092
A functional taxonomy of tumor suppression in oncogenic KRAS-driven lung cancer. Cancer discovery Cai, H. n., Chew, S. K., Li, C. n., Tsai, M. K., Andrejka, L. n., Murray, C. W., Hughes, N. W., Shuldiner, E. G., Ashkin, E. L., Tang, R. n., Hung, K. L., Chen, L. C., Lee, S. Y., Yousefi, M. n., Lin, W. Y., Kunder, C. A., Cong, L. n., McFarland, C. D., Petrov, D. A., Swanton, C. n., Winslow, M. M. 2021

Abstract

Cancer genotyping has identified a large number of putative tumor suppressor genes. Carcinogenesis is a multi-step process, however the importance and specific roles of many of these genes during tumor initiation, growth and progression remain unknown. Here we use a multiplexed mouse model of oncogenic KRAS-driven lung cancer to quantify the impact of forty-eight known and putative tumor suppressor genes on diverse aspects of carcinogenesis at an unprecedented scale and resolution. We uncover many previously understudied functional tumor suppressors that constrain cancer in vivo. Inactivation of some genes substantially increased growth, while the inactivation of others increases tumor initiation and/or the emergence of exceptionally large tumors. These functional in vivo analyses revealed an unexpectedly complex landscape of tumor suppression that has implications for understanding cancer evolution, interpreting clinical cancer genome sequencing data, and directing approaches to limit tumor initiation and progression.

View details for DOI 10.1158/2159-8290.CD-20-1325

View details for PubMedID 33608386
Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nature neuroscience Zhu, X. n., Zhou, B. n., Pattni, R. n., Gleason, K. n., Tan, C. n., Kalinowski, A. n., Sloan, S. n., Fiston-Lavier, A. S., Mariani, J. n., Petrov, D. n., Barres, B. A., Duncan, L. n., Abyzov, A. n., Vogel, H. n., Moran, J. V., Vaccarino, F. M., Tamminga, C. A., Levinson, D. F., Urban, A. E. 2021

Abstract

Retrotransposons can cause somatic genome variation in the human nervous system, which is hypothesized to have relevance to brain development and neuropsychiatric disease. However, the detection of individual somatic mobile element insertions presents a difficult signal-to-noise problem. Using a machine-learning method (RetroSom) and deep whole-genome sequencing, we analyzed L1 and Alu retrotransposition in sorted neurons and glia from human brains. We characterized two brain-specific L1 insertions in neurons and glia from a donor with schizophrenia. There was anatomical distribution of the L1 insertions in neurons and glia across both hemispheres, indicating retrotransposition occurred during early embryogenesis. Both insertions were within the introns of genes (CNNM2 and FRMD4A) inside genomic loci associated with neuropsychiatric disorders. Proof-of-principle experiments revealed these L1 insertions significantly reduced gene expression. These results demonstrate that RetroSom has broad applications for studies of brain development and may provide insight into the possible pathological effects of somatic retrotransposition.

View details for DOI 10.1038/s41593-020-00767-4

View details for PubMedID 33432196
Fitness variation across subtle environmental perturbations reveals local modularity and global pleiotropy of adaptation ELIFE Kinsler, G., Geiler-Samerotte, K., Petrov, D. 2020; 9

View details for Web of Science ID 000615779900001
Ancient RNA virus epidemics through the lens of recent adaptation in human genomes. Philosophical transactions of the Royal Society of London. Series B, Biological sciences Enard, D., Petrov, D. A. 2020; 375 (1812): 20190575

Abstract

Over the course of the last several million years of evolution, humans probably have been plagued by hundreds or perhaps thousands of epidemics. Little is known about such ancient epidemics and a deep evolutionary perspective on current pathogenic threats is lacking. The study of past epidemics has typically been limited in temporal scope to recorded history, and in physical scope to pathogens that left sufficient DNA behind, such as Yersinia pestis during the Great Plague. Host genomes, however, offer an indirect way to detect ancient epidemics beyond the current temporal and physical limits. Arms races with pathogens have shaped the genomes of the hosts by driving a large number of adaptations at many genes, and these signals can be used to detect and further characterize ancient epidemics. Here, we detect the genomic footprints left by ancient viral epidemics that took place in the past approximately 50 000 years in the 26 human populations represented in the 1000 Genomes Project. By using the enrichment in signals of adaptation at approximately 4500 host loci that interact with specific types of viruses, we provide evidence that RNA viruses have driven a particularly large number of adaptive events across diverse human populations. These results suggest that different types of viruses may have exerted different selective pressures during human evolution. Knowledge of these past selective pressures will provide a deeper evolutionary perspective on current pathogenic threats. This article is part of the theme issue 'Insights into health and disease from ancient biomolecules'.

View details for DOI 10.1098/rstb.2019.0575

View details for PubMedID 33012231
Genetic Adaptation in New York City Rats. Genome biology and evolution Harpak, A., Garud, N., Rosenberg, N. A., Petrov, D. A., Combs, M., Pennings, P. S., Munshi-South, J. 2020

Abstract

Brown rats (Rattus norvegicus) thrive in urban environments by navigating the anthropocentric environment and taking advantage of human resources and by-products. From the human perspective, rats are a chronic problem that causes billions of dollars in damage to agriculture, health and infrastructure. Did genetic adaptation play a role in the spread of rats in cities? To approach this question, we collected whole-genome sequences from 29 brown rats from New York City (NYC) and scanned for genetic signatures of adaptation. We tested for (i) high-frequency, extended haplotypes that could indicate selective sweeps and (ii) loci of extreme genetic differentiation between the NYC sample and a sample from the presumed ancestral range of brown rats in northeast China. We found candidate selective sweeps near or inside genes associated with metabolism, diet, the nervous system and locomotory behavior. Patterns of differentiation between NYC and Chinese rats at putative sweep loci suggest that many sweeps began after the split from the ancestral population. Together, our results suggest several hypotheses on adaptation in rats living in close proximity to humans.

View details for DOI 10.1093/gbe/evaa247

View details for PubMedID 33211096
Genetic determinants of EGFR-driven lung cancer growth and therapeutic response in vivo Foggetti, G., Li, C., Cai, H., Lin, W., Ayeni, D., Hastings, K., Andrejka, L., Maghini, D., Homer, R., Petrov, D. A., Winslow, M. M., Politi, K. AMER ASSOC CANCER RESEARCH. 2020

View details for DOI 10.1158/1538-7445.AM2020-1094

View details for Web of Science ID 000590059303367
Multiplexed functional cancer genomics. Cai, H., Li, C., Chew, S., Yousefi, M., Foggetti, G., Lin, W., Rogers, Z. N., Winters, I. P., McFarland, C. D., Politi, K., Swanton, C., Petrov, D. A., Winslow, M. M. AMER ASSOC CANCER RESEARCH. 2020: 23

View details for Web of Science ID 000537844900022
Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila melanogaster. Genetics Machado, H. E., Lawrie, D. S., Petrov, D. A. 2020; 214 (2): 511-528

Abstract

Codon usage bias (CUB), where certain codons are used more frequently than expected by chance, is a ubiquitous phenomenon and occurs across the tree of life. The dominant paradigm is that the proportion of preferred codons is set by weak selection. While experimental changes in codon usage have at times shown large phenotypic effects in contrast to this paradigm, genome-wide population genetic estimates have supported the weak selection model. Here we use deep genomic population sequencing of two Drosophila melanogaster populations to measure selection on synonymous sites in a way that allowed us to estimate the prevalence of both weak and strong purifying selection. We find that selection in favor of preferred codons ranges from weak (|Nes| ∼ 1) to strong (|Nes| > 10), with strong selection acting on 10-20% of synonymous sites in preferred codons. While previous studies indicated that selection at synonymous sites could be strong, this is the first study to detect and quantify strong selection specifically at the level of CUB. Further, we find that CUB-associated polymorphism accounts for the majority of strong selection on synonymous sites, with secondary contributions of splicing (selection on alternatively spliced genes, splice junctions, and spliceosome-bound sites) and transcription factor binding. Our findings support a new model of CUB and indicate that the functional importance of CUB, as well as synonymous sites in general, have been underestimated.

View details for DOI 10.1534/genetics.119.302542

View details for PubMedID 33954361
Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data. BMC biology Armstrong, E. E., Taylor, R. W., Miller, D. E., Kaelin, C. B., Barsh, G. S., Hadly, E. A., Petrov, D. 2020; 18 (1): 3

Abstract

BACKGROUND: The lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly from a captive African lion from the Exotic Feline Rescue Center (Center Point, IN) as a resource for current and subsequent genetic work of the sole social species of the Panthera clade.RESULTS: Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length of runs of homozygosity across lion genomes, indicating contrasting histories of recent and possibly intense inbreeding and bottleneck events. Demographic analyses reveal similar ancient histories across all individuals during the Pleistocene except the Asiatic lion, which shows a more rapid decline in population size. We show a substantial influence on the reference genome choice in the inference of demographic history and heterozygosity.CONCLUSIONS: We demonstrate that the choice of reference genome is important when comparing heterozygosity estimates across species and those inferred from different references should not be compared to each other. In addition, estimates of heterozygosity or the amount or length of runs of homozygosity should not be taken as reflective of a species, as these can differ substantially among individuals. This high-quality genome will greatly aid in the continuing research and conservation efforts for the lion, which is rapidly moving towards becoming a species in danger of extinction.

View details for DOI 10.1186/s12915-019-0734-5

View details for PubMedID 31915011
Fitness variation across subtle environmental perturbations reveals local modularity and global pleiotropy of adaptation. eLife Kinsler, G. n., Geiler-Samerotte, K. n., Petrov, D. A. 2020; 9

Abstract

Building a genotype-phenotype-fitness map of adaptation is a central goal in evolutionary biology. It is difficult even when adaptive mutations are known because it is hard to enumerate which phenotypes make these mutations adaptive. We address this problem by first quantifying how the fitness of hundreds of adaptive yeast mutants responds to subtle environmental shifts. We then model the number of phenotypes these mutations collectively influence by decomposing these patterns of fitness variation. We find that a small number of inferred phenotypes can predict fitness of the adaptive mutations near their original glucose-limited evolution condition. Importantly, inferred phenotypes that matter little to fitness at or near the evolution condition can matter strongly in distant environments. This suggests that adaptive mutations are locally modular-affecting a small number of phenotypes that matter to fitness in the environment where they evolved-yet globally pleiotropic-affecting additional phenotypes that may reduce or improve fitness in new environments.

View details for DOI 10.7554/eLife.61271

View details for PubMedID 33263280
Accurate Allele Frequencies from Ultra-low Coverage Pool-Seq Samples in Evolve-and-Resequence Experiments. G3 (Bethesda, Md.) Tilk, S., Bergland, A., Goodman, A., Schmidt, P., Petrov, D., Greenblum, S. 2019

Abstract

Evolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.

View details for DOI 10.1534/g3.119.400755

View details for PubMedID 31636085
Single nucleotide mapping of trait space reveals Pareto fronts that constrain adaptation. Nature ecology & evolution Li, Y., Petrov, D. A., Sherlock, G. 2019

Abstract

Trade-offs constrain the improvement of performance of multiple traits simultaneously. Such trade-offs define Pareto fronts, which represent a set of optimal individuals that cannot be improved in any one trait without reducing performance in another. Surprisingly, experimental evolution often yields genotypes with improved performance in all measured traits, perhaps indicating an absence of trade-offs at least in the short term. Here we densely sample adaptive mutations in Saccharomyces cerevisiae to ask whether first-step adaptive mutations result in trade-offs during the growth cycle. We isolated thousands of adaptive clones evolved under carefully chosen conditions and quantified their performances in each part of the growth cycle. We too find that some first-step adaptive mutations can improve all traits to a modest extent. However, our dense sampling allowed us to identify trade-offs and establish the existence of Pareto fronts between fermentation and respiration, and between respiration and stationary phases. Moreover, we establish that no single mutation in the ancestral genome can circumvent the detected trade-offs. Finally, we sequenced hundreds of these adaptive clones, revealing new targets of adaptation and defining the genetic basis of the identified trade-offs.

View details for DOI 10.1038/s41559-019-0993-0

View details for PubMedID 31611676
Microbiome composition shapes rapid genomic adaptation of Drosophila melanogaster. Proceedings of the National Academy of Sciences of the United States of America Rudman, S. M., Greenblum, S., Hughes, R. C., Rajpurohit, S., Kiratli, O., Lowder, D. B., Lemmon, S. G., Petrov, D. A., Chaston, J. M., Schmidt, P. 2019

Abstract

Population genomic data has revealed patterns of genetic variation associated with adaptation in many taxa. Yet understanding the adaptive process that drives such patterns is challenging; it requires disentangling the ecological agents of selection, determining the relevant timescales over which evolution occurs, and elucidating the genetic architecture of adaptation. Doing so for the adaptation of hosts to their microbiome is of particular interest with growing recognition of the importance and complexity of host-microbe interactions. Here, we track the pace and genomic architecture of adaptation to an experimental microbiome manipulation in replicate populations of Drosophila melanogaster in field mesocosms. Shifts in microbiome composition altered population dynamics and led to divergence between treatments in allele frequencies, with regions showing strong divergence found on all chromosomes. Moreover, at divergent loci previously associated with adaptation across natural populations, we found that the more common allele in fly populations experimentally enriched for a certain microbial group was also more common in natural populations with high relative abundance of that microbial group. These results suggest that microbiomes may be an agent of selection that shapes the pattern and process of adaptation and, more broadly, that variation in a single ecological factor within a complex environment can drive rapid, polygenic adaptation over short timescales.

View details for DOI 10.1073/pnas.1907787116

View details for PubMedID 31527278
Evolutionary Dynamics in Structured Populations Under Strong Population Genetic Forces. G3 (Bethesda, Md.) Feder, A. F., Pennings, P. S., Hermisson, J., Petrov, D. A. 2019

Abstract

In the long-term neutral equilibrium, high rates of migration between subpopulations result in little population differentiation . However, in the short-term, even very abundant migration may not be enough for subpopulations to equilibrate immediately. In this study, we investigate dynamical patterns of short-term population differentiation in adapting populations via stochastic and analytical modeling through time. We characterize a regime in which selection and migration interact to create non-monotonic patterns of population differentiation over time when migration is weaker than selection, but stronger than drift. We demonstrate how these patterns can be leveraged to estimate high migration rates using approximate Bayesian computation. We apply this approach to estimate fast migration in a rapidly adapting intra-host Simian-HIV population sampled from different anatomical locations. We find differences in estimated migration rates between different compartments, even though all are above N e m = 1. This work demonstrates how studying demographic processes on the timescale of selective sweeps illuminates processes too fast to leave signatures on neutral timescales.

View details for DOI 10.1534/g3.119.400605

View details for PubMedID 31462443
Exploiting selection at linked sites to infer the rate and strength of adaptation NATURE ECOLOGY & EVOLUTION Uricchio, L. H., Petrov, D. A., Enard, D. 2019; 3 (6): 977–84

View details for DOI 10.1038/s41559-019-0890-6

View details for Web of Science ID 000470917200022
Empowering conservation practice with efficient and economical genotyping from poor quality samples. Methods in ecology and evolution Natesh, M., Taylor, R. W., Truelove, N. K., Hadly, E. A., Palumbi, S. R., Petrov, D. A., Ramakrishnan, U. 2019; 10 (6): 853-859

Abstract

Moderate- to high-density genotyping (100 + SNPs) is widely used to determine and measure individual identity, relatedness, fitness, population structure and migration in wild populations.However, these important tools are difficult to apply when high-quality genetic material is unavailable. Most genomic tools are developed for high-quality DNA sources from laboratory or medical settings. As a result, most genetic data from market or field settings is limited to easily amplified mitochondrial DNA or a few microsatellites.To enable genotyping in conservation contexts, we used next-generation sequencing of multiplex PCR products from very low-quality DNA extracted from faeces, hair and cooked samples. We demonstrated utility and wide-ranging potential application in endangered wild tigers and tracking commercial trade in Caribbean queen conch.We genotyped 100 SNPs from degraded tiger samples to identify individuals, discern close relatives and detect population differentiation. Co-occurring carnivores do not amplify (e.g. Indian wild dog/dhole) or are monomorphic (e.g. leopard). Sixty-two SNPs from conch fritters and field-collected samples were used to test relatedness and detect population structure.We provide proof of concept for a rapid, simple, cost-effective and scalable method (for both samples and number of loci), a framework that can be applied to other conservation scenarios previously limited by low-quality DNA samples. These approaches provide a critical advance for wildlife monitoring and forensics, open the door to field-ready testing, and will strengthen the use of science in policy decisions and wildlife trade.

View details for DOI 10.1111/2041-210X.13173

View details for PubMedID 31511786

View details for PubMedCentralID PMC6738957
Empowering conservation practice with efficient and economical genotyping from poor quality samples METHODS IN ECOLOGY AND EVOLUTION Natesh, M., Taylor, R. W., Truelove, N. K., Hadly, E. A., Palumbi, S. R., Petrov, D. A., Ramakrishnan, U. 2019; 10 (6): 853–59

View details for DOI 10.1111/2041-210X.13173

View details for Web of Science ID 000470017200011
Exploiting selection at linked sites to infer the rate and strength of adaptation. Nature ecology & evolution Uricchio, L. H., Petrov, D. A., Enard, D. 2019

Abstract

Genomic data encode past evolutionary events and have the potential to reveal the strength, rate and biological drivers of adaptation. However, joint estimation of adaptation rate (alpha) and adaptation strength remains challenging because evolutionary processes such as demography, linkage and non-neutral polymorphism can confound inference. Here, we exploit the influence of background selection to reduce the fixation rate of weakly beneficial alleles to jointly infer the strength and rate of adaptation. We develop a McDonald-Kreitman-based method to infer adaptation rate and strength, and estimate alpha=0.135 in human protein-coding sequences, 72% of which is contributed by weakly adaptive variants. We show that, in this adaptation regime, alpha is reduced ~25% by linkage genome-wide. Moreover, we show that virus-interacting proteins undergo adaptation that is both stronger and nearly twice as frequent as the genome average (alpha=0.224, 56% due to strongly beneficial alleles). Our results suggest that, while most adaptation in human proteins is weakly beneficial, adaptation to viruses is often strongly beneficial. Our method provides a robust framework for estimation of adaptation rate and strength across species.

View details for PubMedID 31061475
Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads GIGASCIENCE Armstrong, E. E., Taylor, R. W., Prost, S., Blinston, P., van der Meer, E., Madzikanda, H., Mufute, O., Mandisodza-Chikerema, R., Stuelpnagel, J., Sillero-Zubiri, C., Petrov, D. 2019; 8 (2)

View details for DOI 10.1093/gigascience/giy124

View details for Web of Science ID 000462551600001
Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila. PLoS genetics Rech, G. E., Bogaerts-Marquez, M., Barron, M. G., Merenciano, M., Villanueva-Canas, J. L., Horvath, V., Fiston-Lavier, A., Luyten, I., Venkataram, S., Quesneville, H., Petrov, D. A., Gonzalez, J. 2019; 15 (2): e1007900

Abstract

Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster, in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well.

View details for PubMedID 30753202
Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila PLOS GENETICS Rech, G. E., Bogaerts-Marquez, M., Barron, M. G., Merenciano, M., Luis Villanueva-Canas, J., Horvath, V., Fiston-Lavier, A., Luyten, I., Venkataram, S., Quesneville, H., Petrov, D. A., Gonzalez, J. 2019; 15 (2)

View details for DOI 10.1371/journal.pgen.1007900

View details for Web of Science ID 000459970100013
Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila melanogaster. Genetics Machado, H. E., Lawrie, D. S., Petrov, D. A. 2019

Abstract

Codon usage bias (CUB), where certain codons are used more frequently than expected by chance, is a ubiquitous phenomenon and occurs across the tree of life. The dominant paradigm is that the proportion of preferred codons is set by weak selection. While experimental changes in codon usage have at times shown large phenotypic effects in contrast to this paradigm, genome-wide population genetic estimates have supported the weak selection model. Here we use deep genomic population sequencing of two Drosophila melanogaster populations to measure selection on synonymous sites in a way that allowed us to estimate the prevalence of both weak and strong purifying selection. We find that selection in favor of preferred codons ranges from weak (|Nes| ∼ 1) to strong (|Nes| > 10), with strong selection acting on 10-20% of synonymous sites in preferred codons. While previous studies indicated that selection at synonymous sites could be strong, this is the first study to detect and quantify strong selection specifically at the level of CUB. Further, we find that CUB-associated polymorphism accounts for the majority of strong selection on synonymous sites, with secondary contributions of splicing (selection on alternatively spliced genes, splice junctions and spliceosome-bound sites) and transcription factor binding. Our findings support a new model of CUB and indicate that the functional importance of CUB, as well as synonymous sites in general, have been underestimated.

View details for DOI 10.1534/genetics.119.302542

View details for PubMedID 31871131
MACHINE LEARNING ANALYSIS OF ULTRA-DEEP WHOLE-GENOME SEQUENCING IN HUMAN BRAIN REVEALS SOMATIC GENOMIC RETROTRANSPOSITION IN GLIA AS WELL AS IN NEURONS Urban, A., Zhu, X., Zhou, B., Sloan, S., Pattni, R., Fiston-Lavier, A., Snyder, M., Petrov, D., Abyzov, A., Vaccarino, F., Barres, B., Vogel, H., Tamminga, C., Levinson, D. ELSEVIER. 2019: 1240

View details for DOI 10.1016/j.euroneuro.2018.08.316

View details for Web of Science ID 000477708400398
Tissue-Specific cis-Regulatory Divergence Implicates eloF in Inhibiting Interspecies Mating in Drosophila CURRENT BIOLOGY Combs, P. A., Krupp, J. J., Khosla, N. M., Bua, D., Petrov, D. A., Levine, J. D., Fraser, H. B. 2018; 28 (24): 3969-+

View details for DOI 10.1016/j.cub.2018.10.036

View details for Web of Science ID 000453543800025
Tissue-Specific cis-Regulatory Divergence Implicates eloF in Inhibiting Interspecies Mating in Drosophila. Current biology : CB Combs, P. A., Krupp, J. J., Khosla, N. M., Bua, D., Petrov, D. A., Levine, J. D., Fraser, H. B. 2018

Abstract

Reproductive isolation is a key component of speciation. In many insects, a major driver of this isolation is cuticular hydrocarbon pheromones, which help to identify potential intraspecific mates [1-3]. When the distributions of related species overlap, there may be strong selection on mate choice for intraspecific partners [4-9] because interspecific hybridization carries significant fitness costs [10]. Drosophila hasbeen a key model for the investigation of reproductive isolation; although both male and female mate choices have been extensively investigated [6,11-16], the genes underlying species recognition remain largely unknown. To explore the molecular mechanisms underlying Drosophila speciation, we measured tissue-specific cis-regulatory divergence using RNA sequencing (RNA-seq) in D.simulans * D.sechellia hybrids. By focusing on cis-regulatory changes specific to female oenocytes, the tissue that produces cuticular hydrocarbons, we rapidly identified a small number of candidate genes. We found that one of these, the fatty acid elongase eloF, broadly affects the hydrocarbons present on D.sechellia and D.melanogaster females, as well asthe propensity of D.simulans males to mate withthem. Therefore, cis-regulatory changes in eloF may be a major driver in the sexual isolation of D.simulans from multiple other species. Our RNA-seq approach proved to be far more efficient than quantitative trait locus (QTL) mapping in identifying candidate genes; the same framework can be used to pinpoint candidate drivers of cis-regulatory divergence in traits differing between any interfertile species.

View details for PubMedID 30503619
Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads. GigaScience Armstrong, E. E., Taylor, R. W., Prost, S., Blinston, P., van der Meer, E., Madzikanda, H., Mufute, O., Mandisodza-Chikerema, R., Stuelpnagel, J., Sillero-Zubiri, C., Petrov, D. 2018

Abstract

Background: A high-quality reference genome assembly is a valuable tool for the study of non-model organisms. Genomic techniques can provide important insights about past population sizes, local adaptation, and aid in the development of breeding management plans. This information is important for fields like conservation genetics, where endangered species require critical and immediate attention. However, funding for genomic-based methods can be sparse for conservation projects, as costs for general species management can consume budgets.Findings: Here we report the generation of high-quality reference genomes for the African wild dog (Lycaon pictus) at a low cost (< $3000), thereby facilitating future studies of this endangered canid. We generated assemblies for three individuals using the linked-read 10x Genomics Chromium system. The most continuous assembly had a scaffold and contig N50 of 21 Mb and 83 Kb, respectively, and completely reconstructed 95% of a set of conserved mammalian genes. Additionally, we estimate the heterozygosity and demographic history of African wild dogs, revealing that although they have historically low effective population sizes, heterozygosity remains high.Conclusions: We show that 10x Genomics Chromium data can be used to effectively generate high-quality genomes from Illumina short-read data of intermediate coverage (25-50x). Interestingly, the wild dog shows higher heterozygosity than other species of conservation concern, possibly due to its behavioral ecology. The availability of reference genomes for non-model organisms will facilitate better genetic monitoring of threatened species such as the African wild dog and help conservationists to better understand the ecology and adaptability of those species in a changing environment.

View details for PubMedID 30346553
Evidence that RNA Viruses Drove Adaptive Introgression between Neanderthals and Modem Humans CELL Enard, D., Petrov, D. A. 2018; 175 (2): 360-+

Abstract

Neanderthals and modern humans interbred at least twice in the past 100,000 years. While there is evidence that most introgressed DNA segments from Neanderthals to modern humans were removed by purifying selection, less is known about the adaptive nature of introgressed sequences that were retained. We hypothesized that interbreeding between Neanderthals and modern humans led to (1) the exposure of each species to novel viruses and (2) the exchange of adaptive alleles that provided resistance against these viruses. Here, we find that long, frequent-and more likely adaptive-segments of Neanderthal ancestry in modern humans are enriched for proteins that interact with viruses (VIPs). We found that VIPs that interact specifically with RNA viruses were more likely to belong to introgressed segments in modern Europeans. Our results show that retained segments of Neanderthal ancestry can be used to detect ancient epidemics.

View details for PubMedID 30290142

View details for PubMedCentralID PMC6176737
Spatiotemporal dynamics and genome-wide association genome-wide association analysis of desiccation tolerance in Drosophila melanogaster MOLECULAR ECOLOGY Rajpurohit, S., Gefen, E., Bergland, A. O., Petrov, D. A., Gibbs, A. G., Schmidt, P. S. 2018; 27 (17): 3525–40

Abstract

Water availability is a major environmental challenge to a variety of terrestrial organisms. In insects, desiccation tolerance varies predictably over spatial and temporal scales and is an important physiological determinant of fitness in natural populations. Here, we examine the dynamics of desiccation tolerance in North American populations of Drosophila melanogaster using: (a) natural populations sampled across latitudes and seasons; (b) experimental evolution in field mesocosms over seasonal time; (c) genome-wide associations to identify SNPs/genes associated with variation for desiccation tolerance; and (d) subsequent analysis of patterns of clinal/seasonal enrichment in existing pooled sequencing data of populations sampled in both North America and Australia. A cline in desiccation tolerance was observed, for which tolerance exhibited a positive association with latitude; tolerance also varied predictably with culture temperature, demonstrating a significant degree of thermal plasticity. Desiccation tolerance evolved rapidly in field mesocosms, although only males showed differences in desiccation tolerance between spring and autumn collections from natural populations. Water loss rates did not vary significantly among latitudinal or seasonal populations; however, changes in metabolic rates during prolonged exposure to dry conditions are consistent with increased tolerance in higher latitude populations. Genome-wide associations in a panel of inbred lines identified twenty-five SNPs in twenty-one loci associated with sex-averaged desiccation tolerance, but there is no robust signal of spatially varying selection on genes associated with desiccation tolerance. Together, our results suggest that desiccation tolerance is a complex and important fitness component that evolves rapidly and predictably in natural populations.

View details for PubMedID 30051644
Functional lung cancer genomics through in vivo genome editing Winters, I. P., Rogers, Z. N., McFarland, C. D., Lalgudi, P. V., Chiou, S., Kay, M. A., Petrov, D., Winslow, M. M. AMER ASSOC CANCER RESEARCH. 2018

View details for DOI 10.1158/1557-3265.AACRIASLC18-IA03

View details for Web of Science ID 000582244900004
Tripolar chromosome segregation drives the association between maternal genotype at variants spanning PLK4 and aneuploidy in human preimplantation embryos HUMAN MOLECULAR GENETICS McCoy, R. C., Newnham, L. J., Ottolini, C. S., Hoffmann, E. R., Chatzimeletiou, K., Cornejo, O. E., Zhan, Q., Zaninovic, N., Rosenwaks, Z., Petrov, D. A., Demko, Z. P., Sigurjonsson, S., Handyside, A. H. 2018; 27 (14): 2573–85

Abstract

Aneuploidy is prevalent in human embryos and is the leading cause of pregnancy loss. Many aneuploidies arise during oogenesis, increasing with maternal age. Superimposed on these meiotic aneuploidies are frequent errors occurring during early mitotic divisions, contributing to widespread chromosomal mosaicism. Here we reanalyzed a published dataset comprising preimplantation genetic testing for aneuploidy in 24 653 blastomere biopsies from day-3 cleavage-stage embryos, as well as 17 051 trophectoderm biopsies from day-5 blastocysts. We focused on complex abnormalities that affected multiple chromosomes simultaneously, seeking insights into their formation. In addition to well-described patterns such as triploidy and haploidy, we identified 4.7% of blastomeres possessing characteristic hypodiploid karyotypes. We inferred this signature to have arisen from tripolar chromosome segregation in normally fertilized diploid zygotes or their descendant diploid cells. This could occur via segregation on a tripolar mitotic spindle or by rapid sequential bipolar mitoses without an intervening S-phase. Both models are consistent with time-lapse data from an intersecting set of 77 cleavage-stage embryos, which were enriched for the tripolar signature among embryos exhibiting abnormal cleavage. The tripolar signature was strongly associated with common maternal genetic variants spanning the centrosomal regulator PLK4, driving the association we previously reported with overall mitotic errors. Our findings are consistent with the known capacity of PLK4 to induce tripolar mitosis or precocious M-phase upon dysregulation. Together, our data support tripolar chromosome segregation as a key mechanism generating complex aneuploidy in cleavage-stage embryos and implicate maternal genotype at a quantitative trait locus spanning PLK4 as a factor influencing its occurrence.

View details for PubMedID 29688390

View details for PubMedCentralID PMC6030883
Quantitative and multiplex analysis of the genomic determinants of tumorigenesis. Winters, I., Rogers, Z., McFarland, C., Petrov, D., Winslow, M. M. AMER ASSOC CANCER RESEARCH. 2018: 15–16

View details for Web of Science ID 000432307300004
Mapping the in vivo fitness landscape of lung adenocarcinoma tumor suppression in mice NATURE GENETICS Rogers, Z. N., McFarland, C. D., Winters, I. P., Seoane, J. A., Brady, J. J., Yoon, S., Curtis, C., Petrov, D. A., Winslow, M. M. 2018; 50 (4): 483-+

Abstract

The functional impact of most genomic alterations found in cancer, alone or in combination, remains largely unknown. Here we integrate tumor barcoding, CRISPR/Cas9-mediated genome editing and ultra-deep barcode sequencing to interrogate pairwise combinations of tumor suppressor alterations in autochthonous mouse models of human lung adenocarcinoma. We map the tumor suppressive effects of 31 common lung adenocarcinoma genotypes and identify a landscape of context dependence and differential effect strengths.

View details for PubMedID 29610476
Hidden Complexity of Yeast Adaptation under Simple Evolutionary Conditions CURRENT BIOLOGY Li, Y., Venkataram, S., Agarwala, A., Dunn, B., Petrov, D. A., Sherlock, G., Fisher, D. S. 2018; 28 (4): 515-+

Abstract

Few studies have "quantitatively" probed how adaptive mutations result in increased fitness. Even in simple microbial evolution experiments, with full knowledge of the underlying mutations and specific growth conditions, it is challenging to determine where within a growth-saturation cycle those fitness gains occur. A common implicit assumption is that most benefits derive from an increased exponential growth rate. Here, we instead show that, in batch serial transfer experiments, adaptive mutants' fitness gains can be dominated by benefits that are accrued in one growth cycle, but not realized until the next growth cycle. For thousands of evolved clones (most with only a single mutation), we systematically varied the lengths of fermentation, respiration, and stationary phases to assess how their fitness, as measured by barcode sequencing, depends on these phases of the growth-saturation-dilution cycles. These data revealed that, whereas all adaptive lineages gained similar and modest benefits from fermentation, most of the benefits for the highest fitness mutants came instead from the time spent in respiration. From monoculture and high-resolution pairwise fitness competition experiments for a dozen of these clones, we determined that the benefits "accrued" during respiration are only largely "realized" later as a shorter duration of lag phase in the following growth cycle. These results reveal hidden complexities of the adaptive process even under ostensibly simple evolutionary conditions, in which fitness gains can accrue during time spent in a growth phase with little cell division, and reveal that the memory of those gains can be realized in the subsequent growth cycle.

View details for PubMedID 29429618

View details for PubMedCentralID PMC5823527
Rapid seasonal evolution in innate immunity of wild Drosophila melanogaster PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES Behrman, E. L., Howick, V. M., Kapun, M., Staubach, F., Bergland, A. O., Petrov, D. A., Lazzaro, B. P., Schmidt, P. S. 2018; 285 (1870)

Abstract

Understanding the rate of evolutionary change and the genetic architecture that facilitates rapid adaptation is a current challenge in evolutionary biology. Comparative studies show that genes with immune function are among the most rapidly evolving genes across a range of taxa. Here, we use immune defence in natural populations of Drosophila melanogaster to understand the rate of evolution in natural populations and the genetics underlying rapid change. We probed the immune system using the natural pathogens Enterococcus faecalis and Providencia rettgeri to measure post-infection survival and bacterial load of wild D. melanogaster populations collected across seasonal time along a latitudinal transect along eastern North America (Massachusetts, Pennsylvania and Virginia). There are pronounced and repeatable changes in the immune response over the approximately 10 generations between spring and autumn collections, with a significant but less distinct difference observed among geographical locations. Genes with known immune function are not enriched among alleles that cycle with seasonal time, but the immune function of a subset of seasonally cycling alleles in immune genes was tested using reconstructed outbred populations. We find that flies containing seasonal alleles in Thioester-containing protein 3 (Tep3) have different functional responses to infection and that epistatic interactions among seasonal Tep3 and Drosomycin-like 6 (Dro6) alleles underlie the immune phenotypes observed in natural populations. This rapid, cyclic response to seasonal environmental pressure broadens our understanding of the complex ecological and genetic interactions determining the evolution of immune defence in natural populations.

View details for PubMedID 29321302

View details for PubMedCentralID PMC5784205
Seasonally fluctuating selection can maintain polymorphism at many loci via segregation lift PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wittmann, M. J., Bergland, A. O., Feldman, M. W., Schmidt, P. S., Petrov, D. A. 2017; 114 (46): E9932–E9941

Abstract

Most natural populations are affected by seasonal changes in temperature, rainfall, or resource availability. Seasonally fluctuating selection could potentially make a large contribution to maintaining genetic polymorphism in populations. However, previous theory suggests that the conditions for multilocus polymorphism are restrictive. Here, we explore a more general class of models with multilocus seasonally fluctuating selection in diploids. In these models, the multilocus genotype is mapped to fitness in two steps. The first mapping is additive across loci and accounts for the relative contributions of heterozygous and homozygous loci-that is, dominance. The second step uses a nonlinear fitness function to account for the strength of selection and epistasis. Using mathematical analysis and individual-based simulations, we show that stable polymorphism at many loci is possible if currently favored alleles are sufficiently dominant. This general mechanism, which we call "segregation lift," requires seasonal changes in dominance, a phenomenon that may arise naturally in situations with antagonistic pleiotropy and seasonal changes in the relative importance of traits for fitness. Segregation lift works best under diminishing-returns epistasis, is not affected by problems of genetic load, and is robust to differences in parameters across loci and seasons. Under segregation lift, loci can exhibit conspicuous seasonal allele-frequency fluctuations, but often fluctuations may be small and hard to detect. An important direction for future work is to formally test for segregation lift in empirical data and to quantify its contribution to maintaining genetic variation in natural populations.

View details for PubMedID 29087300
High rate of adaptation of mammalian proteins that interact with Plasmodium and related parasites PLOS GENETICS Ebel, E. R., Telis, N., Venkataram, S., Petrov, D. A., Enard, D. 2017; 13 (9): e1007023

Abstract

Plasmodium parasites, along with their Piroplasm relatives, have caused malaria-like illnesses in terrestrial mammals for millions of years. Several Plasmodium-protective alleles have recently evolved in human populations, but little is known about host adaptation to blood parasites over deeper evolutionary timescales. In this work, we analyze mammalian adaptation in ~500 Plasmodium- or Piroplasm- interacting proteins (PPIPs) manually curated from the scientific literature. We show that (i) PPIPs are enriched for both immune functions and pleiotropy with other pathogens, and (ii) the rate of adaptation across mammals is significantly elevated in PPIPs, compared to carefully matched control proteins. PPIPs with high pathogen pleiotropy show the strongest signatures of adaptation, but this pattern is fully explained by their immune enrichment. Several pieces of evidence suggest that blood parasites specifically have imposed selection on PPIPs. First, even non-immune PPIPs that lack interactions with other pathogens have adapted at twice the rate of matched controls. Second, PPIP adaptation is linked to high expression in the liver, a critical organ in the parasite life cycle. Finally, our detailed investigation of alpha-spectrin, a major red blood cell membrane protein, shows that domains with particularly high rates of adaptation are those known to interact specifically with P. falciparum. Overall, we show that host proteins that interact with Plasmodium and Piroplasm parasites have experienced elevated rates of adaptation across mammals, and provide evidence that some of this adaptation has likely been driven by blood parasites.

View details for PubMedID 28957326
A quantitative and multiplexed approach to uncover the fitness landscape of tumor suppression in vivo. Nature methods Rogers, Z. N., McFarland, C. D., Winters, I. P., Naranjo, S., Chuang, C., Petrov, D., Winslow, M. M. 2017

Abstract

Cancer growth is a multistage, stochastic evolutionary process. While cancer genome sequencing has been instrumental in identifying the genomic alterations that occur in human tumors, the consequences of these alterations on tumor growth remain largely unexplored. Conventional genetically engineered mouse models enable the study of tumor growth in vivo, but they are neither readily scalable nor sufficiently quantitative to unravel the magnitude and mode of action of many tumor-suppressor genes. Here, we present a method that integrates tumor barcoding with ultradeep barcode sequencing (Tuba-seq) to interrogate tumor-suppressor function in mouse models of human cancer. Tuba-seq uncovers genotype-dependent distributions of tumor sizes. By combining Tuba-seq with multiplexed CRISPR-Cas9-mediated genome editing, we quantified the effects of 11 tumor-suppressor pathways that are frequently altered in human lung adenocarcinoma. Tuba-seq enables the broad quantification of the function of tumor-suppressor genes with unprecedented resolution, parallelization, and precision.

View details for DOI 10.1038/nmeth.4297

View details for PubMedID 28530655
A spatio-temporal assessment of simian/human immunodeficiency virus (SHIV) evolution reveals a highly dynamic process within the host. PLoS pathogens Feder, A. F., Kline, C., Polacino, P., Cottrell, M., Kashuba, A. D., Keele, B. F., Hu, S., Petrov, D. A., Pennings, P. S., Ambrose, Z. 2017; 13 (5)

Abstract

The process by which drug-resistant HIV-1 arises and spreads spatially within an infected individual is poorly understood. Studies have found variable results relating how HIV-1 in the blood differs from virus sampled in tissues, offering conflicting findings about whether HIV-1 throughout the body is homogeneously distributed. However, most of these studies sample only two compartments and few have data from multiple time points. To directly measure how drug resistance spreads within a host and to assess how spatial structure impacts its emergence, we examined serial sequences from four macaques infected with RT-SHIVmne027, a simian immunodeficiency virus encoding HIV-1 reverse transcriptase (RT), and treated with RT inhibitors. Both viral DNA and RNA (vDNA and vRNA) were isolated from the blood (including plasma and peripheral blood mononuclear cells), lymph nodes, gut, and vagina at a median of four time points and RT was characterized via single-genome sequencing. The resulting sequences reveal a dynamic system in which vRNA rapidly acquires drug resistance concomitantly across compartments through multiple independent mutations. Fast migration results in the same viral genotypes present across compartments, but not so fast as to equilibrate their frequencies immediately. The blood and lymph nodes were found to be compartmentalized rarely, while both the blood and lymph node were more frequently different from mucosal tissues. This study suggests that even oft-sampled blood does not fully capture the viral dynamics in other parts of the body, especially the gut where vRNA turnover was faster than the plasma and vDNA retained fewer wild-type viruses than other sampled compartments. Our findings of transient compartmentalization across multiple tissues may help explain the varied results of previous compartmentalization studies in HIV-1.

View details for DOI 10.1371/journal.ppat.1006358

View details for PubMedID 28542550
Soft Selective Sweeps in Evolutionary Rescue. Genetics Wilson, B. A., Pennings, P. S., Petrov, D. A. 2017

Abstract

Evolutionary rescue occurs when a population that is declining in size because of an environmental change is rescued from extinction by genetic adaptation. Evolutionary rescue is an important phenomenon at the intersection of ecology and population genetics, and the study of evolutionary rescue is critical to understanding processes ranging from species conservation to the evolution of drug and pesticide resistance. While most population-genetic models of evolutionary rescue focus on estimating the probability of rescue, we focus on whether one or more adaptive lineages contribute to evolutionary rescue. We find that when evolutionary rescue is likely, it is often driven by soft selective sweeps where multiple adaptive mutations spread through the population simultaneously. We give full analytic results for the probability of evolutionary rescue and the probability that evolutionary rescue occurs via soft selective sweeps. We expect that these results will find utility in understanding the genetic signatures associated with various evolutionary rescue scenarios in large populations, such as the evolution of drug resistance in viral, bacterial, or eukaryotic pathogens.

View details for DOI 10.1534/genetics.116.191478

View details for PubMedID 28213477

View details for PubMedCentralID PMC5378114
Seeking Goldilocks During Evolution of Drug Resistance. PLoS biology Sherlock, G., Petrov, D. A. 2017; 15 (2)

Abstract

Speciation can occur when a population is split and the resulting subpopulations evolve independently, accumulating mutations over time that make them incompatible with one another. It is thought that such incompatible mutations, known as Bateson-Dobzhansky-Muller (BDM) incompatibilities, may arise when the two populations face different environments, which impose different selective pressures. However, a new study in PLOS Biology by Ono et al. finds that the first-step mutations selected in yeast populations evolving in parallel in the presence of the antifungal drug nystatin are frequently incompatible with one another. This incompatibility is environment dependent, such that the combination of two incompatible alleles can become advantageous under increasing drug concentrations. This suggests that the activity for the affected pathway must have an optimum level, the value of which varies according to the drug concentration. It is likely that many biological processes similarly have an optimum under a given environment and many single-step adaptive ways to reach it; thus, not only should BDM incompatibilities commonly arise during parallel evolution, they might be virtually inevitable, as the combination of two such steps is likely to overshoot the optimum.

View details for DOI 10.1371/journal.pbio.2001872

View details for PubMedID 28158184

View details for PubMedCentralID PMC5291373
Extremely Rare Polymorphisms in Saccharomyces cerevisiae Allow Inference of the Mutational Spectrum. PLoS genetics Zhu, Y. O., Sherlock, G., Petrov, D. A. 2017; 13 (1)

Abstract

The characterization of mutational spectra is usually carried out in one of three ways-by direct observation through mutation accumulation (MA) experiments, through parent-offspring sequencing, or by indirect inference from sequence data. Direct observations of spontaneous mutations with MA experiments are limited, given (i) the rarity of spontaneous mutations, (ii) applicability only to laboratory model species with short generation times, and (iii) the possibility that mutational spectra under lab conditions might be different from those observed in nature. Trio sequencing is an elegant solution, but it is not applicable in all organisms. Indirect inference, usually from divergence data, faces no such technical limitations, but rely upon critical assumptions regarding the strength of natural selection that are likely to be violated. Ideally, new mutational events would be directly observed before the biased filter of selection, and without the technical limitations common to lab experiments. One approach is to identify very young mutations from population sequencing data. Here we do so by leveraging two characteristics common to all new mutations-new mutations are necessarily rare in the population, and absent in the genomes of immediate relatives. From 132 clinical yeast strains, we were able to identify 1,425 putatively new mutations and show that they exhibit extremely low signatures of selection, as well as display a mutational spectrum that is similar to that identified by a large scale MA experiment. We verify that population sequencing data are a potential wealth of information for inferring mutational spectra, and should be considered for analysis where MA experiments are infeasible or especially tedious.

View details for DOI 10.1371/journal.pgen.1006455

View details for PubMedID 28046117

View details for PubMedCentralID PMC5207638
Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations. Genome research Assaf, Z. J., Tilk, S. n., Park, J. n., Siegal, M. L., Petrov, D. A. 2017; 27 (12): 1988–2000

Abstract

Mutations provide the raw material of evolution, and thus our ability to study evolution depends fundamentally on having precise measurements of mutational rates and patterns. We generate a data set for this purpose using (1) de novo mutations from mutation accumulation experiments and (2) extremely rare polymorphisms from natural populations. The first, mutation accumulation (MA) lines are the product of maintaining flies in tiny populations for many generations, therefore rendering natural selection ineffective and allowing new mutations to accrue in the genome. The second, rare genetic variation from natural populations allows the study of mutation because extremely rare polymorphisms are relatively unaffected by the filter of natural selection. We use both methods in Drosophila melanogaster, first generating our own novel data set of sequenced MA lines and performing a meta-analysis of all published MA mutations (∼2000 events) and then identifying a high quality set of ∼70,000 extremely rare (≤0.1%) polymorphisms that are fully validated with resequencing. We use these data sets to precisely measure mutational rates and patterns. Highlights of our results include: a high rate of multinucleotide mutation events at both short (∼5 bp) and long (∼1 kb) genomic distances, showing that mutation drives GC content lower in already GC-poor regions, and using our precise context-dependent mutation rates to predict long-term evolutionary patterns at synonymous sites. We also show that de novo mutations from independent MA experiments display similar patterns of single nucleotide mutation and well match the patterns of mutation found in natural populations.

View details for PubMedID 29079675
Multiplexed in vivo homology-directed repair and tumor barcoding enables parallel quantification of Kras variant oncogenicity. Nature communications Winters, I. P., Chiou, S. H., Paulk, N. K., McFarland, C. D., Lalgudi, P. V., Ma, R. K., Lisowski, L. n., Connolly, A. J., Petrov, D. A., Kay, M. A., Winslow, M. M. 2017; 8 (1): 2053

Abstract

Large-scale genomic analyses of human cancers have cataloged somatic point mutations thought to initiate tumor development and sustain cancer growth. However, determining the functional significance of specific alterations remains a major bottleneck in our understanding of the genetic determinants of cancer. Here, we present a platform that integrates multiplexed AAV/Cas9-mediated homology-directed repair (HDR) with DNA barcoding and high-throughput sequencing to simultaneously investigate multiple genomic alterations in de novo cancers in mice. Using this approach, we introduce a barcoded library of non-synonymous mutations into hotspot codons 12 and 13 of Kras in adult somatic cells to initiate tumors in the lung, pancreas, and muscle. High-throughput sequencing of barcoded KrasHDRalleles from bulk lung and pancreas reveals surprising diversity in Kras variant oncogenicity. Rapid, cost-effective, and quantitative approaches to simultaneously investigate the function of precise genomic alterations in vivo will help uncover novel biological and clinically actionable insights into carcinogenesis.

View details for PubMedID 29233960

View details for PubMedCentralID PMC5727199
Adaptive dynamics of cuticular hydrocarbons in Drosophila JOURNAL OF EVOLUTIONARY BIOLOGY Rajpurohit, S., Hanus, R., Vrkoslav, V., Behrman, E. L., Bergland, A. O., Petrov, D., Cvacka, J., Schmidt, P. S. 2017; 30 (1): 66-80

View details for DOI 10.1111/jeb.12988

View details for Web of Science ID 000394852200006
Adaptive dynamics of cuticular hydrocarbons in Drosophila. Journal of evolutionary biology Rajpurohit, S., Hanus, R., Vrkoslav, V., Behrman, E. L., Bergland, A. O., Petrov, D., Cvacka, J., Schmidt, P. S. 2016

Abstract

Cuticular hydrocarbons (CHCs) are hydrophobic compounds deposited on the arthropod cuticle that are of functional significance with respect to stress tolerance, social interactions and mating dynamics. We characterized CHC profiles in natural populations of Drosophila melanogaster at five levels: across a latitudinal transect in the eastern United States, as a function of developmental temperature during culture, across seasonal time in replicate years, and as a function of rapid evolution in experimental mesocosms in the field. Furthermore, we also characterized spatial and temporal changes in allele frequencies for SNPs in genes that are associated with the production and chemical profile of CHCs. Our data demonstrate a striking degree of parallelism for clinal and seasonal variation in CHCs in this taxon; CHC profiles also demonstrate significant plasticity in response to rearing temperature, and the observed patterns of plasticity parallel the spatiotemporal patterns observed in nature. We find that these congruent shifts in CHC profiles across time and space are also mirrored by predictable shifts in allele frequencies at SNPs associated with CHC chain length. Finally, we observed rapid and predictable evolution of CHC profiles in experimental mesocosms in the field. Together, these data strongly suggest that CHC profiles respond rapidly and adaptively to environmental parameters that covary with latitude and season, and that this response reflects the process of local adaptation in natural populations of D. melanogaster.

View details for DOI 10.1111/jeb.12988

View details for PubMedID 27718537

View details for PubMedCentralID PMC5214518
Development of a Comprehensive Genotype-to-Fitness Map of Adaptation-Driving Mutations in Yeast. Cell Venkataram, S., Dunn, B., Li, Y., Agarwala, A., Chang, J., Ebel, E. R., Geiler-Samerotte, K., Hérissant, L., Blundell, J. R., Levy, S. F., Fisher, D. S., Sherlock, G., Petrov, D. A. 2016; 166 (6): 1585-1596 e22

Abstract

Adaptive evolution plays a large role in generating the phenotypic diversity observed in nature, yet current methods are impractical for characterizing the molecular basis and fitness effects of large numbers of individual adaptive mutations. Here, we used a DNA barcoding approach to generate the genotype-to-fitness map for adaptation-driving mutations from a Saccharomyces cerevisiae population experimentally evolved by serial transfer under limiting glucose. We isolated and measured the fitness of thousands of independent adaptive clones and sequenced the genomes of hundreds of clones. We found only two major classes of adaptive mutations: self-diploidization and mutations in the nutrient-responsive Ras/PKA and TOR/Sch9 pathways. Our large sample size and precision of measurement allowed us to determine that there are significant differences in fitness between mutations in different genes, between different paralogs, and even between different classes of mutations within the same gene.

View details for DOI 10.1016/j.cell.2016.08.002

View details for PubMedID 27594428

View details for PubMedCentralID PMC5070919
An Intrinsically Disordered Region of the DNA Repair Protein Nbs1 Is a Species-Specific Barrier to Herpes Simplex Virus 1 in Primates. Cell host & microbe Lou, D. I., Kim, E. T., Meyerson, N. R., Pancholi, N. J., Mohni, K. N., Enard, D., Petrov, D. A., Weller, S. K., Weitzman, M. D., Sawyer, S. L. 2016; 20 (2): 178-188

Abstract

Humans occasionally transmit herpes simplex virus 1 (HSV-1) to captive primates, who reciprocally harbor alphaherpesviruses poised for zoonotic transmission to humans. To understand the basis for the species-specific restriction of HSV-1 in primates, we simulated what might happen during the cross-species transmission of HSV-1 and found that the DNA repair protein Nbs1 from only some primate species is able to promote HSV-1 infection. The Nbs1 homologs that promote HSV-1 infection also interact with the HSV-1 ICP0 protein. ICP0 interaction mapped to a region of structural disorder in the Nbs1 protein. Chimeras reversing patterns of disorder in Nbs1 reversed titers of HSV-1 produced in the cell. By extending this analysis to 1,237 virus-interacting mammalian proteins, we show that proteins that interact with viruses are highly enriched in disorder, suggesting that viruses commonly interact with host proteins through intrinsically disordered domains.

View details for DOI 10.1016/j.chom.2016.07.003

View details for PubMedID 27512903
Whole Genome Analysis of 132 Clinical Saccharomyces cerevisiae Strains Reveals Extensive Ploidy Variation G3-GENES GENOMES GENETICS Zhu, Y. O., Sherlock, G., Petrov, D. A. 2016; 6 (8): 2421-2434

Abstract

Budding yeast has undergone several independent transitions from commercial to clinical lifestyles. The frequency of such transitions suggests that clinical yeast strains are derived from environmentally available yeast populations, including commercial sources. However, despite their important role in adaptive evolution, the prevalence of polyploidy and aneuploidy has not been extensively analyzed in clinical strains. In this study, we have looked for patterns governing the transition to clinical invasion in the largest screen of clinical yeast isolates to date. In particular, we have focused on the hypothesis that ploidy changes have influenced adaptive processes. We sequenced 144 yeast strains, 132 of which are clinical isolates. We found pervasive large-scale genomic variation in both overall ploidy (34% of strains identified as 3n/4n) and individual chromosomal copy numbers (36% of strains identified as aneuploid). We also found evidence for the highly dynamic nature of yeast genomes, with 35 strains showing partial chromosomal copy number changes and eight strains showing multiple independent chromosomal events. Intriguingly, a lineage identified to be baker's/commercial derived with a unique damaging mutation in NDC80 was particularly prone to polyploidy, with 83% of its members being triploid or tetraploid. Polyploidy was in turn associated with a >2× increase in aneuploidy rates as compared to other lineages. This dataset provides a rich source of information on the genomics of clinical yeast strains and highlights the potential importance of large-scale genomic copy variation in yeast adaptation.

View details for DOI 10.1534/g3.116.029397/-/DC1

View details for Web of Science ID 000381282300017

View details for PubMedID 27317778

View details for PubMedCentralID PMC4978896
Heterozygote Advantage Is a Common Outcome of Adaptation in Saccharomyces cerevisiae GENETICS Sellis, D., Kvitek, D. J., Dunn, B., Sherlock, G., Petrov, D. A. 2016; 203 (3): 1401-?

Abstract

Adaptation in diploids is predicted to proceed via mutations that are at least partially dominant in fitness. Recently, we argued that many adaptive mutations might also be commonly overdominant in fitness. Natural (directional) selection acting on overdominant mutations should drive them into the population but then, instead of bringing them to fixation, should maintain them as balanced polymorphisms via heterozygote advantage. If true, this would make adaptive evolution in sexual diploids differ drastically from that of haploids. The validity of this prediction has not yet been tested experimentally. Here, we performed four replicate evolutionary experiments with diploid yeast populations (Saccharomyces cerevisiae) growing in glucose-limited continuous cultures. We sequenced 24 evolved clones and identified initial adaptive mutations in all four chemostats. The first adaptive mutations in all four chemostats were three copy number variations, all of which proved to be overdominant in fitness. The fact that fitness overdominant mutations were always the first step in independent adaptive walks supports the prediction that heterozygote advantage can arise as a common outcome of directional selection in diploids and demonstrates that overdominance of de novo adaptive mutations in diploids is not rare.

View details for DOI 10.1534/genetics.115.185165

View details for Web of Science ID 000379473600028

View details for PubMedID 27194750

View details for PubMedCentralID PMC4937471
Elevated Linkage Disequilibrium and Signatures of Soft Sweeps Are Common in Drosophila melanogaster GENETICS Garud, N. R., Petrov, D. A. 2016; 203 (2): 863-?

Abstract

The extent to which selection and demography impact patterns of genetic diversity in natural populations of Drosophila melanogaster is yet to be fully understood. We previously observed that linkage disequilibrium (LD) at scales of ∼10 kb in the Drosophila Genetic Reference Panel (DGRP), consisting of 145 inbred strains from Raleigh, North Carolina, measured both between pairs of sites and as haplotype homozygosity, is elevated above neutral demographic expectations. We also demonstrated that signatures of strong and recent soft sweeps are abundant. However, the extent to which these patterns are specific to this derived and admixed population is unknown. It is also unclear whether these patterns are a consequence of the extensive inbreeding performed to generate the DGRP data. Here we analyze LD statistics in a sample of >100 fully-sequenced strains from Zambia; an ancestral population to the Raleigh population that has experienced little to no admixture and was generated by sequencing haploid embryos rather than inbred strains. We find an elevation in long-range LD and haplotype homozygosity compared to neutral expectations in the Zambian sample, thus showing the elevation in LD is not specific to the DGRP data set. This elevation in LD and haplotype structure remains even after controlling for possible confounders including genomic inversions, admixture, population substructure, close relatedness of individual strains, and recombination rate variation. Furthermore, signatures of partial soft sweeps similar to those found in the DGRP as well as partial hard sweeps are common in Zambia. These results suggest that while the selective forces and sources of adaptive mutations may differ in Zambia and Raleigh, elevated long-range LD and signatures of soft sweeps are generic in D. melanogaster.

View details for DOI 10.1534/genetics.115.184002

View details for Web of Science ID 000377462800022

View details for PubMedID 27098909

View details for PubMedCentralID PMC4896199
Viruses are a dominant driver of protein adaptation in mammals ELIFE Enard, D., Cai, L., Gwennap, C., Petrov, D. A. 2016; 5

Abstract

Viruses interact with hundreds to thousands of proteins in mammals, yet adaptation against viruses has only been studied in a few proteins specialized in antiviral defense. Whether adaptation to viruses typically involves only specialized antiviral proteins or affects a broad array of virus-interacting proteins is unknown. Here, we analyze adaptation in ~1300 virus-interacting proteins manually curated from a set of 9900 proteins conserved in all sequenced mammalian genomes. We show that viruses (i) use the more evolutionarily constrained proteins within the cellular functions they interact with and that (ii) despite this high constraint, virus-interacting proteins account for a high proportion of all protein adaptation in humans and other mammals. Adaptation is elevated in virus-interacting proteins across all functional categories, including both immune and non-immune functions. We conservatively estimate that viruses have driven close to 30% of all adaptive amino acid changes in the part of the human proteome conserved within mammals. Our results suggest that viruses are one of the most dominant drivers of evolutionary change across mammalian and human proteomes.

View details for DOI 10.7554/eLife.12469

View details for Web of Science ID 000376921100001

View details for PubMedID 27187613

View details for PubMedCentralID PMC4869911
Effects of maternal age on euploidy rates in a large cohort of embryos analyzed with 24-chromosome single-nucleotide polymorphism-based preimplantation genetic screening FERTILITY AND STERILITY Demko, Z. P., Simon, A. L., McCoy, R. C., Petrov, D. A., Rabinowitz, M. 2016; 105 (5): 1307-1313

Abstract

To determine the effect of maternal age on the average number of euploid embryos retrieved during oocyte harvest as part of an in vitro fertilization (IVF) cycle, including the probability of retrieving at least one euploid embryo in a cohort (PrE).Retrospective study.Preimplantation genetic screening (PGS) laboratory.Women aged 18 to 48 years undergoing IVF treatment.Use of 24-chromosome single-nucleotide polymorphism (SNP)-based PGS of day-3 and day-5 embryo biopsies.Relationships between maternal age and the rate of embryos that tested as euploid (hereafter referred to as "euploid embryos"), the average number and proportion of euploid embryos per IVF cycle, and PrE.We analyzed 22,599 day-3 embryos and 15,112 day-5 embryos. In women aged 27 to 35 years, the median proportion of euploid embryos in each cycle remained constant at ∼35% in day-3 biopsies and ∼55% in day-5 biopsies, but it decreased rapidly after age 35. On average, women in their late 20s had four euploid embryos (day 3 or day 5) per cycle, but this number decreased linearly (R(2) ≥ 0.983) after 35 years of age. The effect of maternal age on PrE was similar, with a rapid exponential decline (R(2) = 0.986). Across all maternal ages, the euploid proportion and number of embryos per cycle were counterbalanced, so the number of euploid embryos per cycle was the same for day-3 and day-5 biopsies. This suggests that the loss of embryos from day 3 to day 5 was primarily due to aneuploidy.Our results confirm the known inverse relationship between advanced maternal age (>35 years) and embryo euploidy, demonstrating that equal numbers of euploid embryos are available at day 3 and day 5.

View details for DOI 10.1016/j.fertnstert.2016.01.025

View details for Web of Science ID 000375871200040

View details for PubMedID 26868992
Global Transcriptional Profiling of Diapause and Climatic Adaptation in Drosophila melanogaster. Molecular biology and evolution Zhao, X., Bergland, A. O., Behrman, E. L., Gregory, B. D., Petrov, D. A., Schmidt, P. S. 2016; 33 (3): 707-720

Abstract

Wild populations of the model organism Drosophila melanogaster experience highly heterogeneous environments over broad geographical ranges as well as over seasonal and annual timescales. Diapause is a primary adaptation to environmental heterogeneity, and in D. melanogaster the propensity to enter diapause varies predictably with latitude and season. Here we performed global transcriptomic profiling of naturally occurring variation in diapause expression elicited by short day photoperiod and moderately low temperature in two tissue types associated with neuroendocrine and endocrine signaling, heads, and ovaries. We show that diapause in D. melanogaster is an actively regulated phenotype at the transcriptional level, suggesting that diapause is not a simple physiological or reproductive quiescence. Differentially expressed genes and pathways are highly distinct in heads and ovaries, demonstrating that the diapause response is not uniform throughout the soma and suggesting that it may be comprised of functional modules associated with specific tissues. Genes downregulated in heads of diapausing flies are significantly enriched for clinally varying single nucleotide polymorphism (SNPs) and seasonally oscillating SNPs, consistent with the hypothesis that diapause is a driving phenotype of climatic adaptation. We also show that chromosome location-based coregulation of gene expression is present in the transcriptional regulation of diapause. Taken together, these results demonstrate that diapause is a complex phenotype actively regulated in multiple tissues, and support the hypothesis that natural variation in diapause propensity underlies adaptation to spatially and temporally varying selective pressures.

View details for DOI 10.1093/molbev/msv263

View details for PubMedID 26568616
Secondary contact and local adaptation contribute to genome-wide patterns of clinal variation in Drosophila melanogaster. Molecular ecology Bergland, A. O., Tobler, R., González, J., Schmidt, P., Petrov, D. 2016; 25 (5): 1157-1174

Abstract

Populations arrayed along broad latitudinal gradients often show patterns of clinal variation in phenotype and genotype. Such population differentiation can be generated and maintained by both historical demographic events and local adaptation. These evolutionary forces are not mutually exclusive and can in some cases produce nearly identical patterns of genetic differentiation among populations. Here, we investigate the evolutionary forces that generated and maintain clinal variation genome-wide among populations of Drosophila melanogaster sampled in North America and Australia. We contrast patterns of clinal variation in these continents with patterns of differentiation among ancestral European and African populations. Using established and novel methods we derive here, we show that recently derived North America and Australia populations were likely founded by both European and African lineages and that this hybridization event likely contributed to genome-wide patterns of parallel clinal variation between continents. The pervasive effects of admixture mean that differentiation at only several hundred loci can be attributed to the operation of spatially varying selection using an FST outlier approach. Our results provide novel insight into the well-studied system of clinal differentiation in D. melanogaster and provide a context for future studies seeking to identify loci contributing to local adaptation in a wide variety of organisms, including other invasive species as well as temperate endemics.

View details for DOI 10.1111/mec.13455

View details for PubMedID 26547394
Comparative population genomics of latitudinal variation in Drosophila simulans and Drosophila melanogaster. Molecular ecology Machado, H. E., Bergland, A. O., O'Brien, K. R., Behrman, E. L., Schmidt, P. S., Petrov, D. A. 2016; 25 (3): 723-740

Abstract

Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster.

View details for DOI 10.1111/mec.13446

View details for PubMedID 26523848
More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1. eLife Feder, A. F., Rhee, S., Holmes, S. P., Shafer, R. W., Petrov, D. A., Pennings, P. S. 2016; 5

Abstract

In the early days of HIV treatment, drug resistance occurred rapidly and predictably in all patients, but under modern treatments, resistance arises slowly, if at all. The probability of resistance should be controlled by the rate of generation of resistance mutations. If many adaptive mutations arise simultaneously, then adaptation proceeds by soft selective sweeps in which multiple adaptive mutations spread concomitantly, but if adaptive mutations occur rarely in the population, then a single adaptive mutation should spread alone in a hard selective sweep. Here, we use 6717 HIV-1 consensus sequences from patients treated with first-line therapies between 1989 and 2013 to confirm that the transition from fast to slow evolution of drug resistance was indeed accompanied with the expected transition from soft to hard selective sweeps. This suggests more generally that evolution proceeds via hard sweeps if resistance is unlikely and via soft sweeps if it is likely.

View details for DOI 10.7554/eLife.10670

View details for PubMedID 26882502

View details for PubMedCentralID PMC4764592
Evidence of Selection against Complex Mitotic-Origin Aneuploidy during Preimplantation Development PLOS GENETICS McCoy, R. C., Demko, Z. P., Ryan, A., Banjevic, M., Hill, M., Sigurjonsson, S., Rabinowitz, M., Petrov, D. A. 2015; 11 (10)

Abstract

Whole-chromosome imbalances affect over half of early human embryos and are the leading cause of pregnancy loss. While these errors frequently arise in oocyte meiosis, many such whole-chromosome abnormalities affecting cleavage-stage embryos are the result of chromosome missegregation occurring during the initial mitotic cell divisions. The first wave of zygotic genome activation at the 4-8 cell stage results in the arrest of a large proportion of embryos, the vast majority of which contain whole-chromosome abnormalities. Thus, the full spectrum of meiotic and mitotic errors can only be detected by sampling after the initial cell divisions, but prior to this selective filter. Here, we apply 24-chromosome preimplantation genetic screening (PGS) to 28,052 single-cell day-3 blastomere biopsies and 18,387 multi-cell day-5 trophectoderm biopsies from 6,366 in vitro fertilization (IVF) cycles. We precisely characterize the rates and patterns of whole-chromosome abnormalities at each developmental stage and distinguish errors of meiotic and mitotic origin without embryo disaggregation, based on informative chromosomal signatures. We show that mitotic errors frequently involve multiple chromosome losses that are not biased toward maternal or paternal homologs. This outcome is characteristic of spindle abnormalities and chaotic cell division detected in previous studies. In contrast to meiotic errors, our data also show that mitotic errors are not significantly associated with maternal age. PGS patients referred due to previous IVF failure had elevated rates of mitotic error, while patients referred due to recurrent pregnancy loss had elevated rates of meiotic error, controlling for maternal age. These results support the conclusion that mitotic error is the predominant mechanism contributing to pregnancy losses occurring prior to blastocyst formation. This high-resolution view of the full spectrum of whole-chromosome abnormalities affecting early embryos provides insight into the cytogenetic mechanisms underlying their formation and the consequences for human fertility.

View details for DOI 10.1371/journal.pgen.1005601

View details for Web of Science ID 000364401600065

View details for PubMedID 26491874

View details for PubMedCentralID PMC4619652
Investigation of the prevalence of antagonistic pleiotropy Herissant, L., Yuan, D., Jerison, E., Agarwala, A., Fisher, D., Desai, M., Petrov, D., Sherlock, G. WILEY-BLACKWELL. 2015: S263–S264

View details for Web of Science ID 000361466200464
Exploring the adaptive mutation spectrum in massively tagged populations of experimentally evolving yeast Dunn, B., Venkataram, S., Levy, S., Blundell, J., Herissant, L., Li, Y., Chang, J., Geiler-Samerotte, K., Agarwala, A., Fisher, D., Petrov, D., Sherlock, G. WILEY-BLACKWELL. 2015: S89

View details for Web of Science ID 000361466200119
Quantification of GC-biased gene conversion in the human genome GENOME RESEARCH Glemin, S., Arndt, P. F., Messer, P. W., Petrov, D., Galtier, N., Duret, L. 2015; 25 (8): 1215-1228

Abstract

Much evidence indicates that GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, a detailed quantification of the process is still lacking. The strength of gBGC can be measured from the analysis of derived allele frequency spectra (DAF), but this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors and by spatial heterogeneity in gBGC strength. We propose a new general method to quantify gBGC from DAF spectra, incorporating polarization errors, taking spatial heterogeneity into account, and jointly estimating mutation bias. Applying it to human polymorphism data from the 1000 Genomes Project, we show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. Genome-wide, the intensity of gBGC is in the nearly neutral area. However, given that recombination occurs primarily within recombination hotspots, 1%-2% of the human genome is subject to strong gBGC. On average, gBGC is stronger in African than in non-African populations, reflecting differences in effective population sizes. However, due to more heterogeneous recombination landscapes, the fraction of the genome affected by strong gBGC is larger in non-African than in African populations. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that, in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.

View details for DOI 10.1101/gr.185488.114

View details for Web of Science ID 000358957500013

View details for PubMedID 25995268

View details for PubMedCentralID PMC4510005
Imperfect drug penetration leads to spatial monotherapy and rapid evolution of multidrug resistance PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Moreno-Gamez, S., Hill, A. L., Rosenbloom, D. I., Petrov, D. A., Nowak, M. A., Pennings, P. S. 2015; 112 (22): E2874-E2883

Abstract

Infections with rapidly evolving pathogens are often treated using combinations of drugs with different mechanisms of action. One of the major goal of combination therapy is to reduce the risk of drug resistance emerging during a patient's treatment. Although this strategy generally has significant benefits over monotherapy, it may also select for multidrug-resistant strains, particularly during long-term treatment for chronic infections. Infections with these strains present an important clinical and public health problem. Complicating this issue, for many antimicrobial treatment regimes, individual drugs have imperfect penetration throughout the body, so there may be regions where only one drug reaches an effective concentration. Here we propose that mismatched drug coverage can greatly speed up the evolution of multidrug resistance by allowing mutations to accumulate in a stepwise fashion. We develop a mathematical model of within-host pathogen evolution under spatially heterogeneous drug coverage and demonstrate that even very small single-drug compartments lead to dramatically higher resistance risk. We find that it is often better to use drug combinations with matched penetration profiles, although there may be a trade-off between preventing eventual treatment failure due to resistance in this way and temporarily reducing pathogen levels systemically. Our results show that drugs with the most extensive distribution are likely to be the most vulnerable to resistance. We conclude that optimal combination treatments should be designed to prevent this spatial effective monotherapy. These results are widely applicable to diverse microbial infections including viruses, bacteria, and parasites.

View details for DOI 10.1073/pnas.1424184112

View details for Web of Science ID 000355832200008

View details for PubMedID 26038564

View details for PubMedCentralID PMC4460514
Obstruction of adaptation in diploids by recessive, strongly deleterious alleles. Proceedings of the National Academy of Sciences of the United States of America Assaf, Z. J., Petrov, D. A., Blundell, J. R. 2015; 112 (20): E2658-66

Abstract

Recessive deleterious mutations are common, causing many genetic disorders in humans and producing inbreeding depression in the majority of sexually reproducing diploids. The abundance of recessive deleterious mutations in natural populations suggests they are likely to be present on a chromosome when a new adaptive mutation occurs, yet the dynamics of recessive deleterious hitchhikers and their impact on adaptation remains poorly understood. Here we model how a recessive deleterious mutation impacts the fate of a genetically linked dominant beneficial mutation. The frequency trajectory of the adaptive mutation in this case is dramatically altered and results in what we have termed a "staggered sweep." It is named for its three-phased trajectory: (i) Initially, the two linked mutations have a selective advantage while rare and will increase in frequency together, then (ii), at higher frequencies, the recessive hitchhiker is exposed to selection and can cause a balanced state via heterozygote advantage (the staggered phase), and (iii) finally, if recombination unlinks the two mutations, then the beneficial mutation can complete the sweep to fixation. Using both analytics and simulations, we show that strongly deleterious recessive mutations can substantially decrease the probability of fixation for nearby beneficial mutations, thus creating zones in the genome where adaptation is suppressed. These mutations can also significantly prolong the number of generations a beneficial mutation takes to sweep to fixation, and cause the genomic signature of selection to resemble that of soft or partial sweeps. We show that recessive deleterious variation could impact adaptation in humans and Drosophila.

View details for DOI 10.1073/pnas.1424949112

View details for PubMedID 25941393

View details for PubMedCentralID PMC4443376
Common variants spanning PLK4 are associated with mitotic-origin aneuploidy in human embryos SCIENCE McCoy, R. C., Demko, Z., Ryan, A., Banjevic, M., Hill, M., Sigurjonsson, S., Rabinowitz, M., Fraser, H. B., Petrov, D. A. 2015; 348 (6231): 235-238

Abstract

Aneuploidy, the inheritance of an atypical chromosome complement, is common in early human development and is the primary cause of pregnancy loss. By screening day-3 embryos during in vitro fertilization cycles, we identified an association between aneuploidy of putative mitotic origin and linked genetic variants on chromosome 4 of maternal genomes. This associated region contains a candidate gene, Polo-like kinase 4 (PLK4), that plays a well-characterized role in centriole duplication and has the ability to alter mitotic fidelity upon minor dysregulation. Mothers with the high-risk genotypes contributed fewer embryos for testing at day 5, suggesting that their embryos are less likely to survive to blastocyst formation. The associated region coincides with a signature of a selective sweep in ancient humans, suggesting that the causal variant was either the target of selection or hitchhiked to substantial frequency.

View details for DOI 10.1126/science.aaa3337

View details for Web of Science ID 000352613700046

View details for PubMedID 25859044
Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature Levy, S. F., Blundell, J. R., Venkataram, S., Petrov, D. A., Fisher, D. S., Sherlock, G. 2015; 519 (7542): 181-186

Abstract

Evolution of large asexual cell populations underlies ∼30% of deaths worldwide, including those caused by bacteria, fungi, parasites, and cancer. However, the dynamics underlying these evolutionary processes remain poorly understood because they involve many competing beneficial lineages, most of which never rise above extremely low frequencies in the population. To observe these normally hidden evolutionary dynamics, we constructed a sequencing-based ultra high-resolution lineage tracking system in Saccharomyces cerevisiae that allowed us to monitor the relative frequencies of ∼500,000 lineages simultaneously. In contrast to some expectations, we found that the spectrum of fitness effects of beneficial mutations is neither exponential nor monotonic. Early adaptation is a predictable consequence of this spectrum and is strikingly reproducible, but the initial small-effect mutations are soon outcompeted by rarer large-effect mutations that result in variability between replicates. These results suggest that early evolutionary dynamics may be deterministic for a period of time before stochastic effects become important.

View details for DOI 10.1038/nature14279

View details for PubMedID 25731169
T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic acids research Fiston-Lavier, A., Barrón, M. G., Petrov, D. A., González, J. 2015; 43 (4)

Abstract

Transposable elements (TEs) constitute the most active, diverse and ancient component in a broad range of genomes. Complete understanding of genome function and evolution cannot be achieved without a thorough understanding of TE impact and biology. However, in-depth analysis of TEs still represents a challenge due to the repetitive nature of these genomic entities. In this work, we present a broadly applicable and flexible tool: T-lex2. T-lex2 is the only available software that allows routine, automatic and accurate genotyping of individual TE insertions and estimation of their population frequencies both using individual strain and pooled next-generation sequencing data. Furthermore, T-lex2 also assesses the quality of the calls allowing the identification of miss-annotated TEs and providing the necessary information to re-annotate them. The flexible and customizable design of T-lex2 allows running it in any genome and for any type of TE insertion. Here, we tested the fidelity of T-lex2 using the fly and human genomes. Overall, T-lex2 represents a significant improvement in our ability to analyze the contribution of TEs to genome function and evolution as well as learning about the biology of TEs. T-lex2 is freely available online at http://sourceforge.net/projects/tlex.

View details for DOI 10.1093/nar/gku1250

View details for PubMedID 25510498

View details for PubMedCentralID PMC4344482
Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS genetics Garud, N. R., Messer, P. W., Buzbas, E. O., Petrov, D. A. 2015; 11 (2)

Abstract

Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.

View details for DOI 10.1371/journal.pgen.1005004

View details for PubMedID 25706129

View details for PubMedCentralID PMC4338236
Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila PLOS GENETICS Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S., Petrov, D. A. 2014; 10 (11)

Abstract

In many species, genomic data have revealed pervasive adaptive evolution indicated by the fixation of beneficial alleles. However, when selection pressures are highly variable along a species' range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called "balanced polymorphisms" have long been understood to be an important component of standing genetic variation, yet direct evidence of the strength of balancing selection and the stability and prevalence of balanced polymorphisms has remained elusive. We hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that these polymorphisms respond to acute and persistent changes in climate and are associated in predictable ways with seasonally variable phenotypes. In addition, our results suggest that adaptively oscillating polymorphisms are likely millions of years old, with some possibly predating the divergence between D. melanogaster and D. simulans. Taken together, our results are consistent with a model of balancing selection wherein rapid temporal fluctuations in climate over generational time promotes adaptive genetic diversity at loci underlying polygenic variation in fitness related phenotypes.

View details for DOI 10.1371/journal.pgen.1004775

View details for Web of Science ID 000345455200026

View details for PubMedCentralID PMC4222749
Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS genetics Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S., Petrov, D. A. 2014; 10 (11)

Abstract

In many species, genomic data have revealed pervasive adaptive evolution indicated by the fixation of beneficial alleles. However, when selection pressures are highly variable along a species' range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called "balanced polymorphisms" have long been understood to be an important component of standing genetic variation, yet direct evidence of the strength of balancing selection and the stability and prevalence of balanced polymorphisms has remained elusive. We hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that these polymorphisms respond to acute and persistent changes in climate and are associated in predictable ways with seasonally variable phenotypes. In addition, our results suggest that adaptively oscillating polymorphisms are likely millions of years old, with some possibly predating the divergence between D. melanogaster and D. simulans. Taken together, our results are consistent with a model of balancing selection wherein rapid temporal fluctuations in climate over generational time promotes adaptive genetic diversity at loci underlying polygenic variation in fitness related phenotypes.

View details for DOI 10.1371/journal.pgen.1004775

View details for PubMedID 25375361

View details for PubMedCentralID PMC4222749
Soft Selective Sweeps in Complex Demographic Scenarios GENETICS Wilson, B. A., Petrov, D. A., Messer, P. W. 2014; 198 (2): 669-684

Abstract

Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such "hardening" of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.

View details for DOI 10.1534/genetics.114.165571

View details for Web of Science ID 000343885300027

View details for PubMedCentralID PMC4266194
Soft selective sweeps in complex demographic scenarios. Genetics Wilson, B. A., Petrov, D. A., Messer, P. W. 2014; 198 (2): 669-684

Abstract

Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such "hardening" of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.

View details for DOI 10.1534/genetics.114.165571

View details for PubMedID 25060100

View details for PubMedCentralID PMC4266194
Reply to Chen and Zhang: On interpreting genome-wide trends from yeast mutation accumulation data PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, Y. O., Siegal, M. L., Hall, D. W., Petrov, D. A. 2014; 111 (39): E4063

View details for PubMedID 25217565
Precise estimates of mutation rate and spectrum in yeast PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, Y. O., Siegal, M. L., Hall, D. W., Petrov, D. A. 2014; 111 (22): E2310-E2318

Abstract

Mutation is the ultimate source of genetic variation. The most direct and unbiased method of studying spontaneous mutations is via mutation accumulation (MA) lines. Until recently, MA experiments were limited by the cost of sequencing and thus provided us with small numbers of mutational events and therefore imprecise estimates of rates and patterns of mutation. We used whole-genome sequencing to identify nearly 1,000 spontaneous mutation events accumulated over ∼311,000 generations in 145 diploid MA lines of the budding yeast Saccharomyces cerevisiae. MA experiments are usually assumed to have negligible levels of selection, but even mild selection will remove strongly deleterious events. We take advantage of such patterns of selection and show that mutation classes such as indels and aneuploidies (especially monosomies) are proportionately much more likely to contribute mutations of large effect. We also provide conservative estimates of indel, aneuploidy, environment-dependent dominant lethal, and recessive lethal mutation rates. To our knowledge, for the first time in yeast MA data, we identified a sufficiently large number of single-nucleotide mutations to measure context-dependent mutation rates and were able to (i) confirm strong AT bias of mutation in yeast driven by high rate of mutations from C/G to T/A and (ii) detect a higher rate of mutation at C/G nucleotides in two specific contexts consistent with cytosine methylation in S. cerevisiae.

View details for DOI 10.1073/pnas.1323011111

View details for Web of Science ID 000336687900012

View details for PubMedID 24847077

View details for PubMedCentralID PMC4050626
Genome-wide signals of positive selection in human evolution. Genome research Enard, D., Messer, P. W., Petrov, D. A. 2014; 24 (6): 885-895

Abstract

The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci, and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1000 Genomes Project and detect signatures of positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to argue that the observed signatures require a high rate of strongly adaptive substitutions near amino acid changes. We further demonstrate that the observed signatures of positive selection correlate better with the presence of regulatory sequences, as predicted by the ENCODE Project Consortium, than with the positions of amino acid substitutions. Our results suggest that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson that adaptive divergence is primarily driven by regulatory changes.

View details for DOI 10.1101/gr.164822.113

View details for PubMedID 24619126

View details for PubMedCentralID PMC4032853
Comparative population genomics: power and principles for the inference of functionality TRENDS IN GENETICS Lawrie, D. S., Petrov, D. A. 2014; 30 (4): 133-139

Abstract

The availability of sequenced genomes from multiple related organisms allows the detection and localization of functional genomic elements based on the idea that such elements evolve more slowly than neutral sequences. Although such comparative genomics methods have proven useful in discovering functional elements and ascertaining levels of functional constraint in the genome as a whole, here we outline limitations intrinsic to this approach that cannot be overcome by sequencing more species. We argue that it is essential to supplement comparative genomics with ultra-deep sampling of populations from closely related species to enable substantially more powerful genomic scans for functional elements. The convergence of sequencing technology and population genetics theory has made such projects feasible and has exciting implications for functional genomics.

View details for DOI 10.1016/j.tig.2014.02.002

View details for Web of Science ID 000335426300003

View details for PubMedID 24656563
Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population. Molecular ecology McCoy, R. C., Garud, N. R., Kelley, J. L., Boggs, C. L., Petrov, D. A. 2014; 23 (1): 136-150

Abstract

The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here, we present and evaluate a method of SNP discovery using RNA sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has as experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all-in-one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other nonmodel systems.

View details for DOI 10.1111/mec.12591

View details for PubMedID 24237665
Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements. PloS one McCoy, R. C., Taylor, R. W., Blauwkamp, T. A., Kelley, J. L., Kertesz, M., Pushkarev, D., Petrov, D. A., Fiston-Lavier, A. 2014; 9 (9)

View details for DOI 10.1371/journal.pone.0106689

View details for PubMedID 25188499
Population genomics of transposable elements in Drosophila. Annual review of genetics Barrón, M. G., Fiston-Lavier, A., Petrov, D. A., González, J. 2014; 48: 561-581

Abstract

Studies of the population dynamics of transposable elements (TEs) in Drosophila melanogaster indicate that consistent forces are affecting TEs independently of their modes of transposition and regulation. New sequencing technologies enable biologists to sample genomes at an unprecedented scale in order to quantify genome-wide polymorphism for annotated and novel TE insertions. In this review, we first present new insights gleaned from high-throughput data for population genomics studies of D. melanogaster. We then consider the latest population genomics models for TE evolution and present examples of functional evidence revealed by genome-wide studies of TE population dynamics in D. melanogaster. Although most of the TE insertions are deleterious or neutral, some TE insertions increase the fitness of the individual that carries them and play a role in genome adaptation.

View details for DOI 10.1146/annurev-genet-120213-092359

View details for PubMedID 25292358
Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PloS one McCoy, R. C., Taylor, R. W., Blauwkamp, T. A., Kelley, J. L., Kertesz, M., Pushkarev, D., Petrov, D. A., Fiston-Lavier, A. 2014; 9 (9)

Abstract

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5-18.5 Kbp with an extremely low error rate ([Formula: see text]0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.

View details for DOI 10.1371/journal.pone.0106689

View details for PubMedID 25188499

View details for PubMedCentralID PMC4154752
Population genomics of rapid adaptation by soft selective sweeps TRENDS IN ECOLOGY & EVOLUTION Messer, P. W., Petrov, D. A. 2013; 28 (22): 659-669

Abstract

Organisms can often adapt surprisingly quickly to evolutionary challenges, such as the application of pesticides or antibiotics, suggesting an abundant supply of adaptive genetic variation. In these situations, adaptation should commonly produce 'soft' selective sweeps, where multiple adaptive alleles sweep through the population at the same time, either because the alleles were already present as standing genetic variation or arose independently by recurrent de novo mutations. Most well-known examples of rapid molecular adaptation indeed show signatures of such soft selective sweeps. Here, we review the current understanding of the mechanisms that produce soft sweeps and the approaches used for their identification in population genomic data. We argue that soft sweeps might be the dominant mode of adaptation in many species.

View details for DOI 10.1016/j.tree.2013.08.003

View details for Web of Science ID 000326666200007

View details for PubMedCentralID PMC3834262
Population genomics of rapid adaptation by soft selective sweeps. Trends in ecology & evolution Messer, P. W., Petrov, D. A. 2013; 28 (11): 659-69

Abstract

Organisms can often adapt surprisingly quickly to evolutionary challenges, such as the application of pesticides or antibiotics, suggesting an abundant supply of adaptive genetic variation. In these situations, adaptation should commonly produce 'soft' selective sweeps, where multiple adaptive alleles sweep through the population at the same time, either because the alleles were already present as standing genetic variation or arose independently by recurrent de novo mutations. Most well-known examples of rapid molecular adaptation indeed show signatures of such soft selective sweeps. Here, we review the current understanding of the mechanisms that produce soft sweeps and the approaches used for their identification in population genomic data. We argue that soft sweeps might be the dominant mode of adaptation in many species.

View details for DOI 10.1016/j.tree.2013.08.003

View details for PubMedID 24075201

View details for PubMedCentralID PMC3834262
Host Species and Environmental Effects on Bacterial Communities Associated with Drosophila in the Laboratory and in the Natural Environment PLOS ONE Staubach, F., Baines, J. F., Kuenzel, S., Bik, E. M., Petrov, D. A. 2013; 8 (8)

Abstract

The fruit fly Drosophila is a classic model organism to study adaptation as well as the relationship between genetic variation and phenotypes. Although associated bacterial communities might be important for many aspects of Drosophila biology, knowledge about their diversity, composition, and factors shaping them is limited. We used 454-based sequencing of a variable region of the bacterial 16S ribosomal RNA gene to characterize the bacterial communities associated with wild and laboratory Drosophila isolates. In order to specifically investigate effects of food source and host species on bacterial communities, we analyzed samples from wild Drosophila melanogaster and D. simulans collected from a variety of natural substrates, as well as from adults and larvae of nine laboratory-reared Drosophila species. We find no evidence for host species effects in lab-reared flies; instead, lab of origin and stochastic effects, which could influence studies of Drosophila phenotypes, are pronounced. In contrast, the natural Drosophila-associated microbiota appears to be predominantly shaped by food substrate with an additional but smaller effect of host species identity. We identify a core member of this natural microbiota that belongs to the genus Gluconobacter and is common to all wild-caught flies in this study, but absent from the laboratory. This makes it a strong candidate for being part of what could be a natural D. melanogaster and D. simulans core microbiome. Furthermore, we were able to identify candidate pathogens in natural fly isolates.

View details for DOI 10.1371/journal.pone.0070749

View details for Web of Science ID 000323115800019

View details for PubMedID 23967097

View details for PubMedCentralID PMC3742674
Frequent adaptation and the McDonald-Kreitman test. Proceedings of the National Academy of Sciences of the United States of America Messer, P. W., Petrov, D. A. 2013; 110 (21): 8615-8620

Abstract

Population genomic studies have shown that genetic draft and background selection can profoundly affect the genome-wide patterns of molecular variation. We performed forward simulations under realistic gene-structure and selection scenarios to investigate whether such linkage effects impinge on the ability of the McDonald-Kreitman (MK) test to infer the rate of positive selection (α) from polymorphism and divergence data. We find that in the presence of slightly deleterious mutations, MK estimates of α severely underestimate the true rate of adaptation even if all polymorphisms with population frequencies under 50% are excluded. Furthermore, already under intermediate rates of adaptation, genetic draft substantially distorts the site frequency spectra at neutral and functional sites from the expectations under mutation-selection-drift balance. MK-type approaches that first infer demography from synonymous sites and then use the inferred demography to correct the estimation of α obtain almost the correct α in our simulations. However, these approaches typically infer a severe past population expansion although there was no such expansion in the simulations, casting doubt on the accuracy of methods that infer demography from synonymous polymorphism data. We propose a simple asymptotic extension of the MK test that yields accurate estimates of α in our simulations and should provide a fruitful direction for future studies.

View details for DOI 10.1073/pnas.1220835110

View details for PubMedID 23650353

View details for PubMedCentralID PMC3666677
Strong purifying selection at synonymous sites in D. melanogaster. PLoS genetics Lawrie, D. S., Messer, P. W., Hershberg, R., Petrov, D. A. 2013; 9 (5)

Abstract

Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in Drosophila melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.

View details for DOI 10.1371/journal.pgen.1003527

View details for PubMedID 23737754

View details for PubMedCentralID PMC3667748
Strong Purifying Selection at Synonymous Sites in D. melanogaster. PLoS genetics Lawrie, D. S., Messer, P. W., Hershberg, R., Petrov, D. A. 2013; 9 (5)

View details for DOI 10.1371/journal.pgen.1003527

View details for PubMedID 23737754
Evaluating methods of demographic inference and testing for balancing selection using genomic data from the checkerspot butterfly Euphydryas gillettii Annual Meeting of the Society-for-Integrative-and-Comparative-Biology (SICB) Mccoy, R. C., Boggs, C. B., Petrov, D. A. OXFORD UNIV PRESS INC. 2013: E329–E329

View details for Web of Science ID 000316991402374
Host species and environmental effects on bacterial communities associated with Drosophila in the laboratory and in the natural environment. PloS one Staubach, F., Baines, J. F., Künzel, S., Bik, E. M., Petrov, D. A. 2013; 8 (8)

Abstract

The fruit fly Drosophila is a classic model organism to study adaptation as well as the relationship between genetic variation and phenotypes. Although associated bacterial communities might be important for many aspects of Drosophila biology, knowledge about their diversity, composition, and factors shaping them is limited. We used 454-based sequencing of a variable region of the bacterial 16S ribosomal RNA gene to characterize the bacterial communities associated with wild and laboratory Drosophila isolates. In order to specifically investigate effects of food source and host species on bacterial communities, we analyzed samples from wild Drosophila melanogaster and D. simulans collected from a variety of natural substrates, as well as from adults and larvae of nine laboratory-reared Drosophila species. We find no evidence for host species effects in lab-reared flies; instead, lab of origin and stochastic effects, which could influence studies of Drosophila phenotypes, are pronounced. In contrast, the natural Drosophila-associated microbiota appears to be predominantly shaped by food substrate with an additional but smaller effect of host species identity. We identify a core member of this natural microbiota that belongs to the genus Gluconobacter and is common to all wild-caught flies in this study, but absent from the laboratory. This makes it a strong candidate for being part of what could be a natural D. melanogaster and D. simulans core microbiome. Furthermore, we were able to identify candidate pathogens in natural fly isolates.

View details for DOI 10.1371/journal.pone.0070749

View details for PubMedID 23967097

View details for PubMedCentralID PMC3742674
Evolutionary Biology for the 21st Century PLOS BIOLOGY Losos, J. B., Arnold, S. J., Bejerano, G., Brodie, E. D., Hibbett, D., Hoekstra, H. E., Mindell, D. P., Monteiro, A., Moritz, C., Orr, H. A., Petrov, D. A., Renner, S. S., Ricklefs, R. E., Soltis, P. S., Turner, T. L. 2013; 11 (1)

View details for DOI 10.1371/journal.pbio.1001466

View details for Web of Science ID 000314648700006

View details for PubMedID 23319892

View details for PubMedCentralID PMC3539946
Evolutionary biology for the 21st century. PLoS biology Losos, J. B., Arnold, S. J., Bejerano, G., Brodie, E. D., Hibbett, D., Hoekstra, H. E., Mindell, D. P., Monteiro, A., Moritz, C., Orr, H. A., Petrov, D. A., Renner, S. S., Ricklefs, R. E., Soltis, P. S., Turner, T. L. 2013; 11 (1)

View details for DOI 10.1371/journal.pbio.1001466

View details for PubMedID 23319892

View details for PubMedCentralID PMC3539946
On the Limitations of Using Ribosomal Genes as References for the Study of Codon Usage: A Rebuttal PLOS ONE Hershberg, R., Petrov, D. A. 2012; 7 (12)

Abstract

In a recent paper published in PLOS ONE, Wang et al. challenge our finding that the identity of optimal codons in different genomes follows a set of clear rules. Here we provide a rebuttal of their paper and demonstrate that the results of our original PLOS Genetics paper stand. This provides us with an opportunity to bring up an aspect of how codon usage has been studied that should be of general interest. The Wang et al. study, as well as many other studies, used ribosomal genes as a reference set for the study of patterns of codon usage. We discuss here the assumptions that are made in order to justify using ribosomal genes to study codon bias, suggest that this practice can at times be problematic, and discuss its limitations.

View details for DOI 10.1371/journal.pone.0049060

View details for Web of Science ID 000312794500008

View details for PubMedID 23284622

View details for PubMedCentralID PMC3527481
LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data PLOS ONE Feder, A. F., Petrov, D. A., Bergland, A. O. 2012; 7 (11)

Abstract

High-throughput pooled resequencing offers significant potential for whole genome population sequencing. However, its main drawback is the loss of haplotype information. In order to regain some of this information, we present LDx, a computational tool for estimating linkage disequilibrium (LD) from pooled resequencing data. LDx uses an approximate maximum likelihood approach to estimate LD (r(2)) between pairs of SNPs that can be observed within and among single reads. LDx also reports r(2) estimates derived solely from observed genotype counts. We demonstrate that the LDx estimates are highly correlated with r(2) estimated from individually resequenced strains. We discuss the performance of LDx using more stringent quality conditions and infer via simulation the degree to which performance can improve based on read depth. Finally we demonstrate two possible uses of LDx with real and simulated pooled resequencing data. First, we use LDx to infer genomewide patterns of decay of LD with physical distance in D. melanogaster population resequencing data. Second, we demonstrate that r(2) estimates from LDx are capable of distinguishing alternative demographic models representing plausible demographic histories of D. melanogaster.

View details for DOI 10.1371/journal.pone.0048588

View details for Web of Science ID 000312272600012

View details for PubMedID 23152785

View details for PubMedCentralID PMC3494690
Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (Mus musculus) PLOS GENETICS Staubach, F., Lorenc, A., Messer, P. W., Tang, K., Petrov, D. A., Tautz, D. 2012; 8 (8)

Abstract

General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus) is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP) typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1), homologues of human genes involved in adaptations (e.g. alpha-amylase genes) or in genetic diseases (e.g. Huntingtin and Parkin). Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice may also have a natural origin.

View details for DOI 10.1371/journal.pgen.1002891

View details for Web of Science ID 000308529300048

View details for PubMedID 22956910

View details for PubMedCentralID PMC3431316
Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster PLOS ONE Zhu, Y., Bergland, A. O., Gonzalez, J., Petrov, D. A. 2012; 7 (7)

Abstract

The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using "pooled" data and compared them with "true" frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.

View details for DOI 10.1371/journal.pone.0041901

View details for Web of Science ID 000309240600056

View details for PubMedID 22848651

View details for PubMedCentralID PMC3406057
Origins and rates of aneuploidy in human blastomeres FERTILITY AND STERILITY Rabinowitz, M., Ryan, A., Gemelos, G., Hill, M., Baner, J., Cinnioglu, C., Banjevic, M., Potter, D., Petrov, D. A., Demko, Z. 2012; 97 (2): 395-401

Abstract

To characterize chromosomal error types and parental origin of aneuploidy in cleavage-stage embryos using an informatics-based technique that enables the elucidation of aneuploidy-causing mechanisms.Analysis of blastomeres biopsied from cleavage-stage embryos for preimplantation genetic screening during IVF.Laboratory.Couples undergoing IVF treatment.Two hundred seventy-four blastomeres were subjected to array-based genotyping and informatics-based techniques to characterize chromosomal error types and parental origin of aneuploidy across all 24 chromosomes.Chromosomal error types (monosomy vs. trisomy; mitotic vs. meiotic) and parental origin (maternal vs. paternal).The rate of maternal meiotic trisomy rose significantly with age, whereas other types of trisomy showed no correlation with age. Trisomies were mostly maternal in origin, whereas paternal and maternal monosomies were roughly equal in frequency. No examples of paternal meiotic trisomy were observed. Segmental error rates were found to be independent of maternal age.All types of aneuploidy that rose with increasing maternal age can be attributed to disjunction errors during meiosis of the oocyte. Chromosome gains were predominantly maternal in origin and occurred during meiosis, whereas chromosome losses were not biased in terms of parental origin of the chromosome. The ability to determine the parental origin for each chromosome, as well as being able to detect whether multiple homologs from a single parent were present, allowed greater insights into the origin of aneuploidy.

View details for DOI 10.1016/j.fertnstert.2011.11.034

View details for Web of Science ID 000299961800028

View details for PubMedID 22195772
Evolution of genome content: population dynamics of transposable elements in flies and humans. Methods in molecular biology (Clifton, N.J.) González, J., Petrov, D. A. 2012; 855: 361-383

Abstract

Recent research is starting to shed light on the factors that influence the population and evolutionary dynamics of transposable elements (TEs) and TE life cycles. Genomes differ sharply in the number of TE copies, in the level of TE activity, in the diversity of TE families and types, and in the proportion of old and young TEs. In this chapter, we focus on two well-studied genomes with strikingly different architectures, humans and Drosophila, which represent two extremes in terms of TE diversity and population dynamics. We argue that some of the answers might lie in (1) the larger population size and consequently more effective selection against new TE insertions due to ectopic recombination in flies compared to humans; and (2) in the faster rate of DNA loss in flies compared to humans leading to much faster removal of fixed TE copies from the fly genome.

View details for DOI 10.1007/978-1-61779-582-4_13

View details for PubMedID 22407716
Heterozygote advantage as a natural consequence of adaptation in diploids PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Sellis, D., Callahan, B. J., Petrov, D. A., Messer, P. W. 2011; 108 (51): 20666-20671

Abstract

Molecular adaptation is typically assumed to proceed by sequential fixation of beneficial mutations. In diploids, this picture presupposes that for most adaptive mutations, the homozygotes have a higher fitness than the heterozygotes. Here, we show that contrary to this expectation, a substantial proportion of adaptive mutations should display heterozygote advantage. This feature of adaptation in diploids emerges naturally from the primary importance of the fitness of heterozygotes for the invasion of new adaptive mutations. We formalize this result in the framework of Fisher's influential geometric model of adaptation. We find that in diploids, adaptation should often proceed through a succession of short-lived balanced states that maintain substantially higher levels of phenotypic and fitness variation in the population compared with classic adaptive walks. In fast-changing environments, this variation produces a diversity advantage that allows diploids to remain better adapted compared with haploids despite the disadvantage associated with the presence of unfit homozygotes. The short-lived balanced states arising during adaptive walks should be mostly invisible to current scans for long-term balancing selection. Instead, they should leave signatures of incomplete selective sweeps, which do appear to be common in many species. Our results also raise the possibility that balancing selection, as a natural consequence of frequent adaptation, might play a more prominent role among the forces maintaining genetic variation than is commonly recognized.

View details for DOI 10.1073/pnas.1114573108

View details for Web of Science ID 000298289400081

View details for PubMedID 22143780

View details for PubMedCentralID PMC3251125
High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes GENOME RESEARCH Markova-Raina, P., Petrov, D. 2011; 21 (6): 863-874

Abstract

We investigate the effect of aligner choice on inferences of positive selection using site-specific models of molecular evolution. We find that independently of the choice of aligner, the rate of false positives is unacceptably high. Our study is a whole-genome analysis of all protein-coding genes in 12 Drosophila genomes annotated in either all 12 species (~6690 genes) or in the six melanogaster group species. We compare six popular aligners: PRANK, T-Coffee, ClustalW, ProbCons, AMAP, and MUSCLE, and find that the aligner choice strongly influences the estimates of positive selection. Differences persist when we use (1) different stringency cutoffs, (2) different selection inference models, (3) alignments with or without gaps, and/or additional masking, (4) per-site versus per-gene statistics, (5) closely related melanogaster group species versus more distant 12 Drosophila genomes. Furthermore, we find that these differences are consequential for downstream analyses such as determination of over/under-represented GO terms associated with positive selection. Visual analysis indicates that most sites inferred as positively selected are, in fact, misaligned at the codon level, resulting in false positive rates of 48%-82%. PRANK, which has been reported to outperform other aligners in simulations, performed best in our empirical study as well. Unfortunately, PRANK still had a high, and unacceptable for most applications, false positives rate of 50%-55%. We identify misannotations and indels, many of which appear to be located in disordered protein regions, as primary culprits for the high misalignment-related error levels and discuss possible workaround approaches to this apparently pervasive problem in genome-wide evolutionary analyses.

View details for DOI 10.1101/gr.115949.110

View details for Web of Science ID 000291153400006

View details for PubMedID 21393387

View details for PubMedCentralID PMC3106319
Population Genomics of Transposable Elements in Drosophila melanogaster MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Fiston-Lavier, A., Lipatov, M., Lenkov, K., Gonzalez, J. 2011; 28 (5): 1633-1644

Abstract

Transposable elements (TEs) are the primary contributors to the genome bulk in many organisms and are major players in genome evolution. A clear and thorough understanding of the population dynamics of TEs is therefore essential for full comprehension of the eukaryotic genome evolution and function. Although TEs in Drosophila melanogaster have received much attention, population dynamics of most TE families in this species remains entirely unexplored. It is not clear whether the same population processes can account for the population behaviors of all TEs in Drosophila or whether, as has been suggested previously, different orders behave according to very different rules. In this work, we analyzed population frequencies for a large number of individual TEs (755 TEs) in five North American and one sub-Saharan African D. melanogaster populations (75 strains in total). These TEs have been annotated in the reference D. melanogaster euchromatic genome and have been sampled from all three major orders (non-LTR, LTR, and TIR) and from all families with more than 20 TE copies (55 families in total). We find strong evidence that TEs in Drosophila across all orders and families are subject to purifying selection at the level of ectopic recombination. We showed that strength of this selection varies predictably with recombination rate, length of individual TEs, and copy number and length of other TEs in the same family. Importantly, these rules do not appear to vary across orders. Finally, we built a statistical model that considered only individual TE-level (such as the TE length) and family-level properties (such as the copy number) and were able to explain more than 40% of the variation in TE frequencies in D. melanogaster.

View details for DOI 10.1093/molbev/msq337

View details for Web of Science ID 000289841500011

View details for PubMedID 21172826

View details for PubMedCentralID PMC3080135
T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data NUCLEIC ACIDS RESEARCH Fiston-Lavier, A., Carrigan, M., Petrov, D. A., Gonzalez, J. 2011; 39 (6)

Abstract

Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of 'no data' calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data.

View details for DOI 10.1093/nar/gkq1291

View details for Web of Science ID 000289166400004

View details for PubMedID 21177644

View details for PubMedCentralID PMC3064797
Faster than Neutral Evolution of Constrained Sequences: The Complex Interplay of Mutational Biases and Weak Selection GENOME BIOLOGY AND EVOLUTION Lawrie, D. S., Petrov, D. A., Messer, P. W. 2011; 3: 383-395

Abstract

Comparative genomics has become widely accepted as the major framework for the ascertainment of functionally important regions in genomes. The underlying paradigm of this approach is that most of the functional regions are assumed to be under selective constraint, which in turn reduces the rate of evolution relative to neutrality. This assumption allows detection of functional regions through sequence conservation. However, constraint does not always lead to sequence conservation. When purifying selection is weak and mutation is biased, constrained regions can even evolve faster than neutral sequences and thus can appear to be under positive selection. Moreover, conservation estimates depend also on the orientation of selection relative to mutational biases and can vary over time. In the light of recent data of the ubiquity of mutational biases and weak selective forces, these effects should reduce the power of conservation analyses to define functional regions using comparative genomics data. We argue that the estimation of true mutational biases and the use of explicit evolutionary models are essential to improve methods inferring the action of natural selection and functionality in genome sequences.

View details for DOI 10.1093/gbe/evr032

View details for Web of Science ID 000295693200004

View details for PubMedID 21498884

View details for PubMedCentralID PMC3101017
Drosophila melanogaster recombination rate calculator GENE Fiston-Lavier, A., Singh, N. D., Lipatov, M., Petrov, D. A. 2010; 463 (1-2): 18-20

Abstract

Recombination rate is a key evolutionary parameter that determines the degree to which sites are linked. Estimating recombination rates is thus of crucial importance for population genetic and molecular evolutionary studies. We present here a user-friendly web-based tool that can be used to retrieve recombination rate estimates for single and/or multiple loci in the Drosophila melanogaster genome given a user-defined choice of the genome release. We used the Marey map approach that is based on comparing the genetic and physical maps to infer recombination rates along the major chromosomes of the D.melanogaster genome. Our implementation of this approach is based on building third-order polynomials which are used to interpolate recombination rates at all points on the chromosome except for telomeric and centromeric regions in which such polynomials are known to provide particularly poor estimation.

View details for DOI 10.1016/j.gene.2010.04.015

View details for Web of Science ID 000280751600003

View details for PubMedID 20452408
Evidence That Mutation Is Universally Biased towards AT in Bacteria PLOS GENETICS Hershberg, R., Petrov, D. A. 2010; 6 (9)

Abstract

Mutation is the engine that drives evolution and adaptation forward in that it generates the variation on which natural selection acts. Mutation is a random process that nevertheless occurs according to certain biases. Elucidating mutational biases and the way they vary across species and within genomes is crucial to understanding evolution and adaptation. Here we demonstrate that clonal pathogens that evolve under severely relaxed selection are uniquely suitable for studying mutational biases in bacteria. We estimate mutational patterns using sequence datasets from five such clonal pathogens belonging to four diverse bacterial clades that span most of the range of genomic nucleotide content. We demonstrate that across different types of sites and in all four clades mutation is consistently biased towards AT. This is true even in clades that have high genomic GC content. In all studied cases the mutational bias towards AT is primarily due to the high rate of C/G to T/A transitions. These results suggest that bacterial mutational biases are far less variable than previously thought. They further demonstrate that variation in nucleotide content cannot stem entirely from variation in mutational biases and that natural selection and/or a natural selection-like process such as biased gene conversion strongly affect nucleotide content.

View details for DOI 10.1371/journal.pgen.1001115

View details for Web of Science ID 000282369200053

View details for PubMedID 20838599

View details for PubMedCentralID PMC2936535
Evidence that Adaptation in Drosophila Is Not Limited by Mutation at Single Sites PLOS GENETICS Karasov, T., Messer, P. W., Petrov, D. A. 2010; 6 (6)

Abstract

Adaptation in eukaryotes is generally assumed to be mutation-limited because of small effective population sizes. This view is difficult to reconcile, however, with the observation that adaptation to anthropogenic changes, such as the introduction of pesticides, can occur very rapidly. Here we investigate adaptation at a key insecticide resistance locus (Ace) in Drosophila melanogaster and show that multiple simple and complex resistance alleles evolved quickly and repeatedly within individual populations. Our results imply that the current effective population size of modern D. melanogaster populations is likely to be substantially larger (> or = 100-fold) than commonly believed. This discrepancy arises because estimates of the effective population size are generally derived from levels of standing variation and thus reveal long-term population dynamics dominated by sharp--even if infrequent--bottlenecks. The short-term effective population sizes relevant for strong adaptation, on the other hand, might be much closer to census population sizes. Adaptation in Drosophila may therefore not be limited by waiting for mutations at single sites, and complex adaptive alleles can be generated quickly without fixation of intermediate states. Adaptive events should also commonly involve the simultaneous rise in frequency of independently generated adaptive mutations. These so-called soft sweeps have very distinct effects on the linked neutral polymorphisms compared to the standard hard sweeps in mutation-limited scenarios. Methods for the mapping of adaptive mutations or association mapping of evolutionarily relevant mutations may thus need to be reconsidered.

View details for DOI 10.1371/journal.pgen.1000924

View details for Web of Science ID 000279805200003

View details for PubMedID 20585551

View details for PubMedCentralID PMC2887467
Genome-Wide Patterns of Adaptation to Temperate Environments Associated with Transposable Elements in Drosophila PLOS GENETICS Gonzalez, J., Karasov, T. L., Messer, P. W., Petrov, D. A. 2010; 6 (4)

Abstract

Investigating spatial patterns of loci under selection can give insight into how populations evolved in response to selective pressures and can provide monitoring tools for detecting the impact of environmental changes on populations. Drosophila is a particularly good model to study adaptation to environmental heterogeneity since it is a tropical species that originated in sub-Saharan Africa and has only recently colonized the rest of the world. There is strong evidence for the adaptive role of Transposable Elements (TEs) in the evolution of Drosophila, and TEs might play an important role specifically in adaptation to temperate climates. In this work, we analyzed the frequency of a set of putatively adaptive and putatively neutral TEs in populations with contrasting climates that were collected near the endpoints of two known latitudinal clines in Australia and North America. The contrasting results obtained for putatively adaptive and putatively neutral TEs and the consistency of the patterns between continents strongly suggest that putatively adaptive TEs are involved in adaptation to temperate climates. We integrated information on population behavior, possible environmental selective agents, and both molecular and functional information of the TEs and their nearby genes to infer the plausible phenotypic consequences of these insertions. We conclude that adaptation to temperate environments is widespread in Drosophila and that TEs play a significant role in this adaptation. It is remarkable that such a diverse set of TEs located next to a diverse set of genes are consistently adaptive to temperate climate-related factors. We argue that reverse population genomic analyses, as the one described in this work, are necessary to arrive at a comprehensive picture of adaptation.

View details for DOI 10.1371/journal.pgen.1000905

View details for Web of Science ID 000277354200022

View details for PubMedID 20386746

View details for PubMedCentralID PMC2851572
Adaptive Evolution of Pelvic Reduction in Sticklebacks by Recurrent Deletion of a Pitx1 Enhancer SCIENCE Chan, Y. F., Marks, M. E., Jones, F. C., Villarreal, G., Shapiro, M. D., Brady, S. D., Southwick, A. M., Absher, D. M., Grimwood, J., Schmutz, J., Myers, R. M., Petrov, D., Jonsson, B., Schluter, D., Bell, M. A., Kingsley, D. M. 2010; 327 (5963): 302-305

Abstract

The molecular mechanisms underlying major phenotypic changes that have evolved repeatedly in nature are generally unknown. Pelvic loss in different natural populations of threespine stickleback fish has occurred through regulatory mutations deleting a tissue-specific enhancer of the Pituitary homeobox transcription factor 1 (Pitx1) gene. The high prevalence of deletion mutations at Pitx1 may be influenced by inherent structural features of the locus. Although Pitx1 null mutations are lethal in laboratory animals, Pitx1 regulatory mutations show molecular signatures of positive selection in pelvic-reduced populations. These studies illustrate how major expression and morphological changes can arise from single mutational leaps in natural populations, producing new adaptive alleles via recurrent regulatory alterations in a key developmental control gene.

View details for DOI 10.1126/science.1182213

View details for Web of Science ID 000273629700034

View details for PubMedID 20007865

View details for PubMedCentralID PMC3109066
Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes GENOME BIOLOGY AND EVOLUTION Cai, J. J., Petrov, D. A. 2010; 2: 393-409

Abstract

Genes in the same organism vary in the time since their evolutionary origin. Without horizontal gene transfer, young genes are necessarily restricted to a few closely related species, whereas old genes can be broadly distributed across the phylogeny. It has been shown that young genes evolve faster than old genes; however, the evolutionary forces responsible for this pattern remain obscure. Here, we classify human-chimp protein-coding genes into different age classes, according to the breath of their phylogenetic distribution. We estimate the strength of purifying selection and the rate of adaptive selection for genes in different age classes. We find that older genes carry fewer and less frequent nonsynonymous single-nucleotide polymorphisms than younger genes suggesting that older genes experience a stronger purifying selection at the protein-coding level. We infer the distribution of fitness effects of new deleterious mutations and find that older genes have proportionally more slightly deleterious mutations and fewer nearly neutral mutations than younger genes. To investigate the role of adaptive selection of genes in different age classes, we determine the selection coefficient (gamma = 2N(e)s) of genes using the MKPRF approach and estimate the ratio of the rate of adaptive nonsynonymous substitution to synonymous substitution (omega(A)) using the DoFE method. Although the proportion of positively selected genes (gamma > 0) is significantly higher in younger genes, we find no correlation between omega(A) and gene age. Collectively, these results provide strong evidence that younger genes are subject to weaker purifying selection and more tenuous evidence that they also undergo adaptive evolution more frequently.

View details for DOI 10.1093/gbe/evq019

View details for Web of Science ID 000280480000035

View details for PubMedID 20624743

View details for PubMedCentralID PMC2997544
Broker Genes in Human Disease GENOME BIOLOGY AND EVOLUTION Cai, J. J., Borenstein, E., Petrov, D. A. 2010; 2: 815-825

Abstract

Genes that underlie human disease are important subjects of systems biology research. In the present study, we demonstrate that Mendelian and complex disease genes have distinct and consistent protein-protein interaction (PPI) properties. We show that five different network properties can be reduced to two independent metrics when applied to the human PPI network. These two metrics largely coincide with the degree (number of connections) and the clustering coefficient (the number of connections among the neighbors of a particular protein). We demonstrate that disease genes have simultaneously unusually high degree and unusually low clustering coefficient. Such genes can be described as brokers in that they connect many proteins that would not be connected otherwise. We show that these results are robust to the effect of gene age and inspection bias variation. Notably, genes identified in genome-wide association study (GWAS) have network patterns that are almost indistinguishable from the network patterns of nondisease genes and significantly different from the network patterns of complex disease genes identified through non-GWAS means. This suggests either that GWAS focused on a distinct set of diseases associated with an unusual set of genes or that mapping of GWAS-identified single nucleotide polymorphisms onto the causally affected neighboring genes is error prone.

View details for DOI 10.1093/gbe/evq064

View details for Web of Science ID 000291467300023

View details for PubMedID 20937604

View details for PubMedCentralID PMC2988523
Time for DNA Disclosure SCIENCE Krane, D. E., Bahn, V., Balding, D., Barlow, B., Cash, H., Desportes, B. L., D'Eustachio, P., Devlin, K., Doom, T. E., Dror, I., Ford, S., Funk, C., Gilder, J., Hampikian, G., Inman, K., Jamieson, A., KENT, P. E., Koppl, R., Kornfield, I., Krimsky, S., Mnookin, J., Mueller, L., Murphy, E., Paoletti, D. R., Petrov, D. A., Raymer, M., Risinger, D. M., Roth, A., Rudin, N., Shields, W., Siegel, J. A., Slatkin, M., Song, Y. S., Speed, T., Spiegelman, C., Sullivan, P., Swienton, A. R., Tarpey, T., Thompson, W. C., Ungvarsky, E., ZABELL, S. 2009; 326 (5960): 1631-1632

View details for Web of Science ID 000272839000027

View details for PubMedID 20019271
The adaptive role of transposable elements in the Drosophila genome GENE Gonzalez, J., Petrov, D. A. 2009; 448 (2): 124-133

Abstract

Transposable elements (TEs) are short DNA sequences with the capacity to move between different sites in the genome. This ability provides them with the capacity to mutate the genome in many different ways, from subtle regulatory mutations to gross genomic rearrangements. The potential adaptive significance of TEs was recognized by those involved in their initial discovery although it was hotly debated afterwards. For more than two decades, TEs were considered to be intragenomic parasites leading to almost exclusively detrimental effects to the host genome. The sequencing of the Drosophila melanogaster genome provided an unprecedented opportunity to study TEs and led to the identification of the first TE-induced adaptations in this species. These studies were followed by a systematic genome-wide search for adaptive insertions that allowed for the first time to infer that TEs contribute substantially to adaptive evolution. This study also revealed that there are at least twice as many TE-induced adaptations that remain to be identified. To gain a better understanding of the adaptive role of TEs in the genome we clearly need to (i) identify as many adaptive TEs as possible in a range of Drosophila species as well as (ii) carry out in-depth investigations of the effects of adaptive TEs on as many phenotypes as possible.

View details for DOI 10.1016/j.gene.2009.06.008

View details for Web of Science ID 000271972200004

View details for PubMedID 19555747

View details for PubMedCentralID PMC2784284
MITEs-The Ultimate Parasites SCIENCE Gonzalez, J., Petrov, D. 2009; 325 (5946): 1352-1353

View details for DOI 10.1126/science.1179556

View details for Web of Science ID 000269699100025

View details for PubMedID 19745141
A Recent Adaptive Transposable Element Insertion Near Highly Conserved Developmental Loci in Drosophila melanogaster MOLECULAR BIOLOGY AND EVOLUTION Gonzalez, J., Macpherson, J. M., Petrov, D. A. 2009; 26 (9): 1949-1961

Abstract

A recent genomewide screen identified 13 transposable elements that are likely to have been adaptive during or after the spread of Drosophila melanogaster out of Africa. One of these insertions, Bari-Juvenile hormone epoxy hydrolase (Bari-Jheh), was associated with the selective sweep of its flanking neutral variation and with reduction of expression of one of its neighboring genes: Jheh3. Here, we provide further evidence that Bari-Jheh insertion is adaptive. We delimit the extent of the selective sweep and show that Bari-Jheh is the only mutation linked to the sweep. Bari-Jheh also lowers the expression of its other flanking gene, Jheh2. Subtle consequences of Bari-Jheh insertion on life-history traits are consistent with the effects of reduced expression of the Jheh genes. Finally, we analyze molecular evolution of Jheh genes in both the long- and the short-term and conclude that Bari-Jheh appears to be a very rare adaptive event in the history of these genes. We discuss the implications of these findings for the detection and understanding of adaptation.

View details for DOI 10.1093/molbev/msp107

View details for Web of Science ID 000269001500003

View details for PubMedID 19458110

View details for PubMedCentralID PMC2734154
From trait to base pairs: Parallel evolution of pelvic reduction in three-spined sticklebacks occurs by repeated deletion of a tissue-specific pelvic enhancer at Pitx1 Chan, Y., Villarreal, G., Marks, M., Shapiro, M., Jones, F., Petrov, D., Dickson, M., Southwick, A., Absher, D., Grimwood, J., Schmutz, J., Myers, R., Jnsson, B., Schluter, D., Bell, M., Kingsley, D. ELSEVIER SCIENCE BV. 2009: S14–S15

View details for DOI 10.1016/j.mod.2009.06.980

View details for Web of Science ID 000270034900038
General Rules for Optimal Codon Choice PLOS GENETICS Hershberg, R., Petrov, D. A. 2009; 5 (7)

Abstract

Different synonymous codons are favored by natural selection for translation efficiency and accuracy in different organisms. The rules governing the identities of favored codons in different organisms remain obscure. In fact, it is not known whether such rules exist or whether favored codons are chosen randomly in evolution in a process akin to a series of frozen accidents. Here, we study this question by identifying for the first time the favored codons in 675 bacteria, 52 archea, and 10 fungi. We use a number of tests to show that the identified codons are indeed likely to be favored and find that across all studied organisms the identity of favored codons tracks the GC content of the genomes. Once the effect of the genomic GC content on selectively favored codon choice is taken into account, additional universal amino acid specific rules governing the identity of favored codons become apparent. Our results provide for the first time a clear set of rules governing the evolution of selectively favored codon usage. Based on these results, we describe a putative scenario for how evolutionary shifts in the identity of selectively favored codons can occur without even temporary weakening of natural selection for codon bias.

View details for DOI 10.1371/journal.pgen.1000556

View details for Web of Science ID 000269219500033

View details for PubMedID 19593368

View details for PubMedCentralID PMC2700274
Pervasive Natural Selection in the Drosophila Genome? PLOS GENETICS Sella, G., Petrov, D. A., Przeworski, M., Andolfatto, P. 2009; 5 (6)

Abstract

Over the past four decades, the predominant view of molecular evolution saw little connection between natural selection and genome evolution, assuming that the functionally constrained fraction of the genome is relatively small and that adaptation is sufficiently infrequent to play little role in shaping patterns of variation within and even between species. Recent evidence from Drosophila, reviewed here, suggests that this view may be invalid. Analyses of genetic variation within and between species reveal that much of the Drosophila genome is under purifying selection, and thus of functional importance, and that a large fraction of coding and noncoding differences between species are adaptive. The findings further indicate that, in Drosophila, adaptations may be both common and strong enough that the fate of neutral mutations depends on their chance linkage to adaptive mutations as much as on the vagaries of genetic drift. The emerging evidence has implications for a wide variety of fields, from conservation genetics to bioinformatics, and presents challenges to modelers and experimentalists alike.

View details for DOI 10.1371/journal.pgen.1000495

View details for Web of Science ID 000268444600003

View details for PubMedID 19503600

View details for PubMedCentralID PMC2684638
Molecular Evolution of the Testis TAFs of Drosophila MOLECULAR BIOLOGY AND EVOLUTION Li, V. C., Davis, J. C., Lenkov, K., Bolival, B., Fuller, M. T., Petrov, D. A. 2009; 26 (5): 1103-1116

Abstract

The basal transcription machinery is responsible for initiating transcription at core promoters. During metazoan evolution, its components have expanded in number and diversified to increase the complexity of transcriptional regulation in tissues and developmental stages. To explore the evolutionary events and forces underlying this diversification, we analyzed the evolution of the Drosophila testis TAFs (TBP-associated factors), paralogs of TAFs from the basal transcription factor TFIID that are essential for normal transcription during spermatogenesis of a large set of specific genes involved in terminal differentiation of male gametes. There are five testis-specific TAFs in Drosophila, each expressed only in primary spermatocytes and each a paralog of a different generally expressed TFIID subunit. An examination of the presence of paralogs across taxa as well as molecular clock dating indicates that all five testis TAFs likely arose within a span of approximately 38 My 63-250 Ma by independent duplication events from their generally expressed paralogs. Furthermore, the evolution of the testis TAFs has been rapid, with apparent further accelerations in multiple Drosophila lineages. Analysis of between-species divergence and intraspecies polymorphism indicates that the major forces of evolution on these genes have been reduced purifying selection, pervasive positive selection, and coevolution. Other genes that exhibit similar patterns of evolution in the Drosophila lineages are also characterized by enriched expression in the testis, suggesting that the pervasive positive selection acting on the tTAFs is likely to be related to their expression in the testis.

View details for DOI 10.1093/molbev/msp030

View details for Web of Science ID 000265274000014

View details for PubMedID 19244474

View details for PubMedCentralID PMC2727373
Inferring the Strength of Selection in Drosophila under Complex Demographic Models MOLECULAR BIOLOGY AND EVOLUTION Gonzalez, J., Macpherson, J. M., Messer, P. W., Petrov, D. A. 2009; 26 (3): 513-526

Abstract

Transposable elements (TEs) constitute a substantial fraction of the genomes of many species, and it is thus important to understand their population dynamics. The strength of natural selection against TEs is a key parameter in understanding these dynamics. In principle, the strength of selection can be inferred from the frequencies of a sample of TEs. However, complicated demographic histories, such as found in Drosophila melanogaster, could lead to a substantial distortion of the TE frequency distribution compared with that expected for a panmictic, constant-sized population. The current methodology for the estimation of selection intensity acting against TEs does not take into account demographic history and might generate erroneous estimates especially for TE families under weak selection. Here, we develop a flexible maximum likelihood methodology that explicitly accounts both for demographic history and for the ascertainment biases of identifying TEs. We apply this method to the newly generated frequency data of the BS family of non-long terminal repeat retrotransposons in D. melanogaster in concert with two recent models of the demographic history of the species to infer the intensity of selection against this family. We find the estimate to differ substantially compared with a prior estimate that was made assuming a model of constant population size. Further, we find there to be relatively little information about selection intensity present in the derived non-African frequency data and that the ancestral African subpopulation is much more informative in this respect. These findings highlight the importance of accounting for demographic history and bear on study design for the inference of selection coefficients generally.

View details for DOI 10.1093/molbev/msn270

View details for Web of Science ID 000263420900005

View details for PubMedID 19033258

View details for PubMedCentralID PMC2767090
Similarly Strong Purifying Selection Acts on Human Disease Genes of All Evolutionary Ages GENOME BIOLOGY AND EVOLUTION Cai, J. J., Borenstein, E., Chen, R., Petrov, D. A. 2009; 1: 131-144

Abstract

A number of studies have showed that recently created genes differ from the genes created in deep evolutionary past in many aspects. Here, we determined the age of emergence and propensity for gene loss (PGL) of all human protein-coding genes and compared disease genes with non-disease genes in terms of their evolutionary rate, strength of purifying selection, mRNA expression, and genetic redundancy. The older and the less prone to loss, non-disease genes have been evolving 1.5- to 3-fold slower between humans and chimps than young non-disease genes, whereas Mendelian disease genes have been evolving very slowly regardless of their ages and PGL. Complex disease genes showed an intermediate pattern. Disease genes also have higher mRNA expression heterogeneity across multiple tissues than non-disease genes regardless of age and PGL. Young and middle-aged disease genes have fewer similar paralogs as non-disease genes of the same age. We reasoned that genes were more likely to be involved in human disease if they were under a strong functional constraint, expressed heterogeneously across tissues, and lacked genetic redundancy. Young human genes that have been evolving under strong constraint between humans and chimps might also be enriched for genes that encode important primate or even human-specific functions.

View details for DOI 10.1093/gbe/evp013

View details for Web of Science ID 000275269200014

View details for PubMedID 20333184

View details for PubMedCentralID PMC2817408
Pervasive Hitchhiking at Coding and Regulatory Sites in Humans PLOS GENETICS Cai, J. J., Macpherson, J. M., Sella, G., Petrov, D. A. 2009; 5 (1)

Abstract

Much effort and interest have focused on assessing the importance of natural selection, particularly positive natural selection, in shaping the human genome. Although scans for positive selection have identified candidate loci that may be associated with positive selection in humans, such scans do not indicate whether adaptation is frequent in general in humans. Studies based on the reasoning of the MacDonald-Kreitman test, which, in principle, can be used to evaluate the extent of positive selection, suggested that adaptation is detectable in the human genome but that it is less common than in Drosophila or Escherichia coli. Both positive and purifying natural selection at functional sites should affect levels and patterns of polymorphism at linked nonfunctional sites. Here, we search for these effects by analyzing patterns of neutral polymorphism in humans in relation to the rates of recombination, functional density, and functional divergence with chimpanzees. We find that the levels of neutral polymorphism are lower in the regions of lower recombination and in the regions of higher functional density or divergence. These correlations persist after controlling for the variation in GC content, density of simple repeats, selective constraint, mutation rate, and depth of sequencing coverage. We argue that these results are most plausibly explained by the effects of natural selection at functional sites -- either recurrent selective sweeps or background selection -- on the levels of linked neutral polymorphism. Natural selection at both coding and regulatory sites appears to affect linked neutral polymorphism, reducing neutral polymorphism by 6% genome-wide and by 11% in the gene-rich half of the human genome. These findings suggest that the effects of natural selection at linked sites cannot be ignored in the study of neutral human polymorphism.

View details for DOI 10.1371/journal.pgen.1000336

View details for Web of Science ID 000266221100019

View details for PubMedID 19148272

View details for PubMedCentralID PMC2613029
High Functional Diversity in Mycobacterium tuberculosis Driven by Genetic Drift and Human Demography PLOS BIOLOGY Hershberg, R., Lipatov, M., Small, P. M., Sheffer, H., Niemann, S., Homolka, S., Roach, J. C., Kremer, K., Petrov, D. A., Feldman, M. W., Gagneux, S. 2008; 6 (12): 2658-2671

Abstract

Mycobacterium tuberculosis infects one third of the human world population and kills someone every 15 seconds. For more than a century, scientists and clinicians have been distinguishing between the human- and animal-adapted members of the M. tuberculosis complex (MTBC). However, all human-adapted strains of MTBC have traditionally been considered to be essentially identical. We surveyed sequence diversity within a global collection of strains belonging to MTBC using seven megabase pairs of DNA sequence data. We show that the members of MTBC affecting humans are more genetically diverse than generally assumed, and that this diversity can be linked to human demographic and migratory events. We further demonstrate that these organisms are under extremely reduced purifying selection and that, as a result of increased genetic drift, much of this genetic diversity is likely to have functional consequences. Our findings suggest that the current increases in human population, urbanization, and global travel, combined with the population genetic characteristics of M. tuberculosis described here, could contribute to the emergence and spread of drug-resistant tuberculosis.

View details for DOI 10.1371/journal.pbio.0060311

View details for Web of Science ID 000261913700009

View details for PubMedID 19090620

View details for PubMedCentralID PMC2602723
High Rate of Recent Transposable Element-Induced Adaptation in Drosophila melanogaster PLOS BIOLOGY Gonzalez, J., Lenkov, K., Lipatov, M., Macpherson, J. M., Petrov, D. A. 2008; 6 (10): 2109-2129

Abstract

Although transposable elements (TEs) are known to be potent sources of mutation, their contribution to the generation of recent adaptive changes has never been systematically assessed. In this work, we conduct a genome-wide screen for adaptive TE insertions in Drosophila melanogaster that have taken place during or after the spread of this species out of Africa. We determine population frequencies of 902 of the 1,572 TEs in Release 3 of the D. melanogaster genome and identify a set of 13 putatively adaptive TEs. These 13 TEs increased in population frequency sharply after the spread out of Africa. We argue that many of these TEs are in fact adaptive by demonstrating that the regions flanking five of these TEs display signatures of partial selective sweeps. Furthermore, we show that eight out of the 13 putatively adaptive elements show population frequency heterogeneity consistent with these elements playing a role in adaptation to temperate climates. We conclude that TEs have contributed considerably to recent adaptive evolution (one TE-induced adaptation every 200-1,250 y). The majority of these adaptive insertions are likely to be involved in regulatory changes. Our results also suggest that TE-induced adaptations arise more often from standing variants than from new mutations. Such a high rate of TE-induced adaptation is inconsistent with the number of fixed TEs in the D. melanogaster genome, and we discuss possible explanations for this discrepancy.

View details for DOI 10.1371/journal.pbio.0060251

View details for Web of Science ID 000260423900008

View details for PubMedID 18942889

View details for PubMedCentralID PMC2570423
Pervasive and Persistent Redundancy among Duplicated Genes in Yeast PLOS GENETICS Dean, E. J., Davis, J. C., Davis, R. W., Petrov, D. A. 2008; 4 (7)

Abstract

The loss of functional redundancy is the key process in the evolution of duplicated genes. Here we systematically assess the extent of functional redundancy among a large set of duplicated genes in Saccharomyces cerevisiae. We quantify growth rate in rich medium for a large number of S. cerevisiae strains that carry single and double deletions of duplicated and singleton genes. We demonstrate that duplicated genes can maintain substantial redundancy for extensive periods of time following duplication ( approximately 100 million years). We find high levels of redundancy among genes duplicated both via the whole genome duplication and via smaller scale duplications. Further, we see no evidence that two duplicated genes together contribute to fitness in rich medium substantially beyond that of their ancestral progenitor gene. We argue that duplicate genes do not often evolve to behave like singleton genes even after very long periods of time.

View details for DOI 10.1371/journal.pgen.1000113

View details for Web of Science ID 000260410600025

View details for PubMedID 18604285

View details for PubMedCentralID PMC2440806
Nonadaptive explanations for signatures of partial selective sweeps in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Macpherson, J. M., Gonzalez, J., Witten, D. M., Davis, J. C., Rosenberg, N. A., Hirsh, A. E., Petrov, D. A. 2008; 25 (6): 1025-1042

Abstract

A beneficial mutation that has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether nonadaptive processes might generate similar haplotype configurations has not been extensively explored. Here, we consider 5 population genetic data sets taken from regions flanking high-frequency transposable elements in North American strains of Drosophila melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Whereas most of the data sets would be rejected as nonneutral under the standard neutral null model, only the data set for which there is strong external evidence in support of an adaptive transposition appears to be nonneutral under the more complex null model and in particular when demography is taken into account. High-frequency, derived mutations from a recently bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.

View details for DOI 10.1093/molbev/msn007

View details for Web of Science ID 000255758200004

View details for PubMedID 18199829

View details for PubMedCentralID PMC3299400
Selection on Codon Bias ANNUAL REVIEW OF GENETICS Hershberg, R., Petrov, D. A. 2008; 42: 287-299

Abstract

In a wide variety of organisms, synonymous codons are used with different frequencies, a phenomenon known as codon bias. Population genetic studies have shown that synonymous sites are under weak selection and that codon bias is maintained by a balance between selection, mutation, and genetic drift. It appears that the major cause for selection on codon bias is that certain preferred codons are translated more accurately and/or efficiently. However, additional and sometimes maybe even contradictory selective forces appear to affect codon usage as well. In this review, we discuss the current understanding of the ways in which natural selection participates in the creation and maintenance of codon bias. We also raise several open questions: (i) Is natural selection weak independently of the level of codon bias? It is possible that selection for preferred codons is weak only when codon bias approaches equilibrium and may be quite strong on genes with codon bias levels that are much lower and/or above equilibrium. (ii) What determines the identity of the major codons? (iii) How do shifts in codon bias occur? (iv) What is the exact nature of selection on codon bias? We discuss these questions in depth and offer some ideas on how they can be addressed using a combination of computational and experimental analyses.

View details for DOI 10.1146/annurev.genet.42.110807.091442

View details for Web of Science ID 000261767000014

View details for PubMedID 18983258
Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in drosophila GENETICS Macpherson, J. M., Sella, G., Davis, J. C., Petrov, D. A. 2007; 177 (4): 2083-2099

Abstract

The effect of recurrent selective sweeps is a spatially heterogeneous reduction in neutral polymorphism throughout the genome. The pattern of reduction depends on the selective advantage and recurrence rate of the sweeps. Because many adaptive substitutions responsible for these sweeps also contribute to nonsynonymous divergence, the spatial distribution of nonsynonymous divergence also reflects the distribution of adaptive substitutions. Thus, the spatial correspondence between neutral polymorphism and nonsynonymous divergence may be especially informative about the process of adaptation. Here we study this correspondence using genomewide polymorphism data from Drosophila simulans and the divergence between D. simulans and D. melanogaster. Focusing on highly recombining portions of the autosomes, at a spatial scale appropriate to the study of selective sweeps, we find that neutral polymorphism is both lower and, as measured by a new statistic Q(S), less homogeneous where nonsynonymous divergence is higher and that the spatial structure of this correlation is best explained by the action of strong recurrent selective sweeps. We introduce a method to infer, from the spatial correspondence between polymorphism and divergence, the rate and selective strength of adaptation. Our results independently confirm a high rate of adaptive substitution (approximately 1/3000 generations) and newly suggest that many adaptations are of surprisingly great selective effect (approximately 1%), reducing the effective population size by approximately 15% even in highly recombining regions of the genome.

View details for DOI 10.1534/genetics.107.080226

View details for Web of Science ID 000251949800011

View details for PubMedID 18073425

View details for PubMedCentralID PMC2219485
Similar levels of X-linked and autosomal nucleotide variation in African and non-African populations of Drosophila melanogaster BMC EVOLUTIONARY BIOLOGY Singh, N. D., Macpherson, J. M., Jensen, J. D., Petrov, D. A. 2007; 7

Abstract

Levels of molecular diversity in Drosophila have repeatedly been shown to be higher in ancestral, African populations than in derived, non-African populations. This pattern holds for both coding and noncoding regions for a variety of molecular markers including single nucleotide polymorphisms and microsatellites. Comparisons of X-linked and autosomal diversity have yielded results largely dependent on population of origin.In an attempt to further elucidate patterns of sequence diversity in Drosophila melanogaster, we studied nucleotide variation at putatively nonfunctional X-linked and autosomal loci in sub-Saharan African and North American strains of D. melanogaster. We combine our experimental results with data from previous studies of molecular polymorphism in this species. We confirm that levels of diversity are consistently higher in African versus North American strains. The relative reduction of diversity for X-linked and autosomal loci in the derived, North American strains depends heavily on the studied loci. While the compiled dataset, comprised primarily of regions within or in close proximity to genes, shows a much more severe reduction of diversity on the X chromosome compared to autosomes in derived strains, the dataset consisting of intergenic loci located far from genes shows very similar reductions of diversities for X-linked and autosomal loci in derived strains. In addition, levels of diversity at X-linked and autosomal loci in the presumably ancestral African population are more similar than expected under an assumption of neutrality and equal numbers of breeding males and females.We show that simple demographic scenarios under assumptions of neutral theory cannot explain all of the observed patterns of molecular diversity. We suggest that the simplest model is a population bottleneck that retains an ancestral female-biased sex ratio, coupled with higher rates of positive selection at X-linked loci in close proximity to genes specifically in derived, non-African populations.

View details for DOI 10.1186/1471-2148-7-202

View details for Web of Science ID 000251904900001

View details for PubMedID 17961244

View details for PubMedCentralID PMC2164965
The mode and tempo of genome size evolution in eukaryotes GENOME RESEARCH Oliver, M. J., Petrov, D., Ackerly, D., Falkowski, P., Schofield, O. M. 2007; 17 (5): 594-601

Abstract

Eukaryotic genome size varies over five orders of magnitude; however, the distribution is strongly skewed toward small values. Genome size is highly correlated to a number of phenotypic traits, suggesting that the relative lack of large genomes in eukaryotes is due to selective removal. Using phylogenetic contrasts, we show that the rate of genome size evolution is proportional to genome size, with the fastest rates occurring in the largest genomes. This trend is evident across the 20 major eukaryotic clades analyzed, indicating that over long time scales, proportional change is the dominant and universal mode of genome-size evolution in eukaryotes. Our results reveal that the evolution of eukaryotic genome size can be described by a simple proportional model of evolution. This model explains the skewed distribution of eukaryotic genome sizes without invoking strong selection against large genomes.

View details for DOI 10.1101/gr.6096207

View details for Web of Science ID 000246297900006

View details for PubMedID 17420184

View details for PubMedCentralID PMC1855170
Evolution of gene function on the X chromosome versus the autosomes. Genome dynamics Singh, N. D., Petrov, D. A. 2007; 3: 101-118

Abstract

Sex chromosomes have arisen from autosomes many times over the course of evolution. This process generates chromosomal heteromorphy between the sexes, which has important implications for the evolution of coding and noncoding sequences on the sex chromosomes versus the autosomes. The formation of sex chromosomes from autosomes involves a reduction in gene dosage, which can modify properties of selection pressure on sex-linked genes. This transition also generates differences in the effective population size and dominance characteristics of novel mutations on the sex chromosome versus the autosomes. All of these changes may affect both patterns of in situ gene evolution and the rates of interchromosomal gene duplication and movement. Here we present a synopsis of the current understanding of the origin of sex chromosomes, theoretical context for differences in rates and patterns of molecular evolution on the X chromosome versus the autosomes, as well as a summary of empirical molecular evolutionary data from Drosophila and mammalian genomes.

View details for DOI 10.1159/000107606

View details for PubMedID 18753787
Reduced selection leads to accelerated gene loss in Shigella GENOME BIOLOGY Hershberg, R., Tang, H., Petrov, D. A. 2007; 8 (8)

Abstract

Obligate pathogenic bacteria lose more genes relative to facultative pathogens, which, in turn, lose more genes than free-living bacteria. It was suggested that the increased gene loss in obligate pathogens may be due to a reduction in the effectiveness of purifying selection. Less attention has been given to the causes of increased gene loss in facultative pathogens.We examined in detail the rate of gene loss in two groups of facultative pathogenic bacteria: pathogenic Escherichia coli, and Shigella. We show that Shigella strains are losing genes at an accelerated rate relative to pathogenic E. coli. We demonstrate that a genome-wide reduction in the effectiveness of selection contributes to the observed increase in the rate of gene loss in Shigella.When compared with their closely related pathogenic E. coli relatives, the more niche-limited Shigella strains appear to be losing genes at a significantly accelerated rate. A genome-wide reduction in the effectiveness of purifying selection plays a role in creating this observed difference. Our results demonstrate that differences in the effectiveness of selection contribute to differences in rate of gene loss in facultative pathogenic bacteria. We discuss how the lifestyle and pathogenicity of Shigella may alter the effectiveness of selection, thus influencing the rate of gene loss.

View details for DOI 10.1186/gb-2007-8-8-r164

View details for Web of Science ID 000253938500016

View details for PubMedID 17686180

View details for PubMedCentralID PMC2374995
Minor shift in background substitutional patterns in the Drosophila saltans and willistoni lineages is insufficient to explain GC content of coding sequences BMC BIOLOGY Singh, N. D., Arndt, P. F., Petrov, D. A. 2006; 4

Abstract

Several lines of evidence suggest that codon usage in the Drosophila saltans and D. willistoni lineages has shifted towards a less frequent use of GC-ending codons. Introns in these lineages show a parallel shift toward a lower GC content. These patterns have been alternatively ascribed to either a shift in mutational patterns or changes in the definition of preferred and unpreferred codons in these lineages.To gain additional insight into this question, we quantified background substitutional patterns in the saltans/willistoni group using inactive copies of a novel, Q-like retrotransposable element. We demonstrate that the pattern of background substitutions in the saltans/willistoni lineage has shifted to a significant degree, primarily due to changes in mutational biases. These differences predict a lower equilibrium GC content in the genomes of the saltans/willistoni species compared with that in the D. melanogaster species group. The magnitude of the difference can readily account for changes in intronic GC content, but it appears insufficient to explain changes in codon usage within the saltans/willistoni lineage.We suggest that the observed changes in codon usage in the saltans/willistoni clade reflects either lineage-specific changes in the definitions of preferred and unpreferred codons, or a weaker selective pressure on codon bias in this lineage.

View details for DOI 10.1186/1741-7007-4-37

View details for Web of Science ID 000241651800001

View details for PubMedID 17049096

View details for PubMedCentralID PMC1626080
Fitness cost of LINE-1 (L1) activity in humans PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Boissinot, S., Davis, J., Entezam, A., Petrov, D., Furano, A. V. 2006; 103 (25): 9590-9594

Abstract

The self-replicating LINE-1 (L1) retrotransposon family is the dominant retrotransposon family in mammals and has generated 30-40% of their genomes. Active L1 families are present in modern mammals but the important question of whether these currently active families affect the genetic fitness of their hosts has not been addressed. This issue is of particular relevance to humans as Homo sapiens contains the active L1 Ta1 subfamily of the human specific Ta (L1Pa1) L1 family. Although DNA insertions generated by the Ta1 subfamily can cause genetic defects in current humans, these are relatively rare, and it is not known whether Ta1-generated inserts or any other property of Ta1 elements have been sufficiently deleterious to reduce the fitness of humans. Here we show that full-length (FL) Ta1 elements, but not the truncated Ta1 elements or SINE (Alu) insertions generated by Ta1 activity, were subject to negative selection. Thus, one or more properties unique to FL L1 elements constitute a genetic burden for modern humans. We also found that the FL Ta1 elements became more deleterious as the expansion of Ta1 has proceeded. Because this expansion is ongoing, the Ta1 subfamily almost certainly continues to decrease the fitness of modern humans.

View details for DOI 10.1073/pnas.0603334103

View details for Web of Science ID 000238660400038

View details for PubMedID 16766655
A novel method distinguishes between mutation rates and fixation biases in patterns of single-nucleotide substitution JOURNAL OF MOLECULAR EVOLUTION Lipatov, M., Arndt, P. F., Hwa, T., Petrov, D. A. 2006; 62 (2): 168-175

Abstract

Analysis of the genome-wide patterns of single-nucleotide substitution reveals that the human GC content structure is out of equilibrium. The substitutions are decreasing the overall GC content (GC), at the same time making its range narrower. Investigation of single-nucleotide polymorphisms (SNPs) revealed that presently the decrease in GC content is due to a uniform mutational preference for A:T pairs, while its projected range is due to a variability in the fixation preference for G:C pairs. However, it is important to determine whether lessons learned about evolutionary processes operating at the present time (that is reflected in the SNP data) can be extended back into the evolutionary past. We describe here a new approach to this problem that utilizes the juxtaposition of forward and reverse substitution rates to determine the relative importance of variability in mutation rates and fixation probabilities in shaping long-term substitutional patterns. We use this approach to demonstrate that the forces shaping GC content structure over the recent past (since the appearance of the SNPs) extend all the way back to the mammalian radiation approximately 90 million years ago. In addition, we find a small but significant effect that has not been detected in the SNP data-relatively high rates of C:G-->A:T germline mutation in low-GC regions of the genome.

View details for DOI 10.1007/s00239-005-0207-z

View details for Web of Science ID 000235866300005

View details for PubMedID 16362483
A novel method distinguishes between mutation rates and fixation biases in patterns of single-nucleotide substitution (vol 62, pg 62, 2006) JOURNAL OF MOLECULAR EVOLUTION Lipatov, M., Arndt, P. F., Hwa, T., Petrov, D. A. 2006; 62 (2): 245

View details for DOI 10.1007/s00239-006-7207-5

View details for Web of Science ID 000235866300011
Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogaster genome BMC BIOLOGY Lipatov, M., Lenkov, K., Petrov, D. A., Bergman, C. M. 2005; 3

Abstract

Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events.Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (approximately 80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and approximately 15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence.In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene structures. Our results highlight the need to establish the fixity of putative cases of TE domestication identified using genome sequences in order to demonstrate their functional importance, and reveal that the contribution of TE domestication to genome evolution may vary drastically among animal taxa.

View details for DOI 10.1186/1741-7007-3-24

View details for Web of Science ID 000236370200001

View details for PubMedID 16283942

View details for PubMedCentralID PMC1308810
Do disparate mechanisms of duplication add similar genes to the genome? TRENDS IN GENETICS Davis, J. C., Petrov, D. A. 2005; 21 (10): 548-551

Abstract

Gene duplication is the fundamental source of new genes. Biases in duplication have profound implications for the dynamics of gene content during evolution. In this article, we compare genes arising from whole gene duplication (WGD), smaller scale duplication (SSD) and singletons in Saccharomyces cerevisiae. Our results demonstrate that genes duplicated by WGD and SSD are similarly biased with respect to codon bias and evolutionary rate, although differing significantly in their functional constituency.

View details for DOI 10.1016/j.tig.2005.07.008

View details for Web of Science ID 000232444400005

View details for PubMedID 16098632
Codon bias and noncoding GC content correlate negatively with recombination rate on the Drosophila X chromosome JOURNAL OF MOLECULAR EVOLUTION Singh, N. D., Davis, J. C., Petrov, D. A. 2005; 61 (3): 315-324

Abstract

The patterns and processes of molecular evolution may differ between the X chromosome and the autosomes in Drosophila melanogaster. This may in part be due to differences in the effective population size between the two chromosome sets and in part to the hemizygosity of the X chromosome in Drosophila males. These and other factors may lead to differences both in the gene complements of the X and the autosomes and in the properties of the genes residing on those chromosomes. Here we show that codon bias and recombination rate are correlated strongly and negatively on the X chromosome, and that this correlation cannot be explained by indirect relationships with other known determinants of codon bias. This is in dramatic contrast to the weak positive correlation found on the autosomes. We explored possible explanations for these patterns, which required a comprehensive analysis of the relationships among multiple genetic properties such as protein length and expression level. This analysis highlights conserved features of coding sequence evolution on the X and the autosomes and illuminates interesting differences between these two chromosome sets.

View details for DOI 10.1007/s00239-004-0287-1

View details for Web of Science ID 000231732400004

View details for PubMedID 16044248
X-linked genes evolve higher codon bias in Drosophila and Caenorhabditis GENETICS Singh, N. D., Davis, J. C., Petrov, D. A. 2005; 171 (1): 145-155

Abstract

Comparing patterns of molecular evolution between autosomes and sex chromosomes (such as X and W chromosomes) can provide insight into the forces underlying genome evolution. Here we investigate patterns of codon bias evolution on the X chromosome and autosomes in Drosophila and Caenorhabditis. We demonstrate that X-linked genes have significantly higher codon bias compared to autosomal genes in both Drosophila and Caenorhabditis. Furthermore, genes that become X-linked evolve higher codon bias gradually, over tens of millions of years. We provide several lines of evidence that this elevation in codon bias is due exclusively to their chromosomal location and not to any other property of X-linked genes. We present two possible explanations for these observations. One possibility is that natural selection is more efficient on the X chromosome due to effective haploidy of the X chromosomes in males and persistently low effective numbers of reproducing males compared to that of females. Alternatively, X-linked genes might experience stronger natural selection for higher codon bias as a result of maladaptive reduction of their dosage engendered by the loss of the Y-linked homologs.

View details for DOI 10.1534/genetics.105.043497

View details for Web of Science ID 000232494400014

View details for PubMedID 15965246

View details for PubMedCentralID PMC1456507
Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila SCIENCE Aminetzach, Y. T., Macpherson, J. M., Petrov, D. A. 2005; 309 (5735): 764-767

Abstract

To study adaptation, it is essential to identify multiple adaptive mutations and to characterize their molecular, phenotypic, selective, and ecological consequences. Here we describe a genomic screen for adaptive insertions of transposable elements in Drosophila. Using a pilot application of this screen, we have identified an adaptive transposable element insertion, which truncates a gene and apparently generates a functional protein in the process. The insertion of this transposable element confers increased resistance to an organophosphate pesticide and has spread in D. melanogaster recently.

View details for DOI 10.1126/science.1112699

View details for Web of Science ID 000230938200048

View details for PubMedID 16051794
Substantial regional variation in substitution rates in the human genome: Importance of GC content, gene density, and telomere-specific effects JOURNAL OF MOLECULAR EVOLUTION Arndt, P. F., Hwa, T., Petrov, D. A. 2005; 60 (6): 748-U28

Abstract

This study presents the first global, 1-Mbp-level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to twofold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates, suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp's. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10-15 Mbp away from the telomere.

View details for DOI 10.1007/s00239-004-0222-5

View details for Web of Science ID 000230077700006

View details for PubMedID 15959677
Protein evolution in the context of Drosophila development JOURNAL OF MOLECULAR EVOLUTION Davis, J. C., Brandman, O., Petrov, D. A. 2005; 60 (6): 774-U42

Abstract

The tempo at which a protein evolves depends not only on the rate at which mutations arise but also on the selective effects that those mutations have at the organismal level. It is intuitive that proteins functioning during different stages of development may be predisposed to having mutations of different selective effects. For example, it has been hypothesized that changes to proteins expressed during early development should have larger phenotypic consequences because later stages depend on them. Conversely, changes to proteins expressed much later in development should have smaller consequences at the organismal level. Here we assess whether proteins expressed at different times during Drosophila development vary systematically in their rates of evolution. We find that proteins expressed early in development and particularly during mid-late embryonic development evolve unusually slowly. In addition, proteins expressed in adult males show an elevated evolutionary rate. These two trends are independent of each other and cannot be explained by peculiar rates of mutation or levels of codon bias. Moreover, the observed patterns appear to hold across several functional classes of genes, although the exact developmental time of the slowest protein evolution differs among each class. We discuss our results in connection with data on the evolution of development.

View details for DOI 10.1007/s00239-004-0241-2

View details for Web of Science ID 000230077700008

View details for PubMedID 15909223
Genomic heterogeneity of background substitutional patterns in Drosophila melanogaster GENETICS Singh, N. D., Arndt, P. F., Petrov, D. A. 2005; 169 (2): 709-722

Abstract

Mutation is the underlying force that provides the variation upon which evolutionary forces can act. It is important to understand how mutation rates vary within genomes and how the probabilities of fixation of new mutations vary as well. If substitutional processes across the genome are heterogeneous, then examining patterns of coding sequence evolution without taking these underlying variations into account may be misleading. Here we present the first rigorous test of substitution rate heterogeneity in the Drosophila melanogaster genome using almost 1500 nonfunctional fragments of the transposable element DNAREP1_DM. Not only do our analyses suggest that substitutional patterns in heterochromatic and euchromatic sequences are different, but also they provide support in favor of a recombination-associated substitutional bias toward G and C in this species. The magnitude of this bias is entirely sufficient to explain recombination-associated patterns of codon usage on the autosomes of the D. melanogaster genome. We also document a bias toward lower GC content in the pattern of small insertions and deletions (indels). In addition, the GC content of noncoding DNA in Drosophila is higher than would be predicted on the basis of the pattern of nucleotide substitutions and small indels. However, we argue that the fast turnover of noncoding sequences in Drosophila makes it difficult to assess the importance of the GC biases in nucleotide substitutions and small indels in shaping the base composition of noncoding sequences.

View details for DOI 10.1534/genetics.104.032250

View details for Web of Science ID 000227697200018

View details for PubMedID 15520267

View details for PubMedCentralID PMC1449091
Elevated evolutionary rates in the laboratory strain of Saccharomyces cerevisiae PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gu, Z. L., David, L., Petrov, D., Jones, T., Davis, R. W., Steinmetz, L. M. 2005; 102 (4): 1092-1097

Abstract

By using the maximum likelihood method, we made a genome-wide comparison of the evolutionary rates in the lineages leading to the laboratory strain (S288c) and a wild strain (YJM789) of Saccharomyces cerevisiae and found that genes in the laboratory strain tend to evolve faster than in the wild strain. The pattern of elevated evolution suggests that relaxation of selection intensity is the dominant underlying reason, which is consistent with recurrent bottlenecks in the S. cerevisiae laboratory strain population. Supporting this conclusion are the following observations: (i) the increases in nonsynonymous evolutionary rate occur for genes in all functional categories; (ii) most of the synonymous evolutionary rate increases in S288c occur in genes with strong codon usage bias; (iii) genes under stronger negative selection have a larger increase in nonsynonymous evolutionary rate; and (iv) more genes with adaptive evolution were detected in the laboratory strain, but they do not account for the majority of the increased evolution. The present discoveries suggest that experimental and possible industrial manipulations of the laboratory strain of yeast could have had a strong effect on the genetic makeup of this model organism. Furthermore, they imply an evolution of laboratory model organisms away from their wild counterparts, questioning the relevancy of the models especially when extensive laboratory cultivation has occurred. In addition, these results shed light on the evolution of livestock and crop species that have been under human domestication for years.

View details for DOI 10.1073/pnas.0409159102

View details for Web of Science ID 000226617900026

View details for PubMedID 15647350

View details for PubMedCentralID PMC545845
The large genome constraint hypothesis: Evolution, ecology and phenotype 2nd Plant Genome Size Workshop and Discussion Meeting Knight, C. A., Molinari, N. A., Petrov, D. A. OXFORD UNIV PRESS. 2005: 177–90

Abstract

If large genomes are truly saturated with unnecessary 'junk' DNA, it would seem natural that there would be costs associated ith accumulation and replication of this excess DNA. Here we examine the available evidence to support this hypothesis, which we term the 'large genome constraint'. We examine the large genome constraint at three scales: evolution, ecology, and the plant phenotype.In evolution, we tested the hypothesis that plant lineages with large genomes are diversifying more slowly. We found that genera with large genomes are less likely to be highly specious -- suggesting a large genome constraint on speciation. In ecology, we found that species with large genomes are under-represented in extreme environments -- again suggesting a large genome constraint for the distribution and abundance of species. Ultimately, if these ecological and evolutionary constraints are real, the genome size effect must be expressed in the phenotype and confer selective disadvantages. Therefore, in phenotype, we review data on the physiological correlates of genome size, and present new analyses involving maximum photosynthetic rate and specific leaf area. Most notably, we found that species with large genomes have reduced maximum photosynthetic rates - again suggesting a large genome constraint on plant performance. Finally, we discuss whether these phenotypic correlations may help explain why species with large genomes are trimmed from the evolutionary tree and have restricted ecological distributions.Our review tentatively supports the large genome constraint hypothesis.

View details for DOI 10.1093/aob/rnci011

View details for Web of Science ID 000226370900011

View details for PubMedID 15596465
Enhancer choice in cis and in trans in Drosophila melanogaster: Role of the promoter GENETICS Morris, J. R., Petrov, D. A., Lee, A. M., Wu, C. T. 2004; 167 (4): 1739-1747

Abstract

Eukaryotic enhancers act over very long distances, yet still show remarkable specificity for their own promoter. To better understand mechanisms underlying this enhancer-promoter specificity, we used transvection to analyze enhancer choice between two promoters, one located in cis to the enhancer and the other in trans to the enhancer, at the yellow gene of Drosophila melanogaster. Previously, we demonstrated that enhancers at yellow prefer to act on the cis-linked promoter, but that mutation of core promoter elements in the cis-linked promoter releases enhancers to act in trans. Here, we address the mechanism by which these elements affect enhancer choice. We consider and explicitly test three models that are based on promoter competency, promoter pairing, and promoter identity. Through targeted gene replacement of the endogenous yellow gene, we show that competency of the cis-linked promoter is a key parameter in the cis-trans choice of an enhancer. In fact, complete replacement of the yellow promoter with both TATA-containing and TATA-less heterologous promoters maintains enhancer action in cis.

View details for DOI 10.1534/genetics.104.026955

View details for Web of Science ID 000223720300018

View details for PubMedID 15342512

View details for PubMedCentralID PMC1471007
Rapid sequence turnover at an intergenic locus in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Singh, N. D., Petrov, D. A. 2004; 21 (4): 670-680

Abstract

Closely related species of Drosophila tend to have similar genome sizes. The strong imbalance in favor of small deletions relative to insertions implies that the unconstrained DNA in Drosophila is unlikely to be passively inherited from even closely related ancestors, and yet most DNA in Drosophila genomes is intergenic and potentially unconstrained. In an attempt to investigate the maintenance of this intergenic DNA, we studied the evolution of an intergenic locus on the fourth chromosome of the Drosophila melanogaster genome. This 1.2-kb locus is marked by two distinct, large insertion events: a nuclear transposition of a mitochondrial sequence and a transposition of a nonautonomous DNA transposon DNAREP1_DM. Because we could trace the evolutionary histories of these sequences, we were able to reconstruct the length evolution of this region in some detail. We sequenced this locus in all four species of the D. melanogaster species complex: D. melanogaster, D. simulans, D. sechellia, and D. mauritiana. Although this locus is similar in size in these four species, less than 10% of the sequence from the most recent common ancestor remains in D. melanogaster and all of its sister species. This region appears to have increased in size through several distinct insertions in the ancestor of the D. melanogaster species complex and has been shrinking since the split of these lineages. In addition, we found no evidence suggesting that the size of this locus has been maintained over evolutionary time; these results are consistent with the model of a dynamic equilibrium between persistent DNA loss through small deletions and more sporadic DNA gain through less frequent but longer insertions. The apparent stability of genome size in Drosophila may belie very rapid sequence turnover at intergenic loci.

View details for DOI 10.1093/molbev/msh060

View details for Web of Science ID 000220685200006

View details for PubMedID 14739245
Preferential duplication of conserved proteins in eukaryotic genomes. PLoS biology Davis, J. C., Petrov, D. A. 2004; 2 (3): E55-?

Abstract

A central goal in genome biology is to understand the origin and maintenance of genic diversity. Over evolutionary time, each gene's contribution to the genic content of an organism depends not only on its probability of long-term survival, but also on its propensity to generate duplicates that are themselves capable of long-term survival. In this study we investigate which types of genes are likely to generate functional and persistent duplicates. We demonstrate that genes that have generated duplicates in the C. elegans and S. cerevisiae genomes were 25%-50% more constrained prior to duplication than the genes that failed to leave duplicates. We further show that conserved genes have been consistently prolific in generating duplicates for hundreds of millions of years in these two species. These findings reveal one way in which gene duplication shapes the content of eukaryotic genomes. Our finding that the set of duplicate genes is biased has important implications for genome-scale studies.

View details for PubMedID 15024414
Preferential duplication of conserved proteins in eukaryotic genomes PLOS BIOLOGY Davis, J. C., Petrov, D. A. 2004; 2 (3): 318-326

Abstract

A central goal in genome biology is to understand the origin and maintenance of genic diversity. Over evolutionary time, each gene's contribution to the genic content of an organism depends not only on its probability of long-term survival, but also on its propensity to generate duplicates that are themselves capable of long-term survival. In this study we investigate which types of genes are likely to generate functional and persistent duplicates. We demonstrate that genes that have generated duplicates in the C. elegans and S. cerevisiae genomes were 25%-50% more constrained prior to duplication than the genes that failed to leave duplicates. We further show that conserved genes have been consistently prolific in generating duplicates for hundreds of millions of years in these two species. These findings reveal one way in which gene duplication shapes the content of eukaryotic genomes. Our finding that the set of duplicate genes is biased has important implications for genome-scale studies.

View details for DOI 10.1371/journal.pbio.0020055

View details for Web of Science ID 000220512000008

View details for PubMedCentralID PMC368158
Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation MOLECULAR BIOLOGY AND EVOLUTION Arndt, P. F., Petrov, D. A., Hwa, T. 2003; 20 (11): 1887-1896

Abstract

Differences in the regional substitution patterns in the human genome created patterns of large-scale variation of base composition known as genomic isochores. To gain insight into the origin of the genomic isochores, we develop a maximum-likelihood approach to determine the history of substitution patterns in the human genome. This approach utilizes the vast amount of repetitive sequence deposited in the human genome over the past approximately 250 Myr. Using this approach, we estimate the frequencies of seven types of substitutions: the four transversions, two transitions, and the methyl-assisted transition of cytosine in CpG. Comparing substitutional patterns in repetitive elements of various ages, we reconstruct the history of the base-substitutional process in the different isochores for the past 250 Myr. At around 90 MYA (around the time of the mammalian radiation), we find an abrupt fourfold to eightfold increase of the cytosine transition rate in CpG pairs compared with that of the reptilian ancestor. Further analysis of nucleotide substitutions in regions with different GC content reveals concurrent changes in the substitutional patterns. Although the substitutional pattern was dependent on the regional GC content in such ways that it preserved the regional GC content before the mammalian radiation, it lost this dependence afterward. The substitutional pattern changed from an isochore-preserving to an isochore-degrading one. We conclude that isochores have been established before the radiation of the eutherian mammals and have been subject to the process of homogenization since then.

View details for DOI 10.1093/molbev/msg204

View details for Web of Science ID 000186618200017

View details for PubMedID 12885958
Rates of DNA duplication and mitochondrial DNA insertion in the human genome JOURNAL OF MOLECULAR EVOLUTION Bensasson, D., Feldman, M. W., Petrov, D. A. 2003; 57 (3): 343-354

Abstract

The hundreds of mitochondrial pseudogenes in the human nuclear genome sequence (numts) constitute an excellent system for studying and dating DNA duplications and insertions. These pseudogenes are associated with many complete mitochondrial genome sequences and through those with a good fossil record. By comparing individual numts with primate and other mammalian mitochondrial genome sequences, we estimate that these numts arose continuously over the last 58 million years. Our pairwise comparisons between numts suggest that most human numts arose from different mitochondrial insertion events and not by DNA duplication within the nuclear genome. The nuclear genome appears to accumulate mtDNA insertions at a rate high enough to predict within-population polymorphism for the presence/absence of many recent mtDNA insertions. Pairwise analysis of numts and their flanking DNA produces an estimate for the DNA duplication rate in humans of 2.2 x 10(-9) per numt per year. Thus, a nucleotide site is about as likely to be involved in a duplication event as it is to change by point substitution. This estimate of the rate of DNA duplication of noncoding DNA is based on sequences that are not in duplication hotspots, and is close to the rate reported for functional genes in other species.

View details for DOI 10.1007/s00239-003-2485-7

View details for Web of Science ID 000184992800012

View details for PubMedID 14629044
Size matters: Non-LTR retrotransposable elements and ectopic recombination in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Aminetzach, Y. T., Davis, J. C., Bensasson, D., Hirsh, A. E. 2003; 20 (6): 880-892

Abstract

The Drosophila melanogaster genome contains approximately 100 distinct families of transposable elements (TEs). In the euchromatic part of the genome, each family is present in a small number of copies (5-150 copies), with individual copies of TEs often present at very low frequencies in populations. This pattern is likely to reflect a balance between the inflow of TEs by transposition and the removal of TEs by natural selection. The nature of natural selection acting against TEs remains controversial. We provide evidence that selection against chromosome abnormalities caused by ectopic recombination limits the spread of some TEs. We also demonstrate for the first time that some TE families in the Drosophila euchromatin appear to be only marginally affected by purifying selection and contain many copies at high population frequencies. We argue that TEs in these families attain high population frequencies and even reach fixation as a result of low family-wide transposition rates leading to low TE copy numbers and consequently reduced strength of selection acting on individual TE copies. Fixation of TEs in these families should provide an upward pressure on the size of intergenic sequences counterbalancing rapid DNA loss through small deletions. Copy-number-dependent selection on TE families caused by ectopic recombination may also promote diversity among TEs in the Drosophila genome.

View details for DOI 10.1093/molbev/msg102

View details for Web of Science ID 000183138500004

View details for PubMedID 12716993
Transposable elements in clonal lineages: lethal hangover from sex Nuzhdin, S. V., Petrov, D. A. OXFORD UNIV PRESS. 2003: 33–41

View details for DOI 10.1046/j.1095-8312.2003.00188.x

View details for Web of Science ID 000182913100005
How intron splicing affects the deletion and insertion profile in Drosophila melanogaster GENETICS Ptak, S. E., Petrov, D. A. 2002; 162 (3): 1233-1244

Abstract

Studies of "dead-on-arrival" transposable elements in Drosophila melanogaster found that deletions outnumber insertions approximately 8:1 with a median size for deletions of approximately 10 bp. These results are consistent with the deletion and insertion profiles found in most other Drosophila pseudogenes. In contrast, a recent study of D. melanogaster introns found a deletion/insertion ratio of 1.35:1, with 84% of deletions being shorter than 10 bp. This discrepancy could be explained if deletions, especially long deletions, are more frequently strongly deleterious than insertions and are eliminated disproportionately from intron sequences. To test this possibility, we use analysis and simulations to examine how deletions and insertions of different lengths affect different components of splicing and determine the distribution of deletions and insertions that preserve the original exons. We find that, consistent with our predictions, longer deletions affect splicing at a much higher rate compared to insertions and short deletions. We also explore other potential constraints in introns and show that most of these also disproportionately affect large deletions. Altogether we demonstrate that constraints in introns may explain much of the difference in the pattern of deletions and insertions observed in Drosophila introns and pseudogenes.

View details for Web of Science ID 000179739900020

View details for PubMedID 12454069
SEGE: A database on 'intron less/single exonic' genes from eukaryotes BIOINFORMATICS Sakharkar, M. K., Kangueane, P., Petrov, D. A., Kolaskar, A. S., Subbiah, S. 2002; 18 (9): 1266-1267

Abstract

Eukaryotes have both 'intron containing' and 'intron less' genes. Several databases are available for 'intron containing' genes in eukaryotes. In this note, we describe a database for 'intron less' genes from eukaryotes. 'Intron less' eukaryotic genes having prokaryotic architecture will help to understand gene evolution in a much simpler way unlike 'intron containing' genes.SEGE is available at http://intron.bic.nus.edu.sg/seg/mmeena@ntu.edu.sg

View details for Web of Science ID 000178001400015

View details for PubMedID 12217920
Mutational equilibrium model of genome size evolution THEORETICAL POPULATION BIOLOGY Petrov, D. A. 2002; 61 (4): 531-544

Abstract

The paper describes a mutational equilibrium model of genome size evolution. This model is different from both adaptive and junk DNA models of genome size evolution in that it does not assume that genome size is maintained either by positive or stabilizing selection for the optimum genome size (as in adaptive theories) or by purifying selection against too much junk DNA (as in junk DNA theories). Instead the genome size is suggested to evolve until the loss of DNA through more frequent small deletions is equal to the rate of DNA gain through more frequent long insertions. The empirical basis for this theory is the finding of a strong correlation and of a clear power-function relationship between the rate of mutational DNA loss (per bp) through small deletions and genome size in animals. Genome size scales as a negative 1.3 power function of the deletion rate per nucleotide. Such a relationship is not predicted by either adaptive or junk DNA theories. However, if genome size is maintained at equilibrium by the balance of mutational forces, this empirilical relationship can be readily accommodated. Within this framework, this finding would imply that the rate of DNA gain through large insertions scales up a quarter-power function of genome size. On this view, as genome size grows, the rate of growth through large insertions is increasing as a quarter power function of genome size and the rate of DNA loss through small deletions increases linearly, until eventually, at the stable equilibrium genome size value, rates of growth and loss equal each other. The current data also suggest that the long-term variation is genome size in animals is brought about to a significant extent by changes in the intrinsic rates of DNA loss through small deletions. Both the origin of mutational biases and the adaptive consequences of such a mode of evolution of genome size are discussed.

View details for DOI 10.1006/tpbi.2002.1605

View details for Web of Science ID 000177739500016

View details for PubMedID 12167373
DNA loss and evolution of genome size in Drosophila GENETICA Petrov, D. A. 2002; 115 (1): 81-91

Abstract

Mutation is often said to be random. Although it must be true that mutation is ignorant about the adaptive needs of the organism and thus is random relative to them as a rule, mutation is not truly random in other respects. Nucleotide substitutions, deletions, insertions, inversions, duplications and other types of mutation occur at different rates and are effected by different mechanisms. Moreover the rates of different mutations vary from organism to organism. Differences in mutational biases, along with natural selection, could impact gene and genome evolution in important ways. For instance, several recent studies have suggested that differences in insertion/deletion biases lead to profound differences in the rate of DNA loss in animals and that this difference per se can lead to significant changes in genome size. In particular, Drosophila melanogaster appears to have a very high rate of deletions and the correspondingly high rate of DNA loss and a very compact genome. To assess the validity of these studies we must first assess the validity of the measurements of indel biases themselves. Here I demonstrate the robustness of indel bias measurements in Drosophila, by comparing indel patterns in different types of nonfunctional sequences. The indel pattern and the high rate of DNA loss appears to be shared by all known nonfunctional sequences, both euchromatic and heterochromatic, transposable and non-transposable, repetitive and unique. Unfortunately all available nonfunctional sequences are untranscribed and thus effects of transcription on indel bias cannot be assessed. I also discuss in detail why it is unlikely that natural selection for or against DNA loss significantly affects current estimates of indel biases.

View details for Web of Science ID 000176413900007

View details for PubMedID 12188050
Gene galaxies in the maize genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Walbot, V., Petrov, D. A. 2001; 98 (15): 8163-8164

View details for Web of Science ID 000169967000003

View details for PubMedID 11459945

View details for PubMedCentralID PMC37413
Genomic gigantism: DNA loss is slow in mountain grasshoppers MOLECULAR BIOLOGY AND EVOLUTION Bensasson, D., Petrov, D. A., Zhang, D. X., Hartl, D. L., Hewitt, G. M. 2001; 18 (2): 246-253

Abstract

Several studies have shown DNA loss to be inversely correlated with genome size in animals. These studies include a comparison between Drosophila and the cricket, Laupala, but there has been no assessment of DNA loss in insects with very large genomes. Podisma pedestris, the brown mountain grasshopper, has a genome over 100 times as large as that of Drosophila and 10 times as large as that of Laupala. We used 58 paralogous nuclear pseudogenes of mitochondrial origin to study the characteristics of insertion, deletion, and point substitution in P. pedestris and Italopodisma. In animals, these pseudogenes are "dead on arrival"; they are abundant in many different eukaryotes, and their mitochondrial origin simplifies the identification of point substitutions accumulated in nuclear pseudogene lineages. There appears to be a mononucleotide repeat within the 643-bp pseudogene sequence studied that acts as a strong hot spot for insertions or deletions (indels). Because the data for other insect species did not contain such an unusual region, hot spots were excluded from species comparisons. The rate of DNA loss relative to point substitution appears to be considerably and significantly lower in the grasshoppers studied than in Drosophila or Laupala. This suggests that the inverse correlation between genome size and the rate of DNA loss can be extended to comparisons between insects with large or gigantic genomes (i.e., Laupala and Podisma). The low rate of DNA loss implies that in grasshoppers, the accumulation of point mutations is a more potent force for obscuring ancient pseudogenes than their loss through indel accumulation, whereas the reverse is true for Drosophila. The main factor contributing to the difference in the rates of DNA loss estimated for grasshoppers, crickets, and Drosophila appears to be deletion size. Large deletions are relatively rare in Podisma and Italopodisma.

View details for Web of Science ID 000166775100015

View details for PubMedID 11158383
Evolution of genome size: new approaches to an old problem TRENDS IN GENETICS Petrov, D. A. 2001; 17 (1): 23-28

Abstract

Eukaryotic genomes come in a wide variety of sizes. Haploid DNA contents (C values) range > 80,000-fold without an apparent correlation with either the complexity of the organism or the number of genes. This puzzling observation, the C-value paradox, has remained a mystery for almost half a century, despite much progress in the elucidation of the structure and function of genomes. Here I argue that new approaches focussing on the genetic mechanisms that generate genome-size differences could shed much light on the evolution of genome size.

View details for Web of Science ID 000168717900007

View details for PubMedID 11163918
Pseudogene evolution and natural selection for a compact genome Symposium on Genetic Diversity and Evolution Petrov, D. A., Hartl, D. L. OXFORD UNIV PRESS INC. 2000: 221–27

Abstract

Pseudogenes are nonfunctional copies of protein-coding genes that are presumed to evolve without selective constraints on their coding function. They are of considerable utility in evolutionary genetics because, in the absence of selection, different types of mutations in pseudogenes should have equal probabilities of fixation. This theoretical inference justifies the estimation of patterns of spontaneous mutation from the analysis of patterns of substitutions in pseudogenes. Although it is possible to test whether pseudogene sequences evolve without constraints for their protein-coding function, it is much more difficult to ascertain whether pseudogenes may affect fitness in ways unrelated to their nucleotide sequence. Consider the possibility that a pseudogene affects fitness merely by increasing genome size. If a larger genome is deleterious--for example, because of increased energetic costs associated with genome replication and maintenance--then deletions, which decrease the length of a pseudogene, should be selectively advantageous relative to insertions or nucleotide substitutions. In this article we examine the implications of selection for genome size relative to small (1-400 bp) deletions, in light of empirical evidence pertaining to the size distribution of deletions observed in Drosophila and mammalian pseudogenes. There is a large difference in the deletion spectra between these organisms. We argue that this difference cannot easily be attributed to selection for overall genome size, since the magnitude of selection is unlikely to be strong enough to significantly affect the probability of fixation of small deletions in Drosophila.

View details for Web of Science ID 000087190900007

View details for PubMedID 10833048
Evidence for DNA loss as a determinant of genome size SCIENCE Petrov, D. A., Sangster, T. A., Johnston, J. S., Hartl, D. L., Shaw, K. L. 2000; 287 (5455): 1060-1062

Abstract

Eukaryotic genome sizes range over five orders of magnitude. This variation cannot be explained by differences in organismic complexity (the C value paradox). To test the hypothesis that some variation in genome size can be attributed to differences in the patterns of insertion and deletion (indel) mutations among organisms, this study examines the indel spectrum in Laupala crickets, which have a genome size 11 times larger than that of Drosophila. Consistent with the hypothesis, DNA loss is more than 40 times slower in Laupala than in Drosophila.

View details for Web of Science ID 000085245400053

View details for PubMedID 10669421
Genome size as a mutation-selection-drift process Fukuoka International Symposium of Population Genetics Lozovskaya, E. R., Nurminsky, D., Petrov, D. A., Hartl, D. L. GENETICS SOC JAPAN. 1999: 201–7

Abstract

A novel method for estimating neutral rates and patterns of DNA evolution in Drosophila takes advantage of the propensity of non-LTR retrotransposable elements to create nonfunctional, transpositionally inactive copies as a product of transposition. For many LINE elements, most copies present in a genome at any one time are nonfunctional "dead-on-arrival" (DOA) copies. Because these are off-shoots of active, transpositionally competent "master" lineages, in a gene tree of a LINE element from multiple samples from related species, the DOA lineages are expected to map to the terminal branches and the active lineages to the internal branches, the primary exceptions being when the sample includes DOA copies that are allelic or orthologous. Analysis of nucleotide substitutions and other changes along the terminal branches therefore allows estimation of the fixation process in the DOA copies, which are unconstrained with respect to protein coding; and under selective neutrality, the fixation process estimates the underlying mutational pattern. We have studied the retroelement Helena in Drosophila. An unexpectedly high rate of DNA loss was observed, yielding a half-life of unconstrained DNA sequences approximately 60-fold faster in Drosophila than in mammals. The high rate of DNA loss suggests a straightforward explanation of the seeming paradox that Drosophila has many fewer pseudogenes than found in mammalian species. Differential rates of deletion in different taxa might also contribute to the celebrated C-value paradox of why some closely related organisms can have very different DNA contents. New data presented here rule out the possibility that the transposition process itself is highly mutagenic, hence the observed linear relation between number of deletions and number of nucleotide substitutions is most easily explained by the hypothesis that both types of changes accumulate in unconstrained sequences over time.

View details for Web of Science ID 000085786200003

View details for PubMedID 10734601
Patterns of nucleotide substitution in Drosophila and mammalian genomes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Petrov, D. A., Hartl, D. L. 1999; 96 (4): 1475-1479

Abstract

To estimate patterns of molecular evolution of unconstrained DNA sequences, we used maximum parsimony to separate phylogenetic trees of a non-long terminal repeat retrotransposable element into either internal branches, representing mainly the constrained evolution of active lineages, or into terminal branches, representing mainly nonfunctional "dead-on-arrival" copies that are unconstrained by selection and evolve as pseudogenes. The pattern of nucleotide substitutions in unconstrained sequences is expected to be congruent with the pattern of point mutation. We examined the retrotransposon Helena in the Drosophila virilis species group (subgenus Drosophila) and the Drosophila melanogaster species subgroup (subgenus Sophophora). The patterns of point mutation are indistinguishable, suggesting considerable stability over evolutionary time (40-60 million years). The relative frequencies of different point mutations are unequal, but the "transition bias" results largely from an approximately 2-fold excess of G.C to A.T substitutions. Spontaneous mutation is biased toward A.T base pairs, with an expected mutational equilibrium of approximately 65% A + T (quite similar to that of long introns). These data also enable the first detailed comparison of patterns of point mutations in Drosophila and mammals. Although the patterns are different, all of the statistical significance comes from a much greater rate of G.C to A.T substitution in mammals, probably because of methylated cytosine "hotspots." When the G.C to A.T substitutions are discounted, the remaining differences are considerably reduced and not statistically significant.

View details for Web of Science ID 000078698400056

View details for PubMedID 9990048

View details for PubMedCentralID PMC15487
Pseudogene evolution in Drosophila suggests a high rate of DNA loss MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Chao, Y. C., Stephenson, E. C., Hartl, D. L. 1998; 15 (11): 1562-1567

View details for Web of Science ID 000076888400019

View details for PubMedID 12572619
Genome size and intron size in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Moriyama, E. N., Petrov, D. A., Hartl, D. L. 1998; 15 (6): 770-773

View details for Web of Science ID 000073759400016

View details for PubMedID 9615458
High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Hartl, D. L. 1998; 15 (3): 293-302

Abstract

We recently proposed that patterns of evolution of non-LTR retrotransposable elements can be used to study patterns of spontaneous mutation. Transposition of non-LTR retrotransposable elements commonly results in creation of 5' truncated, "dead-on-arrival" copies. These inactive copies are effectively pseudogenes and, according to the neutral theory, their molecular evolution ought to reflect rates and patterns of spontaneous mutation. Maximum parsimony can be used to separate the evolution of active lineages of a non-LTR element from the fate of the "dead-on-arrival" insertions and to directly assess the relative frequencies of different types of spontaneous mutations. We applied this approach using a non-LTR element, Helena, in the Drosophila virilis group and have demonstrated a surprisingly high incidence of large deletions and the virtual absence of insertions. Based on these results, we suggested that Drosophila in general may exhibit a high rate of spontaneous large deletions and have hypothesized that such a high rate of DNA loss may help to explain the puzzling dearth of bona fide pseudogenes in Drosophila. We also speculated that variation in the rate of spontaneous deletion may contribute to the divergence of genome size in different taxa by affecting the amount of superfluous "junk" DNA such as, for example, pseudogenes or long introns. In this paper, we extend our analysis to the D. melanogaster subgroup, which last shared a common ancestor with the D. virilis group approximately 40 MYA. In a different region of the same transposable element, Helena, we demonstrate that inactive copies accumulate deletions in species of the D. melanogaster subgroup at a rate very similar to that of the D. virilis group. These results strongly suggest that the high rate of DNA loss is a general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species group.

View details for Web of Science ID 000072361600007

View details for PubMedID 9501496
Trash DNA is what gets thrown away: high rate of DNA loss in Drosophila International-Society-of-Molecular-Evolution Symposium on Junk DNA - the Role and the Evolution of Non-Coding Sequences Petrov, D. A., Hartl, D. L. ELSEVIER SCIENCE BV. 1997: 279–89

Abstract

We have recently described a novel method of estimating neutral rates and patterns of spontaneous mutation (Petrov et al., 1996). This method takes advantage of the propensity of non-LTR retrotransposable elements to create non-functional, 'dead-on-arrival' copies as a product of transposition. Maximum parsimony analysis is used to separate the evolution of actively transposing lineages of a non-LTR element from the fate of individual inactive insertions, and thereby allows one to assess directly the relative rates of different types of mutation, including point substitutions, deletions and insertions. Because non-LTR elements enjoy wide phylogenetic distribution, this method can be used in taxa that do not harbor a significant number of bona fide pseudogenes, as is the case in Drosophila (Jeffs and Ashburner, 1991; Weiner et al., 1986). We used this method with Helena, a non-LTR retrotransposable element present in the Drosophila virilis species group. A striking finding was the virtual absence of insertions and remarkably high incidence of large deletions, which combine to produce a high overall rate of DNA loss. On average, the rate of DNA loss in D. virilis is approximately 75 times faster than that estimated for mammalian pseudogenes (Petrov et al., 1996). The high rate of DNA loss should lead to rapid elimination of non-essential DNA and thus may explain the seemingly paradoxical dearth of pseudogenes in Drosophila. Varying rates of DNA loss may also contribute to differences in genome size (Graur et al., 1989; Petrov et al., 1996), thus explaining the celebrated 'C-value' paradox (John and Miklos, 1988). In this paper we outline the theoretical basis of our method, examine the data from this perspective, and discuss potential problems that may bias our estimates.

View details for Web of Science ID 000071411800030

View details for PubMedID 9461402
Slow but steady: Reduction of genome size through biased mutation PLANT CELL Petrov, D. 1997; 9 (11): 1900-1901

View details for Web of Science ID A1997YJ50700003
High intrinsic: Rate of DNA loss in Drosophila NATURE Petrov, D. A., Lozovskaya, E. R., Hartl, D. L. 1996; 384 (6607): 346-349

Abstract

Pseudogenes are common in mammals but virtually absent in Drosophila. All putative Drosophila pseudogenes show patterns of molecular evolution that are inconsistent with the lack of functional constraints. The absence of bona fide pseudogenes is not only puzzling, it also hampers attempts to estimate rates and patterns of neutral DNA change. The estimation problem is especially acute in the case of deletions and insertions, which are likely to have large effects when they occur in functional genes and are therefore subject to strong purifying selection. We propose a solution to this problem by taking advantage of the propensity of retrotransposable elements without long terminal repeats (non-LTR) to create non-functional, 'dead-on-arrival' copies of themselves as a common by-product of their transpositional cycle. Phylogenetic analysis of a non-LTR element, Helena, demonstrates that copies lose DNA at an unusually high rate, suggesting that lack of pseudogenes in Drosophila is the product of rampant deletion of DNA in unconstrained regions. This finding has important implications for the study of genome evolution in general and the 'C-value paradox' in particular.

View details for Web of Science ID A1996VV27100045

View details for PubMedID 8934517
Triple-ligation strategy with advantages over directional cloning BIOTECHNIQUES Siegal, M. L., Petrov, D. A., DeAguiar, D. 1996; 21 (4): 614-?

View details for Web of Science ID A1996VL40500009

View details for PubMedID 8891209
GENOMIC REGULATION OF TRANSPOSABLE ELEMENTS IN DROSOPHILA CURRENT OPINION IN GENETICS & DEVELOPMENT Lozovskaya, E. R., Hartl, D. L., Petrov, D. A. 1995; 5 (6): 768-773

Abstract

Transposable elements are a major source of genetic change, including the creation of novel genes, the alteration of gene expression in development, and the genesis of major genomic rearrangements. They are ubiquitous among contemporary organisms and probably as old as life itself. The long coexistence of transposable elements in the genome would be expected to be accompanied by host-element coevolution. Indeed, the important role of host factors in the regulation of transposable elements has been illuminated by recent studies of several systems in Drosophila. These include host factors that regulate the P element, a host mutation that renders the genome permissive for gypsy mobilization and infection, and newly induced mutations that affect the expression of transposon insertion mutations. The finding of a type of hybrid dysgenesis in D. virilis, in which multiple unrelated transposable elements are mobilized simultaneously, may also be relevant to host-factor regulation of transposition.

View details for Web of Science ID A1995TJ92700009

View details for PubMedID 8745075
DIVERSE TRANSPOSABLE ELEMENTS ARE MOBILIZED IN HYBRID DYSGENESIS IN DROSOPHILA-VIRILIS PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Petrov, D. A., Schutzman, J. L., Hartl, D. L., Lozovskaya, E. R. 1995; 92 (17): 8050-8054

Abstract

We describe a system of hybrid dysgenesis in Drosophila virilis in which at least four unrelated transposable elements are all mobilized following a dysgenic cross. The data are largely consistent with the superposition of at least three different systems of hybrid dysgenesis, each repressing a different transposable element, which break down following the hybrid cross, possibly because they share a common pathway in the host. The data are also consistent with a mechanism in which mobilization of a single element triggers that of others, perhaps through chromosome breakage. The mobilization of multiple, unrelated elements in hybrid dysgenesis is reminiscent of McClintock's evidence [McClintock, B. (1955) Brookhaven Symp. Biol. 8, 58-74] for simultaneous mobilization of different transposable elements in maize.

View details for Web of Science ID A1995RP74800092

View details for PubMedID 7644536

View details for PubMedCentralID PMC41284
A COMBINED MOLECULAR AND CYTOGENETIC APPROACH TO GENOME EVOLUTION IN DROSOPHILA USING LARGE-FRAGMENT DNA CLONING CHROMOSOMA Lozovskaya, E. R., Petrov, D. A., Hartl, D. L. 1993; 102 (4): 253-266

Abstract

Methods of genome analysis, including the cloning and manipulation of large fragments of DNA, have opened new strategies for uniting molecular evolutionary genetics with chromosome evolution. We have begun the development of a physical map of the genome of Drosophila virilis based on large DNA fragments cloned in bacteriophage P1. A library of 10,080 P1 clones with average insert sizes of 65.8 kb, containing approximately 3.7 copies of the haploid genome of D. virilis, has been constructed and characterized. Approximately 75% of the clones have inserts exceeding 50 kb, and approximately 25% have inserts exceeding 80 kb. A sample of 186 randomly selected clones was mapped by in situ hybridization with the salivary gland chromosomes. A method for identifying D. virilis clones containing homologs of D. melanogaster genes has also been developed using hybridization with specific probes obtained from D. melanogaster by means of the polymerase chain reaction. This method proved successful for nine of ten genes and resulted in the recovery of 14 clones. The hybridization patterns of a sample of P1 clones containing repetitive DNA were also determined. A significant fraction of these clones hybridizes to multiple euchromatic sites but not to the chromocenter, which is a pattern of hybridization that is very rare among clones derived from D. melanogaster. The materials and methods described will make it possible to carry out a direct study of molecular evolution at the level of chromosome structure and organization as well as at the level of individual genes.

View details for Web of Science ID A1993KU57400005

View details for PubMedID 8486077
A REPETITIVE DNA ELEMENT, ASSOCIATED WITH TELOMERIC SEQUENCES IN DROSOPHILA-MELANOGASTER, CONTAINS OPEN READING FRAMES CHROMOSOMA Danilevskaya, O. N., Petrov, D. A., Pavlova, M. N., Koga, A., Kurenova, E. V., Hartl, D. L. 1992; 102 (1): 32-40

Abstract

He-T sequences are a complex repetitive family of DNA sequences in Drosophila that are associated with telomeric regions, pericentromeric heterochromatin, and the Y chromosome. A component of the He-T family containing open reading frames (ORFs) is described. These ORF-containing elements within the He-T family are designated T-elements, since hybridization in situ with the polytene salivary gland chromosomes results in detectable signal exclusively at the chromosome tips. One T-element that has been sequenced includes ORFs of 1,428 and 1,614 bp. The ORFs are overlapping but one nucleotide out of frame with respect to each other. The longer ORF contains cysteine-histidine motifs strongly resembling nucleic acid binding domains of gag-like proteins, and the overall organization of the T-element ORFs is reminiscent of LINE elements. The T-elements are transcribed and appear to be conserved in Drosophila species related to D. melanogaster. The results suggest that T-elements may play a role in the structure and/or function of telomeres.

View details for Web of Science ID A1992KB27500006

View details for PubMedID 1291227
GENETIC-DIFFERENCES AT 4 DNA TYPING LOCI IN FINNISH, ITALIAN, AND MIXED CAUCASIAN POPULATIONS PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Krane, D. E., Allen, R. W., Sawyer, S. A., Petrov, D. A., Hartl, D. L. 1992; 89 (22): 10583-10587

Abstract

Highly polymorphic segments of the human genome containing variable numbers of tandem repeats (VNTRs) have been widely used to establish DNA profiles of individuals for use in forensics. Methods of estimating the probability of occurrence of matching DNA profiles between two randomly selected individuals have been subject to extensive debate regarding the possibility of significant substructure occurring within the major races. We have sampled two Caucasian subpopulations, Finns and Italians, at four commonly used VNTR loci to determine the extent to which the subgroups differ from each other and from a mixed Caucasian database. The data were also analyzed for the occurrence of linkage disequilibrium among the loci. The allele frequency distributions of some loci were found to differ significantly among the subpopulations in a manner consistent with population substructure. Major differences were also found in the probability of occurrence of matching DNA profiles between two individuals chosen at random from the same subpopulation. With respect to the Finnish and Italian subpopulations, the conventional product rule for estimating the probability of a multilocus VNTR match using a mixed Caucasian database consistently yields estimates that are artificially small. Systematic errors of this type were not found using the interim ceiling principle recently advocated in the National Research Council's report [National Research Council (1992) DNA Technology in Forensic Science (Natl. Acad. Sci., Washington)]. The interim ceiling principle is based on currently available racial or ethnic databases and sets an arbitrary lower limit on each VNTR allele frequency. In the future the ceiling frequencies are expected to be established from more adequate data acquired for relevant VNTR loci from multiple subpopulations.

View details for Web of Science ID A1992JY87400005

View details for PubMedID 1438254

Dmitri Petrov

Michelle and Kevin Douglas Professor in the School of Humanities and Sciences

Biology

Academic Appointments

Additional Info

Links

Current Research and Scholarly Interests

2025-26 Courses

2024-25 Courses

2023-24 Courses

2022-23 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract