Current Research and Scholarly Interests

Evolution of genomes and population genomics of adaptation and variation

2022-23 Courses

Stanford Advisees

Graduate and Fellowship Programs

  • Biology (School of Humanities and Sciences) (Phd Program)
  • Biomedical Informatics (Phd Program)

All Publications

  • Extreme Sensitivity of Fitness to Environmental Conditions: Lessons from #1BigBatch. Journal of molecular evolution Kinsler, G., Schmidlin, K., Newell, D., Eder, R., Apodaca, S., Lam, G., Petrov, D., Geiler-Samerotte, K. 2023


    The phrase "survival of the fittest" has become an iconic descriptor of how natural selection works. And yet, precisely measuring fitness, even for single-celled microbial populations growing in controlled laboratory conditions, remains a challenge. While numerous methods exist to perform these measurements, including recently developed methods utilizing DNA barcodes, all methods are limited in their precision to differentiate strains with small fitness differences. In this study, we rule out some major sources of imprecision, but still find that fitness measurements vary substantially from replicate to replicate. Our data suggest that very subtle and difficult to avoid environmental differences between replicates create systematic variation across fitness measurements. We conclude by discussing how fitness measurements should be interpreted given their extreme environment dependence. This work was inspired by the scientific community who followed us and gave us tips as we live tweeted a high-replicate fitness measurement experiment at #1BigBatch.

    View details for DOI 10.1007/s00239-023-10114-3

    View details for PubMedID 37237236

  • Fully accessible fitness landscape of oncogene-negative lung adenocarcinoma. bioRxiv : the preprint server for biology Yousefi, M., Andrejka, L., Winslow, M. M., Petrov, D. A., Boross, G. 2023


    Cancer genomes are almost invariably complex with genomic alterations cooperating during each step of carcinogenesis. In cancers that lack a single dominant oncogene mutation, cooperation between the inactivation of multiple tumor suppressor genes can drive tumor initiation and growth. Here, we shed light on how the sequential acquisition of genomic alterations generates oncogene-negative lung tumors. We couple tumor barcoding with combinatorial and multiplexed somatic genome editing to characterize the fitness landscapes of three tumor suppressor genes NF1, RASA1, and PTEN, the inactivation of which jointly drives oncogene-negative lung adenocarcinoma initiation and growth. The fitness landscape was surprisingly accessible, with each additional mutation leading to growth advantage. Furthermore, the fitness landscapes remained fully accessible across backgrounds with additional tumor suppressor mutations. These results suggest that while predicting cancer evolution will be challenging, acquiring the multiple alterations required for the growth of oncogene-negative tumors can be facilitated by the lack of constraints on mutational order.

    View details for DOI 10.1101/2023.01.30.526178

    View details for PubMedID 36778226

  • Antigenic diversity in malaria parasites is maintained on extrachromosomal DNA. bioRxiv : the preprint server for biology Ebel, E. R., Kim, B. Y., McDew-White, M., Egan, E. S., Anderson, T. J., Petrov, D. A. 2023


    Sequence variation among antigenic var genes enables Plasmodium falciparum malaria parasites to evade host immunity. Using long sequence reads from haploid clones from a mutation accumulation experiment, we detect var diversity inconsistent with simple chromosomal inheritance. We discover putatively circular DNA that is strongly enriched for var genes, which exist in multiple alleles per locus separated by recombination and indel events. Extrachromosomal DNA likely contributes to rapid antigenic diversification in P. falciparum .

    View details for DOI 10.1101/2023.02.02.526885

    View details for PubMedID 36778235

  • A multiplexed in vivo approach to identify driver genes in small cell lung cancer. Cell reports Lee, M. C., Cai, H., Murray, C. W., Li, C., Shue, Y. T., Andrejka, L., He, A. L., Holzem, A. M., Drainas, A. P., Ko, J. H., Coles, G. L., Kong, C., Zhu, S., Zhu, C., Wang, J., van de Rijn, M., Petrov, D. A., Winslow, M. M., Sage, J. 2023; 42 (1): 111990


    Small cell lung cancer (SCLC) is a lethal form of lung cancer. Here, we develop a quantitative multiplexed approach on the basis of lentiviral barcoding with somatic CRISPR-Cas9-mediated genome editing to functionally investigate candidate regulators of tumor initiation and growth in genetically engineered mouse models of SCLC. We found that naphthalene pre-treatment enhances lentiviral vector-mediated SCLC initiation, enabling high multiplicity of tumor clones for analysis through high-throughput sequencing methods. Candidate drivers of SCLC identified from a meta-analysis across multiple human SCLC genomic datasets were tested using this approach, which defines both positive and detrimental impacts of inactivating 40 genes across candidate pathways on SCLC development. This analysis and subsequent validation in human SCLC cells establish TSC1 in the PI3K-AKT-mTOR pathway as a robust tumor suppressor in SCLC. This approach should illuminate drivers of SCLC, facilitate the development of precision therapies for defined SCLC genotypes, and identify therapeutic targets.

    View details for DOI 10.1016/j.celrep.2023.111990

    View details for PubMedID 36640300

  • Multiplexed screens identify RAS paralogues HRAS and NRAS as suppressors of KRAS-driven lung cancer growth. Nature cell biology Tang, R., Shuldiner, E. G., Kelly, M., Murray, C. W., Hebert, J. D., Andrejka, L., Tsai, M. K., Hughes, N. W., Parker, M. I., Cai, H., Li, Y. C., Wahl, G. M., Dunbrack, R. L., Jackson, P. K., Petrov, D. A., Winslow, M. M. 2023


    Oncogenic KRAS mutations occur in approximately 30% of lung adenocarcinoma. Despite several decades of effort, oncogenic KRAS-driven lung cancer remains difficult to treat, and our understanding of the regulators of RAS signalling is incomplete. Here to uncover the impact of diverse KRAS-interacting proteins on lung cancer growth, we combined multiplexed somatic CRISPR/Cas9-based genome editing in genetically engineered mouse models with tumour barcoding and high-throughput barcode sequencing. Through a series of CRISPR/Cas9 screens in autochthonous lung cancer models, we show that HRAS and NRAS are suppressors of KRASG12D-driven tumour growth in vivo and confirm these effects in oncogenic KRAS-driven human lung cancer cell lines. Mechanistically, RAS paralogues interact with oncogenic KRAS, suppress KRAS-KRAS interactions, and reduce downstream ERK signalling. Furthermore, HRAS and NRAS mutations identified in oncogenic KRAS-driven human tumours partially abolished this effect. By comparing the tumour-suppressive effects of HRAS and NRAS in oncogenic KRAS- and oncogenic BRAF-driven lung cancer models, we confirm that RAS paralogues are specific suppressors of KRAS-driven lung cancer in vivo. Our study outlines a technological avenue to uncover positive and negative regulators of oncogenic KRAS-driven cancer in a multiplexed manner in vivo and highlights the role RAS paralogue imbalance in oncogenic KRAS-driven lung cancer.

    View details for DOI 10.1038/s41556-022-01049-w

    View details for PubMedID 36635501

  • Genome Report: Chromosome-level draft assemblies of the snow leopard, African leopard, and tiger (Panthera uncia, Panthera pardus pardus, and Panthera tigris). G3 (Bethesda, Md.) Armstrong, E. E., Campana, M. G., Solari, K. A., Morgan, S. R., Ryder, O. A., Naude, V. N., Samelius, G., Sharma, K., Hadly, E. A., Petrov, D. A. 2022


    The big cats (genus Panthera) represent some of the most popular and charismatic species on the planet. Although some reference genomes are available for this clade, few are at the chromosome level, inhibiting high-resolution genomic studies. We assembled genomes from three members of the genus, the tiger (Panthera tigris), the snow leopard (Panthera uncia), and the African leopard (Panthera pardus pardus), at chromosome or near-chromosome level. We used a combination of short- and long-read technologies, as well as proximity ligation data from Hi-C technology, to achieve high continuity and contiguity for each individual. We hope these genomes will aid in further evolutionary and conservation research of this iconic group of mammals.

    View details for DOI 10.1093/g3journal/jkac277

    View details for PubMedID 36250809

  • Most cancers carry a substantial deleterious load due to Hill-Robertson interference. eLife Tilk, S., Tkachenko, S., Curtis, C., Petrov, D. A., McFarland, C. D. 2022; 11


    Cancer genomes exhibit surprisingly weak signatures of negative selection1,2. This may be because selective pressures are relaxed or because genome-wide linkage prevents deleterious mutations from being removed (Hill-Robertson interference)3. By stratifying tumors by their genome-wide mutational burden, we observe negative selection (dN/dS ~ 0.56) in low mutational burden tumors, while remaining cancers exhibit dN/dS ratios ~1. This suggests that most tumors do not remove deleterious passengers. To buffer against deleterious passengers, tumors upregulate heat shock pathways as their mutational burden increases. Finally, evolutionary modeling finds that Hill-Robertson interference alone can reproduce patterns of attenuated selection and estimates the total fitness cost of passengers to be 46% per cell on average. Collectively, our findings suggest that the lack of observed negative selection in most tumors is not due to relaxed selective pressures, but rather the inability of selection to remove deleterious mutations in the presence of genome-wide linkage.

    View details for DOI 10.7554/eLife.67790

    View details for PubMedID 36047771

  • Dissecting the role of Stag2 in lung adenocarcinoma Ashkin, E. L., Cai, H., Tang, Y. J., Li, C., Chew, S., Hung, K., Belk, J., Karmakar, S., Hebert, J., Yousefi, M., Swanton, C., Petrov, D. A., Winslow, M. AMER ASSOC CANCER RESEARCH. 2022
  • A journey to deconvolute the multifaceted functions and context-dependency of cancer driver genes Cai, H., Chew, S., Li, C., Murray, C. W., Andrejka, L., Hebert, J. D., Tsai, M. K., Tang, R., Hughes, N. W., Shuldiner, E. G., Ashkin, E. L., Lee, S. C., Yousefi, M., Petrov, D. A., Swanton, C., Winslow, M. W. AMER ASSOC CANCER RESEARCH. 2022
  • A quantitative in vivo pharmacogenomics platform uncovers biomarkers of therapy response Rosen, M., Amar, D., Winters, I., Rizvi, H., Nie, W., Wall, G., Petrov, D., Winslow, M., Rudin, C., Juan, J. AMER ASSOC CANCER RESEARCH. 2022
  • Combinatorial Inactivation of Tumor Suppressors Efficiently Initiates Lung Adenocarcinoma with Therapeutic Vulnerabilities. Cancer research Yousefi, M., Boross, G., Weiss, C., Murray, C. W., Hebert, J. D., Cai, H., Ashkin, E. L., Karmakar, S., Andrejka, L., Chen, L., Wang, M., Tsai, M. K., Lin, W., Li, C., Yakhchalian, P., Colon, C. I., Chew, S., Chu, P., Swanton, C., Kunder, C. A., Petrov, D. A., Winslow, M. M. 2022; 82 (8): 1589-1602


    Lung cancer is the leading cause of cancer death worldwide, with lung adenocarcinoma being the most common subtype. Many oncogenes and tumor suppressor genes are altered in this cancer type, and the discovery of oncogene mutations has led to the development of targeted therapies that have improved clinical outcomes. However, a large fraction of lung adenocarcinomas lacks mutations in known oncogenes, and the genesis and treatment of these oncogene-negative tumors remain enigmatic. Here, we perform iterative in vivo functional screens using quantitative autochthonous mouse model systems to uncover the genetic and biochemical changes that enable efficient lung tumor initiation in the absence of oncogene alterations. Generation of hundreds of diverse combinations of tumor suppressor alterations demonstrates that inactivation of suppressors of the RAS and PI3K pathways drives the development of oncogene-negative lung adenocarcinoma. Human genomic data and histology identified RAS/MAPK and PI3K pathway activation as a common feature of an event in oncogene-negative human lung adenocarcinomas. These Onc-negativeRAS/PI3K tumors and related cell lines are vulnerable to pharmacologic inhibition of these signaling axes. These results transform our understanding of this prevalent yet understudied subtype of lung adenocarcinoma.SIGNIFICANCE: To address the large fraction of lung adenocarcinomas lacking mutations in proto-oncogenes for which targeted therapies are unavailable, this work uncovers driver pathways of oncogene-negative lung adenocarcinomas and demonstrates their therapeutic vulnerabilities.

    View details for DOI 10.1158/0008-5472.CAN-22-0059

    View details for PubMedID 35425962

  • Direct observation of adaptive tracking on ecological time scales in Drosophila. Science (New York, N.Y.) Rudman, S. M., Greenblum, S. I., Rajpurohit, S., Betancourt, N. J., Hanna, J., Tilk, S., Yokoyama, T., Petrov, D. A., Schmidt, P. 2022; 375 (6586): eabj7484


    Direct observation of evolution in response to natural environmental change can resolve fundamental questions about adaptation, including its pace, temporal dynamics, and underlying phenotypic and genomic architecture. We tracked the evolution of fitness-associated phenotypes and allele frequencies genome-wide in 10 replicate field populations of Drosophila melanogaster over 10 generations from summer to late fall. Adaptation was evident over each sampling interval (one to four generations), with exceptionally rapid phenotypic adaptation and large allele frequency shifts at many independent loci. The direction and basis of the adaptive response shifted repeatedly over time, consistent with the action of strong and rapidly fluctuating selection. Overall, we found clear phenotypic and genomic evidence of adaptive tracking occurring contemporaneously with environmental change, thus demonstrating the temporally dynamic nature of adaptation.

    View details for DOI 10.1126/science.abj7484

    View details for PubMedID 35298245

  • Revisiting the malaria hypothesis: accounting for polygenicity and pleiotropy. Trends in parasitology Ebel, E. R., Uricchio, L. H., Petrov, D. A., Egan, E. S. 1800


    The malaria hypothesis predicts local, balancing selection of deleterious alleles that confer strong protection from malaria. Three protective variants, recently discovered in red cell genes, are indeed more common in African than European populations. Still, up to 89% of the heritability of severe malaria is attributed to many genome-wide loci with individually small effects. Recent analyses of hundreds of genome-wide association studies (GWAS) in humans suggest that most functional, polygenic variation is pleiotropic for multiple traits. Interestingly, GWAS alleles and red cell traits associated with small reductions in malaria risk are not enriched in African populations. We propose that other selective and neutral forces, in addition to malaria prevalence, explain the global distribution of most genetic variation impacting malaria risk.

    View details for DOI 10.1016/

    View details for PubMedID 35065882

  • Tumor suppressor pathways shape EGFR-driven lung tumor progression and response to treatment. Molecular & cellular oncology Foggetti, G., Li, C., Cai, H., Petrov, D. A., Winslow, M. M., Politi, K. 2022; 9 (1): 1994328


    In vivo modeling combined with CRISPR/Cas9-mediated somatic genome editing has contributed to elucidating the functional importance of specific genetic alterations in human tumors. Our recent work uncovered tumor suppressor pathways that affect EGFR-driven lung tumor growth and sensitivity to tyrosine kinase inhibitors and reflect the mutational landscape and treatment outcomes in the human disease.

    View details for DOI 10.1080/23723556.2021.1994328

    View details for PubMedID 35252550

    View details for PubMedCentralID PMC8890383

  • Tumor suppressor pathways shape EGFR-driven lung tumor progression and response to treatment MOLECULAR & CELLULAR ONCOLOGY Foggetti, G., Li, C., Cai, H., Petrov, D. A., Winslow, M. M., Politi, K. 2021
  • The Tetragnatha kauaiensis genome sheds light on the origins of genomic novelty in spiders. Genome biology and evolution Cerca, J., Armstrong, E. E., Vizueta, J., Fernandez, R., Dimitrov, D., Petersen, B., Prost, S., Rozas, J., Petrov, D., Gillespie, R. G. 2021


    Spiders (Araneae) have a diverse spectrum of morphologies, behaviours and physiologies. Attempts to understand the genomic-basis of this diversity are often hindered by their large, heterozygous and AT-rich genomes with high repeat content resulting in highly fragmented, poor-quality assemblies. As a result, the key attributes of spider genomes, including gene family evolution, repeat content, and gene function, remain poorly understood. Here, we used Illumina and Dovetail Chicago technologies to sequence the genome of the long jawed spider Tetragnatha kauaiensis, producing an assembly distributed along 3,925 scaffolds with a N50 of 2Mb. Using comparative genomics tools, we explore genome evolution across available spider assemblies. Our findings suggest that the previously reported and vast genome size variation in spiders is linked to the different representation and number of transposable elements. Using statistical tools to uncover gene-family level evolution, we find expansions associated with the sensory perception of taste, immunity and metabolism. In addition, we report strikingly different histories of chemosensory, venom and silk gene families, with the first two evolving much earlier, affected by the ancestral whole genome duplication in Arachnopulmonata (450 million years ago) and exhibiting higher numbers. Together, our findings reveal that spider genomes are highly variable and that genomic novelty may have been driven by the burst of an ancient whole genome duplication, followed by gene family and transposable element expansion.

    View details for DOI 10.1093/gbe/evab262

    View details for PubMedID 34849853

  • Common host variation drives malaria parasite fitness in healthy human red cells. eLife Ebel, E. R., Kuypers, F. A., Lin, C., Petrov, D. A., Egan, E. S. 2021; 10


    The replication of Plasmodium falciparum parasites within red blood cells (RBCs) causes severe disease in humans, especially in Africa. Deleterious alleles like hemoglobin S are well-known to confer strong resistance to malaria, but the effects of common RBC variation are largely undetermined. Here we collected fresh blood samples from 121 healthy donors, most with African ancestry, and performed exome sequencing, detailed RBC phenotyping, and parasite fitness assays. Over one third of healthy donors unknowingly carried alleles for G6PD deficiency or hemoglobinopathies, which were associated with characteristic RBC phenotypes. Among non-carriers alone, variation in RBC hydration, membrane deformability, and volume was strongly associated with P. falciparum growth rate. Common genetic variants in PIEZO1, SPTA1/SPTB, and several P. falciparum invasion receptors were also associated with parasite growth rate. Interestingly, we observed little or negative evidence for divergent selection on non-pathogenic RBC variation between Africans and Europeans. These findings suggest a model in which globally widespread variation in a moderate number of genes and phenotypes modulates P. falciparum fitness in RBCs.

    View details for DOI 10.7554/eLife.69808

    View details for PubMedID 34553687

  • Common host variation drives malaria parasite fitness in healthy human red cells ELIFE Ebel, E. R., Kuypers, F. A., Lin, C., Petrov, D. A., Egan, E. S. 2021; 10
  • Richard C. Lewontin (1929-2021). Science (New York, N.Y.) Berry, A., Petrov, D. A. 2021; 373 (6556): 745

    View details for DOI 10.1126/science.abl5430

    View details for PubMedID 34385385

  • Highly contiguous assemblies of 101 drosophilid genomes. eLife Kim, B. Y., Wang, J., Miller, D. E., Barmina, O., Delaney, E. K., Thompson, A., Comeault, A. A., Peede, D., D'Agostino, E. R., Pelaez, J., Aguilar, J. M., Haji, D., Matsunaga, T., Armstrong, E., Zych, M., Ogawa, Y., Stamenkovic-Radak, M., Jelic, M., Veselinovic, M. S., Tanaskovic, M., Eric, P., Gao, J., Katoh, T. K., Toda, M. J., Watabe, H., Watada, M., Davis, J. S., Moyle, L., Manoli, G., Bertolini, E., Kostal, V., Hawley, R. S., Takahashi, A., Jones, C. D., Price, D. K., Whiteman, N. K., Kopp, A., Matute, D. R., Petrov, D. A. 2021; 10


    Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.

    View details for DOI 10.7554/eLife.66405

    View details for PubMedID 34279216

  • Quantitative in vivo analyses reveal a complex pharmacogenomic landscape in lung adenocarcinoma. Cancer research Li, C., Lin, W., Rizvi, H., Cai, H., McFarland, C. D., Rogers, Z. N., Yousefi, M., Winters, I. P., Rudin, C. M., Petrov, D. A., Winslow, M. M. 2021


    The lack of knowledge about the relationship between tumor genotypes and therapeutic responses remains one of the most critical gaps in enabling the effective use of cancer therapies. Here we couple a multiplexed and quantitative experimental platform with robust statistical methods to enable pharmacogenomic mapping of lung cancer treatment responses in vivo. The complex map of genotype-specific treatment responses uncovered that over 20% of possible interactions show significant resistance or sensitivity. Known and novel interactions were identified, and one of these interactions, the resistance of KEAP1 mutant lung tumors to platinum therapy, was validated using a large patient response dataset. These results highlight the broad impact of tumor suppressor genotype on treatment responses and define a strategy to identify the determinants of precision therapies.

    View details for DOI 10.1158/0008-5472.CAN-21-0716

    View details for PubMedID 34215621

  • Broad geographic sampling reveals the shared basis and environmental correlates of seasonal adaptation in Drosophila. eLife Machado, H. E., Bergland, A., Taylor, R. W., Tilk, S., Behrman, E., Dyer, K., Fabian, D. K., Flatt, T., Gonzalez, J., Karasov, T. L., Kim, B. Y., Kozeretska, I., Lazzaro, B. P., Merritt, T., Pool, J. E., O'Brien, K., Rajpurohit, S., Roy, P. R., Schaeffer, S. W., Serga, S., Schmidt, P., Petrov, D. A. 2021; 10


    To advance our understanding of adaptation to temporally varying selection pressures, we identified signatures of seasonal adaptation occurring in parallel among Drosophila melanogaster populations. Specifically, we estimated allele frequencies genome-wide from flies sampled early and late in the growing season from 20 widely dispersed populations. We identified parallel seasonal allele frequency shifts across North America and Europe, demonstrating that seasonal adaptation is a general phenomenon of temperate fly populations. Seasonally fluctuating polymorphisms are enriched in large chromosomal inversions and we find a broad concordance between seasonal and spatial allele frequency change. The direction of allele frequency change at seasonally variable polymorphisms can be predicted by weather conditions in the weeks prior to sampling, linking the environment and the genomic response to selection. Our results suggest that fluctuating selection is an important evolutionary force affecting patterns of genetic variation in Drosophila.

    View details for DOI 10.7554/eLife.67577

    View details for PubMedID 34155971

  • Functional biology in its natural context: A search for emergent simplicity. eLife Bergelson, J., Kreitman, M., Petrov, D. A., Sanchez, A., Tikhonov, M. 2021; 10


    The immeasurable complexity at every level of biological organization creates a daunting task for understanding biological function. Here, we highlight the risks of stripping it away at the outset and discuss a possible path toward arriving at emergent simplicity of understanding while still embracing the ever-changing complexity of biotic interactions that we see in nature.

    View details for DOI 10.7554/eLife.67646

    View details for PubMedID 34096867

  • The cis-regulatory effects of modern human-specific variants. eLife Weiss, C. V., Harshman, L., Inoue, F., Fraser, H. B., Petrov, D. A., Ahituv, N., Gokhman, D. 2021; 10


    The Neanderthal and Denisovan genomes enabled the discovery of sequences that differ between modern and archaic humans, the majority of which are noncoding. However, our understanding of the regulatory consequences of these differences remains limited, in part due to the decay of regulatory marks in ancient samples. Here, we used a massively parallel reporter assay in embryonic stem cells, neural progenitor cells, and bone osteoblasts to investigate the regulatory effects of the 14,042 single-nucleotide modern human-specific variants. Overall, 1791 (13%) of sequences containing these variants showed active regulatory activity, and 407 (23%) of these drove differential expression between human groups. Differentially active sequences were associated with divergent transcription factor binding motifs, and with genes enriched for vocal tract and brain anatomy and function. This work provides insight into the regulatory function of variants that emerged along the modern human lineage and the recent evolution of human gene expression.

    View details for DOI 10.7554/eLife.63713

    View details for PubMedID 33885362

  • The AMBRA1 E3 ligase adaptor regulates the stability of cyclinD. Nature Chaikovsky, A. C., Li, C., Jeng, E. E., Loebell, S., Lee, M. C., Murray, C. W., Cheng, R., Demeter, J., Swaney, D. L., Chen, S., Newton, B. W., Johnson, J. R., Drainas, A. P., Shue, Y. T., Seoane, J. A., Srinivasan, P., He, A., Yoshida, A., Hipkins, S. Q., McCrea, E., Poltorack, C. D., Krogan, N. J., Diehl, J. A., Kong, C., Jackson, P. K., Curtis, C., Petrov, D. A., Bassik, M. C., Winslow, M. M., Sage, J. 2021


    The initiation of cell division integrates a large number of intra- and extracellular inputs. D-type cyclins (hereafter, cyclinD) couple these inputs to the initiation of DNA replication1. Increased levels of cyclinD promote cell division by activating cyclin-dependent kinases4 and 6 (hereafter, CDK4/6), which in turn phosphorylate and inactivate the retinoblastoma tumour suppressor. Accordingly, increased levels and activity of cyclinD-CDK4/6 complexes are strongly linked to unchecked cell proliferation and cancer2,3. However, the mechanisms that regulate levels of cyclinD are incompletely understood4,5. Here we show that autophagy and beclin1 regulator1 (AMBRA1) is the main regulator of the degradation of cyclinD. We identified AMBRA1 in a genome-wide screen to investigate the genetic basis of the response to CDK4/6 inhibition. Loss of AMBRA1 results in high levels of cyclinD in cells and in mice, which promotes proliferation and decreases sensitivity to CDK4/6 inhibition. Mechanistically, AMBRA1 mediates ubiquitylation and proteasomal degradation of cyclinD as a substrate receptor for the cullin4 E3 ligase complex. Loss of AMBRA1 enhances the growth of lung adenocarcinoma in a mouse model, and low levels of AMBRA1 correlate with worse survival in patients with lung adenocarcinoma. Thus, AMBRA1 regulates cellular levels of cyclinD, and contributes to cancer development and the response of cancer cells to CDK4/6 inhibitors.

    View details for DOI 10.1038/s41586-021-03474-7

    View details for PubMedID 33854239

  • Historical trends and new surveillance of Plasmodium falciparum drug resistance markers in Angola. Malaria journal Ebel, E. R., Reis, F., Petrov, D. A., Beleza, S. 2021; 20 (1): 175


    BACKGROUND: Plasmodium falciparum resistance to chloroquine (CQ) and sulfadoxine-pyrimethamine (SP) has historically posed a major threat to malaria control throughout the world. The country of Angola officially replaced CQ with artemisinin-based combination therapy (ACT) as a first-line treatment in 2006, but malaria cases and deaths have recently been rising. Many classic resistance mutations are relevant for the efficacy of currently available drugs, making it important to continue monitoring their frequency in Angola.METHODS: Plasmodium falciparum DNA was sampled from the blood of 50 hospital patients in Cabinda, Angola from October-December of 2018. Each infection was genotyped for 13 alleles in the genes crt, mdr1, dhps, dhfr, and kelch13, which are collectively involved in resistance to six common anti-malarials. To compare frequency patterns over time, P. falciparum genotype data were also collated from studies published from across Angola in the last two decades.RESULTS: The two most important alleles for CQ resistance, crt 76T and mdr1 86Y, were found at respective frequencies of 71.4% and 6.5%. Historical data suggest that mdr1 N86 has been steadily replacing 86Y throughout Angola in the last decade, while the frequency of crt 76T has been more variable across studies. Over a third of new samples from Cabinda were 'quintuple mutants' for SP resistance in dhfr/dhps, with a sixth mutation at dhps A581G present at 9.6% frequency. The markers dhfr 51I, dhfr 108N, and dhps 437G have been nearly fixed in Angola since the early 2000s, whereas dhfr 59R may have risen to high frequency more recently. Finally, no non-synonymous polymorphisms were detected in kelch13, which is involved in artemisinin resistance in Southeast Asia.CONCLUSIONS: Genetic markers of P. falciparum resistance to CQ are likely declining in frequency in Angola, consistent with the official discontinuation of CQ in 2006. The high frequency of multiple genetic markers of SP resistance is consistent with the continued public and private use of SP. In the future, more complete haplotype data from mdr1, dhfr, and dhps will be critical for understanding the changing efficacy of multiple anti-malarial drugs. These data can be used to support effective drug policy decisions in Angola.

    View details for DOI 10.1186/s12936-021-03713-2

    View details for PubMedID 33827587

  • Genetic determinants of EGFR-Driven Lung Cancer Growth and Therapeutic Response In Vivo. Cancer discovery Foggetti, G., Li, C., Cai, H., Hellyer, J. A., Lin, W., Ayeni, D., Hastings, K., Choi, J., Wurtz, A., Andrejka, L., Maghini, D. G., Rashleigh, N., Levy, S., Homer, R., Gettinger, S. N., Diehn, M., Wakelee, H. A., Petrov, D. A., Winslow, M. M., Politi, K. 2021


    In lung adenocarcinoma, oncogenic EGFR mutations co-occur with many tumor suppressor gene alterations, however the extent to which these contribute to tumor growth and response to therapy in vivo remains largely unknown. By quantifying the effects of inactivating ten putative tumor suppressor genes in a mouse model of EGFR-driven Trp53-deficient lung adenocarcinoma, we found that Apc, Rb1, or Rbm10 inactivation strongly promoted tumor growth. Unexpectedly, inactivation of Lkb1 or Setd2 - the strongest drivers of growth in a Kras-driven model - reduced EGFR-driven tumor growth. These results are consistent with mutational frequencies in human EGFR- and KRAS-driven lung adenocarcinomas. Furthermore, Keap1 inactivation reduced the sensitivity of EGFR-driven tumors to the EGFR inhibitor osimertinib and mutations in the KEAP1 pathway were associated with decreased time on tyrosine kinase inhibitor treatment in patients. Our study highlights how the impact of genetic alterations differ across oncogenic contexts and that the fitness landscape shifts upon treatment.

    View details for DOI 10.1158/2159-8290.CD-20-1385

    View details for PubMedID 33707235

  • Detection of hard and soft selective sweeps from Drosophila melanogaster population genomic data. PLoS genetics Garud, N. R., Messer, P. W., Petrov, D. A. 2021; 17 (2): e1009373


    Whether hard sweeps or soft sweeps dominate adaptation has been a matter of much debate. Recently, we developed haplotype homozygosity statistics that (i) can detect both hard and soft sweeps with similar power and (ii) can classify the detected sweeps as hard or soft. The application of our method to population genomic data from a natural population of Drosophila melanogaster (DGRP) allowed us to rediscover three known cases of adaptation at the loci Ace, Cyp6g1, and CHKov1 known to be driven by soft sweeps, and detected additional candidate loci for recent and strong sweeps. Surprisingly, all of the top 50 candidates showed patterns much more consistent with soft rather than hard sweeps. Recently, Harris et al. 2018 criticized this work, suggesting that all the candidate loci detected by our haplotype statistics, including the positive controls, are unlikely to be sweeps at all and that instead these haplotype patterns can be more easily explained by complex neutral demographic models. They also claim that these neutral non-sweeps are likely to be hard instead of soft sweeps. Here, we reanalyze the DGRP data using a range of complex admixture demographic models and reconfirm our original published results suggesting that the majority of recent and strong sweeps in D. melanogaster are first likely to be true sweeps, and second, that they do appear to be soft. Furthermore, we discuss ways to take this work forward given that most demographic models employed in such analyses are necessarily too simple to capture the full demographic complexity, while more realistic models are unlikely to be inferred correctly because they require a large number of free parameters.

    View details for DOI 10.1371/journal.pgen.1009373

    View details for PubMedID 33635910

  • The clarifying role of time series data in the population genetics of HIV. PLoS genetics Feder, A. F., Pennings, P. S., Petrov, D. A. 2021; 17 (1): e1009050


    HIV can evolve remarkably quickly in response to antiretroviral therapies and the immune system. This evolution stymies treatment effectiveness and prevents the development of an HIV vaccine. Consequently, there has been a great interest in using population genetics to disentangle the forces that govern the HIV adaptive landscape (selection, drift, mutation, and recombination). Traditional population genetics approaches look at the current state of genetic variation and infer the processes that can generate it. However, because HIV evolves rapidly, we can also sample populations repeatedly over time and watch evolution in action. In this paper, we demonstrate how time series data can bound evolutionary parameters in a way that complements and informs traditional population genetic approaches. Specifically, we focus on our recent paper (Feder et al., 2016, eLife), in which we show that, as improved HIV drugs have led to fewer patients failing therapy due to resistance evolution, less genetic diversity has been maintained following the fixation of drug resistance mutations. Because soft sweeps of multiple drug resistance mutations spreading simultaneously have been previously documented in response to the less effective HIV therapies used early in the epidemic, we interpret the maintenance of post-sweep diversity in response to poor therapies as further evidence of soft sweeps and therefore a high population mutation rate (θ) in these intra-patient HIV populations. Because improved drugs resulted in rarer resistance evolution accompanied by lower post-sweep diversity, we suggest that both observations can be explained by decreased population mutation rates and a resultant transition to hard selective sweeps. A recent paper (Harris et al., 2018, PLOS Genetics) proposed an alternative interpretation: Diversity maintenance following drug resistance evolution in response to poor therapies may have been driven by recombination during slow, hard selective sweeps of single mutations. Then, if better drugs have led to faster hard selective sweeps of resistance, recombination will have less time to rescue diversity during the sweep, recapitulating the decrease in post-sweep diversity as drugs have improved. In this paper, we use time series data to show that drug resistance evolution during ineffective treatment is very fast, providing new evidence that soft sweeps drove early HIV treatment failure.

    View details for DOI 10.1371/journal.pgen.1009050

    View details for PubMedID 33444376

  • Widespread introgression across a phylogeny of 155 Drosophila genomes. Current biology : CB Suvorov, A., Kim, B. Y., Wang, J., Armstrong, E. E., Peede, D., D'Agostino, E. R., Price, D. K., Waddell, P., Lang, M., Courtier-Orgogozo, V., David, J. R., Petrov, D., Matute, D. R., Schrider, D. R., Comeault, A. A. 2021


    Genome-scale sequence data have invigorated the study of hybridization and introgression, particularly in animals. However, outside of a few notable cases, we lack systematic tests for introgression at a larger phylogenetic scale across entire clades. Here, we leverage 155 genome assemblies from 149 species to generate a fossil-calibrated phylogeny and conduct multilocus tests for introgression across 9 monophyletic radiations within the genus Drosophila. Using complementary phylogenomic approaches, we identify widespread introgression across the evolutionary history of Drosophila. Mapping gene-tree discordance onto the phylogeny revealed that both ancient and recent introgression has occurred across most of the 9 clades that we examined. Our results provide the first evidence of introgression occurring across the evolutionary history of Drosophila and highlight the need to continue to study the evolutionary consequences of hybridization and introgression in this genus and across the tree of life.

    View details for DOI 10.1016/j.cub.2021.10.052

    View details for PubMedID 34788634

  • Drosophila Evolution over Space and Time (DEST) - A New Population Genomics Resource. Molecular biology and evolution Kapun, M., Nunez, J. C., Bogaerts-Márquez, M., Murga-Moreno, J., Paris, M., Outten, J., Coronado-Zamora, M., Tern, C., Rota-Stabelli, O., García Guerreiro, M. P., Casillas, S., Orengo, D. J., Puerma, E., Kankare, M., Ometto, L., Loeschcke, V., Onder, B. S., Abbott, J. K., Schaeffer, S. W., Rajpurohit, S., Behrman, E. L., Schou, M. F., Merritt, T. J., Lazzaro, B. P., Glaser-Schmitt, A., Argyridou, E., Staubach, F., Wang, Y., Tauber, E., Serga, S. V., Fabian, D. K., Dyer, K. A., Wheat, C. W., Parsch, J., Grath, S., Veselinovic, M. S., Stamenkovic-Radak, M., Jelic, M., Buendía-Ruíz, A. J., Gómez-Julián, M. J., Espinosa-Jimenez, M. L., Gallardo-Jiménez, F. D., Patenkovic, A., Eric, K., Tanaskovic, M., Ullastres, A., Guio, L., Merenciano, M., Guirao-Rico, S., Horváth, V., Obbard, D. J., Pasyukova, E., Alatortsev, V. E., Vieira, C. P., Vieira, J., Torres, J. R., Kozeretska, I., Maistrenko, O. M., Montchamp-Moreau, C., Mukha, D. V., Machado, H. E., Lamb, K., Paulo, T., Yusuf, L., Barbadilla, A., Petrov, D., Schmidt, P., Gonzalez, J., Flatt, T., Bergland, A. O. 2021


    Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome datasets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate datasets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in > 20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This dataset, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental meta-data. A web-based genome browser and web portal provide easy access to the SNP dataset. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan dataset. Our resource will enable population geneticists to analyze spatio-temporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.

    View details for DOI 10.1093/molbev/msab259

    View details for PubMedID 34469576

  • Publisher Correction: Human-chimpanzee fused cells reveal cis-regulatory divergence underlying skeletal evolution. Nature genetics Gokhman, D. n., Agoglia, R. M., Kinnebrew, M. n., Gordon, W. n., Sun, D. n., Bajpai, V. K., Naqvi, S. n., Chen, C. n., Chan, A. n., Chen, C. n., Petrov, D. A., Ahituv, N. n., Zhang, H. n., Mishina, Y. n., Wysocka, J. n., Rohatgi, R. n., Fraser, H. B. 2021

    View details for DOI 10.1038/s41588-021-00849-4

    View details for PubMedID 33762754

  • Human-chimpanzee fused cells reveal cis-regulatory divergence underlying skeletal evolution. Nature genetics Gokhman, D. n., Agoglia, R. M., Kinnebrew, M. n., Gordon, W. n., Sun, D. n., Bajpai, V. K., Naqvi, S. n., Chen, C. n., Chan, A. n., Chen, C. n., Petrov, D. A., Ahituv, N. n., Zhang, H. n., Mishina, Y. n., Wysocka, J. n., Rohatgi, R. n., Fraser, H. B. 2021


    Gene regulatory divergence is thought to play a central role in determining human-specific traits. However, our ability to link divergent regulation to divergent phenotypes is limited. Here, we utilized human-chimpanzee hybrid induced pluripotent stem cells to study gene expression separating these species. The tetraploid hybrid cells allowed us to separate cis- from trans-regulatory effects, and to control for nongenetic confounding factors. We differentiated these cells into cranial neural crest cells, the primary cell type giving rise to the face. We discovered evidence of lineage-specific selection on the hedgehog signaling pathway, including a human-specific sixfold down-regulation of EVC2 (LIMBIN), a key hedgehog gene. Inducing a similar down-regulation of EVC2 substantially reduced hedgehog signaling output. Mice and humans lacking functional EVC2 show striking phenotypic parallels to human-chimpanzee craniofacial differences, suggesting that the regulatory divergence of hedgehog signaling may have contributed to the unique craniofacial morphology of humans.

    View details for DOI 10.1038/s41588-021-00804-3

    View details for PubMedID 33731941

  • Recent evolutionary history of tigers highlights contrasting roles of genetic drift and selection. Molecular biology and evolution Armstrong, E. E., Khan, A. n., Taylor, R. W., Gouy, A. n., Greenbaum, G. n., Thiéry, A. n., Kang, J. T., Redondo, S. A., Prost, S. n., Barsh, G. n., Kaelin, C. n., Phalke, S. n., Chugani, A. n., Gilbert, M. n., Miquelle, D. n., Zachariah, A. n., Borthakur, U. n., Reddy, A. n., Louis, E. n., Ryder, O. A., Jhala, Y. V., Petrov, D. n., Excoffier, L. n., Hadly, E. n., Ramakrishnan, U. n. 2021


    Species conservation can be improved by knowledge of evolutionary and genetic history. Tigers are among the most charismatic of endangered species and garner significant conservation attention. However, their evolutionary history and genomic variation remains poorly known, especially for Indian tigers. With 70% of the worlds wild tigers living in India, such knowledge is critical. We re-sequenced 65 individual tiger genomes representing most extant subspecies with a specific focus on tigers from India. As suggested by earlier studies, we found strong genetic differentiation between the putative tiger subspecies. Despite high total genomic diversity in India, individual tigers host longer runs of homozygosity, potentially suggesting recent inbreeding or founding events, possibly due to small and fragmented protected areas. We suggest the impacts of ongoing connectivity loss on inbreeding and persistence of Indian tigers be closely monitored. Surprisingly, demographic models suggest recent divergence (within the last 20,000 years) between subspecies, and strong population bottlenecks. Amur tiger genomes revealed the strongest signals of selection related to metabolic adaptation to cold, while Sumatran tigers show evidence of weak selection for genes involved in body size regulation. We recommend detailed investigation of local adaptation in Amur and Sumatran tigers prior to initiating genetic rescue.

    View details for DOI 10.1093/molbev/msab032

    View details for PubMedID 33592092

  • A functional taxonomy of tumor suppression in oncogenic KRAS-driven lung cancer. Cancer discovery Cai, H. n., Chew, S. K., Li, C. n., Tsai, M. K., Andrejka, L. n., Murray, C. W., Hughes, N. W., Shuldiner, E. G., Ashkin, E. L., Tang, R. n., Hung, K. L., Chen, L. C., Lee, S. Y., Yousefi, M. n., Lin, W. Y., Kunder, C. A., Cong, L. n., McFarland, C. D., Petrov, D. A., Swanton, C. n., Winslow, M. M. 2021


    Cancer genotyping has identified a large number of putative tumor suppressor genes. Carcinogenesis is a multi-step process, however the importance and specific roles of many of these genes during tumor initiation, growth and progression remain unknown. Here we use a multiplexed mouse model of oncogenic KRAS-driven lung cancer to quantify the impact of forty-eight known and putative tumor suppressor genes on diverse aspects of carcinogenesis at an unprecedented scale and resolution. We uncover many previously understudied functional tumor suppressors that constrain cancer in vivo. Inactivation of some genes substantially increased growth, while the inactivation of others increases tumor initiation and/or the emergence of exceptionally large tumors. These functional in vivo analyses revealed an unexpectedly complex landscape of tumor suppression that has implications for understanding cancer evolution, interpreting clinical cancer genome sequencing data, and directing approaches to limit tumor initiation and progression.

    View details for DOI 10.1158/2159-8290.CD-20-1325

    View details for PubMedID 33608386

  • Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nature neuroscience Zhu, X. n., Zhou, B. n., Pattni, R. n., Gleason, K. n., Tan, C. n., Kalinowski, A. n., Sloan, S. n., Fiston-Lavier, A. S., Mariani, J. n., Petrov, D. n., Barres, B. A., Duncan, L. n., Abyzov, A. n., Vogel, H. n., Moran, J. V., Vaccarino, F. M., Tamminga, C. A., Levinson, D. F., Urban, A. E. 2021


    Retrotransposons can cause somatic genome variation in the human nervous system, which is hypothesized to have relevance to brain development and neuropsychiatric disease. However, the detection of individual somatic mobile element insertions presents a difficult signal-to-noise problem. Using a machine-learning method (RetroSom) and deep whole-genome sequencing, we analyzed L1 and Alu retrotransposition in sorted neurons and glia from human brains. We characterized two brain-specific L1 insertions in neurons and glia from a donor with schizophrenia. There was anatomical distribution of the L1 insertions in neurons and glia across both hemispheres, indicating retrotransposition occurred during early embryogenesis. Both insertions were within the introns of genes (CNNM2 and FRMD4A) inside genomic loci associated with neuropsychiatric disorders. Proof-of-principle experiments revealed these L1 insertions significantly reduced gene expression. These results demonstrate that RetroSom has broad applications for studies of brain development and may provide insight into the possible pathological effects of somatic retrotransposition.

    View details for DOI 10.1038/s41593-020-00767-4

    View details for PubMedID 33432196

  • Fitness variation across subtle environmental perturbations reveals local modularity and global pleiotropy of adaptation ELIFE Kinsler, G., Geiler-Samerotte, K., Petrov, D. 2020; 9
  • Ancient RNA virus epidemics through the lens of recent adaptation in human genomes. Philosophical transactions of the Royal Society of London. Series B, Biological sciences Enard, D., Petrov, D. A. 2020; 375 (1812): 20190575


    Over the course of the last several million years of evolution, humans probably have been plagued by hundreds or perhaps thousands of epidemics. Little is known about such ancient epidemics and a deep evolutionary perspective on current pathogenic threats is lacking. The study of past epidemics has typically been limited in temporal scope to recorded history, and in physical scope to pathogens that left sufficient DNA behind, such as Yersinia pestis during the Great Plague. Host genomes, however, offer an indirect way to detect ancient epidemics beyond the current temporal and physical limits. Arms races with pathogens have shaped the genomes of the hosts by driving a large number of adaptations at many genes, and these signals can be used to detect and further characterize ancient epidemics. Here, we detect the genomic footprints left by ancient viral epidemics that took place in the past approximately 50 000 years in the 26 human populations represented in the 1000 Genomes Project. By using the enrichment in signals of adaptation at approximately 4500 host loci that interact with specific types of viruses, we provide evidence that RNA viruses have driven a particularly large number of adaptive events across diverse human populations. These results suggest that different types of viruses may have exerted different selective pressures during human evolution. Knowledge of these past selective pressures will provide a deeper evolutionary perspective on current pathogenic threats. This article is part of the theme issue 'Insights into health and disease from ancient biomolecules'.

    View details for DOI 10.1098/rstb.2019.0575

    View details for PubMedID 33012231

  • Genetic Adaptation in New York City Rats. Genome biology and evolution Harpak, A., Garud, N., Rosenberg, N. A., Petrov, D. A., Combs, M., Pennings, P. S., Munshi-South, J. 2020


    Brown rats (Rattus norvegicus) thrive in urban environments by navigating the anthropocentric environment and taking advantage of human resources and by-products. From the human perspective, rats are a chronic problem that causes billions of dollars in damage to agriculture, health and infrastructure. Did genetic adaptation play a role in the spread of rats in cities? To approach this question, we collected whole-genome sequences from 29 brown rats from New York City (NYC) and scanned for genetic signatures of adaptation. We tested for (i) high-frequency, extended haplotypes that could indicate selective sweeps and (ii) loci of extreme genetic differentiation between the NYC sample and a sample from the presumed ancestral range of brown rats in northeast China. We found candidate selective sweeps near or inside genes associated with metabolism, diet, the nervous system and locomotory behavior. Patterns of differentiation between NYC and Chinese rats at putative sweep loci suggest that many sweeps began after the split from the ancestral population. Together, our results suggest several hypotheses on adaptation in rats living in close proximity to humans.

    View details for DOI 10.1093/gbe/evaa247

    View details for PubMedID 33211096

  • Genetic determinants of EGFR-driven lung cancer growth and therapeutic response in vivo Foggetti, G., Li, C., Cai, H., Lin, W., Ayeni, D., Hastings, K., Andrejka, L., Maghini, D., Homer, R., Petrov, D. A., Winslow, M. M., Politi, K. AMER ASSOC CANCER RESEARCH. 2020
  • Multiplexed functional cancer genomics. Cai, H., Li, C., Chew, S., Yousefi, M., Foggetti, G., Lin, W., Rogers, Z. N., Winters, I. P., McFarland, C. D., Politi, K., Swanton, C., Petrov, D. A., Winslow, M. M. AMER ASSOC CANCER RESEARCH. 2020: 23
  • Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila melanogaster. Genetics Machado, H. E., Lawrie, D. S., Petrov, D. A. 2020; 214 (2): 511-528


    Codon usage bias (CUB), where certain codons are used more frequently than expected by chance, is a ubiquitous phenomenon and occurs across the tree of life. The dominant paradigm is that the proportion of preferred codons is set by weak selection. While experimental changes in codon usage have at times shown large phenotypic effects in contrast to this paradigm, genome-wide population genetic estimates have supported the weak selection model. Here we use deep genomic population sequencing of two Drosophila melanogaster populations to measure selection on synonymous sites in a way that allowed us to estimate the prevalence of both weak and strong purifying selection. We find that selection in favor of preferred codons ranges from weak (|Nes| ∼ 1) to strong (|Nes| > 10), with strong selection acting on 10-20% of synonymous sites in preferred codons. While previous studies indicated that selection at synonymous sites could be strong, this is the first study to detect and quantify strong selection specifically at the level of CUB. Further, we find that CUB-associated polymorphism accounts for the majority of strong selection on synonymous sites, with secondary contributions of splicing (selection on alternatively spliced genes, splice junctions, and spliceosome-bound sites) and transcription factor binding. Our findings support a new model of CUB and indicate that the functional importance of CUB, as well as synonymous sites in general, have been underestimated.

    View details for DOI 10.1534/genetics.119.302542

    View details for PubMedID 33954361

  • Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data. BMC biology Armstrong, E. E., Taylor, R. W., Miller, D. E., Kaelin, C. B., Barsh, G. S., Hadly, E. A., Petrov, D. 2020; 18 (1): 3


    BACKGROUND: The lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly from a captive African lion from the Exotic Feline Rescue Center (Center Point, IN) as a resource for current and subsequent genetic work of the sole social species of the Panthera clade.RESULTS: Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length of runs of homozygosity across lion genomes, indicating contrasting histories of recent and possibly intense inbreeding and bottleneck events. Demographic analyses reveal similar ancient histories across all individuals during the Pleistocene except the Asiatic lion, which shows a more rapid decline in population size. We show a substantial influence on the reference genome choice in the inference of demographic history and heterozygosity.CONCLUSIONS: We demonstrate that the choice of reference genome is important when comparing heterozygosity estimates across species and those inferred from different references should not be compared to each other. In addition, estimates of heterozygosity or the amount or length of runs of homozygosity should not be taken as reflective of a species, as these can differ substantially among individuals. This high-quality genome will greatly aid in the continuing research and conservation efforts for the lion, which is rapidly moving towards becoming a species in danger of extinction.

    View details for DOI 10.1186/s12915-019-0734-5

    View details for PubMedID 31915011

  • Fitness variation across subtle environmental perturbations reveals local modularity and global pleiotropy of adaptation. eLife Kinsler, G. n., Geiler-Samerotte, K. n., Petrov, D. A. 2020; 9


    Building a genotype-phenotype-fitness map of adaptation is a central goal in evolutionary biology. It is difficult even when adaptive mutations are known because it is hard to enumerate which phenotypes make these mutations adaptive. We address this problem by first quantifying how the fitness of hundreds of adaptive yeast mutants responds to subtle environmental shifts. We then model the number of phenotypes these mutations collectively influence by decomposing these patterns of fitness variation. We find that a small number of inferred phenotypes can predict fitness of the adaptive mutations near their original glucose-limited evolution condition. Importantly, inferred phenotypes that matter little to fitness at or near the evolution condition can matter strongly in distant environments. This suggests that adaptive mutations are locally modular-affecting a small number of phenotypes that matter to fitness in the environment where they evolved-yet globally pleiotropic-affecting additional phenotypes that may reduce or improve fitness in new environments.

    View details for DOI 10.7554/eLife.61271

    View details for PubMedID 33263280

  • Accurate Allele Frequencies from Ultra-low Coverage Pool-Seq Samples in Evolve-and-Resequence Experiments. G3 (Bethesda, Md.) Tilk, S., Bergland, A., Goodman, A., Schmidt, P., Petrov, D., Greenblum, S. 2019


    Evolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.

    View details for DOI 10.1534/g3.119.400755

    View details for PubMedID 31636085

  • Single nucleotide mapping of trait space reveals Pareto fronts that constrain adaptation. Nature ecology & evolution Li, Y., Petrov, D. A., Sherlock, G. 2019


    Trade-offs constrain the improvement of performance of multiple traits simultaneously. Such trade-offs define Pareto fronts, which represent a set of optimal individuals that cannot be improved in any one trait without reducing performance in another. Surprisingly, experimental evolution often yields genotypes with improved performance in all measured traits, perhaps indicating an absence of trade-offs at least in the short term. Here we densely sample adaptive mutations in Saccharomyces cerevisiae to ask whether first-step adaptive mutations result in trade-offs during the growth cycle. We isolated thousands of adaptive clones evolved under carefully chosen conditions and quantified their performances in each part of the growth cycle. We too find that some first-step adaptive mutations can improve all traits to a modest extent. However, our dense sampling allowed us to identify trade-offs and establish the existence of Pareto fronts between fermentation and respiration, and between respiration and stationary phases. Moreover, we establish that no single mutation in the ancestral genome can circumvent the detected trade-offs. Finally, we sequenced hundreds of these adaptive clones, revealing new targets of adaptation and defining the genetic basis of the identified trade-offs.

    View details for DOI 10.1038/s41559-019-0993-0

    View details for PubMedID 31611676

  • Microbiome composition shapes rapid genomic adaptation of Drosophila melanogaster. Proceedings of the National Academy of Sciences of the United States of America Rudman, S. M., Greenblum, S., Hughes, R. C., Rajpurohit, S., Kiratli, O., Lowder, D. B., Lemmon, S. G., Petrov, D. A., Chaston, J. M., Schmidt, P. 2019


    Population genomic data has revealed patterns of genetic variation associated with adaptation in many taxa. Yet understanding the adaptive process that drives such patterns is challenging; it requires disentangling the ecological agents of selection, determining the relevant timescales over which evolution occurs, and elucidating the genetic architecture of adaptation. Doing so for the adaptation of hosts to their microbiome is of particular interest with growing recognition of the importance and complexity of host-microbe interactions. Here, we track the pace and genomic architecture of adaptation to an experimental microbiome manipulation in replicate populations of Drosophila melanogaster in field mesocosms. Shifts in microbiome composition altered population dynamics and led to divergence between treatments in allele frequencies, with regions showing strong divergence found on all chromosomes. Moreover, at divergent loci previously associated with adaptation across natural populations, we found that the more common allele in fly populations experimentally enriched for a certain microbial group was also more common in natural populations with high relative abundance of that microbial group. These results suggest that microbiomes may be an agent of selection that shapes the pattern and process of adaptation and, more broadly, that variation in a single ecological factor within a complex environment can drive rapid, polygenic adaptation over short timescales.

    View details for DOI 10.1073/pnas.1907787116

    View details for PubMedID 31527278

  • Evolutionary Dynamics in Structured Populations Under Strong Population Genetic Forces. G3 (Bethesda, Md.) Feder, A. F., Pennings, P. S., Hermisson, J., Petrov, D. A. 2019


    In the long-term neutral equilibrium, high rates of migration between subpopulations result in little population differentiation . However, in the short-term, even very abundant migration may not be enough for subpopulations to equilibrate immediately. In this study, we investigate dynamical patterns of short-term population differentiation in adapting populations via stochastic and analytical modeling through time. We characterize a regime in which selection and migration interact to create non-monotonic patterns of population differentiation over time when migration is weaker than selection, but stronger than drift. We demonstrate how these patterns can be leveraged to estimate high migration rates using approximate Bayesian computation. We apply this approach to estimate fast migration in a rapidly adapting intra-host Simian-HIV population sampled from different anatomical locations. We find differences in estimated migration rates between different compartments, even though all are above N e m = 1. This work demonstrates how studying demographic processes on the timescale of selective sweeps illuminates processes too fast to leave signatures on neutral timescales.

    View details for DOI 10.1534/g3.119.400605

    View details for PubMedID 31462443

  • Exploiting selection at linked sites to infer the rate and strength of adaptation NATURE ECOLOGY & EVOLUTION Uricchio, L. H., Petrov, D. A., Enard, D. 2019; 3 (6): 977–84
  • Empowering conservation practice with efficient and economical genotyping from poor quality samples. Methods in ecology and evolution Natesh, M., Taylor, R. W., Truelove, N. K., Hadly, E. A., Palumbi, S. R., Petrov, D. A., Ramakrishnan, U. 2019; 10 (6): 853-859


    Moderate- to high-density genotyping (100 + SNPs) is widely used to determine and measure individual identity, relatedness, fitness, population structure and migration in wild populations.However, these important tools are difficult to apply when high-quality genetic material is unavailable. Most genomic tools are developed for high-quality DNA sources from laboratory or medical settings. As a result, most genetic data from market or field settings is limited to easily amplified mitochondrial DNA or a few microsatellites.To enable genotyping in conservation contexts, we used next-generation sequencing of multiplex PCR products from very low-quality DNA extracted from faeces, hair and cooked samples. We demonstrated utility and wide-ranging potential application in endangered wild tigers and tracking commercial trade in Caribbean queen conch.We genotyped 100 SNPs from degraded tiger samples to identify individuals, discern close relatives and detect population differentiation. Co-occurring carnivores do not amplify (e.g. Indian wild dog/dhole) or are monomorphic (e.g. leopard). Sixty-two SNPs from conch fritters and field-collected samples were used to test relatedness and detect population structure.We provide proof of concept for a rapid, simple, cost-effective and scalable method (for both samples and number of loci), a framework that can be applied to other conservation scenarios previously limited by low-quality DNA samples. These approaches provide a critical advance for wildlife monitoring and forensics, open the door to field-ready testing, and will strengthen the use of science in policy decisions and wildlife trade.

    View details for DOI 10.1111/2041-210X.13173

    View details for PubMedID 31511786

    View details for PubMedCentralID PMC6738957

  • Empowering conservation practice with efficient and economical genotyping from poor quality samples METHODS IN ECOLOGY AND EVOLUTION Natesh, M., Taylor, R. W., Truelove, N. K., Hadly, E. A., Palumbi, S. R., Petrov, D. A., Ramakrishnan, U. 2019; 10 (6): 853–59
  • Exploiting selection at linked sites to infer the rate and strength of adaptation. Nature ecology & evolution Uricchio, L. H., Petrov, D. A., Enard, D. 2019


    Genomic data encode past evolutionary events and have the potential to reveal the strength, rate and biological drivers of adaptation. However, joint estimation of adaptation rate (alpha) and adaptation strength remains challenging because evolutionary processes such as demography, linkage and non-neutral polymorphism can confound inference. Here, we exploit the influence of background selection to reduce the fixation rate of weakly beneficial alleles to jointly infer the strength and rate of adaptation. We develop a McDonald-Kreitman-based method to infer adaptation rate and strength, and estimate alpha=0.135 in human protein-coding sequences, 72% of which is contributed by weakly adaptive variants. We show that, in this adaptation regime, alpha is reduced ~25% by linkage genome-wide. Moreover, we show that virus-interacting proteins undergo adaptation that is both stronger and nearly twice as frequent as the genome average (alpha=0.224, 56% due to strongly beneficial alleles). Our results suggest that, while most adaptation in human proteins is weakly beneficial, adaptation to viruses is often strongly beneficial. Our method provides a robust framework for estimation of adaptation rate and strength across species.

    View details for PubMedID 31061475

  • Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads GIGASCIENCE Armstrong, E. E., Taylor, R. W., Prost, S., Blinston, P., van der Meer, E., Madzikanda, H., Mufute, O., Mandisodza-Chikerema, R., Stuelpnagel, J., Sillero-Zubiri, C., Petrov, D. 2019; 8 (2)
  • Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila. PLoS genetics Rech, G. E., Bogaerts-Marquez, M., Barron, M. G., Merenciano, M., Villanueva-Canas, J. L., Horvath, V., Fiston-Lavier, A., Luyten, I., Venkataram, S., Quesneville, H., Petrov, D. A., Gonzalez, J. 2019; 15 (2): e1007900


    Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster, in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well.

    View details for PubMedID 30753202

  • Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila PLOS GENETICS Rech, G. E., Bogaerts-Marquez, M., Barron, M. G., Merenciano, M., Luis Villanueva-Canas, J., Horvath, V., Fiston-Lavier, A., Luyten, I., Venkataram, S., Quesneville, H., Petrov, D. A., Gonzalez, J. 2019; 15 (2)
  • Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila melanogaster. Genetics Machado, H. E., Lawrie, D. S., Petrov, D. A. 2019


    Codon usage bias (CUB), where certain codons are used more frequently than expected by chance, is a ubiquitous phenomenon and occurs across the tree of life. The dominant paradigm is that the proportion of preferred codons is set by weak selection. While experimental changes in codon usage have at times shown large phenotypic effects in contrast to this paradigm, genome-wide population genetic estimates have supported the weak selection model. Here we use deep genomic population sequencing of two Drosophila melanogaster populations to measure selection on synonymous sites in a way that allowed us to estimate the prevalence of both weak and strong purifying selection. We find that selection in favor of preferred codons ranges from weak (|Nes| ∼ 1) to strong (|Nes| > 10), with strong selection acting on 10-20% of synonymous sites in preferred codons. While previous studies indicated that selection at synonymous sites could be strong, this is the first study to detect and quantify strong selection specifically at the level of CUB. Further, we find that CUB-associated polymorphism accounts for the majority of strong selection on synonymous sites, with secondary contributions of splicing (selection on alternatively spliced genes, splice junctions and spliceosome-bound sites) and transcription factor binding. Our findings support a new model of CUB and indicate that the functional importance of CUB, as well as synonymous sites in general, have been underestimated.

    View details for DOI 10.1534/genetics.119.302542

    View details for PubMedID 31871131

  • MACHINE LEARNING ANALYSIS OF ULTRA-DEEP WHOLE-GENOME SEQUENCING IN HUMAN BRAIN REVEALS SOMATIC GENOMIC RETROTRANSPOSITION IN GLIA AS WELL AS IN NEURONS Urban, A., Zhu, X., Zhou, B., Sloan, S., Pattni, R., Fiston-Lavier, A., Snyder, M., Petrov, D., Abyzov, A., Vaccarino, F., Barres, B., Vogel, H., Tamminga, C., Levinson, D. ELSEVIER. 2019: 1240
  • Tissue-Specific cis-Regulatory Divergence Implicates eloF in Inhibiting Interspecies Mating in Drosophila CURRENT BIOLOGY Combs, P. A., Krupp, J. J., Khosla, N. M., Bua, D., Petrov, D. A., Levine, J. D., Fraser, H. B. 2018; 28 (24): 3969-+
  • Tissue-Specific cis-Regulatory Divergence Implicates eloF in Inhibiting Interspecies Mating in Drosophila. Current biology : CB Combs, P. A., Krupp, J. J., Khosla, N. M., Bua, D., Petrov, D. A., Levine, J. D., Fraser, H. B. 2018


    Reproductive isolation is a key component of speciation. In many insects, a major driver of this isolation is cuticular hydrocarbon pheromones, which help to identify potential intraspecific mates [1-3]. When the distributions of related species overlap, there may be strong selection on mate choice for intraspecific partners [4-9] because interspecific hybridization carries significant fitness costs [10]. Drosophila hasbeen a key model for the investigation of reproductive isolation; although both male and female mate choices have been extensively investigated [6,11-16], the genes underlying species recognition remain largely unknown. To explore the molecular mechanisms underlying Drosophila speciation, we measured tissue-specific cis-regulatory divergence using RNA sequencing (RNA-seq) in D.simulans * D.sechellia hybrids. By focusing on cis-regulatory changes specific to female oenocytes, the tissue that produces cuticular hydrocarbons, we rapidly identified a small number of candidate genes. We found that one of these, the fatty acid elongase eloF, broadly affects the hydrocarbons present on D.sechellia and D.melanogaster females, as well asthe propensity of D.simulans males to mate withthem. Therefore, cis-regulatory changes in eloF may be a major driver in the sexual isolation of D.simulans from multiple other species. Our RNA-seq approach proved to be far more efficient than quantitative trait locus (QTL) mapping in identifying candidate genes; the same framework can be used to pinpoint candidate drivers of cis-regulatory divergence in traits differing between any interfertile species.

    View details for PubMedID 30503619

  • Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads. GigaScience Armstrong, E. E., Taylor, R. W., Prost, S., Blinston, P., van der Meer, E., Madzikanda, H., Mufute, O., Mandisodza-Chikerema, R., Stuelpnagel, J., Sillero-Zubiri, C., Petrov, D. 2018


    Background: A high-quality reference genome assembly is a valuable tool for the study of non-model organisms. Genomic techniques can provide important insights about past population sizes, local adaptation, and aid in the development of breeding management plans. This information is important for fields like conservation genetics, where endangered species require critical and immediate attention. However, funding for genomic-based methods can be sparse for conservation projects, as costs for general species management can consume budgets.Findings: Here we report the generation of high-quality reference genomes for the African wild dog (Lycaon pictus) at a low cost (< $3000), thereby facilitating future studies of this endangered canid. We generated assemblies for three individuals using the linked-read 10x Genomics Chromium system. The most continuous assembly had a scaffold and contig N50 of 21 Mb and 83 Kb, respectively, and completely reconstructed 95% of a set of conserved mammalian genes. Additionally, we estimate the heterozygosity and demographic history of African wild dogs, revealing that although they have historically low effective population sizes, heterozygosity remains high.Conclusions: We show that 10x Genomics Chromium data can be used to effectively generate high-quality genomes from Illumina short-read data of intermediate coverage (25-50x). Interestingly, the wild dog shows higher heterozygosity than other species of conservation concern, possibly due to its behavioral ecology. The availability of reference genomes for non-model organisms will facilitate better genetic monitoring of threatened species such as the African wild dog and help conservationists to better understand the ecology and adaptability of those species in a changing environment.

    View details for PubMedID 30346553

  • Evidence that RNA Viruses Drove Adaptive Introgression between Neanderthals and Modem Humans CELL Enard, D., Petrov, D. A. 2018; 175 (2): 360-+


    Neanderthals and modern humans interbred at least twice in the past 100,000 years. While there is evidence that most introgressed DNA segments from Neanderthals to modern humans were removed by purifying selection, less is known about the adaptive nature of introgressed sequences that were retained. We hypothesized that interbreeding between Neanderthals and modern humans led to (1) the exposure of each species to novel viruses and (2) the exchange of adaptive alleles that provided resistance against these viruses. Here, we find that long, frequent-and more likely adaptive-segments of Neanderthal ancestry in modern humans are enriched for proteins that interact with viruses (VIPs). We found that VIPs that interact specifically with RNA viruses were more likely to belong to introgressed segments in modern Europeans. Our results show that retained segments of Neanderthal ancestry can be used to detect ancient epidemics.

    View details for PubMedID 30290142

    View details for PubMedCentralID PMC6176737

  • Spatiotemporal dynamics and genome-wide association genome-wide association analysis of desiccation tolerance in Drosophila melanogaster MOLECULAR ECOLOGY Rajpurohit, S., Gefen, E., Bergland, A. O., Petrov, D. A., Gibbs, A. G., Schmidt, P. S. 2018; 27 (17): 3525–40


    Water availability is a major environmental challenge to a variety of terrestrial organisms. In insects, desiccation tolerance varies predictably over spatial and temporal scales and is an important physiological determinant of fitness in natural populations. Here, we examine the dynamics of desiccation tolerance in North American populations of Drosophila melanogaster using: (a) natural populations sampled across latitudes and seasons; (b) experimental evolution in field mesocosms over seasonal time; (c) genome-wide associations to identify SNPs/genes associated with variation for desiccation tolerance; and (d) subsequent analysis of patterns of clinal/seasonal enrichment in existing pooled sequencing data of populations sampled in both North America and Australia. A cline in desiccation tolerance was observed, for which tolerance exhibited a positive association with latitude; tolerance also varied predictably with culture temperature, demonstrating a significant degree of thermal plasticity. Desiccation tolerance evolved rapidly in field mesocosms, although only males showed differences in desiccation tolerance between spring and autumn collections from natural populations. Water loss rates did not vary significantly among latitudinal or seasonal populations; however, changes in metabolic rates during prolonged exposure to dry conditions are consistent with increased tolerance in higher latitude populations. Genome-wide associations in a panel of inbred lines identified twenty-five SNPs in twenty-one loci associated with sex-averaged desiccation tolerance, but there is no robust signal of spatially varying selection on genes associated with desiccation tolerance. Together, our results suggest that desiccation tolerance is a complex and important fitness component that evolves rapidly and predictably in natural populations.

    View details for PubMedID 30051644

  • Functional lung cancer genomics through in vivo genome editing Winters, I. P., Rogers, Z. N., McFarland, C. D., Lalgudi, P. V., Chiou, S., Kay, M. A., Petrov, D., Winslow, M. M. AMER ASSOC CANCER RESEARCH. 2018
  • Tripolar chromosome segregation drives the association between maternal genotype at variants spanning PLK4 and aneuploidy in human preimplantation embryos HUMAN MOLECULAR GENETICS McCoy, R. C., Newnham, L. J., Ottolini, C. S., Hoffmann, E. R., Chatzimeletiou, K., Cornejo, O. E., Zhan, Q., Zaninovic, N., Rosenwaks, Z., Petrov, D. A., Demko, Z. P., Sigurjonsson, S., Handyside, A. H. 2018; 27 (14): 2573–85


    Aneuploidy is prevalent in human embryos and is the leading cause of pregnancy loss. Many aneuploidies arise during oogenesis, increasing with maternal age. Superimposed on these meiotic aneuploidies are frequent errors occurring during early mitotic divisions, contributing to widespread chromosomal mosaicism. Here we reanalyzed a published dataset comprising preimplantation genetic testing for aneuploidy in 24 653 blastomere biopsies from day-3 cleavage-stage embryos, as well as 17 051 trophectoderm biopsies from day-5 blastocysts. We focused on complex abnormalities that affected multiple chromosomes simultaneously, seeking insights into their formation. In addition to well-described patterns such as triploidy and haploidy, we identified 4.7% of blastomeres possessing characteristic hypodiploid karyotypes. We inferred this signature to have arisen from tripolar chromosome segregation in normally fertilized diploid zygotes or their descendant diploid cells. This could occur via segregation on a tripolar mitotic spindle or by rapid sequential bipolar mitoses without an intervening S-phase. Both models are consistent with time-lapse data from an intersecting set of 77 cleavage-stage embryos, which were enriched for the tripolar signature among embryos exhibiting abnormal cleavage. The tripolar signature was strongly associated with common maternal genetic variants spanning the centrosomal regulator PLK4, driving the association we previously reported with overall mitotic errors. Our findings are consistent with the known capacity of PLK4 to induce tripolar mitosis or precocious M-phase upon dysregulation. Together, our data support tripolar chromosome segregation as a key mechanism generating complex aneuploidy in cleavage-stage embryos and implicate maternal genotype at a quantitative trait locus spanning PLK4 as a factor influencing its occurrence.

    View details for PubMedID 29688390

    View details for PubMedCentralID PMC6030883

  • Quantitative and multiplex analysis of the genomic determinants of tumorigenesis. Winters, I., Rogers, Z., McFarland, C., Petrov, D., Winslow, M. M. AMER ASSOC CANCER RESEARCH. 2018: 15–16
  • Mapping the in vivo fitness landscape of lung adenocarcinoma tumor suppression in mice NATURE GENETICS Rogers, Z. N., McFarland, C. D., Winters, I. P., Seoane, J. A., Brady, J. J., Yoon, S., Curtis, C., Petrov, D. A., Winslow, M. M. 2018; 50 (4): 483-+


    The functional impact of most genomic alterations found in cancer, alone or in combination, remains largely unknown. Here we integrate tumor barcoding, CRISPR/Cas9-mediated genome editing and ultra-deep barcode sequencing to interrogate pairwise combinations of tumor suppressor alterations in autochthonous mouse models of human lung adenocarcinoma. We map the tumor suppressive effects of 31 common lung adenocarcinoma genotypes and identify a landscape of context dependence and differential effect strengths.

    View details for PubMedID 29610476

  • Hidden Complexity of Yeast Adaptation under Simple Evolutionary Conditions CURRENT BIOLOGY Li, Y., Venkataram, S., Agarwala, A., Dunn, B., Petrov, D. A., Sherlock, G., Fisher, D. S. 2018; 28 (4): 515-+


    Few studies have "quantitatively" probed how adaptive mutations result in increased fitness. Even in simple microbial evolution experiments, with full knowledge of the underlying mutations and specific growth conditions, it is challenging to determine where within a growth-saturation cycle those fitness gains occur. A common implicit assumption is that most benefits derive from an increased exponential growth rate. Here, we instead show that, in batch serial transfer experiments, adaptive mutants' fitness gains can be dominated by benefits that are accrued in one growth cycle, but not realized until the next growth cycle. For thousands of evolved clones (most with only a single mutation), we systematically varied the lengths of fermentation, respiration, and stationary phases to assess how their fitness, as measured by barcode sequencing, depends on these phases of the growth-saturation-dilution cycles. These data revealed that, whereas all adaptive lineages gained similar and modest benefits from fermentation, most of the benefits for the highest fitness mutants came instead from the time spent in respiration. From monoculture and high-resolution pairwise fitness competition experiments for a dozen of these clones, we determined that the benefits "accrued" during respiration are only largely "realized" later as a shorter duration of lag phase in the following growth cycle. These results reveal hidden complexities of the adaptive process even under ostensibly simple evolutionary conditions, in which fitness gains can accrue during time spent in a growth phase with little cell division, and reveal that the memory of those gains can be realized in the subsequent growth cycle.

    View details for PubMedID 29429618

    View details for PubMedCentralID PMC5823527

  • Rapid seasonal evolution in innate immunity of wild Drosophila melanogaster PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES Behrman, E. L., Howick, V. M., Kapun, M., Staubach, F., Bergland, A. O., Petrov, D. A., Lazzaro, B. P., Schmidt, P. S. 2018; 285 (1870)


    Understanding the rate of evolutionary change and the genetic architecture that facilitates rapid adaptation is a current challenge in evolutionary biology. Comparative studies show that genes with immune function are among the most rapidly evolving genes across a range of taxa. Here, we use immune defence in natural populations of Drosophila melanogaster to understand the rate of evolution in natural populations and the genetics underlying rapid change. We probed the immune system using the natural pathogens Enterococcus faecalis and Providencia rettgeri to measure post-infection survival and bacterial load of wild D. melanogaster populations collected across seasonal time along a latitudinal transect along eastern North America (Massachusetts, Pennsylvania and Virginia). There are pronounced and repeatable changes in the immune response over the approximately 10 generations between spring and autumn collections, with a significant but less distinct difference observed among geographical locations. Genes with known immune function are not enriched among alleles that cycle with seasonal time, but the immune function of a subset of seasonally cycling alleles in immune genes was tested using reconstructed outbred populations. We find that flies containing seasonal alleles in Thioester-containing protein 3 (Tep3) have different functional responses to infection and that epistatic interactions among seasonal Tep3 and Drosomycin-like 6 (Dro6) alleles underlie the immune phenotypes observed in natural populations. This rapid, cyclic response to seasonal environmental pressure broadens our understanding of the complex ecological and genetic interactions determining the evolution of immune defence in natural populations.

    View details for PubMedID 29321302

    View details for PubMedCentralID PMC5784205

  • Seasonally fluctuating selection can maintain polymorphism at many loci via segregation lift PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wittmann, M. J., Bergland, A. O., Feldman, M. W., Schmidt, P. S., Petrov, D. A. 2017; 114 (46): E9932–E9941


    Most natural populations are affected by seasonal changes in temperature, rainfall, or resource availability. Seasonally fluctuating selection could potentially make a large contribution to maintaining genetic polymorphism in populations. However, previous theory suggests that the conditions for multilocus polymorphism are restrictive. Here, we explore a more general class of models with multilocus seasonally fluctuating selection in diploids. In these models, the multilocus genotype is mapped to fitness in two steps. The first mapping is additive across loci and accounts for the relative contributions of heterozygous and homozygous loci-that is, dominance. The second step uses a nonlinear fitness function to account for the strength of selection and epistasis. Using mathematical analysis and individual-based simulations, we show that stable polymorphism at many loci is possible if currently favored alleles are sufficiently dominant. This general mechanism, which we call "segregation lift," requires seasonal changes in dominance, a phenomenon that may arise naturally in situations with antagonistic pleiotropy and seasonal changes in the relative importance of traits for fitness. Segregation lift works best under diminishing-returns epistasis, is not affected by problems of genetic load, and is robust to differences in parameters across loci and seasons. Under segregation lift, loci can exhibit conspicuous seasonal allele-frequency fluctuations, but often fluctuations may be small and hard to detect. An important direction for future work is to formally test for segregation lift in empirical data and to quantify its contribution to maintaining genetic variation in natural populations.

    View details for PubMedID 29087300

  • High rate of adaptation of mammalian proteins that interact with Plasmodium and related parasites PLOS GENETICS Ebel, E. R., Telis, N., Venkataram, S., Petrov, D. A., Enard, D. 2017; 13 (9): e1007023


    Plasmodium parasites, along with their Piroplasm relatives, have caused malaria-like illnesses in terrestrial mammals for millions of years. Several Plasmodium-protective alleles have recently evolved in human populations, but little is known about host adaptation to blood parasites over deeper evolutionary timescales. In this work, we analyze mammalian adaptation in ~500 Plasmodium- or Piroplasm- interacting proteins (PPIPs) manually curated from the scientific literature. We show that (i) PPIPs are enriched for both immune functions and pleiotropy with other pathogens, and (ii) the rate of adaptation across mammals is significantly elevated in PPIPs, compared to carefully matched control proteins. PPIPs with high pathogen pleiotropy show the strongest signatures of adaptation, but this pattern is fully explained by their immune enrichment. Several pieces of evidence suggest that blood parasites specifically have imposed selection on PPIPs. First, even non-immune PPIPs that lack interactions with other pathogens have adapted at twice the rate of matched controls. Second, PPIP adaptation is linked to high expression in the liver, a critical organ in the parasite life cycle. Finally, our detailed investigation of alpha-spectrin, a major red blood cell membrane protein, shows that domains with particularly high rates of adaptation are those known to interact specifically with P. falciparum. Overall, we show that host proteins that interact with Plasmodium and Piroplasm parasites have experienced elevated rates of adaptation across mammals, and provide evidence that some of this adaptation has likely been driven by blood parasites.

    View details for PubMedID 28957326

  • A quantitative and multiplexed approach to uncover the fitness landscape of tumor suppression in vivo. Nature methods Rogers, Z. N., McFarland, C. D., Winters, I. P., Naranjo, S., Chuang, C., Petrov, D., Winslow, M. M. 2017


    Cancer growth is a multistage, stochastic evolutionary process. While cancer genome sequencing has been instrumental in identifying the genomic alterations that occur in human tumors, the consequences of these alterations on tumor growth remain largely unexplored. Conventional genetically engineered mouse models enable the study of tumor growth in vivo, but they are neither readily scalable nor sufficiently quantitative to unravel the magnitude and mode of action of many tumor-suppressor genes. Here, we present a method that integrates tumor barcoding with ultradeep barcode sequencing (Tuba-seq) to interrogate tumor-suppressor function in mouse models of human cancer. Tuba-seq uncovers genotype-dependent distributions of tumor sizes. By combining Tuba-seq with multiplexed CRISPR-Cas9-mediated genome editing, we quantified the effects of 11 tumor-suppressor pathways that are frequently altered in human lung adenocarcinoma. Tuba-seq enables the broad quantification of the function of tumor-suppressor genes with unprecedented resolution, parallelization, and precision.

    View details for DOI 10.1038/nmeth.4297

    View details for PubMedID 28530655

  • A spatio-temporal assessment of simian/human immunodeficiency virus (SHIV) evolution reveals a highly dynamic process within the host. PLoS pathogens Feder, A. F., Kline, C., Polacino, P., Cottrell, M., Kashuba, A. D., Keele, B. F., Hu, S., Petrov, D. A., Pennings, P. S., Ambrose, Z. 2017; 13 (5)


    The process by which drug-resistant HIV-1 arises and spreads spatially within an infected individual is poorly understood. Studies have found variable results relating how HIV-1 in the blood differs from virus sampled in tissues, offering conflicting findings about whether HIV-1 throughout the body is homogeneously distributed. However, most of these studies sample only two compartments and few have data from multiple time points. To directly measure how drug resistance spreads within a host and to assess how spatial structure impacts its emergence, we examined serial sequences from four macaques infected with RT-SHIVmne027, a simian immunodeficiency virus encoding HIV-1 reverse transcriptase (RT), and treated with RT inhibitors. Both viral DNA and RNA (vDNA and vRNA) were isolated from the blood (including plasma and peripheral blood mononuclear cells), lymph nodes, gut, and vagina at a median of four time points and RT was characterized via single-genome sequencing. The resulting sequences reveal a dynamic system in which vRNA rapidly acquires drug resistance concomitantly across compartments through multiple independent mutations. Fast migration results in the same viral genotypes present across compartments, but not so fast as to equilibrate their frequencies immediately. The blood and lymph nodes were found to be compartmentalized rarely, while both the blood and lymph node were more frequently different from mucosal tissues. This study suggests that even oft-sampled blood does not fully capture the viral dynamics in other parts of the body, especially the gut where vRNA turnover was faster than the plasma and vDNA retained fewer wild-type viruses than other sampled compartments. Our findings of transient compartmentalization across multiple tissues may help explain the varied results of previous compartmentalization studies in HIV-1.

    View details for DOI 10.1371/journal.ppat.1006358

    View details for PubMedID 28542550

  • Soft Selective Sweeps in Evolutionary Rescue. Genetics Wilson, B. A., Pennings, P. S., Petrov, D. A. 2017


    Evolutionary rescue occurs when a population that is declining in size because of an environmental change is rescued from extinction by genetic adaptation. Evolutionary rescue is an important phenomenon at the intersection of ecology and population genetics, and the study of evolutionary rescue is critical to understanding processes ranging from species conservation to the evolution of drug and pesticide resistance. While most population-genetic models of evolutionary rescue focus on estimating the probability of rescue, we focus on whether one or more adaptive lineages contribute to evolutionary rescue. We find that when evolutionary rescue is likely, it is often driven by soft selective sweeps where multiple adaptive mutations spread through the population simultaneously. We give full analytic results for the probability of evolutionary rescue and the probability that evolutionary rescue occurs via soft selective sweeps. We expect that these results will find utility in understanding the genetic signatures associated with various evolutionary rescue scenarios in large populations, such as the evolution of drug resistance in viral, bacterial, or eukaryotic pathogens.

    View details for DOI 10.1534/genetics.116.191478

    View details for PubMedID 28213477

    View details for PubMedCentralID PMC5378114

  • Seeking Goldilocks During Evolution of Drug Resistance. PLoS biology Sherlock, G., Petrov, D. A. 2017; 15 (2)


    Speciation can occur when a population is split and the resulting subpopulations evolve independently, accumulating mutations over time that make them incompatible with one another. It is thought that such incompatible mutations, known as Bateson-Dobzhansky-Muller (BDM) incompatibilities, may arise when the two populations face different environments, which impose different selective pressures. However, a new study in PLOS Biology by Ono et al. finds that the first-step mutations selected in yeast populations evolving in parallel in the presence of the antifungal drug nystatin are frequently incompatible with one another. This incompatibility is environment dependent, such that the combination of two incompatible alleles can become advantageous under increasing drug concentrations. This suggests that the activity for the affected pathway must have an optimum level, the value of which varies according to the drug concentration. It is likely that many biological processes similarly have an optimum under a given environment and many single-step adaptive ways to reach it; thus, not only should BDM incompatibilities commonly arise during parallel evolution, they might be virtually inevitable, as the combination of two such steps is likely to overshoot the optimum.

    View details for DOI 10.1371/journal.pbio.2001872

    View details for PubMedID 28158184

    View details for PubMedCentralID PMC5291373

  • Extremely Rare Polymorphisms in Saccharomyces cerevisiae Allow Inference of the Mutational Spectrum. PLoS genetics Zhu, Y. O., Sherlock, G., Petrov, D. A. 2017; 13 (1)


    The characterization of mutational spectra is usually carried out in one of three ways-by direct observation through mutation accumulation (MA) experiments, through parent-offspring sequencing, or by indirect inference from sequence data. Direct observations of spontaneous mutations with MA experiments are limited, given (i) the rarity of spontaneous mutations, (ii) applicability only to laboratory model species with short generation times, and (iii) the possibility that mutational spectra under lab conditions might be different from those observed in nature. Trio sequencing is an elegant solution, but it is not applicable in all organisms. Indirect inference, usually from divergence data, faces no such technical limitations, but rely upon critical assumptions regarding the strength of natural selection that are likely to be violated. Ideally, new mutational events would be directly observed before the biased filter of selection, and without the technical limitations common to lab experiments. One approach is to identify very young mutations from population sequencing data. Here we do so by leveraging two characteristics common to all new mutations-new mutations are necessarily rare in the population, and absent in the genomes of immediate relatives. From 132 clinical yeast strains, we were able to identify 1,425 putatively new mutations and show that they exhibit extremely low signatures of selection, as well as display a mutational spectrum that is similar to that identified by a large scale MA experiment. We verify that population sequencing data are a potential wealth of information for inferring mutational spectra, and should be considered for analysis where MA experiments are infeasible or especially tedious.

    View details for DOI 10.1371/journal.pgen.1006455

    View details for PubMedID 28046117

    View details for PubMedCentralID PMC5207638

  • Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations. Genome research Assaf, Z. J., Tilk, S. n., Park, J. n., Siegal, M. L., Petrov, D. A. 2017; 27 (12): 1988–2000


    Mutations provide the raw material of evolution, and thus our ability to study evolution depends fundamentally on having precise measurements of mutational rates and patterns. We generate a data set for this purpose using (1) de novo mutations from mutation accumulation experiments and (2) extremely rare polymorphisms from natural populations. The first, mutation accumulation (MA) lines are the product of maintaining flies in tiny populations for many generations, therefore rendering natural selection ineffective and allowing new mutations to accrue in the genome. The second, rare genetic variation from natural populations allows the study of mutation because extremely rare polymorphisms are relatively unaffected by the filter of natural selection. We use both methods in Drosophila melanogaster, first generating our own novel data set of sequenced MA lines and performing a meta-analysis of all published MA mutations (∼2000 events) and then identifying a high quality set of ∼70,000 extremely rare (≤0.1%) polymorphisms that are fully validated with resequencing. We use these data sets to precisely measure mutational rates and patterns. Highlights of our results include: a high rate of multinucleotide mutation events at both short (∼5 bp) and long (∼1 kb) genomic distances, showing that mutation drives GC content lower in already GC-poor regions, and using our precise context-dependent mutation rates to predict long-term evolutionary patterns at synonymous sites. We also show that de novo mutations from independent MA experiments display similar patterns of single nucleotide mutation and well match the patterns of mutation found in natural populations.

    View details for PubMedID 29079675

  • Multiplexed in vivo homology-directed repair and tumor barcoding enables parallel quantification of Kras variant oncogenicity. Nature communications Winters, I. P., Chiou, S. H., Paulk, N. K., McFarland, C. D., Lalgudi, P. V., Ma, R. K., Lisowski, L. n., Connolly, A. J., Petrov, D. A., Kay, M. A., Winslow, M. M. 2017; 8 (1): 2053


    Large-scale genomic analyses of human cancers have cataloged somatic point mutations thought to initiate tumor development and sustain cancer growth. However, determining the functional significance of specific alterations remains a major bottleneck in our understanding of the genetic determinants of cancer. Here, we present a platform that integrates multiplexed AAV/Cas9-mediated homology-directed repair (HDR) with DNA barcoding and high-throughput sequencing to simultaneously investigate multiple genomic alterations in de novo cancers in mice. Using this approach, we introduce a barcoded library of non-synonymous mutations into hotspot codons 12 and 13 of Kras in adult somatic cells to initiate tumors in the lung, pancreas, and muscle. High-throughput sequencing of barcoded KrasHDRalleles from bulk lung and pancreas reveals surprising diversity in Kras variant oncogenicity. Rapid, cost-effective, and quantitative approaches to simultaneously investigate the function of precise genomic alterations in vivo will help uncover novel biological and clinically actionable insights into carcinogenesis.

    View details for PubMedID 29233960

    View details for PubMedCentralID PMC5727199

  • Adaptive dynamics of cuticular hydrocarbons in Drosophila JOURNAL OF EVOLUTIONARY BIOLOGY Rajpurohit, S., Hanus, R., Vrkoslav, V., Behrman, E. L., Bergland, A. O., Petrov, D., Cvacka, J., Schmidt, P. S. 2017; 30 (1): 66-80

    View details for DOI 10.1111/jeb.12988

    View details for Web of Science ID 000394852200006

  • Adaptive dynamics of cuticular hydrocarbons in Drosophila. Journal of evolutionary biology Rajpurohit, S., Hanus, R., Vrkoslav, V., Behrman, E. L., Bergland, A. O., Petrov, D., Cvacka, J., Schmidt, P. S. 2016


    Cuticular hydrocarbons (CHCs) are hydrophobic compounds deposited on the arthropod cuticle that are of functional significance with respect to stress tolerance, social interactions and mating dynamics. We characterized CHC profiles in natural populations of Drosophila melanogaster at five levels: across a latitudinal transect in the eastern United States, as a function of developmental temperature during culture, across seasonal time in replicate years, and as a function of rapid evolution in experimental mesocosms in the field. Furthermore, we also characterized spatial and temporal changes in allele frequencies for SNPs in genes that are associated with the production and chemical profile of CHCs. Our data demonstrate a striking degree of parallelism for clinal and seasonal variation in CHCs in this taxon; CHC profiles also demonstrate significant plasticity in response to rearing temperature, and the observed patterns of plasticity parallel the spatiotemporal patterns observed in nature. We find that these congruent shifts in CHC profiles across time and space are also mirrored by predictable shifts in allele frequencies at SNPs associated with CHC chain length. Finally, we observed rapid and predictable evolution of CHC profiles in experimental mesocosms in the field. Together, these data strongly suggest that CHC profiles respond rapidly and adaptively to environmental parameters that covary with latitude and season, and that this response reflects the process of local adaptation in natural populations of D. melanogaster.

    View details for DOI 10.1111/jeb.12988

    View details for PubMedID 27718537

    View details for PubMedCentralID PMC5214518

  • Development of a Comprehensive Genotype-to-Fitness Map of Adaptation-Driving Mutations in Yeast. Cell Venkataram, S., Dunn, B., Li, Y., Agarwala, A., Chang, J., Ebel, E. R., Geiler-Samerotte, K., Hérissant, L., Blundell, J. R., Levy, S. F., Fisher, D. S., Sherlock, G., Petrov, D. A. 2016; 166 (6): 1585-1596 e22


    Adaptive evolution plays a large role in generating the phenotypic diversity observed in nature, yet current methods are impractical for characterizing the molecular basis and fitness effects of large numbers of individual adaptive mutations. Here, we used a DNA barcoding approach to generate the genotype-to-fitness map for adaptation-driving mutations from a Saccharomyces cerevisiae population experimentally evolved by serial transfer under limiting glucose. We isolated and measured the fitness of thousands of independent adaptive clones and sequenced the genomes of hundreds of clones. We found only two major classes of adaptive mutations: self-diploidization and mutations in the nutrient-responsive Ras/PKA and TOR/Sch9 pathways. Our large sample size and precision of measurement allowed us to determine that there are significant differences in fitness between mutations in different genes, between different paralogs, and even between different classes of mutations within the same gene.

    View details for DOI 10.1016/j.cell.2016.08.002

    View details for PubMedID 27594428

    View details for PubMedCentralID PMC5070919

  • An Intrinsically Disordered Region of the DNA Repair Protein Nbs1 Is a Species-Specific Barrier to Herpes Simplex Virus 1 in Primates. Cell host & microbe Lou, D. I., Kim, E. T., Meyerson, N. R., Pancholi, N. J., Mohni, K. N., Enard, D., Petrov, D. A., Weller, S. K., Weitzman, M. D., Sawyer, S. L. 2016; 20 (2): 178-188


    Humans occasionally transmit herpes simplex virus 1 (HSV-1) to captive primates, who reciprocally harbor alphaherpesviruses poised for zoonotic transmission to humans. To understand the basis for the species-specific restriction of HSV-1 in primates, we simulated what might happen during the cross-species transmission of HSV-1 and found that the DNA repair protein Nbs1 from only some primate species is able to promote HSV-1 infection. The Nbs1 homologs that promote HSV-1 infection also interact with the HSV-1 ICP0 protein. ICP0 interaction mapped to a region of structural disorder in the Nbs1 protein. Chimeras reversing patterns of disorder in Nbs1 reversed titers of HSV-1 produced in the cell. By extending this analysis to 1,237 virus-interacting mammalian proteins, we show that proteins that interact with viruses are highly enriched in disorder, suggesting that viruses commonly interact with host proteins through intrinsically disordered domains.

    View details for DOI 10.1016/j.chom.2016.07.003

    View details for PubMedID 27512903

  • Whole Genome Analysis of 132 Clinical Saccharomyces cerevisiae Strains Reveals Extensive Ploidy Variation G3-GENES GENOMES GENETICS Zhu, Y. O., Sherlock, G., Petrov, D. A. 2016; 6 (8): 2421-2434


    Budding yeast has undergone several independent transitions from commercial to clinical lifestyles. The frequency of such transitions suggests that clinical yeast strains are derived from environmentally available yeast populations, including commercial sources. However, despite their important role in adaptive evolution, the prevalence of polyploidy and aneuploidy has not been extensively analyzed in clinical strains. In this study, we have looked for patterns governing the transition to clinical invasion in the largest screen of clinical yeast isolates to date. In particular, we have focused on the hypothesis that ploidy changes have influenced adaptive processes. We sequenced 144 yeast strains, 132 of which are clinical isolates. We found pervasive large-scale genomic variation in both overall ploidy (34% of strains identified as 3n/4n) and individual chromosomal copy numbers (36% of strains identified as aneuploid). We also found evidence for the highly dynamic nature of yeast genomes, with 35 strains showing partial chromosomal copy number changes and eight strains showing multiple independent chromosomal events. Intriguingly, a lineage identified to be baker's/commercial derived with a unique damaging mutation in NDC80 was particularly prone to polyploidy, with 83% of its members being triploid or tetraploid. Polyploidy was in turn associated with a >2× increase in aneuploidy rates as compared to other lineages. This dataset provides a rich source of information on the genomics of clinical yeast strains and highlights the potential importance of large-scale genomic copy variation in yeast adaptation.

    View details for DOI 10.1534/g3.116.029397/-/DC1

    View details for Web of Science ID 000381282300017

    View details for PubMedID 27317778

    View details for PubMedCentralID PMC4978896

  • Heterozygote Advantage Is a Common Outcome of Adaptation in Saccharomyces cerevisiae GENETICS Sellis, D., Kvitek, D. J., Dunn, B., Sherlock, G., Petrov, D. A. 2016; 203 (3): 1401-?


    Adaptation in diploids is predicted to proceed via mutations that are at least partially dominant in fitness. Recently, we argued that many adaptive mutations might also be commonly overdominant in fitness. Natural (directional) selection acting on overdominant mutations should drive them into the population but then, instead of bringing them to fixation, should maintain them as balanced polymorphisms via heterozygote advantage. If true, this would make adaptive evolution in sexual diploids differ drastically from that of haploids. The validity of this prediction has not yet been tested experimentally. Here, we performed four replicate evolutionary experiments with diploid yeast populations (Saccharomyces cerevisiae) growing in glucose-limited continuous cultures. We sequenced 24 evolved clones and identified initial adaptive mutations in all four chemostats. The first adaptive mutations in all four chemostats were three copy number variations, all of which proved to be overdominant in fitness. The fact that fitness overdominant mutations were always the first step in independent adaptive walks supports the prediction that heterozygote advantage can arise as a common outcome of directional selection in diploids and demonstrates that overdominance of de novo adaptive mutations in diploids is not rare.

    View details for DOI 10.1534/genetics.115.185165

    View details for Web of Science ID 000379473600028

    View details for PubMedID 27194750

    View details for PubMedCentralID PMC4937471

  • Elevated Linkage Disequilibrium and Signatures of Soft Sweeps Are Common in Drosophila melanogaster GENETICS Garud, N. R., Petrov, D. A. 2016; 203 (2): 863-?


    The extent to which selection and demography impact patterns of genetic diversity in natural populations of Drosophila melanogaster is yet to be fully understood. We previously observed that linkage disequilibrium (LD) at scales of ∼10 kb in the Drosophila Genetic Reference Panel (DGRP), consisting of 145 inbred strains from Raleigh, North Carolina, measured both between pairs of sites and as haplotype homozygosity, is elevated above neutral demographic expectations. We also demonstrated that signatures of strong and recent soft sweeps are abundant. However, the extent to which these patterns are specific to this derived and admixed population is unknown. It is also unclear whether these patterns are a consequence of the extensive inbreeding performed to generate the DGRP data. Here we analyze LD statistics in a sample of >100 fully-sequenced strains from Zambia; an ancestral population to the Raleigh population that has experienced little to no admixture and was generated by sequencing haploid embryos rather than inbred strains. We find an elevation in long-range LD and haplotype homozygosity compared to neutral expectations in the Zambian sample, thus showing the elevation in LD is not specific to the DGRP data set. This elevation in LD and haplotype structure remains even after controlling for possible confounders including genomic inversions, admixture, population substructure, close relatedness of individual strains, and recombination rate variation. Furthermore, signatures of partial soft sweeps similar to those found in the DGRP as well as partial hard sweeps are common in Zambia. These results suggest that while the selective forces and sources of adaptive mutations may differ in Zambia and Raleigh, elevated long-range LD and signatures of soft sweeps are generic in D. melanogaster.

    View details for DOI 10.1534/genetics.115.184002

    View details for Web of Science ID 000377462800022

    View details for PubMedID 27098909

    View details for PubMedCentralID PMC4896199

  • Viruses are a dominant driver of protein adaptation in mammals ELIFE Enard, D., Cai, L., Gwennap, C., Petrov, D. A. 2016; 5


    Viruses interact with hundreds to thousands of proteins in mammals, yet adaptation against viruses has only been studied in a few proteins specialized in antiviral defense. Whether adaptation to viruses typically involves only specialized antiviral proteins or affects a broad array of virus-interacting proteins is unknown. Here, we analyze adaptation in ~1300 virus-interacting proteins manually curated from a set of 9900 proteins conserved in all sequenced mammalian genomes. We show that viruses (i) use the more evolutionarily constrained proteins within the cellular functions they interact with and that (ii) despite this high constraint, virus-interacting proteins account for a high proportion of all protein adaptation in humans and other mammals. Adaptation is elevated in virus-interacting proteins across all functional categories, including both immune and non-immune functions. We conservatively estimate that viruses have driven close to 30% of all adaptive amino acid changes in the part of the human proteome conserved within mammals. Our results suggest that viruses are one of the most dominant drivers of evolutionary change across mammalian and human proteomes.

    View details for DOI 10.7554/eLife.12469

    View details for Web of Science ID 000376921100001

    View details for PubMedID 27187613

    View details for PubMedCentralID PMC4869911

  • Effects of maternal age on euploidy rates in a large cohort of embryos analyzed with 24-chromosome single-nucleotide polymorphism-based preimplantation genetic screening FERTILITY AND STERILITY Demko, Z. P., Simon, A. L., McCoy, R. C., Petrov, D. A., Rabinowitz, M. 2016; 105 (5): 1307-1313


    To determine the effect of maternal age on the average number of euploid embryos retrieved during oocyte harvest as part of an in vitro fertilization (IVF) cycle, including the probability of retrieving at least one euploid embryo in a cohort (PrE).Retrospective study.Preimplantation genetic screening (PGS) laboratory.Women aged 18 to 48 years undergoing IVF treatment.Use of 24-chromosome single-nucleotide polymorphism (SNP)-based PGS of day-3 and day-5 embryo biopsies.Relationships between maternal age and the rate of embryos that tested as euploid (hereafter referred to as "euploid embryos"), the average number and proportion of euploid embryos per IVF cycle, and PrE.We analyzed 22,599 day-3 embryos and 15,112 day-5 embryos. In women aged 27 to 35 years, the median proportion of euploid embryos in each cycle remained constant at ∼35% in day-3 biopsies and ∼55% in day-5 biopsies, but it decreased rapidly after age 35. On average, women in their late 20s had four euploid embryos (day 3 or day 5) per cycle, but this number decreased linearly (R(2) ≥ 0.983) after 35 years of age. The effect of maternal age on PrE was similar, with a rapid exponential decline (R(2) = 0.986). Across all maternal ages, the euploid proportion and number of embryos per cycle were counterbalanced, so the number of euploid embryos per cycle was the same for day-3 and day-5 biopsies. This suggests that the loss of embryos from day 3 to day 5 was primarily due to aneuploidy.Our results confirm the known inverse relationship between advanced maternal age (>35 years) and embryo euploidy, demonstrating that equal numbers of euploid embryos are available at day 3 and day 5.

    View details for DOI 10.1016/j.fertnstert.2016.01.025

    View details for Web of Science ID 000375871200040

    View details for PubMedID 26868992

  • Global Transcriptional Profiling of Diapause and Climatic Adaptation in Drosophila melanogaster. Molecular biology and evolution Zhao, X., Bergland, A. O., Behrman, E. L., Gregory, B. D., Petrov, D. A., Schmidt, P. S. 2016; 33 (3): 707-720


    Wild populations of the model organism Drosophila melanogaster experience highly heterogeneous environments over broad geographical ranges as well as over seasonal and annual timescales. Diapause is a primary adaptation to environmental heterogeneity, and in D. melanogaster the propensity to enter diapause varies predictably with latitude and season. Here we performed global transcriptomic profiling of naturally occurring variation in diapause expression elicited by short day photoperiod and moderately low temperature in two tissue types associated with neuroendocrine and endocrine signaling, heads, and ovaries. We show that diapause in D. melanogaster is an actively regulated phenotype at the transcriptional level, suggesting that diapause is not a simple physiological or reproductive quiescence. Differentially expressed genes and pathways are highly distinct in heads and ovaries, demonstrating that the diapause response is not uniform throughout the soma and suggesting that it may be comprised of functional modules associated with specific tissues. Genes downregulated in heads of diapausing flies are significantly enriched for clinally varying single nucleotide polymorphism (SNPs) and seasonally oscillating SNPs, consistent with the hypothesis that diapause is a driving phenotype of climatic adaptation. We also show that chromosome location-based coregulation of gene expression is present in the transcriptional regulation of diapause. Taken together, these results demonstrate that diapause is a complex phenotype actively regulated in multiple tissues, and support the hypothesis that natural variation in diapause propensity underlies adaptation to spatially and temporally varying selective pressures.

    View details for DOI 10.1093/molbev/msv263

    View details for PubMedID 26568616

  • Secondary contact and local adaptation contribute to genome-wide patterns of clinal variation in Drosophila melanogaster. Molecular ecology Bergland, A. O., Tobler, R., González, J., Schmidt, P., Petrov, D. 2016; 25 (5): 1157-1174


    Populations arrayed along broad latitudinal gradients often show patterns of clinal variation in phenotype and genotype. Such population differentiation can be generated and maintained by both historical demographic events and local adaptation. These evolutionary forces are not mutually exclusive and can in some cases produce nearly identical patterns of genetic differentiation among populations. Here, we investigate the evolutionary forces that generated and maintain clinal variation genome-wide among populations of Drosophila melanogaster sampled in North America and Australia. We contrast patterns of clinal variation in these continents with patterns of differentiation among ancestral European and African populations. Using established and novel methods we derive here, we show that recently derived North America and Australia populations were likely founded by both European and African lineages and that this hybridization event likely contributed to genome-wide patterns of parallel clinal variation between continents. The pervasive effects of admixture mean that differentiation at only several hundred loci can be attributed to the operation of spatially varying selection using an FST outlier approach. Our results provide novel insight into the well-studied system of clinal differentiation in D. melanogaster and provide a context for future studies seeking to identify loci contributing to local adaptation in a wide variety of organisms, including other invasive species as well as temperate endemics.

    View details for DOI 10.1111/mec.13455

    View details for PubMedID 26547394

  • Comparative population genomics of latitudinal variation in Drosophila simulans and Drosophila melanogaster. Molecular ecology Machado, H. E., Bergland, A. O., O'Brien, K. R., Behrman, E. L., Schmidt, P. S., Petrov, D. A. 2016; 25 (3): 723-740


    Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster.

    View details for DOI 10.1111/mec.13446

    View details for PubMedID 26523848

  • More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1. eLife Feder, A. F., Rhee, S., Holmes, S. P., Shafer, R. W., Petrov, D. A., Pennings, P. S. 2016; 5


    In the early days of HIV treatment, drug resistance occurred rapidly and predictably in all patients, but under modern treatments, resistance arises slowly, if at all. The probability of resistance should be controlled by the rate of generation of resistance mutations. If many adaptive mutations arise simultaneously, then adaptation proceeds by soft selective sweeps in which multiple adaptive mutations spread concomitantly, but if adaptive mutations occur rarely in the population, then a single adaptive mutation should spread alone in a hard selective sweep. Here, we use 6717 HIV-1 consensus sequences from patients treated with first-line therapies between 1989 and 2013 to confirm that the transition from fast to slow evolution of drug resistance was indeed accompanied with the expected transition from soft to hard selective sweeps. This suggests more generally that evolution proceeds via hard sweeps if resistance is unlikely and via soft sweeps if it is likely.

    View details for DOI 10.7554/eLife.10670

    View details for PubMedID 26882502

    View details for PubMedCentralID PMC4764592

  • Evidence of Selection against Complex Mitotic-Origin Aneuploidy during Preimplantation Development PLOS GENETICS McCoy, R. C., Demko, Z. P., Ryan, A., Banjevic, M., Hill, M., Sigurjonsson, S., Rabinowitz, M., Petrov, D. A. 2015; 11 (10)


    Whole-chromosome imbalances affect over half of early human embryos and are the leading cause of pregnancy loss. While these errors frequently arise in oocyte meiosis, many such whole-chromosome abnormalities affecting cleavage-stage embryos are the result of chromosome missegregation occurring during the initial mitotic cell divisions. The first wave of zygotic genome activation at the 4-8 cell stage results in the arrest of a large proportion of embryos, the vast majority of which contain whole-chromosome abnormalities. Thus, the full spectrum of meiotic and mitotic errors can only be detected by sampling after the initial cell divisions, but prior to this selective filter. Here, we apply 24-chromosome preimplantation genetic screening (PGS) to 28,052 single-cell day-3 blastomere biopsies and 18,387 multi-cell day-5 trophectoderm biopsies from 6,366 in vitro fertilization (IVF) cycles. We precisely characterize the rates and patterns of whole-chromosome abnormalities at each developmental stage and distinguish errors of meiotic and mitotic origin without embryo disaggregation, based on informative chromosomal signatures. We show that mitotic errors frequently involve multiple chromosome losses that are not biased toward maternal or paternal homologs. This outcome is characteristic of spindle abnormalities and chaotic cell division detected in previous studies. In contrast to meiotic errors, our data also show that mitotic errors are not significantly associated with maternal age. PGS patients referred due to previous IVF failure had elevated rates of mitotic error, while patients referred due to recurrent pregnancy loss had elevated rates of meiotic error, controlling for maternal age. These results support the conclusion that mitotic error is the predominant mechanism contributing to pregnancy losses occurring prior to blastocyst formation. This high-resolution view of the full spectrum of whole-chromosome abnormalities affecting early embryos provides insight into the cytogenetic mechanisms underlying their formation and the consequences for human fertility.

    View details for DOI 10.1371/journal.pgen.1005601

    View details for Web of Science ID 000364401600065

    View details for PubMedID 26491874

    View details for PubMedCentralID PMC4619652

  • Investigation of the prevalence of antagonistic pleiotropy Herissant, L., Yuan, D., Jerison, E., Agarwala, A., Fisher, D., Desai, M., Petrov, D., Sherlock, G. WILEY-BLACKWELL. 2015: S263–S264
  • Exploring the adaptive mutation spectrum in massively tagged populations of experimentally evolving yeast Dunn, B., Venkataram, S., Levy, S., Blundell, J., Herissant, L., Li, Y., Chang, J., Geiler-Samerotte, K., Agarwala, A., Fisher, D., Petrov, D., Sherlock, G. WILEY-BLACKWELL. 2015: S89
  • Quantification of GC-biased gene conversion in the human genome GENOME RESEARCH Glemin, S., Arndt, P. F., Messer, P. W., Petrov, D., Galtier, N., Duret, L. 2015; 25 (8): 1215-1228


    Much evidence indicates that GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, a detailed quantification of the process is still lacking. The strength of gBGC can be measured from the analysis of derived allele frequency spectra (DAF), but this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors and by spatial heterogeneity in gBGC strength. We propose a new general method to quantify gBGC from DAF spectra, incorporating polarization errors, taking spatial heterogeneity into account, and jointly estimating mutation bias. Applying it to human polymorphism data from the 1000 Genomes Project, we show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. Genome-wide, the intensity of gBGC is in the nearly neutral area. However, given that recombination occurs primarily within recombination hotspots, 1%-2% of the human genome is subject to strong gBGC. On average, gBGC is stronger in African than in non-African populations, reflecting differences in effective population sizes. However, due to more heterogeneous recombination landscapes, the fraction of the genome affected by strong gBGC is larger in non-African than in African populations. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that, in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.

    View details for DOI 10.1101/gr.185488.114

    View details for Web of Science ID 000358957500013

    View details for PubMedID 25995268

    View details for PubMedCentralID PMC4510005

  • Imperfect drug penetration leads to spatial monotherapy and rapid evolution of multidrug resistance PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Moreno-Gamez, S., Hill, A. L., Rosenbloom, D. I., Petrov, D. A., Nowak, M. A., Pennings, P. S. 2015; 112 (22): E2874-E2883


    Infections with rapidly evolving pathogens are often treated using combinations of drugs with different mechanisms of action. One of the major goal of combination therapy is to reduce the risk of drug resistance emerging during a patient's treatment. Although this strategy generally has significant benefits over monotherapy, it may also select for multidrug-resistant strains, particularly during long-term treatment for chronic infections. Infections with these strains present an important clinical and public health problem. Complicating this issue, for many antimicrobial treatment regimes, individual drugs have imperfect penetration throughout the body, so there may be regions where only one drug reaches an effective concentration. Here we propose that mismatched drug coverage can greatly speed up the evolution of multidrug resistance by allowing mutations to accumulate in a stepwise fashion. We develop a mathematical model of within-host pathogen evolution under spatially heterogeneous drug coverage and demonstrate that even very small single-drug compartments lead to dramatically higher resistance risk. We find that it is often better to use drug combinations with matched penetration profiles, although there may be a trade-off between preventing eventual treatment failure due to resistance in this way and temporarily reducing pathogen levels systemically. Our results show that drugs with the most extensive distribution are likely to be the most vulnerable to resistance. We conclude that optimal combination treatments should be designed to prevent this spatial effective monotherapy. These results are widely applicable to diverse microbial infections including viruses, bacteria, and parasites.

    View details for DOI 10.1073/pnas.1424184112

    View details for Web of Science ID 000355832200008

    View details for PubMedID 26038564

    View details for PubMedCentralID PMC4460514

  • Obstruction of adaptation in diploids by recessive, strongly deleterious alleles. Proceedings of the National Academy of Sciences of the United States of America Assaf, Z. J., Petrov, D. A., Blundell, J. R. 2015; 112 (20): E2658-66


    Recessive deleterious mutations are common, causing many genetic disorders in humans and producing inbreeding depression in the majority of sexually reproducing diploids. The abundance of recessive deleterious mutations in natural populations suggests they are likely to be present on a chromosome when a new adaptive mutation occurs, yet the dynamics of recessive deleterious hitchhikers and their impact on adaptation remains poorly understood. Here we model how a recessive deleterious mutation impacts the fate of a genetically linked dominant beneficial mutation. The frequency trajectory of the adaptive mutation in this case is dramatically altered and results in what we have termed a "staggered sweep." It is named for its three-phased trajectory: (i) Initially, the two linked mutations have a selective advantage while rare and will increase in frequency together, then (ii), at higher frequencies, the recessive hitchhiker is exposed to selection and can cause a balanced state via heterozygote advantage (the staggered phase), and (iii) finally, if recombination unlinks the two mutations, then the beneficial mutation can complete the sweep to fixation. Using both analytics and simulations, we show that strongly deleterious recessive mutations can substantially decrease the probability of fixation for nearby beneficial mutations, thus creating zones in the genome where adaptation is suppressed. These mutations can also significantly prolong the number of generations a beneficial mutation takes to sweep to fixation, and cause the genomic signature of selection to resemble that of soft or partial sweeps. We show that recessive deleterious variation could impact adaptation in humans and Drosophila.

    View details for DOI 10.1073/pnas.1424949112

    View details for PubMedID 25941393

    View details for PubMedCentralID PMC4443376

  • Common variants spanning PLK4 are associated with mitotic-origin aneuploidy in human embryos SCIENCE McCoy, R. C., Demko, Z., Ryan, A., Banjevic, M., Hill, M., Sigurjonsson, S., Rabinowitz, M., Fraser, H. B., Petrov, D. A. 2015; 348 (6231): 235-238


    Aneuploidy, the inheritance of an atypical chromosome complement, is common in early human development and is the primary cause of pregnancy loss. By screening day-3 embryos during in vitro fertilization cycles, we identified an association between aneuploidy of putative mitotic origin and linked genetic variants on chromosome 4 of maternal genomes. This associated region contains a candidate gene, Polo-like kinase 4 (PLK4), that plays a well-characterized role in centriole duplication and has the ability to alter mitotic fidelity upon minor dysregulation. Mothers with the high-risk genotypes contributed fewer embryos for testing at day 5, suggesting that their embryos are less likely to survive to blastocyst formation. The associated region coincides with a signature of a selective sweep in ancient humans, suggesting that the causal variant was either the target of selection or hitchhiked to substantial frequency.

    View details for DOI 10.1126/science.aaa3337

    View details for Web of Science ID 000352613700046

    View details for PubMedID 25859044

  • Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature Levy, S. F., Blundell, J. R., Venkataram, S., Petrov, D. A., Fisher, D. S., Sherlock, G. 2015; 519 (7542): 181-186


    Evolution of large asexual cell populations underlies ∼30% of deaths worldwide, including those caused by bacteria, fungi, parasites, and cancer. However, the dynamics underlying these evolutionary processes remain poorly understood because they involve many competing beneficial lineages, most of which never rise above extremely low frequencies in the population. To observe these normally hidden evolutionary dynamics, we constructed a sequencing-based ultra high-resolution lineage tracking system in Saccharomyces cerevisiae that allowed us to monitor the relative frequencies of ∼500,000 lineages simultaneously. In contrast to some expectations, we found that the spectrum of fitness effects of beneficial mutations is neither exponential nor monotonic. Early adaptation is a predictable consequence of this spectrum and is strikingly reproducible, but the initial small-effect mutations are soon outcompeted by rarer large-effect mutations that result in variability between replicates. These results suggest that early evolutionary dynamics may be deterministic for a period of time before stochastic effects become important.

    View details for DOI 10.1038/nature14279

    View details for PubMedID 25731169

  • T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic acids research Fiston-Lavier, A., Barrón, M. G., Petrov, D. A., González, J. 2015; 43 (4)


    Transposable elements (TEs) constitute the most active, diverse and ancient component in a broad range of genomes. Complete understanding of genome function and evolution cannot be achieved without a thorough understanding of TE impact and biology. However, in-depth analysis of TEs still represents a challenge due to the repetitive nature of these genomic entities. In this work, we present a broadly applicable and flexible tool: T-lex2. T-lex2 is the only available software that allows routine, automatic and accurate genotyping of individual TE insertions and estimation of their population frequencies both using individual strain and pooled next-generation sequencing data. Furthermore, T-lex2 also assesses the quality of the calls allowing the identification of miss-annotated TEs and providing the necessary information to re-annotate them. The flexible and customizable design of T-lex2 allows running it in any genome and for any type of TE insertion. Here, we tested the fidelity of T-lex2 using the fly and human genomes. Overall, T-lex2 represents a significant improvement in our ability to analyze the contribution of TEs to genome function and evolution as well as learning about the biology of TEs. T-lex2 is freely available online at

    View details for DOI 10.1093/nar/gku1250

    View details for PubMedID 25510498

    View details for PubMedCentralID PMC4344482

  • Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS genetics Garud, N. R., Messer, P. W., Buzbas, E. O., Petrov, D. A. 2015; 11 (2)


    Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.

    View details for DOI 10.1371/journal.pgen.1005004

    View details for PubMedID 25706129

    View details for PubMedCentralID PMC4338236

  • Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila PLOS GENETICS Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S., Petrov, D. A. 2014; 10 (11)


    In many species, genomic data have revealed pervasive adaptive evolution indicated by the fixation of beneficial alleles. However, when selection pressures are highly variable along a species' range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called "balanced polymorphisms" have long been understood to be an important component of standing genetic variation, yet direct evidence of the strength of balancing selection and the stability and prevalence of balanced polymorphisms has remained elusive. We hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that these polymorphisms respond to acute and persistent changes in climate and are associated in predictable ways with seasonally variable phenotypes. In addition, our results suggest that adaptively oscillating polymorphisms are likely millions of years old, with some possibly predating the divergence between D. melanogaster and D. simulans. Taken together, our results are consistent with a model of balancing selection wherein rapid temporal fluctuations in climate over generational time promotes adaptive genetic diversity at loci underlying polygenic variation in fitness related phenotypes.

    View details for DOI 10.1371/journal.pgen.1004775

    View details for Web of Science ID 000345455200026

    View details for PubMedCentralID PMC4222749

  • Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS genetics Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S., Petrov, D. A. 2014; 10 (11)


    In many species, genomic data have revealed pervasive adaptive evolution indicated by the fixation of beneficial alleles. However, when selection pressures are highly variable along a species' range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called "balanced polymorphisms" have long been understood to be an important component of standing genetic variation, yet direct evidence of the strength of balancing selection and the stability and prevalence of balanced polymorphisms has remained elusive. We hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that these polymorphisms respond to acute and persistent changes in climate and are associated in predictable ways with seasonally variable phenotypes. In addition, our results suggest that adaptively oscillating polymorphisms are likely millions of years old, with some possibly predating the divergence between D. melanogaster and D. simulans. Taken together, our results are consistent with a model of balancing selection wherein rapid temporal fluctuations in climate over generational time promotes adaptive genetic diversity at loci underlying polygenic variation in fitness related phenotypes.

    View details for DOI 10.1371/journal.pgen.1004775

    View details for PubMedID 25375361

    View details for PubMedCentralID PMC4222749

  • Soft Selective Sweeps in Complex Demographic Scenarios GENETICS Wilson, B. A., Petrov, D. A., Messer, P. W. 2014; 198 (2): 669-684


    Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such "hardening" of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.

    View details for DOI 10.1534/genetics.114.165571

    View details for Web of Science ID 000343885300027

    View details for PubMedCentralID PMC4266194

  • Soft selective sweeps in complex demographic scenarios. Genetics Wilson, B. A., Petrov, D. A., Messer, P. W. 2014; 198 (2): 669-684


    Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such "hardening" of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.

    View details for DOI 10.1534/genetics.114.165571

    View details for PubMedID 25060100

    View details for PubMedCentralID PMC4266194

  • Reply to Chen and Zhang: On interpreting genome-wide trends from yeast mutation accumulation data PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, Y. O., Siegal, M. L., Hall, D. W., Petrov, D. A. 2014; 111 (39): E4063

    View details for PubMedID 25217565

  • Precise estimates of mutation rate and spectrum in yeast PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, Y. O., Siegal, M. L., Hall, D. W., Petrov, D. A. 2014; 111 (22): E2310-E2318


    Mutation is the ultimate source of genetic variation. The most direct and unbiased method of studying spontaneous mutations is via mutation accumulation (MA) lines. Until recently, MA experiments were limited by the cost of sequencing and thus provided us with small numbers of mutational events and therefore imprecise estimates of rates and patterns of mutation. We used whole-genome sequencing to identify nearly 1,000 spontaneous mutation events accumulated over ∼311,000 generations in 145 diploid MA lines of the budding yeast Saccharomyces cerevisiae. MA experiments are usually assumed to have negligible levels of selection, but even mild selection will remove strongly deleterious events. We take advantage of such patterns of selection and show that mutation classes such as indels and aneuploidies (especially monosomies) are proportionately much more likely to contribute mutations of large effect. We also provide conservative estimates of indel, aneuploidy, environment-dependent dominant lethal, and recessive lethal mutation rates. To our knowledge, for the first time in yeast MA data, we identified a sufficiently large number of single-nucleotide mutations to measure context-dependent mutation rates and were able to (i) confirm strong AT bias of mutation in yeast driven by high rate of mutations from C/G to T/A and (ii) detect a higher rate of mutation at C/G nucleotides in two specific contexts consistent with cytosine methylation in S. cerevisiae.

    View details for DOI 10.1073/pnas.1323011111

    View details for Web of Science ID 000336687900012

    View details for PubMedID 24847077

    View details for PubMedCentralID PMC4050626

  • Genome-wide signals of positive selection in human evolution. Genome research Enard, D., Messer, P. W., Petrov, D. A. 2014; 24 (6): 885-895


    The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci, and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1000 Genomes Project and detect signatures of positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to argue that the observed signatures require a high rate of strongly adaptive substitutions near amino acid changes. We further demonstrate that the observed signatures of positive selection correlate better with the presence of regulatory sequences, as predicted by the ENCODE Project Consortium, than with the positions of amino acid substitutions. Our results suggest that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson that adaptive divergence is primarily driven by regulatory changes.

    View details for DOI 10.1101/gr.164822.113

    View details for PubMedID 24619126

    View details for PubMedCentralID PMC4032853

  • Comparative population genomics: power and principles for the inference of functionality TRENDS IN GENETICS Lawrie, D. S., Petrov, D. A. 2014; 30 (4): 133-139


    The availability of sequenced genomes from multiple related organisms allows the detection and localization of functional genomic elements based on the idea that such elements evolve more slowly than neutral sequences. Although such comparative genomics methods have proven useful in discovering functional elements and ascertaining levels of functional constraint in the genome as a whole, here we outline limitations intrinsic to this approach that cannot be overcome by sequencing more species. We argue that it is essential to supplement comparative genomics with ultra-deep sampling of populations from closely related species to enable substantially more powerful genomic scans for functional elements. The convergence of sequencing technology and population genetics theory has made such projects feasible and has exciting implications for functional genomics.

    View details for DOI 10.1016/j.tig.2014.02.002

    View details for Web of Science ID 000335426300003

    View details for PubMedID 24656563

  • Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population. Molecular ecology McCoy, R. C., Garud, N. R., Kelley, J. L., Boggs, C. L., Petrov, D. A. 2014; 23 (1): 136-150


    The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here, we present and evaluate a method of SNP discovery using RNA sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has as experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all-in-one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other nonmodel systems.

    View details for DOI 10.1111/mec.12591

    View details for PubMedID 24237665

  • Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements. PloS one McCoy, R. C., Taylor, R. W., Blauwkamp, T. A., Kelley, J. L., Kertesz, M., Pushkarev, D., Petrov, D. A., Fiston-Lavier, A. 2014; 9 (9)

    View details for DOI 10.1371/journal.pone.0106689

    View details for PubMedID 25188499

  • Population genomics of transposable elements in Drosophila. Annual review of genetics Barrón, M. G., Fiston-Lavier, A., Petrov, D. A., González, J. 2014; 48: 561-581


    Studies of the population dynamics of transposable elements (TEs) in Drosophila melanogaster indicate that consistent forces are affecting TEs independently of their modes of transposition and regulation. New sequencing technologies enable biologists to sample genomes at an unprecedented scale in order to quantify genome-wide polymorphism for annotated and novel TE insertions. In this review, we first present new insights gleaned from high-throughput data for population genomics studies of D. melanogaster. We then consider the latest population genomics models for TE evolution and present examples of functional evidence revealed by genome-wide studies of TE population dynamics in D. melanogaster. Although most of the TE insertions are deleterious or neutral, some TE insertions increase the fitness of the individual that carries them and play a role in genome adaptation.

    View details for DOI 10.1146/annurev-genet-120213-092359

    View details for PubMedID 25292358

  • Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PloS one McCoy, R. C., Taylor, R. W., Blauwkamp, T. A., Kelley, J. L., Kertesz, M., Pushkarev, D., Petrov, D. A., Fiston-Lavier, A. 2014; 9 (9)


    High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5-18.5 Kbp with an extremely low error rate ([Formula: see text]0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.

    View details for DOI 10.1371/journal.pone.0106689

    View details for PubMedID 25188499

    View details for PubMedCentralID PMC4154752

  • Population genomics of rapid adaptation by soft selective sweeps TRENDS IN ECOLOGY & EVOLUTION Messer, P. W., Petrov, D. A. 2013; 28 (22): 659-669


    Organisms can often adapt surprisingly quickly to evolutionary challenges, such as the application of pesticides or antibiotics, suggesting an abundant supply of adaptive genetic variation. In these situations, adaptation should commonly produce 'soft' selective sweeps, where multiple adaptive alleles sweep through the population at the same time, either because the alleles were already present as standing genetic variation or arose independently by recurrent de novo mutations. Most well-known examples of rapid molecular adaptation indeed show signatures of such soft selective sweeps. Here, we review the current understanding of the mechanisms that produce soft sweeps and the approaches used for their identification in population genomic data. We argue that soft sweeps might be the dominant mode of adaptation in many species.

    View details for DOI 10.1016/j.tree.2013.08.003

    View details for Web of Science ID 000326666200007

    View details for PubMedCentralID PMC3834262

  • Population genomics of rapid adaptation by soft selective sweeps. Trends in ecology & evolution Messer, P. W., Petrov, D. A. 2013; 28 (11): 659-69


    Organisms can often adapt surprisingly quickly to evolutionary challenges, such as the application of pesticides or antibiotics, suggesting an abundant supply of adaptive genetic variation. In these situations, adaptation should commonly produce 'soft' selective sweeps, where multiple adaptive alleles sweep through the population at the same time, either because the alleles were already present as standing genetic variation or arose independently by recurrent de novo mutations. Most well-known examples of rapid molecular adaptation indeed show signatures of such soft selective sweeps. Here, we review the current understanding of the mechanisms that produce soft sweeps and the approaches used for their identification in population genomic data. We argue that soft sweeps might be the dominant mode of adaptation in many species.

    View details for DOI 10.1016/j.tree.2013.08.003

    View details for PubMedID 24075201

    View details for PubMedCentralID PMC3834262

  • Host Species and Environmental Effects on Bacterial Communities Associated with Drosophila in the Laboratory and in the Natural Environment PLOS ONE Staubach, F., Baines, J. F., Kuenzel, S., Bik, E. M., Petrov, D. A. 2013; 8 (8)


    The fruit fly Drosophila is a classic model organism to study adaptation as well as the relationship between genetic variation and phenotypes. Although associated bacterial communities might be important for many aspects of Drosophila biology, knowledge about their diversity, composition, and factors shaping them is limited. We used 454-based sequencing of a variable region of the bacterial 16S ribosomal RNA gene to characterize the bacterial communities associated with wild and laboratory Drosophila isolates. In order to specifically investigate effects of food source and host species on bacterial communities, we analyzed samples from wild Drosophila melanogaster and D. simulans collected from a variety of natural substrates, as well as from adults and larvae of nine laboratory-reared Drosophila species. We find no evidence for host species effects in lab-reared flies; instead, lab of origin and stochastic effects, which could influence studies of Drosophila phenotypes, are pronounced. In contrast, the natural Drosophila-associated microbiota appears to be predominantly shaped by food substrate with an additional but smaller effect of host species identity. We identify a core member of this natural microbiota that belongs to the genus Gluconobacter and is common to all wild-caught flies in this study, but absent from the laboratory. This makes it a strong candidate for being part of what could be a natural D. melanogaster and D. simulans core microbiome. Furthermore, we were able to identify candidate pathogens in natural fly isolates.

    View details for DOI 10.1371/journal.pone.0070749

    View details for Web of Science ID 000323115800019

    View details for PubMedID 23967097

    View details for PubMedCentralID PMC3742674

  • Frequent adaptation and the McDonald-Kreitman test. Proceedings of the National Academy of Sciences of the United States of America Messer, P. W., Petrov, D. A. 2013; 110 (21): 8615-8620


    Population genomic studies have shown that genetic draft and background selection can profoundly affect the genome-wide patterns of molecular variation. We performed forward simulations under realistic gene-structure and selection scenarios to investigate whether such linkage effects impinge on the ability of the McDonald-Kreitman (MK) test to infer the rate of positive selection (α) from polymorphism and divergence data. We find that in the presence of slightly deleterious mutations, MK estimates of α severely underestimate the true rate of adaptation even if all polymorphisms with population frequencies under 50% are excluded. Furthermore, already under intermediate rates of adaptation, genetic draft substantially distorts the site frequency spectra at neutral and functional sites from the expectations under mutation-selection-drift balance. MK-type approaches that first infer demography from synonymous sites and then use the inferred demography to correct the estimation of α obtain almost the correct α in our simulations. However, these approaches typically infer a severe past population expansion although there was no such expansion in the simulations, casting doubt on the accuracy of methods that infer demography from synonymous polymorphism data. We propose a simple asymptotic extension of the MK test that yields accurate estimates of α in our simulations and should provide a fruitful direction for future studies.

    View details for DOI 10.1073/pnas.1220835110

    View details for PubMedID 23650353

    View details for PubMedCentralID PMC3666677

  • Strong purifying selection at synonymous sites in D. melanogaster. PLoS genetics Lawrie, D. S., Messer, P. W., Hershberg, R., Petrov, D. A. 2013; 9 (5)


    Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in Drosophila melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.

    View details for DOI 10.1371/journal.pgen.1003527

    View details for PubMedID 23737754

    View details for PubMedCentralID PMC3667748

  • Strong Purifying Selection at Synonymous Sites in D. melanogaster. PLoS genetics Lawrie, D. S., Messer, P. W., Hershberg, R., Petrov, D. A. 2013; 9 (5)

    View details for DOI 10.1371/journal.pgen.1003527

    View details for PubMedID 23737754

  • Evaluating methods of demographic inference and testing for balancing selection using genomic data from the checkerspot butterfly Euphydryas gillettii Annual Meeting of the Society-for-Integrative-and-Comparative-Biology (SICB) Mccoy, R. C., Boggs, C. B., Petrov, D. A. OXFORD UNIV PRESS INC. 2013: E329–E329
  • Host species and environmental effects on bacterial communities associated with Drosophila in the laboratory and in the natural environment. PloS one Staubach, F., Baines, J. F., Künzel, S., Bik, E. M., Petrov, D. A. 2013; 8 (8)


    The fruit fly Drosophila is a classic model organism to study adaptation as well as the relationship between genetic variation and phenotypes. Although associated bacterial communities might be important for many aspects of Drosophila biology, knowledge about their diversity, composition, and factors shaping them is limited. We used 454-based sequencing of a variable region of the bacterial 16S ribosomal RNA gene to characterize the bacterial communities associated with wild and laboratory Drosophila isolates. In order to specifically investigate effects of food source and host species on bacterial communities, we analyzed samples from wild Drosophila melanogaster and D. simulans collected from a variety of natural substrates, as well as from adults and larvae of nine laboratory-reared Drosophila species. We find no evidence for host species effects in lab-reared flies; instead, lab of origin and stochastic effects, which could influence studies of Drosophila phenotypes, are pronounced. In contrast, the natural Drosophila-associated microbiota appears to be predominantly shaped by food substrate with an additional but smaller effect of host species identity. We identify a core member of this natural microbiota that belongs to the genus Gluconobacter and is common to all wild-caught flies in this study, but absent from the laboratory. This makes it a strong candidate for being part of what could be a natural D. melanogaster and D. simulans core microbiome. Furthermore, we were able to identify candidate pathogens in natural fly isolates.

    View details for DOI 10.1371/journal.pone.0070749

    View details for PubMedID 23967097

    View details for PubMedCentralID PMC3742674

  • Evolutionary Biology for the 21st Century PLOS BIOLOGY Losos, J. B., Arnold, S. J., Bejerano, G., Brodie, E. D., Hibbett, D., Hoekstra, H. E., Mindell, D. P., Monteiro, A., Moritz, C., Orr, H. A., Petrov, D. A., Renner, S. S., Ricklefs, R. E., Soltis, P. S., Turner, T. L. 2013; 11 (1)

    View details for DOI 10.1371/journal.pbio.1001466

    View details for Web of Science ID 000314648700006

    View details for PubMedID 23319892

    View details for PubMedCentralID PMC3539946

  • Evolutionary biology for the 21st century. PLoS biology Losos, J. B., Arnold, S. J., Bejerano, G., Brodie, E. D., Hibbett, D., Hoekstra, H. E., Mindell, D. P., Monteiro, A., Moritz, C., Orr, H. A., Petrov, D. A., Renner, S. S., Ricklefs, R. E., Soltis, P. S., Turner, T. L. 2013; 11 (1)

    View details for DOI 10.1371/journal.pbio.1001466

    View details for PubMedID 23319892

    View details for PubMedCentralID PMC3539946

  • On the Limitations of Using Ribosomal Genes as References for the Study of Codon Usage: A Rebuttal PLOS ONE Hershberg, R., Petrov, D. A. 2012; 7 (12)


    In a recent paper published in PLOS ONE, Wang et al. challenge our finding that the identity of optimal codons in different genomes follows a set of clear rules. Here we provide a rebuttal of their paper and demonstrate that the results of our original PLOS Genetics paper stand. This provides us with an opportunity to bring up an aspect of how codon usage has been studied that should be of general interest. The Wang et al. study, as well as many other studies, used ribosomal genes as a reference set for the study of patterns of codon usage. We discuss here the assumptions that are made in order to justify using ribosomal genes to study codon bias, suggest that this practice can at times be problematic, and discuss its limitations.

    View details for DOI 10.1371/journal.pone.0049060

    View details for Web of Science ID 000312794500008

    View details for PubMedID 23284622

    View details for PubMedCentralID PMC3527481

  • LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data PLOS ONE Feder, A. F., Petrov, D. A., Bergland, A. O. 2012; 7 (11)


    High-throughput pooled resequencing offers significant potential for whole genome population sequencing. However, its main drawback is the loss of haplotype information. In order to regain some of this information, we present LDx, a computational tool for estimating linkage disequilibrium (LD) from pooled resequencing data. LDx uses an approximate maximum likelihood approach to estimate LD (r(2)) between pairs of SNPs that can be observed within and among single reads. LDx also reports r(2) estimates derived solely from observed genotype counts. We demonstrate that the LDx estimates are highly correlated with r(2) estimated from individually resequenced strains. We discuss the performance of LDx using more stringent quality conditions and infer via simulation the degree to which performance can improve based on read depth. Finally we demonstrate two possible uses of LDx with real and simulated pooled resequencing data. First, we use LDx to infer genomewide patterns of decay of LD with physical distance in D. melanogaster population resequencing data. Second, we demonstrate that r(2) estimates from LDx are capable of distinguishing alternative demographic models representing plausible demographic histories of D. melanogaster.

    View details for DOI 10.1371/journal.pone.0048588

    View details for Web of Science ID 000312272600012

    View details for PubMedID 23152785

    View details for PubMedCentralID PMC3494690

  • Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (Mus musculus) PLOS GENETICS Staubach, F., Lorenc, A., Messer, P. W., Tang, K., Petrov, D. A., Tautz, D. 2012; 8 (8)


    General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus) is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP) typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1), homologues of human genes involved in adaptations (e.g. alpha-amylase genes) or in genetic diseases (e.g. Huntingtin and Parkin). Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice may also have a natural origin.

    View details for DOI 10.1371/journal.pgen.1002891

    View details for Web of Science ID 000308529300048

    View details for PubMedID 22956910

    View details for PubMedCentralID PMC3431316

  • Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster PLOS ONE Zhu, Y., Bergland, A. O., Gonzalez, J., Petrov, D. A. 2012; 7 (7)


    The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using "pooled" data and compared them with "true" frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.

    View details for DOI 10.1371/journal.pone.0041901

    View details for Web of Science ID 000309240600056

    View details for PubMedID 22848651

    View details for PubMedCentralID PMC3406057

  • Origins and rates of aneuploidy in human blastomeres FERTILITY AND STERILITY Rabinowitz, M., Ryan, A., Gemelos, G., Hill, M., Baner, J., Cinnioglu, C., Banjevic, M., Potter, D., Petrov, D. A., Demko, Z. 2012; 97 (2): 395-401


    To characterize chromosomal error types and parental origin of aneuploidy in cleavage-stage embryos using an informatics-based technique that enables the elucidation of aneuploidy-causing mechanisms.Analysis of blastomeres biopsied from cleavage-stage embryos for preimplantation genetic screening during IVF.Laboratory.Couples undergoing IVF treatment.Two hundred seventy-four blastomeres were subjected to array-based genotyping and informatics-based techniques to characterize chromosomal error types and parental origin of aneuploidy across all 24 chromosomes.Chromosomal error types (monosomy vs. trisomy; mitotic vs. meiotic) and parental origin (maternal vs. paternal).The rate of maternal meiotic trisomy rose significantly with age, whereas other types of trisomy showed no correlation with age. Trisomies were mostly maternal in origin, whereas paternal and maternal monosomies were roughly equal in frequency. No examples of paternal meiotic trisomy were observed. Segmental error rates were found to be independent of maternal age.All types of aneuploidy that rose with increasing maternal age can be attributed to disjunction errors during meiosis of the oocyte. Chromosome gains were predominantly maternal in origin and occurred during meiosis, whereas chromosome losses were not biased in terms of parental origin of the chromosome. The ability to determine the parental origin for each chromosome, as well as being able to detect whether multiple homologs from a single parent were present, allowed greater insights into the origin of aneuploidy.

    View details for DOI 10.1016/j.fertnstert.2011.11.034

    View details for Web of Science ID 000299961800028

    View details for PubMedID 22195772

  • Evolution of genome content: population dynamics of transposable elements in flies and humans. Methods in molecular biology (Clifton, N.J.) González, J., Petrov, D. A. 2012; 855: 361-383


    Recent research is starting to shed light on the factors that influence the population and evolutionary dynamics of transposable elements (TEs) and TE life cycles. Genomes differ sharply in the number of TE copies, in the level of TE activity, in the diversity of TE families and types, and in the proportion of old and young TEs. In this chapter, we focus on two well-studied genomes with strikingly different architectures, humans and Drosophila, which represent two extremes in terms of TE diversity and population dynamics. We argue that some of the answers might lie in (1) the larger population size and consequently more effective selection against new TE insertions due to ectopic recombination in flies compared to humans; and (2) in the faster rate of DNA loss in flies compared to humans leading to much faster removal of fixed TE copies from the fly genome.

    View details for DOI 10.1007/978-1-61779-582-4_13

    View details for PubMedID 22407716

  • Heterozygote advantage as a natural consequence of adaptation in diploids PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Sellis, D., Callahan, B. J., Petrov, D. A., Messer, P. W. 2011; 108 (51): 20666-20671


    Molecular adaptation is typically assumed to proceed by sequential fixation of beneficial mutations. In diploids, this picture presupposes that for most adaptive mutations, the homozygotes have a higher fitness than the heterozygotes. Here, we show that contrary to this expectation, a substantial proportion of adaptive mutations should display heterozygote advantage. This feature of adaptation in diploids emerges naturally from the primary importance of the fitness of heterozygotes for the invasion of new adaptive mutations. We formalize this result in the framework of Fisher's influential geometric model of adaptation. We find that in diploids, adaptation should often proceed through a succession of short-lived balanced states that maintain substantially higher levels of phenotypic and fitness variation in the population compared with classic adaptive walks. In fast-changing environments, this variation produces a diversity advantage that allows diploids to remain better adapted compared with haploids despite the disadvantage associated with the presence of unfit homozygotes. The short-lived balanced states arising during adaptive walks should be mostly invisible to current scans for long-term balancing selection. Instead, they should leave signatures of incomplete selective sweeps, which do appear to be common in many species. Our results also raise the possibility that balancing selection, as a natural consequence of frequent adaptation, might play a more prominent role among the forces maintaining genetic variation than is commonly recognized.

    View details for DOI 10.1073/pnas.1114573108

    View details for Web of Science ID 000298289400081

    View details for PubMedID 22143780

    View details for PubMedCentralID PMC3251125

  • High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes GENOME RESEARCH Markova-Raina, P., Petrov, D. 2011; 21 (6): 863-874


    We investigate the effect of aligner choice on inferences of positive selection using site-specific models of molecular evolution. We find that independently of the choice of aligner, the rate of false positives is unacceptably high. Our study is a whole-genome analysis of all protein-coding genes in 12 Drosophila genomes annotated in either all 12 species (~6690 genes) or in the six melanogaster group species. We compare six popular aligners: PRANK, T-Coffee, ClustalW, ProbCons, AMAP, and MUSCLE, and find that the aligner choice strongly influences the estimates of positive selection. Differences persist when we use (1) different stringency cutoffs, (2) different selection inference models, (3) alignments with or without gaps, and/or additional masking, (4) per-site versus per-gene statistics, (5) closely related melanogaster group species versus more distant 12 Drosophila genomes. Furthermore, we find that these differences are consequential for downstream analyses such as determination of over/under-represented GO terms associated with positive selection. Visual analysis indicates that most sites inferred as positively selected are, in fact, misaligned at the codon level, resulting in false positive rates of 48%-82%. PRANK, which has been reported to outperform other aligners in simulations, performed best in our empirical study as well. Unfortunately, PRANK still had a high, and unacceptable for most applications, false positives rate of 50%-55%. We identify misannotations and indels, many of which appear to be located in disordered protein regions, as primary culprits for the high misalignment-related error levels and discuss possible workaround approaches to this apparently pervasive problem in genome-wide evolutionary analyses.

    View details for DOI 10.1101/gr.115949.110

    View details for Web of Science ID 000291153400006

    View details for PubMedID 21393387

    View details for PubMedCentralID PMC3106319

  • Population Genomics of Transposable Elements in Drosophila melanogaster MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Fiston-Lavier, A., Lipatov, M., Lenkov, K., Gonzalez, J. 2011; 28 (5): 1633-1644


    Transposable elements (TEs) are the primary contributors to the genome bulk in many organisms and are major players in genome evolution. A clear and thorough understanding of the population dynamics of TEs is therefore essential for full comprehension of the eukaryotic genome evolution and function. Although TEs in Drosophila melanogaster have received much attention, population dynamics of most TE families in this species remains entirely unexplored. It is not clear whether the same population processes can account for the population behaviors of all TEs in Drosophila or whether, as has been suggested previously, different orders behave according to very different rules. In this work, we analyzed population frequencies for a large number of individual TEs (755 TEs) in five North American and one sub-Saharan African D. melanogaster populations (75 strains in total). These TEs have been annotated in the reference D. melanogaster euchromatic genome and have been sampled from all three major orders (non-LTR, LTR, and TIR) and from all families with more than 20 TE copies (55 families in total). We find strong evidence that TEs in Drosophila across all orders and families are subject to purifying selection at the level of ectopic recombination. We showed that strength of this selection varies predictably with recombination rate, length of individual TEs, and copy number and length of other TEs in the same family. Importantly, these rules do not appear to vary across orders. Finally, we built a statistical model that considered only individual TE-level (such as the TE length) and family-level properties (such as the copy number) and were able to explain more than 40% of the variation in TE frequencies in D. melanogaster.

    View details for DOI 10.1093/molbev/msq337

    View details for Web of Science ID 000289841500011

    View details for PubMedID 21172826

    View details for PubMedCentralID PMC3080135

  • T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data NUCLEIC ACIDS RESEARCH Fiston-Lavier, A., Carrigan, M., Petrov, D. A., Gonzalez, J. 2011; 39 (6)


    Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of 'no data' calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data.

    View details for DOI 10.1093/nar/gkq1291

    View details for Web of Science ID 000289166400004

    View details for PubMedID 21177644

    View details for PubMedCentralID PMC3064797

  • Faster than Neutral Evolution of Constrained Sequences: The Complex Interplay of Mutational Biases and Weak Selection GENOME BIOLOGY AND EVOLUTION Lawrie, D. S., Petrov, D. A., Messer, P. W. 2011; 3: 383-395


    Comparative genomics has become widely accepted as the major framework for the ascertainment of functionally important regions in genomes. The underlying paradigm of this approach is that most of the functional regions are assumed to be under selective constraint, which in turn reduces the rate of evolution relative to neutrality. This assumption allows detection of functional regions through sequence conservation. However, constraint does not always lead to sequence conservation. When purifying selection is weak and mutation is biased, constrained regions can even evolve faster than neutral sequences and thus can appear to be under positive selection. Moreover, conservation estimates depend also on the orientation of selection relative to mutational biases and can vary over time. In the light of recent data of the ubiquity of mutational biases and weak selective forces, these effects should reduce the power of conservation analyses to define functional regions using comparative genomics data. We argue that the estimation of true mutational biases and the use of explicit evolutionary models are essential to improve methods inferring the action of natural selection and functionality in genome sequences.

    View details for DOI 10.1093/gbe/evr032

    View details for Web of Science ID 000295693200004

    View details for PubMedID 21498884

    View details for PubMedCentralID PMC3101017

  • Drosophila melanogaster recombination rate calculator GENE Fiston-Lavier, A., Singh, N. D., Lipatov, M., Petrov, D. A. 2010; 463 (1-2): 18-20


    Recombination rate is a key evolutionary parameter that determines the degree to which sites are linked. Estimating recombination rates is thus of crucial importance for population genetic and molecular evolutionary studies. We present here a user-friendly web-based tool that can be used to retrieve recombination rate estimates for single and/or multiple loci in the Drosophila melanogaster genome given a user-defined choice of the genome release. We used the Marey map approach that is based on comparing the genetic and physical maps to infer recombination rates along the major chromosomes of the D.melanogaster genome. Our implementation of this approach is based on building third-order polynomials which are used to interpolate recombination rates at all points on the chromosome except for telomeric and centromeric regions in which such polynomials are known to provide particularly poor estimation.

    View details for DOI 10.1016/j.gene.2010.04.015

    View details for Web of Science ID 000280751600003

    View details for PubMedID 20452408

  • Evidence That Mutation Is Universally Biased towards AT in Bacteria PLOS GENETICS Hershberg, R., Petrov, D. A. 2010; 6 (9)


    Mutation is the engine that drives evolution and adaptation forward in that it generates the variation on which natural selection acts. Mutation is a random process that nevertheless occurs according to certain biases. Elucidating mutational biases and the way they vary across species and within genomes is crucial to understanding evolution and adaptation. Here we demonstrate that clonal pathogens that evolve under severely relaxed selection are uniquely suitable for studying mutational biases in bacteria. We estimate mutational patterns using sequence datasets from five such clonal pathogens belonging to four diverse bacterial clades that span most of the range of genomic nucleotide content. We demonstrate that across different types of sites and in all four clades mutation is consistently biased towards AT. This is true even in clades that have high genomic GC content. In all studied cases the mutational bias towards AT is primarily due to the high rate of C/G to T/A transitions. These results suggest that bacterial mutational biases are far less variable than previously thought. They further demonstrate that variation in nucleotide content cannot stem entirely from variation in mutational biases and that natural selection and/or a natural selection-like process such as biased gene conversion strongly affect nucleotide content.

    View details for DOI 10.1371/journal.pgen.1001115

    View details for Web of Science ID 000282369200053

    View details for PubMedID 20838599

    View details for PubMedCentralID PMC2936535

  • Evidence that Adaptation in Drosophila Is Not Limited by Mutation at Single Sites PLOS GENETICS Karasov, T., Messer, P. W., Petrov, D. A. 2010; 6 (6)


    Adaptation in eukaryotes is generally assumed to be mutation-limited because of small effective population sizes. This view is difficult to reconcile, however, with the observation that adaptation to anthropogenic changes, such as the introduction of pesticides, can occur very rapidly. Here we investigate adaptation at a key insecticide resistance locus (Ace) in Drosophila melanogaster and show that multiple simple and complex resistance alleles evolved quickly and repeatedly within individual populations. Our results imply that the current effective population size of modern D. melanogaster populations is likely to be substantially larger (> or = 100-fold) than commonly believed. This discrepancy arises because estimates of the effective population size are generally derived from levels of standing variation and thus reveal long-term population dynamics dominated by sharp--even if infrequent--bottlenecks. The short-term effective population sizes relevant for strong adaptation, on the other hand, might be much closer to census population sizes. Adaptation in Drosophila may therefore not be limited by waiting for mutations at single sites, and complex adaptive alleles can be generated quickly without fixation of intermediate states. Adaptive events should also commonly involve the simultaneous rise in frequency of independently generated adaptive mutations. These so-called soft sweeps have very distinct effects on the linked neutral polymorphisms compared to the standard hard sweeps in mutation-limited scenarios. Methods for the mapping of adaptive mutations or association mapping of evolutionarily relevant mutations may thus need to be reconsidered.

    View details for DOI 10.1371/journal.pgen.1000924

    View details for Web of Science ID 000279805200003

    View details for PubMedID 20585551

    View details for PubMedCentralID PMC2887467

  • Genome-Wide Patterns of Adaptation to Temperate Environments Associated with Transposable Elements in Drosophila PLOS GENETICS Gonzalez, J., Karasov, T. L., Messer, P. W., Petrov, D. A. 2010; 6 (4)


    Investigating spatial patterns of loci under selection can give insight into how populations evolved in response to selective pressures and can provide monitoring tools for detecting the impact of environmental changes on populations. Drosophila is a particularly good model to study adaptation to environmental heterogeneity since it is a tropical species that originated in sub-Saharan Africa and has only recently colonized the rest of the world. There is strong evidence for the adaptive role of Transposable Elements (TEs) in the evolution of Drosophila, and TEs might play an important role specifically in adaptation to temperate climates. In this work, we analyzed the frequency of a set of putatively adaptive and putatively neutral TEs in populations with contrasting climates that were collected near the endpoints of two known latitudinal clines in Australia and North America. The contrasting results obtained for putatively adaptive and putatively neutral TEs and the consistency of the patterns between continents strongly suggest that putatively adaptive TEs are involved in adaptation to temperate climates. We integrated information on population behavior, possible environmental selective agents, and both molecular and functional information of the TEs and their nearby genes to infer the plausible phenotypic consequences of these insertions. We conclude that adaptation to temperate environments is widespread in Drosophila and that TEs play a significant role in this adaptation. It is remarkable that such a diverse set of TEs located next to a diverse set of genes are consistently adaptive to temperate climate-related factors. We argue that reverse population genomic analyses, as the one described in this work, are necessary to arrive at a comprehensive picture of adaptation.

    View details for DOI 10.1371/journal.pgen.1000905

    View details for Web of Science ID 000277354200022

    View details for PubMedID 20386746

    View details for PubMedCentralID PMC2851572

  • Adaptive Evolution of Pelvic Reduction in Sticklebacks by Recurrent Deletion of a Pitx1 Enhancer SCIENCE Chan, Y. F., Marks, M. E., Jones, F. C., Villarreal, G., Shapiro, M. D., Brady, S. D., Southwick, A. M., Absher, D. M., Grimwood, J., Schmutz, J., Myers, R. M., Petrov, D., Jonsson, B., Schluter, D., Bell, M. A., Kingsley, D. M. 2010; 327 (5963): 302-305


    The molecular mechanisms underlying major phenotypic changes that have evolved repeatedly in nature are generally unknown. Pelvic loss in different natural populations of threespine stickleback fish has occurred through regulatory mutations deleting a tissue-specific enhancer of the Pituitary homeobox transcription factor 1 (Pitx1) gene. The high prevalence of deletion mutations at Pitx1 may be influenced by inherent structural features of the locus. Although Pitx1 null mutations are lethal in laboratory animals, Pitx1 regulatory mutations show molecular signatures of positive selection in pelvic-reduced populations. These studies illustrate how major expression and morphological changes can arise from single mutational leaps in natural populations, producing new adaptive alleles via recurrent regulatory alterations in a key developmental control gene.

    View details for DOI 10.1126/science.1182213

    View details for Web of Science ID 000273629700034

    View details for PubMedID 20007865

    View details for PubMedCentralID PMC3109066

  • Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes GENOME BIOLOGY AND EVOLUTION Cai, J. J., Petrov, D. A. 2010; 2: 393-409


    Genes in the same organism vary in the time since their evolutionary origin. Without horizontal gene transfer, young genes are necessarily restricted to a few closely related species, whereas old genes can be broadly distributed across the phylogeny. It has been shown that young genes evolve faster than old genes; however, the evolutionary forces responsible for this pattern remain obscure. Here, we classify human-chimp protein-coding genes into different age classes, according to the breath of their phylogenetic distribution. We estimate the strength of purifying selection and the rate of adaptive selection for genes in different age classes. We find that older genes carry fewer and less frequent nonsynonymous single-nucleotide polymorphisms than younger genes suggesting that older genes experience a stronger purifying selection at the protein-coding level. We infer the distribution of fitness effects of new deleterious mutations and find that older genes have proportionally more slightly deleterious mutations and fewer nearly neutral mutations than younger genes. To investigate the role of adaptive selection of genes in different age classes, we determine the selection coefficient (gamma = 2N(e)s) of genes using the MKPRF approach and estimate the ratio of the rate of adaptive nonsynonymous substitution to synonymous substitution (omega(A)) using the DoFE method. Although the proportion of positively selected genes (gamma > 0) is significantly higher in younger genes, we find no correlation between omega(A) and gene age. Collectively, these results provide strong evidence that younger genes are subject to weaker purifying selection and more tenuous evidence that they also undergo adaptive evolution more frequently.

    View details for DOI 10.1093/gbe/evq019

    View details for Web of Science ID 000280480000035

    View details for PubMedID 20624743

    View details for PubMedCentralID PMC2997544

  • Broker Genes in Human Disease GENOME BIOLOGY AND EVOLUTION Cai, J. J., Borenstein, E., Petrov, D. A. 2010; 2: 815-825


    Genes that underlie human disease are important subjects of systems biology research. In the present study, we demonstrate that Mendelian and complex disease genes have distinct and consistent protein-protein interaction (PPI) properties. We show that five different network properties can be reduced to two independent metrics when applied to the human PPI network. These two metrics largely coincide with the degree (number of connections) and the clustering coefficient (the number of connections among the neighbors of a particular protein). We demonstrate that disease genes have simultaneously unusually high degree and unusually low clustering coefficient. Such genes can be described as brokers in that they connect many proteins that would not be connected otherwise. We show that these results are robust to the effect of gene age and inspection bias variation. Notably, genes identified in genome-wide association study (GWAS) have network patterns that are almost indistinguishable from the network patterns of nondisease genes and significantly different from the network patterns of complex disease genes identified through non-GWAS means. This suggests either that GWAS focused on a distinct set of diseases associated with an unusual set of genes or that mapping of GWAS-identified single nucleotide polymorphisms onto the causally affected neighboring genes is error prone.

    View details for DOI 10.1093/gbe/evq064

    View details for Web of Science ID 000291467300023

    View details for PubMedID 20937604

    View details for PubMedCentralID PMC2988523

  • Time for DNA Disclosure SCIENCE Krane, D. E., Bahn, V., Balding, D., Barlow, B., Cash, H., Desportes, B. L., D'Eustachio, P., Devlin, K., Doom, T. E., Dror, I., Ford, S., Funk, C., Gilder, J., Hampikian, G., Inman, K., Jamieson, A., KENT, P. E., Koppl, R., Kornfield, I., Krimsky, S., Mnookin, J., Mueller, L., Murphy, E., Paoletti, D. R., Petrov, D. A., Raymer, M., Risinger, D. M., Roth, A., Rudin, N., Shields, W., Siegel, J. A., Slatkin, M., Song, Y. S., Speed, T., Spiegelman, C., Sullivan, P., Swienton, A. R., Tarpey, T., Thompson, W. C., Ungvarsky, E., ZABELL, S. 2009; 326 (5960): 1631-1632

    View details for Web of Science ID 000272839000027

    View details for PubMedID 20019271

  • The adaptive role of transposable elements in the Drosophila genome GENE Gonzalez, J., Petrov, D. A. 2009; 448 (2): 124-133


    Transposable elements (TEs) are short DNA sequences with the capacity to move between different sites in the genome. This ability provides them with the capacity to mutate the genome in many different ways, from subtle regulatory mutations to gross genomic rearrangements. The potential adaptive significance of TEs was recognized by those involved in their initial discovery although it was hotly debated afterwards. For more than two decades, TEs were considered to be intragenomic parasites leading to almost exclusively detrimental effects to the host genome. The sequencing of the Drosophila melanogaster genome provided an unprecedented opportunity to study TEs and led to the identification of the first TE-induced adaptations in this species. These studies were followed by a systematic genome-wide search for adaptive insertions that allowed for the first time to infer that TEs contribute substantially to adaptive evolution. This study also revealed that there are at least twice as many TE-induced adaptations that remain to be identified. To gain a better understanding of the adaptive role of TEs in the genome we clearly need to (i) identify as many adaptive TEs as possible in a range of Drosophila species as well as (ii) carry out in-depth investigations of the effects of adaptive TEs on as many phenotypes as possible.

    View details for DOI 10.1016/j.gene.2009.06.008

    View details for Web of Science ID 000271972200004

    View details for PubMedID 19555747

    View details for PubMedCentralID PMC2784284

  • MITEs-The Ultimate Parasites SCIENCE Gonzalez, J., Petrov, D. 2009; 325 (5946): 1352-1353

    View details for DOI 10.1126/science.1179556

    View details for Web of Science ID 000269699100025

    View details for PubMedID 19745141

  • A Recent Adaptive Transposable Element Insertion Near Highly Conserved Developmental Loci in Drosophila melanogaster MOLECULAR BIOLOGY AND EVOLUTION Gonzalez, J., Macpherson, J. M., Petrov, D. A. 2009; 26 (9): 1949-1961


    A recent genomewide screen identified 13 transposable elements that are likely to have been adaptive during or after the spread of Drosophila melanogaster out of Africa. One of these insertions, Bari-Juvenile hormone epoxy hydrolase (Bari-Jheh), was associated with the selective sweep of its flanking neutral variation and with reduction of expression of one of its neighboring genes: Jheh3. Here, we provide further evidence that Bari-Jheh insertion is adaptive. We delimit the extent of the selective sweep and show that Bari-Jheh is the only mutation linked to the sweep. Bari-Jheh also lowers the expression of its other flanking gene, Jheh2. Subtle consequences of Bari-Jheh insertion on life-history traits are consistent with the effects of reduced expression of the Jheh genes. Finally, we analyze molecular evolution of Jheh genes in both the long- and the short-term and conclude that Bari-Jheh appears to be a very rare adaptive event in the history of these genes. We discuss the implications of these findings for the detection and understanding of adaptation.

    View details for DOI 10.1093/molbev/msp107

    View details for Web of Science ID 000269001500003

    View details for PubMedID 19458110

    View details for PubMedCentralID PMC2734154

  • From trait to base pairs: Parallel evolution of pelvic reduction in three-spined sticklebacks occurs by repeated deletion of a tissue-specific pelvic enhancer at Pitx1 Chan, Y., Villarreal, G., Marks, M., Shapiro, M., Jones, F., Petrov, D., Dickson, M., Southwick, A., Absher, D., Grimwood, J., Schmutz, J., Myers, R., Jnsson, B., Schluter, D., Bell, M., Kingsley, D. ELSEVIER SCIENCE BV. 2009: S14–S15
  • General Rules for Optimal Codon Choice PLOS GENETICS Hershberg, R., Petrov, D. A. 2009; 5 (7)


    Different synonymous codons are favored by natural selection for translation efficiency and accuracy in different organisms. The rules governing the identities of favored codons in different organisms remain obscure. In fact, it is not known whether such rules exist or whether favored codons are chosen randomly in evolution in a process akin to a series of frozen accidents. Here, we study this question by identifying for the first time the favored codons in 675 bacteria, 52 archea, and 10 fungi. We use a number of tests to show that the identified codons are indeed likely to be favored and find that across all studied organisms the identity of favored codons tracks the GC content of the genomes. Once the effect of the genomic GC content on selectively favored codon choice is taken into account, additional universal amino acid specific rules governing the identity of favored codons become apparent. Our results provide for the first time a clear set of rules governing the evolution of selectively favored codon usage. Based on these results, we describe a putative scenario for how evolutionary shifts in the identity of selectively favored codons can occur without even temporary weakening of natural selection for codon bias.

    View details for DOI 10.1371/journal.pgen.1000556

    View details for Web of Science ID 000269219500033

    View details for PubMedID 19593368

    View details for PubMedCentralID PMC2700274

  • Pervasive Natural Selection in the Drosophila Genome? PLOS GENETICS Sella, G., Petrov, D. A., Przeworski, M., Andolfatto, P. 2009; 5 (6)


    Over the past four decades, the predominant view of molecular evolution saw little connection between natural selection and genome evolution, assuming that the functionally constrained fraction of the genome is relatively small and that adaptation is sufficiently infrequent to play little role in shaping patterns of variation within and even between species. Recent evidence from Drosophila, reviewed here, suggests that this view may be invalid. Analyses of genetic variation within and between species reveal that much of the Drosophila genome is under purifying selection, and thus of functional importance, and that a large fraction of coding and noncoding differences between species are adaptive. The findings further indicate that, in Drosophila, adaptations may be both common and strong enough that the fate of neutral mutations depends on their chance linkage to adaptive mutations as much as on the vagaries of genetic drift. The emerging evidence has implications for a wide variety of fields, from conservation genetics to bioinformatics, and presents challenges to modelers and experimentalists alike.

    View details for DOI 10.1371/journal.pgen.1000495

    View details for Web of Science ID 000268444600003

    View details for PubMedID 19503600

    View details for PubMedCentralID PMC2684638

  • Molecular Evolution of the Testis TAFs of Drosophila MOLECULAR BIOLOGY AND EVOLUTION Li, V. C., Davis, J. C., Lenkov, K., Bolival, B., Fuller, M. T., Petrov, D. A. 2009; 26 (5): 1103-1116


    The basal transcription machinery is responsible for initiating transcription at core promoters. During metazoan evolution, its components have expanded in number and diversified to increase the complexity of transcriptional regulation in tissues and developmental stages. To explore the evolutionary events and forces underlying this diversification, we analyzed the evolution of the Drosophila testis TAFs (TBP-associated factors), paralogs of TAFs from the basal transcription factor TFIID that are essential for normal transcription during spermatogenesis of a large set of specific genes involved in terminal differentiation of male gametes. There are five testis-specific TAFs in Drosophila, each expressed only in primary spermatocytes and each a paralog of a different generally expressed TFIID subunit. An examination of the presence of paralogs across taxa as well as molecular clock dating indicates that all five testis TAFs likely arose within a span of approximately 38 My 63-250 Ma by independent duplication events from their generally expressed paralogs. Furthermore, the evolution of the testis TAFs has been rapid, with apparent further accelerations in multiple Drosophila lineages. Analysis of between-species divergence and intraspecies polymorphism indicates that the major forces of evolution on these genes have been reduced purifying selection, pervasive positive selection, and coevolution. Other genes that exhibit similar patterns of evolution in the Drosophila lineages are also characterized by enriched expression in the testis, suggesting that the pervasive positive selection acting on the tTAFs is likely to be related to their expression in the testis.

    View details for DOI 10.1093/molbev/msp030

    View details for Web of Science ID 000265274000014

    View details for PubMedID 19244474

    View details for PubMedCentralID PMC2727373

  • Inferring the Strength of Selection in Drosophila under Complex Demographic Models MOLECULAR BIOLOGY AND EVOLUTION Gonzalez, J., Macpherson, J. M., Messer, P. W., Petrov, D. A. 2009; 26 (3): 513-526


    Transposable elements (TEs) constitute a substantial fraction of the genomes of many species, and it is thus important to understand their population dynamics. The strength of natural selection against TEs is a key parameter in understanding these dynamics. In principle, the strength of selection can be inferred from the frequencies of a sample of TEs. However, complicated demographic histories, such as found in Drosophila melanogaster, could lead to a substantial distortion of the TE frequency distribution compared with that expected for a panmictic, constant-sized population. The current methodology for the estimation of selection intensity acting against TEs does not take into account demographic history and might generate erroneous estimates especially for TE families under weak selection. Here, we develop a flexible maximum likelihood methodology that explicitly accounts both for demographic history and for the ascertainment biases of identifying TEs. We apply this method to the newly generated frequency data of the BS family of non-long terminal repeat retrotransposons in D. melanogaster in concert with two recent models of the demographic history of the species to infer the intensity of selection against this family. We find the estimate to differ substantially compared with a prior estimate that was made assuming a model of constant population size. Further, we find there to be relatively little information about selection intensity present in the derived non-African frequency data and that the ancestral African subpopulation is much more informative in this respect. These findings highlight the importance of accounting for demographic history and bear on study design for the inference of selection coefficients generally.

    View details for DOI 10.1093/molbev/msn270

    View details for Web of Science ID 000263420900005

    View details for PubMedID 19033258

    View details for PubMedCentralID PMC2767090

  • Similarly Strong Purifying Selection Acts on Human Disease Genes of All Evolutionary Ages GENOME BIOLOGY AND EVOLUTION Cai, J. J., Borenstein, E., Chen, R., Petrov, D. A. 2009; 1: 131-144


    A number of studies have showed that recently created genes differ from the genes created in deep evolutionary past in many aspects. Here, we determined the age of emergence and propensity for gene loss (PGL) of all human protein-coding genes and compared disease genes with non-disease genes in terms of their evolutionary rate, strength of purifying selection, mRNA expression, and genetic redundancy. The older and the less prone to loss, non-disease genes have been evolving 1.5- to 3-fold slower between humans and chimps than young non-disease genes, whereas Mendelian disease genes have been evolving very slowly regardless of their ages and PGL. Complex disease genes showed an intermediate pattern. Disease genes also have higher mRNA expression heterogeneity across multiple tissues than non-disease genes regardless of age and PGL. Young and middle-aged disease genes have fewer similar paralogs as non-disease genes of the same age. We reasoned that genes were more likely to be involved in human disease if they were under a strong functional constraint, expressed heterogeneously across tissues, and lacked genetic redundancy. Young human genes that have been evolving under strong constraint between humans and chimps might also be enriched for genes that encode important primate or even human-specific functions.

    View details for DOI 10.1093/gbe/evp013

    View details for Web of Science ID 000275269200014

    View details for PubMedID 20333184

    View details for PubMedCentralID PMC2817408

  • Pervasive Hitchhiking at Coding and Regulatory Sites in Humans PLOS GENETICS Cai, J. J., Macpherson, J. M., Sella, G., Petrov, D. A. 2009; 5 (1)


    Much effort and interest have focused on assessing the importance of natural selection, particularly positive natural selection, in shaping the human genome. Although scans for positive selection have identified candidate loci that may be associated with positive selection in humans, such scans do not indicate whether adaptation is frequent in general in humans. Studies based on the reasoning of the MacDonald-Kreitman test, which, in principle, can be used to evaluate the extent of positive selection, suggested that adaptation is detectable in the human genome but that it is less common than in Drosophila or Escherichia coli. Both positive and purifying natural selection at functional sites should affect levels and patterns of polymorphism at linked nonfunctional sites. Here, we search for these effects by analyzing patterns of neutral polymorphism in humans in relation to the rates of recombination, functional density, and functional divergence with chimpanzees. We find that the levels of neutral polymorphism are lower in the regions of lower recombination and in the regions of higher functional density or divergence. These correlations persist after controlling for the variation in GC content, density of simple repeats, selective constraint, mutation rate, and depth of sequencing coverage. We argue that these results are most plausibly explained by the effects of natural selection at functional sites -- either recurrent selective sweeps or background selection -- on the levels of linked neutral polymorphism. Natural selection at both coding and regulatory sites appears to affect linked neutral polymorphism, reducing neutral polymorphism by 6% genome-wide and by 11% in the gene-rich half of the human genome. These findings suggest that the effects of natural selection at linked sites cannot be ignored in the study of neutral human polymorphism.

    View details for DOI 10.1371/journal.pgen.1000336

    View details for Web of Science ID 000266221100019

    View details for PubMedID 19148272

    View details for PubMedCentralID PMC2613029

  • High Functional Diversity in Mycobacterium tuberculosis Driven by Genetic Drift and Human Demography PLOS BIOLOGY Hershberg, R., Lipatov, M., Small, P. M., Sheffer, H., Niemann, S., Homolka, S., Roach, J. C., Kremer, K., Petrov, D. A., Feldman, M. W., Gagneux, S. 2008; 6 (12): 2658-2671


    Mycobacterium tuberculosis infects one third of the human world population and kills someone every 15 seconds. For more than a century, scientists and clinicians have been distinguishing between the human- and animal-adapted members of the M. tuberculosis complex (MTBC). However, all human-adapted strains of MTBC have traditionally been considered to be essentially identical. We surveyed sequence diversity within a global collection of strains belonging to MTBC using seven megabase pairs of DNA sequence data. We show that the members of MTBC affecting humans are more genetically diverse than generally assumed, and that this diversity can be linked to human demographic and migratory events. We further demonstrate that these organisms are under extremely reduced purifying selection and that, as a result of increased genetic drift, much of this genetic diversity is likely to have functional consequences. Our findings suggest that the current increases in human population, urbanization, and global travel, combined with the population genetic characteristics of M. tuberculosis described here, could contribute to the emergence and spread of drug-resistant tuberculosis.

    View details for DOI 10.1371/journal.pbio.0060311

    View details for Web of Science ID 000261913700009

    View details for PubMedID 19090620

    View details for PubMedCentralID PMC2602723

  • High Rate of Recent Transposable Element-Induced Adaptation in Drosophila melanogaster PLOS BIOLOGY Gonzalez, J., Lenkov, K., Lipatov, M., Macpherson, J. M., Petrov, D. A. 2008; 6 (10): 2109-2129


    Although transposable elements (TEs) are known to be potent sources of mutation, their contribution to the generation of recent adaptive changes has never been systematically assessed. In this work, we conduct a genome-wide screen for adaptive TE insertions in Drosophila melanogaster that have taken place during or after the spread of this species out of Africa. We determine population frequencies of 902 of the 1,572 TEs in Release 3 of the D. melanogaster genome and identify a set of 13 putatively adaptive TEs. These 13 TEs increased in population frequency sharply after the spread out of Africa. We argue that many of these TEs are in fact adaptive by demonstrating that the regions flanking five of these TEs display signatures of partial selective sweeps. Furthermore, we show that eight out of the 13 putatively adaptive elements show population frequency heterogeneity consistent with these elements playing a role in adaptation to temperate climates. We conclude that TEs have contributed considerably to recent adaptive evolution (one TE-induced adaptation every 200-1,250 y). The majority of these adaptive insertions are likely to be involved in regulatory changes. Our results also suggest that TE-induced adaptations arise more often from standing variants than from new mutations. Such a high rate of TE-induced adaptation is inconsistent with the number of fixed TEs in the D. melanogaster genome, and we discuss possible explanations for this discrepancy.

    View details for DOI 10.1371/journal.pbio.0060251

    View details for Web of Science ID 000260423900008

    View details for PubMedID 18942889

    View details for PubMedCentralID PMC2570423

  • Pervasive and Persistent Redundancy among Duplicated Genes in Yeast PLOS GENETICS Dean, E. J., Davis, J. C., Davis, R. W., Petrov, D. A. 2008; 4 (7)


    The loss of functional redundancy is the key process in the evolution of duplicated genes. Here we systematically assess the extent of functional redundancy among a large set of duplicated genes in Saccharomyces cerevisiae. We quantify growth rate in rich medium for a large number of S. cerevisiae strains that carry single and double deletions of duplicated and singleton genes. We demonstrate that duplicated genes can maintain substantial redundancy for extensive periods of time following duplication ( approximately 100 million years). We find high levels of redundancy among genes duplicated both via the whole genome duplication and via smaller scale duplications. Further, we see no evidence that two duplicated genes together contribute to fitness in rich medium substantially beyond that of their ancestral progenitor gene. We argue that duplicate genes do not often evolve to behave like singleton genes even after very long periods of time.

    View details for DOI 10.1371/journal.pgen.1000113

    View details for Web of Science ID 000260410600025

    View details for PubMedID 18604285

    View details for PubMedCentralID PMC2440806

  • Nonadaptive explanations for signatures of partial selective sweeps in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Macpherson, J. M., Gonzalez, J., Witten, D. M., Davis, J. C., Rosenberg, N. A., Hirsh, A. E., Petrov, D. A. 2008; 25 (6): 1025-1042


    A beneficial mutation that has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether nonadaptive processes might generate similar haplotype configurations has not been extensively explored. Here, we consider 5 population genetic data sets taken from regions flanking high-frequency transposable elements in North American strains of Drosophila melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Whereas most of the data sets would be rejected as nonneutral under the standard neutral null model, only the data set for which there is strong external evidence in support of an adaptive transposition appears to be nonneutral under the more complex null model and in particular when demography is taken into account. High-frequency, derived mutations from a recently bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.

    View details for DOI 10.1093/molbev/msn007

    View details for Web of Science ID 000255758200004

    View details for PubMedID 18199829

    View details for PubMedCentralID PMC3299400

  • Selection on Codon Bias ANNUAL REVIEW OF GENETICS Hershberg, R., Petrov, D. A. 2008; 42: 287-299


    In a wide variety of organisms, synonymous codons are used with different frequencies, a phenomenon known as codon bias. Population genetic studies have shown that synonymous sites are under weak selection and that codon bias is maintained by a balance between selection, mutation, and genetic drift. It appears that the major cause for selection on codon bias is that certain preferred codons are translated more accurately and/or efficiently. However, additional and sometimes maybe even contradictory selective forces appear to affect codon usage as well. In this review, we discuss the current understanding of the ways in which natural selection participates in the creation and maintenance of codon bias. We also raise several open questions: (i) Is natural selection weak independently of the level of codon bias? It is possible that selection for preferred codons is weak only when codon bias approaches equilibrium and may be quite strong on genes with codon bias levels that are much lower and/or above equilibrium. (ii) What determines the identity of the major codons? (iii) How do shifts in codon bias occur? (iv) What is the exact nature of selection on codon bias? We discuss these questions in depth and offer some ideas on how they can be addressed using a combination of computational and experimental analyses.

    View details for DOI 10.1146/annurev.genet.42.110807.091442

    View details for Web of Science ID 000261767000014

    View details for PubMedID 18983258

  • Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in drosophila GENETICS Macpherson, J. M., Sella, G., Davis, J. C., Petrov, D. A. 2007; 177 (4): 2083-2099


    The effect of recurrent selective sweeps is a spatially heterogeneous reduction in neutral polymorphism throughout the genome. The pattern of reduction depends on the selective advantage and recurrence rate of the sweeps. Because many adaptive substitutions responsible for these sweeps also contribute to nonsynonymous divergence, the spatial distribution of nonsynonymous divergence also reflects the distribution of adaptive substitutions. Thus, the spatial correspondence between neutral polymorphism and nonsynonymous divergence may be especially informative about the process of adaptation. Here we study this correspondence using genomewide polymorphism data from Drosophila simulans and the divergence between D. simulans and D. melanogaster. Focusing on highly recombining portions of the autosomes, at a spatial scale appropriate to the study of selective sweeps, we find that neutral polymorphism is both lower and, as measured by a new statistic Q(S), less homogeneous where nonsynonymous divergence is higher and that the spatial structure of this correlation is best explained by the action of strong recurrent selective sweeps. We introduce a method to infer, from the spatial correspondence between polymorphism and divergence, the rate and selective strength of adaptation. Our results independently confirm a high rate of adaptive substitution (approximately 1/3000 generations) and newly suggest that many adaptations are of surprisingly great selective effect (approximately 1%), reducing the effective population size by approximately 15% even in highly recombining regions of the genome.

    View details for DOI 10.1534/genetics.107.080226

    View details for Web of Science ID 000251949800011

    View details for PubMedID 18073425

    View details for PubMedCentralID PMC2219485

  • Similar levels of X-linked and autosomal nucleotide variation in African and non-African populations of Drosophila melanogaster BMC EVOLUTIONARY BIOLOGY Singh, N. D., Macpherson, J. M., Jensen, J. D., Petrov, D. A. 2007; 7


    Levels of molecular diversity in Drosophila have repeatedly been shown to be higher in ancestral, African populations than in derived, non-African populations. This pattern holds for both coding and noncoding regions for a variety of molecular markers including single nucleotide polymorphisms and microsatellites. Comparisons of X-linked and autosomal diversity have yielded results largely dependent on population of origin.In an attempt to further elucidate patterns of sequence diversity in Drosophila melanogaster, we studied nucleotide variation at putatively nonfunctional X-linked and autosomal loci in sub-Saharan African and North American strains of D. melanogaster. We combine our experimental results with data from previous studies of molecular polymorphism in this species. We confirm that levels of diversity are consistently higher in African versus North American strains. The relative reduction of diversity for X-linked and autosomal loci in the derived, North American strains depends heavily on the studied loci. While the compiled dataset, comprised primarily of regions within or in close proximity to genes, shows a much more severe reduction of diversity on the X chromosome compared to autosomes in derived strains, the dataset consisting of intergenic loci located far from genes shows very similar reductions of diversities for X-linked and autosomal loci in derived strains. In addition, levels of diversity at X-linked and autosomal loci in the presumably ancestral African population are more similar than expected under an assumption of neutrality and equal numbers of breeding males and females.We show that simple demographic scenarios under assumptions of neutral theory cannot explain all of the observed patterns of molecular diversity. We suggest that the simplest model is a population bottleneck that retains an ancestral female-biased sex ratio, coupled with higher rates of positive selection at X-linked loci in close proximity to genes specifically in derived, non-African populations.

    View details for DOI 10.1186/1471-2148-7-202

    View details for Web of Science ID 000251904900001

    View details for PubMedID 17961244

    View details for PubMedCentralID PMC2164965

  • The mode and tempo of genome size evolution in eukaryotes GENOME RESEARCH Oliver, M. J., Petrov, D., Ackerly, D., Falkowski, P., Schofield, O. M. 2007; 17 (5): 594-601


    Eukaryotic genome size varies over five orders of magnitude; however, the distribution is strongly skewed toward small values. Genome size is highly correlated to a number of phenotypic traits, suggesting that the relative lack of large genomes in eukaryotes is due to selective removal. Using phylogenetic contrasts, we show that the rate of genome size evolution is proportional to genome size, with the fastest rates occurring in the largest genomes. This trend is evident across the 20 major eukaryotic clades analyzed, indicating that over long time scales, proportional change is the dominant and universal mode of genome-size evolution in eukaryotes. Our results reveal that the evolution of eukaryotic genome size can be described by a simple proportional model of evolution. This model explains the skewed distribution of eukaryotic genome sizes without invoking strong selection against large genomes.

    View details for DOI 10.1101/gr.6096207

    View details for Web of Science ID 000246297900006

    View details for PubMedID 17420184

    View details for PubMedCentralID PMC1855170

  • Evolution of gene function on the X chromosome versus the autosomes. Genome dynamics Singh, N. D., Petrov, D. A. 2007; 3: 101-118


    Sex chromosomes have arisen from autosomes many times over the course of evolution. This process generates chromosomal heteromorphy between the sexes, which has important implications for the evolution of coding and noncoding sequences on the sex chromosomes versus the autosomes. The formation of sex chromosomes from autosomes involves a reduction in gene dosage, which can modify properties of selection pressure on sex-linked genes. This transition also generates differences in the effective population size and dominance characteristics of novel mutations on the sex chromosome versus the autosomes. All of these changes may affect both patterns of in situ gene evolution and the rates of interchromosomal gene duplication and movement. Here we present a synopsis of the current understanding of the origin of sex chromosomes, theoretical context for differences in rates and patterns of molecular evolution on the X chromosome versus the autosomes, as well as a summary of empirical molecular evolutionary data from Drosophila and mammalian genomes.

    View details for DOI 10.1159/000107606

    View details for PubMedID 18753787

  • Reduced selection leads to accelerated gene loss in Shigella GENOME BIOLOGY Hershberg, R., Tang, H., Petrov, D. A. 2007; 8 (8)


    Obligate pathogenic bacteria lose more genes relative to facultative pathogens, which, in turn, lose more genes than free-living bacteria. It was suggested that the increased gene loss in obligate pathogens may be due to a reduction in the effectiveness of purifying selection. Less attention has been given to the causes of increased gene loss in facultative pathogens.We examined in detail the rate of gene loss in two groups of facultative pathogenic bacteria: pathogenic Escherichia coli, and Shigella. We show that Shigella strains are losing genes at an accelerated rate relative to pathogenic E. coli. We demonstrate that a genome-wide reduction in the effectiveness of selection contributes to the observed increase in the rate of gene loss in Shigella.When compared with their closely related pathogenic E. coli relatives, the more niche-limited Shigella strains appear to be losing genes at a significantly accelerated rate. A genome-wide reduction in the effectiveness of purifying selection plays a role in creating this observed difference. Our results demonstrate that differences in the effectiveness of selection contribute to differences in rate of gene loss in facultative pathogenic bacteria. We discuss how the lifestyle and pathogenicity of Shigella may alter the effectiveness of selection, thus influencing the rate of gene loss.

    View details for DOI 10.1186/gb-2007-8-8-r164

    View details for Web of Science ID 000253938500016

    View details for PubMedID 17686180

    View details for PubMedCentralID PMC2374995

  • Minor shift in background substitutional patterns in the Drosophila saltans and willistoni lineages is insufficient to explain GC content of coding sequences BMC BIOLOGY Singh, N. D., Arndt, P. F., Petrov, D. A. 2006; 4


    Several lines of evidence suggest that codon usage in the Drosophila saltans and D. willistoni lineages has shifted towards a less frequent use of GC-ending codons. Introns in these lineages show a parallel shift toward a lower GC content. These patterns have been alternatively ascribed to either a shift in mutational patterns or changes in the definition of preferred and unpreferred codons in these lineages.To gain additional insight into this question, we quantified background substitutional patterns in the saltans/willistoni group using inactive copies of a novel, Q-like retrotransposable element. We demonstrate that the pattern of background substitutions in the saltans/willistoni lineage has shifted to a significant degree, primarily due to changes in mutational biases. These differences predict a lower equilibrium GC content in the genomes of the saltans/willistoni species compared with that in the D. melanogaster species group. The magnitude of the difference can readily account for changes in intronic GC content, but it appears insufficient to explain changes in codon usage within the saltans/willistoni lineage.We suggest that the observed changes in codon usage in the saltans/willistoni clade reflects either lineage-specific changes in the definitions of preferred and unpreferred codons, or a weaker selective pressure on codon bias in this lineage.

    View details for DOI 10.1186/1741-7007-4-37

    View details for Web of Science ID 000241651800001

    View details for PubMedID 17049096

    View details for PubMedCentralID PMC1626080

  • Fitness cost of LINE-1 (L1) activity in humans PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Boissinot, S., Davis, J., Entezam, A., Petrov, D., Furano, A. V. 2006; 103 (25): 9590-9594


    The self-replicating LINE-1 (L1) retrotransposon family is the dominant retrotransposon family in mammals and has generated 30-40% of their genomes. Active L1 families are present in modern mammals but the important question of whether these currently active families affect the genetic fitness of their hosts has not been addressed. This issue is of particular relevance to humans as Homo sapiens contains the active L1 Ta1 subfamily of the human specific Ta (L1Pa1) L1 family. Although DNA insertions generated by the Ta1 subfamily can cause genetic defects in current humans, these are relatively rare, and it is not known whether Ta1-generated inserts or any other property of Ta1 elements have been sufficiently deleterious to reduce the fitness of humans. Here we show that full-length (FL) Ta1 elements, but not the truncated Ta1 elements or SINE (Alu) insertions generated by Ta1 activity, were subject to negative selection. Thus, one or more properties unique to FL L1 elements constitute a genetic burden for modern humans. We also found that the FL Ta1 elements became more deleterious as the expansion of Ta1 has proceeded. Because this expansion is ongoing, the Ta1 subfamily almost certainly continues to decrease the fitness of modern humans.

    View details for DOI 10.1073/pnas.0603334103

    View details for Web of Science ID 000238660400038

    View details for PubMedID 16766655

  • A novel method distinguishes between mutation rates and fixation biases in patterns of single-nucleotide substitution JOURNAL OF MOLECULAR EVOLUTION Lipatov, M., Arndt, P. F., Hwa, T., Petrov, D. A. 2006; 62 (2): 168-175


    Analysis of the genome-wide patterns of single-nucleotide substitution reveals that the human GC content structure is out of equilibrium. The substitutions are decreasing the overall GC content (GC), at the same time making its range narrower. Investigation of single-nucleotide polymorphisms (SNPs) revealed that presently the decrease in GC content is due to a uniform mutational preference for A:T pairs, while its projected range is due to a variability in the fixation preference for G:C pairs. However, it is important to determine whether lessons learned about evolutionary processes operating at the present time (that is reflected in the SNP data) can be extended back into the evolutionary past. We describe here a new approach to this problem that utilizes the juxtaposition of forward and reverse substitution rates to determine the relative importance of variability in mutation rates and fixation probabilities in shaping long-term substitutional patterns. We use this approach to demonstrate that the forces shaping GC content structure over the recent past (since the appearance of the SNPs) extend all the way back to the mammalian radiation approximately 90 million years ago. In addition, we find a small but significant effect that has not been detected in the SNP data-relatively high rates of C:G-->A:T germline mutation in low-GC regions of the genome.

    View details for DOI 10.1007/s00239-005-0207-z

    View details for Web of Science ID 000235866300005

    View details for PubMedID 16362483

  • A novel method distinguishes between mutation rates and fixation biases in patterns of single-nucleotide substitution (vol 62, pg 62, 2006) JOURNAL OF MOLECULAR EVOLUTION Lipatov, M., Arndt, P. F., Hwa, T., Petrov, D. A. 2006; 62 (2): 245
  • Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogaster genome BMC BIOLOGY Lipatov, M., Lenkov, K., Petrov, D. A., Bergman, C. M. 2005; 3


    Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events.Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (approximately 80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and approximately 15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence.In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene structures. Our results highlight the need to establish the fixity of putative cases of TE domestication identified using genome sequences in order to demonstrate their functional importance, and reveal that the contribution of TE domestication to genome evolution may vary drastically among animal taxa.

    View details for DOI 10.1186/1741-7007-3-24

    View details for Web of Science ID 000236370200001

    View details for PubMedID 16283942

    View details for PubMedCentralID PMC1308810

  • Do disparate mechanisms of duplication add similar genes to the genome? TRENDS IN GENETICS Davis, J. C., Petrov, D. A. 2005; 21 (10): 548-551


    Gene duplication is the fundamental source of new genes. Biases in duplication have profound implications for the dynamics of gene content during evolution. In this article, we compare genes arising from whole gene duplication (WGD), smaller scale duplication (SSD) and singletons in Saccharomyces cerevisiae. Our results demonstrate that genes duplicated by WGD and SSD are similarly biased with respect to codon bias and evolutionary rate, although differing significantly in their functional constituency.

    View details for DOI 10.1016/j.tig.2005.07.008

    View details for Web of Science ID 000232444400005

    View details for PubMedID 16098632

  • Codon bias and noncoding GC content correlate negatively with recombination rate on the Drosophila X chromosome JOURNAL OF MOLECULAR EVOLUTION Singh, N. D., Davis, J. C., Petrov, D. A. 2005; 61 (3): 315-324


    The patterns and processes of molecular evolution may differ between the X chromosome and the autosomes in Drosophila melanogaster. This may in part be due to differences in the effective population size between the two chromosome sets and in part to the hemizygosity of the X chromosome in Drosophila males. These and other factors may lead to differences both in the gene complements of the X and the autosomes and in the properties of the genes residing on those chromosomes. Here we show that codon bias and recombination rate are correlated strongly and negatively on the X chromosome, and that this correlation cannot be explained by indirect relationships with other known determinants of codon bias. This is in dramatic contrast to the weak positive correlation found on the autosomes. We explored possible explanations for these patterns, which required a comprehensive analysis of the relationships among multiple genetic properties such as protein length and expression level. This analysis highlights conserved features of coding sequence evolution on the X and the autosomes and illuminates interesting differences between these two chromosome sets.

    View details for DOI 10.1007/s00239-004-0287-1

    View details for Web of Science ID 000231732400004

    View details for PubMedID 16044248

  • X-linked genes evolve higher codon bias in Drosophila and Caenorhabditis GENETICS Singh, N. D., Davis, J. C., Petrov, D. A. 2005; 171 (1): 145-155


    Comparing patterns of molecular evolution between autosomes and sex chromosomes (such as X and W chromosomes) can provide insight into the forces underlying genome evolution. Here we investigate patterns of codon bias evolution on the X chromosome and autosomes in Drosophila and Caenorhabditis. We demonstrate that X-linked genes have significantly higher codon bias compared to autosomal genes in both Drosophila and Caenorhabditis. Furthermore, genes that become X-linked evolve higher codon bias gradually, over tens of millions of years. We provide several lines of evidence that this elevation in codon bias is due exclusively to their chromosomal location and not to any other property of X-linked genes. We present two possible explanations for these observations. One possibility is that natural selection is more efficient on the X chromosome due to effective haploidy of the X chromosomes in males and persistently low effective numbers of reproducing males compared to that of females. Alternatively, X-linked genes might experience stronger natural selection for higher codon bias as a result of maladaptive reduction of their dosage engendered by the loss of the Y-linked homologs.

    View details for DOI 10.1534/genetics.105.043497

    View details for Web of Science ID 000232494400014

    View details for PubMedID 15965246

    View details for PubMedCentralID PMC1456507

  • Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila SCIENCE Aminetzach, Y. T., Macpherson, J. M., Petrov, D. A. 2005; 309 (5735): 764-767


    To study adaptation, it is essential to identify multiple adaptive mutations and to characterize their molecular, phenotypic, selective, and ecological consequences. Here we describe a genomic screen for adaptive insertions of transposable elements in Drosophila. Using a pilot application of this screen, we have identified an adaptive transposable element insertion, which truncates a gene and apparently generates a functional protein in the process. The insertion of this transposable element confers increased resistance to an organophosphate pesticide and has spread in D. melanogaster recently.

    View details for DOI 10.1126/science.1112699

    View details for Web of Science ID 000230938200048

    View details for PubMedID 16051794

  • Substantial regional variation in substitution rates in the human genome: Importance of GC content, gene density, and telomere-specific effects JOURNAL OF MOLECULAR EVOLUTION Arndt, P. F., Hwa, T., Petrov, D. A. 2005; 60 (6): 748-U28


    This study presents the first global, 1-Mbp-level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to twofold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates, suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp's. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10-15 Mbp away from the telomere.

    View details for DOI 10.1007/s00239-004-0222-5

    View details for Web of Science ID 000230077700006

    View details for PubMedID 15959677

  • Protein evolution in the context of Drosophila development JOURNAL OF MOLECULAR EVOLUTION Davis, J. C., Brandman, O., Petrov, D. A. 2005; 60 (6): 774-U42


    The tempo at which a protein evolves depends not only on the rate at which mutations arise but also on the selective effects that those mutations have at the organismal level. It is intuitive that proteins functioning during different stages of development may be predisposed to having mutations of different selective effects. For example, it has been hypothesized that changes to proteins expressed during early development should have larger phenotypic consequences because later stages depend on them. Conversely, changes to proteins expressed much later in development should have smaller consequences at the organismal level. Here we assess whether proteins expressed at different times during Drosophila development vary systematically in their rates of evolution. We find that proteins expressed early in development and particularly during mid-late embryonic development evolve unusually slowly. In addition, proteins expressed in adult males show an elevated evolutionary rate. These two trends are independent of each other and cannot be explained by peculiar rates of mutation or levels of codon bias. Moreover, the observed patterns appear to hold across several functional classes of genes, although the exact developmental time of the slowest protein evolution differs among each class. We discuss our results in connection with data on the evolution of development.

    View details for DOI 10.1007/s00239-004-0241-2

    View details for Web of Science ID 000230077700008

    View details for PubMedID 15909223

  • Genomic heterogeneity of background substitutional patterns in Drosophila melanogaster GENETICS Singh, N. D., Arndt, P. F., Petrov, D. A. 2005; 169 (2): 709-722


    Mutation is the underlying force that provides the variation upon which evolutionary forces can act. It is important to understand how mutation rates vary within genomes and how the probabilities of fixation of new mutations vary as well. If substitutional processes across the genome are heterogeneous, then examining patterns of coding sequence evolution without taking these underlying variations into account may be misleading. Here we present the first rigorous test of substitution rate heterogeneity in the Drosophila melanogaster genome using almost 1500 nonfunctional fragments of the transposable element DNAREP1_DM. Not only do our analyses suggest that substitutional patterns in heterochromatic and euchromatic sequences are different, but also they provide support in favor of a recombination-associated substitutional bias toward G and C in this species. The magnitude of this bias is entirely sufficient to explain recombination-associated patterns of codon usage on the autosomes of the D. melanogaster genome. We also document a bias toward lower GC content in the pattern of small insertions and deletions (indels). In addition, the GC content of noncoding DNA in Drosophila is higher than would be predicted on the basis of the pattern of nucleotide substitutions and small indels. However, we argue that the fast turnover of noncoding sequences in Drosophila makes it difficult to assess the importance of the GC biases in nucleotide substitutions and small indels in shaping the base composition of noncoding sequences.

    View details for DOI 10.1534/genetics.104.032250

    View details for Web of Science ID 000227697200018

    View details for PubMedID 15520267

    View details for PubMedCentralID PMC1449091

  • Elevated evolutionary rates in the laboratory strain of Saccharomyces cerevisiae PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gu, Z. L., David, L., Petrov, D., Jones, T., Davis, R. W., Steinmetz, L. M. 2005; 102 (4): 1092-1097


    By using the maximum likelihood method, we made a genome-wide comparison of the evolutionary rates in the lineages leading to the laboratory strain (S288c) and a wild strain (YJM789) of Saccharomyces cerevisiae and found that genes in the laboratory strain tend to evolve faster than in the wild strain. The pattern of elevated evolution suggests that relaxation of selection intensity is the dominant underlying reason, which is consistent with recurrent bottlenecks in the S. cerevisiae laboratory strain population. Supporting this conclusion are the following observations: (i) the increases in nonsynonymous evolutionary rate occur for genes in all functional categories; (ii) most of the synonymous evolutionary rate increases in S288c occur in genes with strong codon usage bias; (iii) genes under stronger negative selection have a larger increase in nonsynonymous evolutionary rate; and (iv) more genes with adaptive evolution were detected in the laboratory strain, but they do not account for the majority of the increased evolution. The present discoveries suggest that experimental and possible industrial manipulations of the laboratory strain of yeast could have had a strong effect on the genetic makeup of this model organism. Furthermore, they imply an evolution of laboratory model organisms away from their wild counterparts, questioning the relevancy of the models especially when extensive laboratory cultivation has occurred. In addition, these results shed light on the evolution of livestock and crop species that have been under human domestication for years.

    View details for DOI 10.1073/pnas.0409159102

    View details for Web of Science ID 000226617900026

    View details for PubMedID 15647350

    View details for PubMedCentralID PMC545845

  • The large genome constraint hypothesis: Evolution, ecology and phenotype 2nd Plant Genome Size Workshop and Discussion Meeting Knight, C. A., Molinari, N. A., Petrov, D. A. OXFORD UNIV PRESS. 2005: 177–90


    If large genomes are truly saturated with unnecessary 'junk' DNA, it would seem natural that there would be costs associated ith accumulation and replication of this excess DNA. Here we examine the available evidence to support this hypothesis, which we term the 'large genome constraint'. We examine the large genome constraint at three scales: evolution, ecology, and the plant phenotype.In evolution, we tested the hypothesis that plant lineages with large genomes are diversifying more slowly. We found that genera with large genomes are less likely to be highly specious -- suggesting a large genome constraint on speciation. In ecology, we found that species with large genomes are under-represented in extreme environments -- again suggesting a large genome constraint for the distribution and abundance of species. Ultimately, if these ecological and evolutionary constraints are real, the genome size effect must be expressed in the phenotype and confer selective disadvantages. Therefore, in phenotype, we review data on the physiological correlates of genome size, and present new analyses involving maximum photosynthetic rate and specific leaf area. Most notably, we found that species with large genomes have reduced maximum photosynthetic rates - again suggesting a large genome constraint on plant performance. Finally, we discuss whether these phenotypic correlations may help explain why species with large genomes are trimmed from the evolutionary tree and have restricted ecological distributions.Our review tentatively supports the large genome constraint hypothesis.

    View details for DOI 10.1093/aob/rnci011

    View details for Web of Science ID 000226370900011

    View details for PubMedID 15596465

  • Enhancer choice in cis and in trans in Drosophila melanogaster: Role of the promoter GENETICS Morris, J. R., Petrov, D. A., Lee, A. M., Wu, C. T. 2004; 167 (4): 1739-1747


    Eukaryotic enhancers act over very long distances, yet still show remarkable specificity for their own promoter. To better understand mechanisms underlying this enhancer-promoter specificity, we used transvection to analyze enhancer choice between two promoters, one located in cis to the enhancer and the other in trans to the enhancer, at the yellow gene of Drosophila melanogaster. Previously, we demonstrated that enhancers at yellow prefer to act on the cis-linked promoter, but that mutation of core promoter elements in the cis-linked promoter releases enhancers to act in trans. Here, we address the mechanism by which these elements affect enhancer choice. We consider and explicitly test three models that are based on promoter competency, promoter pairing, and promoter identity. Through targeted gene replacement of the endogenous yellow gene, we show that competency of the cis-linked promoter is a key parameter in the cis-trans choice of an enhancer. In fact, complete replacement of the yellow promoter with both TATA-containing and TATA-less heterologous promoters maintains enhancer action in cis.

    View details for DOI 10.1534/genetics.104.026955

    View details for Web of Science ID 000223720300018

    View details for PubMedID 15342512

    View details for PubMedCentralID PMC1471007

  • Rapid sequence turnover at an intergenic locus in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Singh, N. D., Petrov, D. A. 2004; 21 (4): 670-680


    Closely related species of Drosophila tend to have similar genome sizes. The strong imbalance in favor of small deletions relative to insertions implies that the unconstrained DNA in Drosophila is unlikely to be passively inherited from even closely related ancestors, and yet most DNA in Drosophila genomes is intergenic and potentially unconstrained. In an attempt to investigate the maintenance of this intergenic DNA, we studied the evolution of an intergenic locus on the fourth chromosome of the Drosophila melanogaster genome. This 1.2-kb locus is marked by two distinct, large insertion events: a nuclear transposition of a mitochondrial sequence and a transposition of a nonautonomous DNA transposon DNAREP1_DM. Because we could trace the evolutionary histories of these sequences, we were able to reconstruct the length evolution of this region in some detail. We sequenced this locus in all four species of the D. melanogaster species complex: D. melanogaster, D. simulans, D. sechellia, and D. mauritiana. Although this locus is similar in size in these four species, less than 10% of the sequence from the most recent common ancestor remains in D. melanogaster and all of its sister species. This region appears to have increased in size through several distinct insertions in the ancestor of the D. melanogaster species complex and has been shrinking since the split of these lineages. In addition, we found no evidence suggesting that the size of this locus has been maintained over evolutionary time; these results are consistent with the model of a dynamic equilibrium between persistent DNA loss through small deletions and more sporadic DNA gain through less frequent but longer insertions. The apparent stability of genome size in Drosophila may belie very rapid sequence turnover at intergenic loci.

    View details for DOI 10.1093/molbev/msh060

    View details for Web of Science ID 000220685200006

    View details for PubMedID 14739245

  • Preferential duplication of conserved proteins in eukaryotic genomes. PLoS biology Davis, J. C., Petrov, D. A. 2004; 2 (3): E55-?


    A central goal in genome biology is to understand the origin and maintenance of genic diversity. Over evolutionary time, each gene's contribution to the genic content of an organism depends not only on its probability of long-term survival, but also on its propensity to generate duplicates that are themselves capable of long-term survival. In this study we investigate which types of genes are likely to generate functional and persistent duplicates. We demonstrate that genes that have generated duplicates in the C. elegans and S. cerevisiae genomes were 25%-50% more constrained prior to duplication than the genes that failed to leave duplicates. We further show that conserved genes have been consistently prolific in generating duplicates for hundreds of millions of years in these two species. These findings reveal one way in which gene duplication shapes the content of eukaryotic genomes. Our finding that the set of duplicate genes is biased has important implications for genome-scale studies.

    View details for PubMedID 15024414

  • Preferential duplication of conserved proteins in eukaryotic genomes PLOS BIOLOGY Davis, J. C., Petrov, D. A. 2004; 2 (3): 318-326


    A central goal in genome biology is to understand the origin and maintenance of genic diversity. Over evolutionary time, each gene's contribution to the genic content of an organism depends not only on its probability of long-term survival, but also on its propensity to generate duplicates that are themselves capable of long-term survival. In this study we investigate which types of genes are likely to generate functional and persistent duplicates. We demonstrate that genes that have generated duplicates in the C. elegans and S. cerevisiae genomes were 25%-50% more constrained prior to duplication than the genes that failed to leave duplicates. We further show that conserved genes have been consistently prolific in generating duplicates for hundreds of millions of years in these two species. These findings reveal one way in which gene duplication shapes the content of eukaryotic genomes. Our finding that the set of duplicate genes is biased has important implications for genome-scale studies.

    View details for DOI 10.1371/journal.pbio.0020055

    View details for Web of Science ID 000220512000008

    View details for PubMedCentralID PMC368158

  • Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation MOLECULAR BIOLOGY AND EVOLUTION Arndt, P. F., Petrov, D. A., Hwa, T. 2003; 20 (11): 1887-1896


    Differences in the regional substitution patterns in the human genome created patterns of large-scale variation of base composition known as genomic isochores. To gain insight into the origin of the genomic isochores, we develop a maximum-likelihood approach to determine the history of substitution patterns in the human genome. This approach utilizes the vast amount of repetitive sequence deposited in the human genome over the past approximately 250 Myr. Using this approach, we estimate the frequencies of seven types of substitutions: the four transversions, two transitions, and the methyl-assisted transition of cytosine in CpG. Comparing substitutional patterns in repetitive elements of various ages, we reconstruct the history of the base-substitutional process in the different isochores for the past 250 Myr. At around 90 MYA (around the time of the mammalian radiation), we find an abrupt fourfold to eightfold increase of the cytosine transition rate in CpG pairs compared with that of the reptilian ancestor. Further analysis of nucleotide substitutions in regions with different GC content reveals concurrent changes in the substitutional patterns. Although the substitutional pattern was dependent on the regional GC content in such ways that it preserved the regional GC content before the mammalian radiation, it lost this dependence afterward. The substitutional pattern changed from an isochore-preserving to an isochore-degrading one. We conclude that isochores have been established before the radiation of the eutherian mammals and have been subject to the process of homogenization since then.

    View details for DOI 10.1093/molbev/msg204

    View details for Web of Science ID 000186618200017

    View details for PubMedID 12885958

  • Rates of DNA duplication and mitochondrial DNA insertion in the human genome JOURNAL OF MOLECULAR EVOLUTION Bensasson, D., Feldman, M. W., Petrov, D. A. 2003; 57 (3): 343-354


    The hundreds of mitochondrial pseudogenes in the human nuclear genome sequence (numts) constitute an excellent system for studying and dating DNA duplications and insertions. These pseudogenes are associated with many complete mitochondrial genome sequences and through those with a good fossil record. By comparing individual numts with primate and other mammalian mitochondrial genome sequences, we estimate that these numts arose continuously over the last 58 million years. Our pairwise comparisons between numts suggest that most human numts arose from different mitochondrial insertion events and not by DNA duplication within the nuclear genome. The nuclear genome appears to accumulate mtDNA insertions at a rate high enough to predict within-population polymorphism for the presence/absence of many recent mtDNA insertions. Pairwise analysis of numts and their flanking DNA produces an estimate for the DNA duplication rate in humans of 2.2 x 10(-9) per numt per year. Thus, a nucleotide site is about as likely to be involved in a duplication event as it is to change by point substitution. This estimate of the rate of DNA duplication of noncoding DNA is based on sequences that are not in duplication hotspots, and is close to the rate reported for functional genes in other species.

    View details for DOI 10.1007/s00239-003-2485-7

    View details for Web of Science ID 000184992800012

    View details for PubMedID 14629044

  • Size matters: Non-LTR retrotransposable elements and ectopic recombination in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Aminetzach, Y. T., Davis, J. C., Bensasson, D., Hirsh, A. E. 2003; 20 (6): 880-892


    The Drosophila melanogaster genome contains approximately 100 distinct families of transposable elements (TEs). In the euchromatic part of the genome, each family is present in a small number of copies (5-150 copies), with individual copies of TEs often present at very low frequencies in populations. This pattern is likely to reflect a balance between the inflow of TEs by transposition and the removal of TEs by natural selection. The nature of natural selection acting against TEs remains controversial. We provide evidence that selection against chromosome abnormalities caused by ectopic recombination limits the spread of some TEs. We also demonstrate for the first time that some TE families in the Drosophila euchromatin appear to be only marginally affected by purifying selection and contain many copies at high population frequencies. We argue that TEs in these families attain high population frequencies and even reach fixation as a result of low family-wide transposition rates leading to low TE copy numbers and consequently reduced strength of selection acting on individual TE copies. Fixation of TEs in these families should provide an upward pressure on the size of intergenic sequences counterbalancing rapid DNA loss through small deletions. Copy-number-dependent selection on TE families caused by ectopic recombination may also promote diversity among TEs in the Drosophila genome.

    View details for DOI 10.1093/molbev/msg102

    View details for Web of Science ID 000183138500004

    View details for PubMedID 12716993

  • Transposable elements in clonal lineages: lethal hangover from sex Nuzhdin, S. V., Petrov, D. A. OXFORD UNIV PRESS. 2003: 33–41
  • How intron splicing affects the deletion and insertion profile in Drosophila melanogaster GENETICS Ptak, S. E., Petrov, D. A. 2002; 162 (3): 1233-1244


    Studies of "dead-on-arrival" transposable elements in Drosophila melanogaster found that deletions outnumber insertions approximately 8:1 with a median size for deletions of approximately 10 bp. These results are consistent with the deletion and insertion profiles found in most other Drosophila pseudogenes. In contrast, a recent study of D. melanogaster introns found a deletion/insertion ratio of 1.35:1, with 84% of deletions being shorter than 10 bp. This discrepancy could be explained if deletions, especially long deletions, are more frequently strongly deleterious than insertions and are eliminated disproportionately from intron sequences. To test this possibility, we use analysis and simulations to examine how deletions and insertions of different lengths affect different components of splicing and determine the distribution of deletions and insertions that preserve the original exons. We find that, consistent with our predictions, longer deletions affect splicing at a much higher rate compared to insertions and short deletions. We also explore other potential constraints in introns and show that most of these also disproportionately affect large deletions. Altogether we demonstrate that constraints in introns may explain much of the difference in the pattern of deletions and insertions observed in Drosophila introns and pseudogenes.

    View details for Web of Science ID 000179739900020

    View details for PubMedID 12454069

  • SEGE: A database on 'intron less/single exonic' genes from eukaryotes BIOINFORMATICS Sakharkar, M. K., Kangueane, P., Petrov, D. A., Kolaskar, A. S., Subbiah, S. 2002; 18 (9): 1266-1267


    Eukaryotes have both 'intron containing' and 'intron less' genes. Several databases are available for 'intron containing' genes in eukaryotes. In this note, we describe a database for 'intron less' genes from eukaryotes. 'Intron less' eukaryotic genes having prokaryotic architecture will help to understand gene evolution in a much simpler way unlike 'intron containing' genes.SEGE is available at

    View details for Web of Science ID 000178001400015

    View details for PubMedID 12217920

  • Mutational equilibrium model of genome size evolution THEORETICAL POPULATION BIOLOGY Petrov, D. A. 2002; 61 (4): 531-544


    The paper describes a mutational equilibrium model of genome size evolution. This model is different from both adaptive and junk DNA models of genome size evolution in that it does not assume that genome size is maintained either by positive or stabilizing selection for the optimum genome size (as in adaptive theories) or by purifying selection against too much junk DNA (as in junk DNA theories). Instead the genome size is suggested to evolve until the loss of DNA through more frequent small deletions is equal to the rate of DNA gain through more frequent long insertions. The empirical basis for this theory is the finding of a strong correlation and of a clear power-function relationship between the rate of mutational DNA loss (per bp) through small deletions and genome size in animals. Genome size scales as a negative 1.3 power function of the deletion rate per nucleotide. Such a relationship is not predicted by either adaptive or junk DNA theories. However, if genome size is maintained at equilibrium by the balance of mutational forces, this empirilical relationship can be readily accommodated. Within this framework, this finding would imply that the rate of DNA gain through large insertions scales up a quarter-power function of genome size. On this view, as genome size grows, the rate of growth through large insertions is increasing as a quarter power function of genome size and the rate of DNA loss through small deletions increases linearly, until eventually, at the stable equilibrium genome size value, rates of growth and loss equal each other. The current data also suggest that the long-term variation is genome size in animals is brought about to a significant extent by changes in the intrinsic rates of DNA loss through small deletions. Both the origin of mutational biases and the adaptive consequences of such a mode of evolution of genome size are discussed.

    View details for DOI 10.1006/tpbi.2002.1605

    View details for Web of Science ID 000177739500016

    View details for PubMedID 12167373

  • DNA loss and evolution of genome size in Drosophila GENETICA Petrov, D. A. 2002; 115 (1): 81-91


    Mutation is often said to be random. Although it must be true that mutation is ignorant about the adaptive needs of the organism and thus is random relative to them as a rule, mutation is not truly random in other respects. Nucleotide substitutions, deletions, insertions, inversions, duplications and other types of mutation occur at different rates and are effected by different mechanisms. Moreover the rates of different mutations vary from organism to organism. Differences in mutational biases, along with natural selection, could impact gene and genome evolution in important ways. For instance, several recent studies have suggested that differences in insertion/deletion biases lead to profound differences in the rate of DNA loss in animals and that this difference per se can lead to significant changes in genome size. In particular, Drosophila melanogaster appears to have a very high rate of deletions and the correspondingly high rate of DNA loss and a very compact genome. To assess the validity of these studies we must first assess the validity of the measurements of indel biases themselves. Here I demonstrate the robustness of indel bias measurements in Drosophila, by comparing indel patterns in different types of nonfunctional sequences. The indel pattern and the high rate of DNA loss appears to be shared by all known nonfunctional sequences, both euchromatic and heterochromatic, transposable and non-transposable, repetitive and unique. Unfortunately all available nonfunctional sequences are untranscribed and thus effects of transcription on indel bias cannot be assessed. I also discuss in detail why it is unlikely that natural selection for or against DNA loss significantly affects current estimates of indel biases.

    View details for Web of Science ID 000176413900007

    View details for PubMedID 12188050

  • Gene galaxies in the maize genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Walbot, V., Petrov, D. A. 2001; 98 (15): 8163-8164

    View details for Web of Science ID 000169967000003

    View details for PubMedID 11459945

    View details for PubMedCentralID PMC37413

  • Genomic gigantism: DNA loss is slow in mountain grasshoppers MOLECULAR BIOLOGY AND EVOLUTION Bensasson, D., Petrov, D. A., Zhang, D. X., Hartl, D. L., Hewitt, G. M. 2001; 18 (2): 246-253


    Several studies have shown DNA loss to be inversely correlated with genome size in animals. These studies include a comparison between Drosophila and the cricket, Laupala, but there has been no assessment of DNA loss in insects with very large genomes. Podisma pedestris, the brown mountain grasshopper, has a genome over 100 times as large as that of Drosophila and 10 times as large as that of Laupala. We used 58 paralogous nuclear pseudogenes of mitochondrial origin to study the characteristics of insertion, deletion, and point substitution in P. pedestris and Italopodisma. In animals, these pseudogenes are "dead on arrival"; they are abundant in many different eukaryotes, and their mitochondrial origin simplifies the identification of point substitutions accumulated in nuclear pseudogene lineages. There appears to be a mononucleotide repeat within the 643-bp pseudogene sequence studied that acts as a strong hot spot for insertions or deletions (indels). Because the data for other insect species did not contain such an unusual region, hot spots were excluded from species comparisons. The rate of DNA loss relative to point substitution appears to be considerably and significantly lower in the grasshoppers studied than in Drosophila or Laupala. This suggests that the inverse correlation between genome size and the rate of DNA loss can be extended to comparisons between insects with large or gigantic genomes (i.e., Laupala and Podisma). The low rate of DNA loss implies that in grasshoppers, the accumulation of point mutations is a more potent force for obscuring ancient pseudogenes than their loss through indel accumulation, whereas the reverse is true for Drosophila. The main factor contributing to the difference in the rates of DNA loss estimated for grasshoppers, crickets, and Drosophila appears to be deletion size. Large deletions are relatively rare in Podisma and Italopodisma.

    View details for Web of Science ID 000166775100015

    View details for PubMedID 11158383

  • Evolution of genome size: new approaches to an old problem TRENDS IN GENETICS Petrov, D. A. 2001; 17 (1): 23-28


    Eukaryotic genomes come in a wide variety of sizes. Haploid DNA contents (C values) range > 80,000-fold without an apparent correlation with either the complexity of the organism or the number of genes. This puzzling observation, the C-value paradox, has remained a mystery for almost half a century, despite much progress in the elucidation of the structure and function of genomes. Here I argue that new approaches focussing on the genetic mechanisms that generate genome-size differences could shed much light on the evolution of genome size.

    View details for Web of Science ID 000168717900007

    View details for PubMedID 11163918

  • Pseudogene evolution and natural selection for a compact genome Symposium on Genetic Diversity and Evolution Petrov, D. A., Hartl, D. L. OXFORD UNIV PRESS INC. 2000: 221–27


    Pseudogenes are nonfunctional copies of protein-coding genes that are presumed to evolve without selective constraints on their coding function. They are of considerable utility in evolutionary genetics because, in the absence of selection, different types of mutations in pseudogenes should have equal probabilities of fixation. This theoretical inference justifies the estimation of patterns of spontaneous mutation from the analysis of patterns of substitutions in pseudogenes. Although it is possible to test whether pseudogene sequences evolve without constraints for their protein-coding function, it is much more difficult to ascertain whether pseudogenes may affect fitness in ways unrelated to their nucleotide sequence. Consider the possibility that a pseudogene affects fitness merely by increasing genome size. If a larger genome is deleterious--for example, because of increased energetic costs associated with genome replication and maintenance--then deletions, which decrease the length of a pseudogene, should be selectively advantageous relative to insertions or nucleotide substitutions. In this article we examine the implications of selection for genome size relative to small (1-400 bp) deletions, in light of empirical evidence pertaining to the size distribution of deletions observed in Drosophila and mammalian pseudogenes. There is a large difference in the deletion spectra between these organisms. We argue that this difference cannot easily be attributed to selection for overall genome size, since the magnitude of selection is unlikely to be strong enough to significantly affect the probability of fixation of small deletions in Drosophila.

    View details for Web of Science ID 000087190900007

    View details for PubMedID 10833048

  • Evidence for DNA loss as a determinant of genome size SCIENCE Petrov, D. A., Sangster, T. A., Johnston, J. S., Hartl, D. L., Shaw, K. L. 2000; 287 (5455): 1060-1062


    Eukaryotic genome sizes range over five orders of magnitude. This variation cannot be explained by differences in organismic complexity (the C value paradox). To test the hypothesis that some variation in genome size can be attributed to differences in the patterns of insertion and deletion (indel) mutations among organisms, this study examines the indel spectrum in Laupala crickets, which have a genome size 11 times larger than that of Drosophila. Consistent with the hypothesis, DNA loss is more than 40 times slower in Laupala than in Drosophila.

    View details for Web of Science ID 000085245400053

    View details for PubMedID 10669421

  • Genome size as a mutation-selection-drift process Fukuoka International Symposium of Population Genetics Lozovskaya, E. R., Nurminsky, D., Petrov, D. A., Hartl, D. L. GENETICS SOC JAPAN. 1999: 201–7


    A novel method for estimating neutral rates and patterns of DNA evolution in Drosophila takes advantage of the propensity of non-LTR retrotransposable elements to create nonfunctional, transpositionally inactive copies as a product of transposition. For many LINE elements, most copies present in a genome at any one time are nonfunctional "dead-on-arrival" (DOA) copies. Because these are off-shoots of active, transpositionally competent "master" lineages, in a gene tree of a LINE element from multiple samples from related species, the DOA lineages are expected to map to the terminal branches and the active lineages to the internal branches, the primary exceptions being when the sample includes DOA copies that are allelic or orthologous. Analysis of nucleotide substitutions and other changes along the terminal branches therefore allows estimation of the fixation process in the DOA copies, which are unconstrained with respect to protein coding; and under selective neutrality, the fixation process estimates the underlying mutational pattern. We have studied the retroelement Helena in Drosophila. An unexpectedly high rate of DNA loss was observed, yielding a half-life of unconstrained DNA sequences approximately 60-fold faster in Drosophila than in mammals. The high rate of DNA loss suggests a straightforward explanation of the seeming paradox that Drosophila has many fewer pseudogenes than found in mammalian species. Differential rates of deletion in different taxa might also contribute to the celebrated C-value paradox of why some closely related organisms can have very different DNA contents. New data presented here rule out the possibility that the transposition process itself is highly mutagenic, hence the observed linear relation between number of deletions and number of nucleotide substitutions is most easily explained by the hypothesis that both types of changes accumulate in unconstrained sequences over time.

    View details for Web of Science ID 000085786200003

    View details for PubMedID 10734601

  • Patterns of nucleotide substitution in Drosophila and mammalian genomes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Petrov, D. A., Hartl, D. L. 1999; 96 (4): 1475-1479


    To estimate patterns of molecular evolution of unconstrained DNA sequences, we used maximum parsimony to separate phylogenetic trees of a non-long terminal repeat retrotransposable element into either internal branches, representing mainly the constrained evolution of active lineages, or into terminal branches, representing mainly nonfunctional "dead-on-arrival" copies that are unconstrained by selection and evolve as pseudogenes. The pattern of nucleotide substitutions in unconstrained sequences is expected to be congruent with the pattern of point mutation. We examined the retrotransposon Helena in the Drosophila virilis species group (subgenus Drosophila) and the Drosophila melanogaster species subgroup (subgenus Sophophora). The patterns of point mutation are indistinguishable, suggesting considerable stability over evolutionary time (40-60 million years). The relative frequencies of different point mutations are unequal, but the "transition bias" results largely from an approximately 2-fold excess of G.C to A.T substitutions. Spontaneous mutation is biased toward A.T base pairs, with an expected mutational equilibrium of approximately 65% A + T (quite similar to that of long introns). These data also enable the first detailed comparison of patterns of point mutations in Drosophila and mammals. Although the patterns are different, all of the statistical significance comes from a much greater rate of G.C to A.T substitution in mammals, probably because of methylated cytosine "hotspots." When the G.C to A.T substitutions are discounted, the remaining differences are considerably reduced and not statistically significant.

    View details for Web of Science ID 000078698400056

    View details for PubMedID 9990048

    View details for PubMedCentralID PMC15487

  • Pseudogene evolution in Drosophila suggests a high rate of DNA loss MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Chao, Y. C., Stephenson, E. C., Hartl, D. L. 1998; 15 (11): 1562-1567

    View details for Web of Science ID 000076888400019

    View details for PubMedID 12572619

  • Genome size and intron size in Drosophila MOLECULAR BIOLOGY AND EVOLUTION Moriyama, E. N., Petrov, D. A., Hartl, D. L. 1998; 15 (6): 770-773

    View details for Web of Science ID 000073759400016

    View details for PubMedID 9615458

  • High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups MOLECULAR BIOLOGY AND EVOLUTION Petrov, D. A., Hartl, D. L. 1998; 15 (3): 293-302


    We recently proposed that patterns of evolution of non-LTR retrotransposable elements can be used to study patterns of spontaneous mutation. Transposition of non-LTR retrotransposable elements commonly results in creation of 5' truncated, "dead-on-arrival" copies. These inactive copies are effectively pseudogenes and, according to the neutral theory, their molecular evolution ought to reflect rates and patterns of spontaneous mutation. Maximum parsimony can be used to separate the evolution of active lineages of a non-LTR element from the fate of the "dead-on-arrival" insertions and to directly assess the relative frequencies of different types of spontaneous mutations. We applied this approach using a non-LTR element, Helena, in the Drosophila virilis group and have demonstrated a surprisingly high incidence of large deletions and the virtual absence of insertions. Based on these results, we suggested that Drosophila in general may exhibit a high rate of spontaneous large deletions and have hypothesized that such a high rate of DNA loss may help to explain the puzzling dearth of bona fide pseudogenes in Drosophila. We also speculated that variation in the rate of spontaneous deletion may contribute to the divergence of genome size in different taxa by affecting the amount of superfluous "junk" DNA such as, for example, pseudogenes or long introns. In this paper, we extend our analysis to the D. melanogaster subgroup, which last shared a common ancestor with the D. virilis group approximately 40 MYA. In a different region of the same transposable element, Helena, we demonstrate that inactive copies accumulate deletions in species of the D. melanogaster subgroup at a rate very similar to that of the D. virilis group. These results strongly suggest that the high rate of DNA loss is a general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species group.

    View details for Web of Science ID 000072361600007

    View details for PubMedID 9501496

  • Trash DNA is what gets thrown away: high rate of DNA loss in Drosophila International-Society-of-Molecular-Evolution Symposium on Junk DNA - the Role and the Evolution of Non-Coding Sequences Petrov, D. A., Hartl, D. L. ELSEVIER SCIENCE BV. 1997: 279–89


    We have recently described a novel method of estimating neutral rates and patterns of spontaneous mutation (Petrov et al., 1996). This method takes advantage of the propensity of non-LTR retrotransposable elements to create non-functional, 'dead-on-arrival' copies as a product of transposition. Maximum parsimony analysis is used to separate the evolution of actively transposing lineages of a non-LTR element from the fate of individual inactive insertions, and thereby allows one to assess directly the relative rates of different types of mutation, including point substitutions, deletions and insertions. Because non-LTR elements enjoy wide phylogenetic distribution, this method can be used in taxa that do not harbor a significant number of bona fide pseudogenes, as is the case in Drosophila (Jeffs and Ashburner, 1991; Weiner et al., 1986). We used this method with Helena, a non-LTR retrotransposable element present in the Drosophila virilis species group. A striking finding was the virtual absence of insertions and remarkably high incidence of large deletions, which combine to produce a high overall rate of DNA loss. On average, the rate of DNA loss in D. virilis is approximately 75 times faster than that estimated for mammalian pseudogenes (Petrov et al., 1996). The high rate of DNA loss should lead to rapid elimination of non-essential DNA and thus may explain the seemingly paradoxical dearth of pseudogenes in Drosophila. Varying rates of DNA loss may also contribute to differences in genome size (Graur et al., 1989; Petrov et al., 1996), thus explaining the celebrated 'C-value' paradox (John and Miklos, 1988). In this paper we outline the theoretical basis of our method, examine the data from this perspective, and discuss potential problems that may bias our estimates.

    View details for Web of Science ID 000071411800030

    View details for PubMedID 9461402

  • Slow but steady: Reduction of genome size through biased mutation PLANT CELL Petrov, D. 1997; 9 (11): 1900-1901
  • High intrinsic: Rate of DNA loss in Drosophila NATURE Petrov, D. A., Lozovskaya, E. R., Hartl, D. L. 1996; 384 (6607): 346-349


    Pseudogenes are common in mammals but virtually absent in Drosophila. All putative Drosophila pseudogenes show patterns of molecular evolution that are inconsistent with the lack of functional constraints. The absence of bona fide pseudogenes is not only puzzling, it also hampers attempts to estimate rates and patterns of neutral DNA change. The estimation problem is especially acute in the case of deletions and insertions, which are likely to have large effects when they occur in functional genes and are therefore subject to strong purifying selection. We propose a solution to this problem by taking advantage of the propensity of retrotransposable elements without long terminal repeats (non-LTR) to create non-functional, 'dead-on-arrival' copies of themselves as a common by-product of their transpositional cycle. Phylogenetic analysis of a non-LTR element, Helena, demonstrates that copies lose DNA at an unusually high rate, suggesting that lack of pseudogenes in Drosophila is the product of rampant deletion of DNA in unconstrained regions. This finding has important implications for the study of genome evolution in general and the 'C-value paradox' in particular.

    View details for Web of Science ID A1996VV27100045

    View details for PubMedID 8934517

  • Triple-ligation strategy with advantages over directional cloning BIOTECHNIQUES Siegal, M. L., Petrov, D. A., DeAguiar, D. 1996; 21 (4): 614-?

    View details for Web of Science ID A1996VL40500009

    View details for PubMedID 8891209



    Transposable elements are a major source of genetic change, including the creation of novel genes, the alteration of gene expression in development, and the genesis of major genomic rearrangements. They are ubiquitous among contemporary organisms and probably as old as life itself. The long coexistence of transposable elements in the genome would be expected to be accompanied by host-element coevolution. Indeed, the important role of host factors in the regulation of transposable elements has been illuminated by recent studies of several systems in Drosophila. These include host factors that regulate the P element, a host mutation that renders the genome permissive for gypsy mobilization and infection, and newly induced mutations that affect the expression of transposon insertion mutations. The finding of a type of hybrid dysgenesis in D. virilis, in which multiple unrelated transposable elements are mobilized simultaneously, may also be relevant to host-factor regulation of transposition.

    View details for Web of Science ID A1995TJ92700009

    View details for PubMedID 8745075



    We describe a system of hybrid dysgenesis in Drosophila virilis in which at least four unrelated transposable elements are all mobilized following a dysgenic cross. The data are largely consistent with the superposition of at least three different systems of hybrid dysgenesis, each repressing a different transposable element, which break down following the hybrid cross, possibly because they share a common pathway in the host. The data are also consistent with a mechanism in which mobilization of a single element triggers that of others, perhaps through chromosome breakage. The mobilization of multiple, unrelated elements in hybrid dysgenesis is reminiscent of McClintock's evidence [McClintock, B. (1955) Brookhaven Symp. Biol. 8, 58-74] for simultaneous mobilization of different transposable elements in maize.

    View details for Web of Science ID A1995RP74800092

    View details for PubMedID 7644536

    View details for PubMedCentralID PMC41284



    Methods of genome analysis, including the cloning and manipulation of large fragments of DNA, have opened new strategies for uniting molecular evolutionary genetics with chromosome evolution. We have begun the development of a physical map of the genome of Drosophila virilis based on large DNA fragments cloned in bacteriophage P1. A library of 10,080 P1 clones with average insert sizes of 65.8 kb, containing approximately 3.7 copies of the haploid genome of D. virilis, has been constructed and characterized. Approximately 75% of the clones have inserts exceeding 50 kb, and approximately 25% have inserts exceeding 80 kb. A sample of 186 randomly selected clones was mapped by in situ hybridization with the salivary gland chromosomes. A method for identifying D. virilis clones containing homologs of D. melanogaster genes has also been developed using hybridization with specific probes obtained from D. melanogaster by means of the polymerase chain reaction. This method proved successful for nine of ten genes and resulted in the recovery of 14 clones. The hybridization patterns of a sample of P1 clones containing repetitive DNA were also determined. A significant fraction of these clones hybridizes to multiple euchromatic sites but not to the chromocenter, which is a pattern of hybridization that is very rare among clones derived from D. melanogaster. The materials and methods described will make it possible to carry out a direct study of molecular evolution at the level of chromosome structure and organization as well as at the level of individual genes.

    View details for Web of Science ID A1993KU57400005

    View details for PubMedID 8486077



    He-T sequences are a complex repetitive family of DNA sequences in Drosophila that are associated with telomeric regions, pericentromeric heterochromatin, and the Y chromosome. A component of the He-T family containing open reading frames (ORFs) is described. These ORF-containing elements within the He-T family are designated T-elements, since hybridization in situ with the polytene salivary gland chromosomes results in detectable signal exclusively at the chromosome tips. One T-element that has been sequenced includes ORFs of 1,428 and 1,614 bp. The ORFs are overlapping but one nucleotide out of frame with respect to each other. The longer ORF contains cysteine-histidine motifs strongly resembling nucleic acid binding domains of gag-like proteins, and the overall organization of the T-element ORFs is reminiscent of LINE elements. The T-elements are transcribed and appear to be conserved in Drosophila species related to D. melanogaster. The results suggest that T-elements may play a role in the structure and/or function of telomeres.

    View details for Web of Science ID A1992KB27500006

    View details for PubMedID 1291227



    Highly polymorphic segments of the human genome containing variable numbers of tandem repeats (VNTRs) have been widely used to establish DNA profiles of individuals for use in forensics. Methods of estimating the probability of occurrence of matching DNA profiles between two randomly selected individuals have been subject to extensive debate regarding the possibility of significant substructure occurring within the major races. We have sampled two Caucasian subpopulations, Finns and Italians, at four commonly used VNTR loci to determine the extent to which the subgroups differ from each other and from a mixed Caucasian database. The data were also analyzed for the occurrence of linkage disequilibrium among the loci. The allele frequency distributions of some loci were found to differ significantly among the subpopulations in a manner consistent with population substructure. Major differences were also found in the probability of occurrence of matching DNA profiles between two individuals chosen at random from the same subpopulation. With respect to the Finnish and Italian subpopulations, the conventional product rule for estimating the probability of a multilocus VNTR match using a mixed Caucasian database consistently yields estimates that are artificially small. Systematic errors of this type were not found using the interim ceiling principle recently advocated in the National Research Council's report [National Research Council (1992) DNA Technology in Forensic Science (Natl. Acad. Sci., Washington)]. The interim ceiling principle is based on currently available racial or ethnic databases and sets an arbitrary lower limit on each VNTR allele frequency. In the future the ceiling frequencies are expected to be established from more adequate data acquired for relevant VNTR loci from multiple subpopulations.

    View details for Web of Science ID A1992JY87400005

    View details for PubMedID 1438254