All Publications


  • The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder. eLife Weibel, C. A., Wheeler, A. L., James, J. E., Willis, S. M., McShea, H., Masel, J. 2024; 12

    Abstract

    The nearly neutral theory of molecular evolution posits variation among species in the effectiveness of selection. In an idealized model, the census population size determines both this minimum magnitude of the selection coefficient required for deleterious variants to be reliably purged, and the amount of neutral diversity. Empirically, an 'effective population size' is often estimated from the amount of putatively neutral genetic diversity and is assumed to also capture a species' effectiveness of selection. A potentially more direct measure of the effectiveness of selection is the degree to which selection maintains preferred codons. However, past metrics that compare codon bias across species are confounded by among-species variation in %GC content and/or amino acid composition. Here, we propose a new Codon Adaptation Index of Species (CAIS), based on Kullback-Leibler divergence, that corrects for both confounders. We demonstrate the use of CAIS correlations, as well as the Effective Number of Codons, to show that the protein domains of more highly adapted vertebrate species evolve higher intrinsic structural disorder.

    View details for DOI 10.7554/eLife.87335

    View details for PubMedID 39239703

  • The effectiveness of selection in a species affects the direction of amino acid frequency evolution. bioRxiv : the preprint server for biology McShea, H., Weibel, C., Wehbi, S., Goodman, P., James, J. E., Wheeler, A. L., Masel, J. 2024

    Abstract

    Nearly neutral theory predicts that species with higher effective population size (N e ) are better able to purge slightly deleterious mutations. We compare evolution in high-N e vs. low-N e vertebrates to reveal which amino acid frequencies are subject to subtle selective preferences. We take three complementary approaches, two measuring flux and one measuring outcomes. First, we fit non-stationary substitution models of amino acid flux using maximum likelihood, comparing the high-N e clade of rodents and lagomorphs to its low-N e sister clade of primates and colugos. Second, we compare evolutionary outcomes across a wider range of vertebrates, via correlations between amino acid frequencies and N e . Third, we dissect the details of flux in human, chimpanzee, mouse, and rat, as scored by parsimony - this also enables comparison to a historical paper. All three methods agree on which amino acids are preferred under more effective selection. Preferred amino acids tend to be smaller, less costly to synthesize, and to promote intrinsic structural disorder. Parsimony-induced bias in the historical study produces an apparent reduction in structural disorder, perhaps driven by slightly deleterious substitutions. Within highly exchangeable pairs of amino acids, arginine is strongly preferred over lysine, and valine over isoleucine, consistent with more effective selection preferring a marginally larger free energy of folding. These two preferences match differences between thermophiles and mesophilic relatives. These results reveal the biophysical consequences of mutation-selection-drift balance, and demonstrate the utility of nearly neutral theory for understanding protein evolution.

    View details for DOI 10.1101/2023.02.01.526552

    View details for PubMedID 38948853

    View details for PubMedCentralID PMC11212923

  • nQMaker: estimating time non-reversible amino acid substitution models. Systematic biology Dang, C. C., Minh, B. Q., McShea, H., Masel, J., James, J. E., Vinh, L. S., Lanfear, R. 2022

    Abstract

    Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All commonly-used amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this paper, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time non-reversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the non-reversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of datasets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the dataset. Notably, for the recently published plant and bird trees, these non-reversible models correctly recovered the commonly estimated root placements with very high statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate non-reversible models and rooted phylogenies from their own protein datasets. The datasets and scripts used in this paper are available at https://doi.org/10.6084/m9.figshare.14516712.

    View details for DOI 10.1093/sysbio/syac007

    View details for PubMedID 35139203

  • Reconstructing the evolutionary history of nitrogenases: Evidence for ancestral molybdenum-cofactor utilization. Geobiology Garcia, A. K., McShea, H., Kolaczkowski, B., Kacar, B. 2020

    Abstract

    The nitrogenase metalloenzyme family, essential for supplying fixed nitrogen to the biosphere, is one of life's key biogeochemical innovations. The three forms of nitrogenase differ in their metal dependence, each binding either a FeMo-, FeV-, or FeFe-cofactor where the reduction of dinitrogen takes place. The history of nitrogenase metal dependence has been of particular interest due to the possible implication that ancient marine metal availabilities have significantly constrained nitrogenase evolution over geologic time. Here, we reconstructed the evolutionary history of nitrogenases, and combined phylogenetic reconstruction, ancestral sequence inference, and structural homology modeling to evaluate the potential metal dependence of ancient nitrogenases. We find that active-site sequence features can reliably distinguish extant Mo-nitrogenases from V- and Fe-nitrogenases and that inferred ancestral sequences at the deepest nodes of the phylogeny suggest these ancient proteins most resemble modern Mo-nitrogenases. Taxa representing early-branching nitrogenase lineages lack one or more biosynthetic nifE and nifN genes that both contribute to the assembly of the FeMo-cofactor in studied organisms, suggesting that early Mo-nitrogenases may have utilized an alternate and/or simplified pathway for cofactor biosynthesis. Our results underscore the profound impacts that protein-level innovations likely had on shaping global biogeochemical cycles throughout the Precambrian, in contrast to organism-level innovations that characterize the Phanerozoic Eon.

    View details for DOI 10.1111/gbi.12381

    View details for PubMedID 32065506