Bio


Dr. Palacios seek to provide statistically rigorous answers to concrete, data driven questions in evolutionary genetics and public health . My research involves probabilistic modeling of evolutionary forces and the development of computationally tractable methods that are applicable to big data problems. Past and current research relies heavily on the theory of stochastic processes, Bayesian nonparametrics and recent developments in machine learning and statistical theory for big data.

Academic Appointments


Honors & Awards


  • Frederick E. Terman Fellow 2017, Stanford University (2017-2019)
  • Alfred P. Sloan Research Fellowship 2018, Sloan Foundation (2018-2020)

Professional Education


  • Ph.D, University of Washington, Statistics (2013)

2019-20 Courses


Stanford Advisees


All Publications


  • Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees. Genetics Palacios, J. A., Véber, A., Cappello, L., Wang, Z., Wakeley, J., Ramachandran, S. 2019

    Abstract

    The large state space of gene genealogies is a major hurdle for inference methods based on Kingman's coalescent. Here, we present a new Bayesian approach for inferring past population sizes which relies on a lower resolution coalescent process we refer to as "Tajima's coalescent". Tajima's coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima's Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated data and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman's coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.

    View details for DOI 10.1534/genetics.119.302373

    View details for PubMedID 31511299

  • Exact limits of inference in coalescent models. Theoretical population biology Johndrow, J. E., Palacios, J. A. 2018

    Abstract

    Recovery of population size history from molecular sequence data is an important problem in population genetics. Inference commonly relies on a coalescent model linking the population size history to genealogies. The high computational cost of estimating parameters from these models usually compels researchers to select a subset of the available data or to rely on insufficient summary statistics for statistical inference. We consider the problem of recovering the true population size history from two possible alternatives on the basis of coalescent time data previously considered by Kim etal. (2015). We improve upon previous results by giving exact expressions for the probability of correctly distinguishing between the two hypotheses as a function of the separation between the alternative size histories, the number of individuals, loci, and the sampling times. In more complicated settings we estimate the exact probability of correct recovery by Monte Carlo simulation. Our results give considerably more pessimistic inferential limits than those previously reported. We also extended our analyses to pairwise SMC and SMC' models of recombination. This work is relevant for optimal design when the inference goal is to test scientific hypotheses about population size trajectories in coalescent models with and without recombination.

    View details for DOI 10.1016/j.tpb.2018.11.004

    View details for PubMedID 30571959

  • No Evidence for Recent Selection at FOXP2 among Diverse Human Populations CELL Atkinson, E., Audesse, A., Palacios, J., Bobo, D., Webb, A., Ramachandran, S., Henn, B. 2018; 174 (6): 1424-+

    Abstract

    FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.

    View details for PubMedID 30078708

    View details for PubMedCentralID PMC6128738

  • PHYLODYN: an R package for phylodynamic simulation and inference MOLECULAR ECOLOGY RESOURCES Karcher, M. D., Palacios, J. A., Lan, S., Minin, V. N. 2017; 17 (1): 96-100

    Abstract

    We introduce phylodyn, an r package for phylodynamic analysis based on gene genealogies. The package's main functionality is Bayesian nonparametric estimation of effective population size fluctuations over time. Our implementation includes several Markov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference that have been developed in recent years. Genealogical data describe the timed ancestral relationships of individuals sampled from a population of interest. Here, individuals are assumed to be sampled at the same point in time (isochronous sampling) or at different points in time (heterochronous sampling); in addition, sampling events can be modelled with preferential sampling, which means that the intensity of sampling events is allowed to depend on the effective population size trajectory. We assume the coalescent and the sequentially Markov coalescent processes as generative models of genealogies. We include several coalescent simulation functions that are useful for testing our phylodynamics methods via simulation studies. We compare the performance and outputs of various methods implemented in phylodyn and outline their strengths and weaknesses. r package phylodyn is available at https://github.com/mdkarcher/phylodyn.

    View details for DOI 10.1111/1755-0998.12630

    View details for Web of Science ID 000390413500012

  • phylodyn: an R package for phylodynamic simulation and inference. Molecular ecology resources Karcher, M. D., Palacios, J. A., Lan, S., Minin, V. N. 2016

    Abstract

    We introduce phylodyn, an r package for phylodynamic analysis based on gene genealogies. The package's main functionality is Bayesian nonparametric estimation of effective population size fluctuations over time. Our implementation includes several Markov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference that have been developed in recent years. Genealogical data describe the timed ancestral relationships of individuals sampled from a population of interest. Here, individuals are assumed to be sampled at the same point in time (isochronous sampling) or at different points in time (heterochronous sampling); in addition, sampling events can be modelled with preferential sampling, which means that the intensity of sampling events is allowed to depend on the effective population size trajectory. We assume the coalescent and the sequentially Markov coalescent processes as generative models of genealogies. We include several coalescent simulation functions that are useful for testing our phylodynamics methods via simulation studies. We compare the performance and outputs of various methods implemented in phylodyn and outline their strengths and weaknesses. r package phylodyn is available at https://github.com/mdkarcher/phylodyn.

    View details for DOI 10.1111/1755-0998.12630

    View details for PubMedID 27801980

  • Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference. PLoS computational biology Karcher, M. D., Palacios, J. A., Bedford, T., Suchard, M. A., Minin, V. N. 2016; 12 (3)

    Abstract

    Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.

    View details for DOI 10.1371/journal.pcbi.1004789

    View details for PubMedID 26938243

  • An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics BIOINFORMATICS Lan, S., Palacios, J. A., Karcher, M., Minin, V. N., Shahbaba, B. 2015; 31 (20): 3282-3289

    Abstract

    The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost.To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm.The R code for all simulation studies and real data analysis conducted in this article are publicly available at http://www.ics.uci.edu/∼slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn.S.Lan@warwick.ac.uk or babaks@uci.eduSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btv378

    View details for Web of Science ID 000362846600007

    View details for PubMedID 26093147

  • Bayesian Nonparametric Inference of Population Size Changes from Sequential Genealogies GENETICS Palacios, J. A., Wakeley, J., Ramachandran, S. 2015; 201 (1): 281-?

    Abstract

    Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model that allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum-likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90% of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.

    View details for DOI 10.1534/genetics.115.177980

    View details for Web of Science ID 000361206400021

    View details for PubMedID 26224734

  • Phylogeography of the Trans-Volcanic bunchgrass lizard (Sceloporus bicanthalis) across the highlands of south-eastern Mexico BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY Leache, A. D., Palacios, J. A., Minin, V. N., Bryson, R. W. 2013; 110 (4): 852-865

    View details for DOI 10.1111/bij.12172

    View details for Web of Science ID 000330183200012

  • Gaussian Process-Based Bayesian Nonparametric Inference of Population Size Trajectories from Gene Genealogies BIOMETRICS Palacios, J. A., Minin, V. N. 2013; 69 (1): 8-18

    Abstract

    Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.

    View details for DOI 10.1111/biom.12003

    View details for Web of Science ID 000317303500003

    View details for PubMedID 23409705