Julia Palacios
Associate Professor of Statistics and of Biomedical Data Science
Bio
Dr. Palacios seek to provide statistically rigorous answers to concrete, data driven questions in evolutionary genetics and public health . My research involves probabilistic modeling of evolutionary forces and the development of computationally tractable methods that are applicable to big data problems. Past and current research relies heavily on the theory of stochastic processes, Bayesian nonparametrics and recent developments in machine learning and statistical theory for big data.
Academic Appointments
-
Associate Professor, Statistics
-
Associate Professor, Department of Biomedical Data Science
-
Member, Bio-X
Honors & Awards
-
Frederick E. Terman Fellow 2017, Stanford University (2017-2019)
-
Alfred P. Sloan Research Fellowship 2018, Sloan Foundation (2018-2020)
Professional Education
-
Ph.D, University of Washington, Statistics (2013)
2024-25 Courses
- Introduction to Stochastic Processes I
STATS 217 (Win) - Workshop in Biostatistics
BIODS 260C, STATS 260C (Spr) -
Independent Studies (8)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIODS 299 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Independent Study
STATS 199 (Aut, Win, Spr, Sum) - Master's Research
CME 291 (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Ph.D. Research Rotation
CME 391 (Aut, Win, Spr, Sum) - Research
STATS 399 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
-
Prior Year Courses
2023-24 Courses
- Introduction to Statistical Inference
STATS 200 (Aut) - Introduction to Stochastic Processes I
STATS 217 (Win) - Workshop in Biostatistics
BIODS 260C, STATS 260C (Spr)
2022-23 Courses
- Theory of Statistics II
STATS 300B (Win) - Workshop in Biostatistics
BIODS 260C, STATS 260C (Spr)
2021-22 Courses
- Theory of Probability
STATS 116 (Aut) - Theory of Statistics II
STATS 300B (Win) - Workshop in Biostatistics
BIODS 260C, STATS 260C (Spr)
- Introduction to Statistical Inference
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Paula Gablenz, Amber Hu, Jimmy Smith -
Postdoctoral Faculty Sponsor
Isaac Goldstein, Bingjing Tang -
Doctoral Dissertation Advisor (AC)
Leda Liang, Julie Zhang
All Publications
-
Enumeration of binary trees compatible with a perfect phylogeny.
Journal of mathematical biology
2022; 84 (6): 54
Abstract
Evolutionary models used for describing molecular sequence variation suppose that at a non-recombining genomic segment, sequences share ancestry that can be represented as a genealogy-a rooted, binary, timed tree, with tips corresponding to individual sequences. Under the infinitely-many-sites mutation model, mutations are randomly superimposed along the branches of the genealogy, so that every mutation occurs at a chromosomal site that has not previously mutated; if a mutation occurs at an interior branch, then all individuals descending from that branch carry the mutation. The implication is that observed patterns of molecular variation from this model impose combinatorial constraints on the hidden state space of genealogies. In particular, observed molecular variation can be represented in the form of a perfect phylogeny, a tree structure that fully encodes the mutational differences among sequences. For a sample of n sequences, a perfect phylogeny might not possess n distinct leaves, and hence might be compatible with many possible binary tree structures that could describe the evolutionary relationships among the n sequences. Here, we investigate enumerative properties of the set of binary ranked and unranked tree shapes that are compatible with a perfect phylogeny, and hence, the binary ranked and unranked tree shapes conditioned on an observed pattern of mutations under the infinitely-many-sites mutation model. We provide a recursive enumeration of these shapes. We consider both perfect phylogenies that can be represented as binary and those that are multifurcating. The results have implications for computational aspects of the statistical inference of evolutionary parameters that underlie sets of molecular sequences.
View details for DOI 10.1007/s00285-022-01748-w
View details for PubMedID 35552538
-
The impact of the COVID-19 preventive measures on influenza transmission: molecular and epidemiological evidence.
International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases
1800
Abstract
OBJECTIVE: We quantify the impact of COVID-19-related control measures on the spread of human influenza virus H1N1 and H3N2.METHODS: We analyzed case numbers to estimate the length of the 2019-2020 influenza season and compare its length to the median of the previous nine seasons. In addition, we used influenza molecular data to compare within-region and between-region genetic diversity and effective population size from 2019 to 2020. Finally, we analyzed personal behavior data, and policy stringency data for each region.RESULTS: The 2019-2020 influenza season was shorter than the median of the previous nine seasons in all regions. For H1N1 and H3N2, there was an increase in between-region genetic diversity in almost all pairs of regions between 2019 and 2020. For 10 of 11 regions for H1N1 and 9 of 11 regions for H3N2, there was a decrease in within-region genetic diversity. For 10 of 13 regions for H1N1 and 3 of 7 regions for H3N2, there was a decrease in effective population size.CONCLUSIONS: We found consistent evidence of decrease in influenza incidence after the introduction of preventive measures due to COVID-19 emergence.
View details for DOI 10.1016/j.ijid.2021.12.323
View details for PubMedID 34902583
-
Adaptive Preferential Sampling in Phylodynamics With an Application to SARS-CoV-2
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
2021
View details for DOI 10.1080/10618600.2021.1987256
View details for Web of Science ID 000723471400001
-
Distance metrics for ranked evolutionary trees.
Proceedings of the National Academy of Sciences of the United States of America
2020
Abstract
Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked genealogies without leaf labels unlock opportunities in the analysis of evolutionary trees. In particular, comparisons between ranked genealogies facilitate the study of evolutionary processes of different organisms sampled at multiple time periods. We propose metrics on ranked tree shapes and ranked genealogies for lineages isochronously and heterochronously sampled. Our proposed tree metrics make it possible to conduct statistical analyses of ranked tree shapes and timed ranked tree shapes or ranked genealogies. Such analyses allow us to assess differences in tree distributions, quantify estimation uncertainty, and summarize tree distributions. We show the utility of our metrics via simulations and an application in infectious diseases.
View details for DOI 10.1073/pnas.1922851117
View details for PubMedID 33139566
-
SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING.
The annals of applied statistics
2020; 14 (2): 727-751
Abstract
Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state-space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data, which in this work are completely linked sequences of DNA at a non recombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the β-globin locus.
View details for DOI 10.1214/19-AOAS1313
View details for PubMedID 33995755
View details for PubMedCentralID PMC8118586
-
SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING
ANNALS OF APPLIED STATISTICS
2020; 14 (2): 727–51
View details for DOI 10.1214/19-AOAS1313
View details for Web of Science ID 000545338700009
-
Discussion on "Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories" by James R. Faulkner, Andrew F. Magee, Beth Shapiro, and Vladimir N. Minin.
Biometrics
2020
View details for DOI 10.1111/biom.13275
View details for PubMedID 32378742
-
Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees.
Genetics
2019; 213 (3): 967-986
Abstract
The large state space of gene genealogies is a major hurdle for inference methods based on Kingman's coalescent. Here, we present a new Bayesian approach for inferring past population sizes, which relies on a lower-resolution coalescent process that we refer to as "Tajima's coalescent." Tajima's coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima's Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman's coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.
View details for DOI 10.1534/genetics.119.302373
View details for PubMedID 33954685
-
Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees.
Genetics
2019; 213 (3): 967-986
Abstract
The large state space of gene genealogies is a major hurdle for inference methods based on Kingman's coalescent. Here, we present a new Bayesian approach for inferring past population sizes, which relies on a lower-resolution coalescent process that we refer to as "Tajima's coalescent." Tajima's coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima's Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman's coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.
View details for DOI 10.1534/genetics.119.302373
View details for PubMedID 33954667
-
Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees.
Genetics
2019
Abstract
The large state space of gene genealogies is a major hurdle for inference methods based on Kingman's coalescent. Here, we present a new Bayesian approach for inferring past population sizes which relies on a lower resolution coalescent process we refer to as "Tajima's coalescent". Tajima's coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima's Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated data and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman's coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.
View details for DOI 10.1534/genetics.119.302373
View details for PubMedID 31511299
-
Exact limits of inference in coalescent models.
Theoretical population biology
2018
Abstract
Recovery of population size history from molecular sequence data is an important problem in population genetics. Inference commonly relies on a coalescent model linking the population size history to genealogies. The high computational cost of estimating parameters from these models usually compels researchers to select a subset of the available data or to rely on insufficient summary statistics for statistical inference. We consider the problem of recovering the true population size history from two possible alternatives on the basis of coalescent time data previously considered by Kim etal. (2015). We improve upon previous results by giving exact expressions for the probability of correctly distinguishing between the two hypotheses as a function of the separation between the alternative size histories, the number of individuals, loci, and the sampling times. In more complicated settings we estimate the exact probability of correct recovery by Monte Carlo simulation. Our results give considerably more pessimistic inferential limits than those previously reported. We also extended our analyses to pairwise SMC and SMC' models of recombination. This work is relevant for optimal design when the inference goal is to test scientific hypotheses about population size trajectories in coalescent models with and without recombination.
View details for DOI 10.1016/j.tpb.2018.11.004
View details for PubMedID 30571959
-
No Evidence for Recent Selection at FOXP2 among Diverse Human Populations
CELL
2018; 174 (6): 1424-+
Abstract
FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.
View details for PubMedID 30078708
View details for PubMedCentralID PMC6128738
-
PHYLODYN: an R package for phylodynamic simulation and inference
MOLECULAR ECOLOGY RESOURCES
2017; 17 (1): 96-100
Abstract
We introduce phylodyn, an r package for phylodynamic analysis based on gene genealogies. The package's main functionality is Bayesian nonparametric estimation of effective population size fluctuations over time. Our implementation includes several Markov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference that have been developed in recent years. Genealogical data describe the timed ancestral relationships of individuals sampled from a population of interest. Here, individuals are assumed to be sampled at the same point in time (isochronous sampling) or at different points in time (heterochronous sampling); in addition, sampling events can be modelled with preferential sampling, which means that the intensity of sampling events is allowed to depend on the effective population size trajectory. We assume the coalescent and the sequentially Markov coalescent processes as generative models of genealogies. We include several coalescent simulation functions that are useful for testing our phylodynamics methods via simulation studies. We compare the performance and outputs of various methods implemented in phylodyn and outline their strengths and weaknesses. r package phylodyn is available at https://github.com/mdkarcher/phylodyn.
View details for DOI 10.1111/1755-0998.12630
View details for Web of Science ID 000390413500012
-
phylodyn: an R package for phylodynamic simulation and inference.
Molecular ecology resources
2016
Abstract
We introduce phylodyn, an r package for phylodynamic analysis based on gene genealogies. The package's main functionality is Bayesian nonparametric estimation of effective population size fluctuations over time. Our implementation includes several Markov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference that have been developed in recent years. Genealogical data describe the timed ancestral relationships of individuals sampled from a population of interest. Here, individuals are assumed to be sampled at the same point in time (isochronous sampling) or at different points in time (heterochronous sampling); in addition, sampling events can be modelled with preferential sampling, which means that the intensity of sampling events is allowed to depend on the effective population size trajectory. We assume the coalescent and the sequentially Markov coalescent processes as generative models of genealogies. We include several coalescent simulation functions that are useful for testing our phylodynamics methods via simulation studies. We compare the performance and outputs of various methods implemented in phylodyn and outline their strengths and weaknesses. r package phylodyn is available at https://github.com/mdkarcher/phylodyn.
View details for DOI 10.1111/1755-0998.12630
View details for PubMedID 27801980
-
Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference.
PLoS computational biology
2016; 12 (3)
Abstract
Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.
View details for DOI 10.1371/journal.pcbi.1004789
View details for PubMedID 26938243
-
An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics
BIOINFORMATICS
2015; 31 (20): 3282-3289
Abstract
The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost.To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm.The R code for all simulation studies and real data analysis conducted in this article are publicly available at http://www.ics.uci.edu/∼slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn.S.Lan@warwick.ac.uk or babaks@uci.eduSupplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btv378
View details for Web of Science ID 000362846600007
View details for PubMedID 26093147
-
Bayesian Nonparametric Inference of Population Size Changes from Sequential Genealogies
GENETICS
2015; 201 (1): 281-?
Abstract
Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model that allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum-likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90% of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.
View details for DOI 10.1534/genetics.115.177980
View details for Web of Science ID 000361206400021
View details for PubMedID 26224734
-
Phylogeography of the Trans-Volcanic bunchgrass lizard (Sceloporus bicanthalis) across the highlands of south-eastern Mexico
BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY
2013; 110 (4): 852-865
View details for DOI 10.1111/bij.12172
View details for Web of Science ID 000330183200012
-
Gaussian Process-Based Bayesian Nonparametric Inference of Population Size Trajectories from Gene Genealogies
BIOMETRICS
2013; 69 (1): 8-18
Abstract
Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.
View details for DOI 10.1111/biom.12003
View details for Web of Science ID 000317303500003
View details for PubMedID 23409705