I use methods from statistical inference and physics to improve RNA structure prediction methods, with the goal of someday making complex RNA machines like in vivo neural nets a reality.
Honors & Awards
Churchill Scholar, Sir Winston Churchill Foundation of the USA (2015-2016)
Education & Certifications
M.Phil., University of Cambridge, Chemistry (2016)
B.A., Pomona College, Chemistry and Mathematics double major, Music minor (2015)
Evaluating riboswitch optimality.
Methods in enzymology
2019; 623: 417–50
Riboswitches are RNA elements that recognize diverse chemical and biomolecular inputs, and transduce this recognition process to genetic, fluorescent, and other engineered outputs using RNA conformational changes. These systems are pervasive in cellular biology and are a promising biotechnology with applications in genetic regulation and biosensing. Here, we derive a simple expression bounding the activation ratio-the proportion of RNA in the active vs. inactive states-for both ON and OFF riboswitches that operate near thermodynamic equilibrium: 1+[I]/KdI, where [I] is the input ligand concentration and KdI is the intrinsic dissociation constant of the aptamer module toward the input ligand. A survey of published studies of natural and synthetic riboswitches confirms that the vast majority of empirically measured activation ratios have remained well below this thermodynamic limit. A few natural and synthetic riboswitches achieve activation ratios close to the limit, and these molecules highlight important principles for achieving high riboswitch performance. For several applications, including "light-up" fluorescent sensors and chemically-controlled CRISPR/Cas complexes, the thermodynamic limit has not yet been achieved, suggesting that current tools are operating at suboptimal efficiencies. Future riboswitch studies will benefit from comparing observed activation ratios to this simple expression for the optimal activation ratio. We present experimental and computational suggestions for how to make these quantitative comparisons and suggest new molecular mechanisms that may allow non-equilibrium riboswitches to surpass the derived limit.
View details for DOI 10.1016/bs.mie.2019.05.028
View details for PubMedID 31239056
A modular DNA scaffold to study protein-protein interactions at single-molecule resolution.
The residence time of a drug on its target has been suggested as a more pertinent metric of therapeutic efficacy than the traditionally used affinity constant. Here, we introduce junctured-DNA tweezers as a generic platform that enables real-time observation, at the single-molecule level, of biomolecular interactions. This tool corresponds to a double-strand DNA scaffold that can be nanomanipulated and on which proteins of interest can be engrafted thanks to widely used genetic tagging strategies. Thus, junctured-DNA tweezers allow a straightforward and robust access to single-molecule force spectroscopy in drug discovery, and more generally in biophysics. Proof-of-principle experiments are provided for the rapamycin-mediated association between FKBP12 and FRB, a system relevant in both medicine and chemical biology. Individual interactions were monitored under a range of applied forces and temperatures, yielding after analysis the characteristic features of the energy profile along the dissociation landscape.
View details for DOI 10.1038/s41565-019-0542-7
View details for PubMedID 31548690
Note: Variational encoding of protein dynamics benefits from maximizing latent autocorrelation.
The Journal of chemical physics
2018; 149 (21): 216101
As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the time scale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We provide evidence that the VDE framework [Hernandez et al., Phys. Rev. E 97, 062412 (2018)], which uses this autocorrelation loss along with a time-lagged reconstruction loss, obtains a variationally optimized latent coordinate in comparison with related loss functions. We thus recommend leveraging the autocorrelation of the latent space while training neural network models of biomolecular simulation data to better represent slow processes.
View details for PubMedID 30525733
Constructing interpretable computational models of protein dynamics using information theory and variance minimization
AMER CHEMICAL SOC. 2018
View details for Web of Science ID 000447609106319
Variational encoding of complex dynamics
PHYSICAL REVIEW E
2018; 97 (6): 062412
Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged covariate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear dynamics into just a few characteristic components. Recent work in the field of deep learning has led to the development of the variational autoencoder (VAE), which is able to compress complex datasets into simpler manifolds. We present the use of a time-lagged VAE, or variational dynamics encoder (VDE), to reduce complex, nonlinear processes to a single embedding with high fidelity to the underlying dynamics. We demonstrate how the VDE is able to capture nontrivial dynamics in a variety of examples, including Brownian dynamics and atomistic protein folding. Additionally, we demonstrate a method for analyzing the VDE model, inspired by saliency mapping, to determine what features are selected by the VDE model to describe dynamics. The VDE presents an important step in applying techniques from deep learning to more accurately model and interpret complex biophysics.
View details for PubMedID 30011547
Transferable Neural Networks for Enhanced Sampling of Protein Dynamics
JOURNAL OF CHEMICAL THEORY AND COMPUTATION
2018; 14 (4): 1887–94
Variational autoencoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single nonlinear embedding. In this work, we illustrate how this nonlinear latent embedding can be used as a collective variable for enhanced sampling and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning about a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable, enabling us to probe the effects of variation in increasingly large systems of biophysical interest.
View details for DOI 10.1021/acs.jctc.8b00025
View details for Web of Science ID 000430023200007
View details for PubMedID 29529369
On the Origins of Regulated Disorder within the C-Terminus of P53
CELL PRESS. 2018: 428A
View details for Web of Science ID 000430450000637
Hierarchical Clustering of Markov State Models Reveals Sequence Effects in p53-CTD Dynamic Behavior
CELL PRESS. 2018: 561A
View details for Web of Science ID 000430563200555
A Minimum Variance Clustering Approach Produces Robust and Interpretable Coarse-Grained Models
JOURNAL OF CHEMICAL THEORY AND COMPUTATION
2018; 14 (2): 1071–82
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models. The method utilizes agglomerative clustering with Ward's minimum variance objective function, and the similarity of the microstate dynamics is determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system and is robust toward long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a data set containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
View details for PubMedID 29253336
Investigating the role of boundary bricks in DNA brick self-assembly
2017; 13 (8): 1670-1680
In the standard DNA brick set-up, distinct 32-nucleotide strands of single-stranded DNA are each designed to bind specifically to four other such molecules. Experimentally, it has been demonstrated that the overall yield is increased if certain bricks which occur on the outer faces of target structures are merged with adjacent bricks. However, it is not well understood by what mechanism such 'boundary bricks' increase the yield, as they likely influence both the nucleation process and the final stability of the target structure. Here, we use Monte Carlo simulations with a patchy particle model of DNA bricks to investigate the role of boundary bricks in the self-assembly of complex multicomponent target structures. We demonstrate that boundary bricks lower the free-energy barrier to nucleation and that boundary bricks on edges stabilize the final structure. However, boundary bricks are also more prone to aggregation, as they can stabilize partially assembled intermediates. We explore some design strategies that permit us to benefit from the stabilizing role of boundary bricks whilst minimizing their ability to hinder assembly; in particular, we show that maximizing the total number of boundary bricks is not an optimal strategy.
View details for DOI 10.1039/c6sm02719a
View details for Web of Science ID 000396026200016
View details for PubMedID 28165104