A Minimum Variance Clustering Approach Produces Robust and Interpretable Coarse-Grained Models
JOURNAL OF CHEMICAL THEORY AND COMPUTATION
2018; 14 (2): 1071–82
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models. The method utilizes agglomerative clustering with Ward's minimum variance objective function, and the similarity of the microstate dynamics is determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system and is robust toward long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a data set containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
View details for DOI 10.1021/acs.jctc.7b01004
View details for Web of Science ID 000425840100052
View details for PubMedID 29253336
Ward Clustering Improves Cross-Validated Markov State Models of Protein Folding.
Journal of chemical theory and computation
Markov state models (MSMs) are a powerful framework for analyzing protein dynamics. MSMs require the decomposition of conformation space into states via clustering, which can be cross-validated when a prediction method is available for the clustering method. We present an algorithm for predicting cluster assignments of new data points with Ward's minimum variance method. We then show that clustering with Ward's method produces better or equivalent cross-validated MSMs for protein folding than other clustering algorithms.
View details for DOI 10.1021/acs.jctc.6b01238
View details for PubMedID 28195713
Identification of simple reaction coordinates from complex dynamics.
journal of chemical physics
2017; 146 (4): 044109-?
Reaction coordinates are widely used throughout chemical physics to model and understand complex chemical transformations. We introduce a definition of the natural reaction coordinate, suitable for condensed phase and biomolecular systems, as a maximally predictive one-dimensional projection. We then show that this criterion is uniquely satisfied by a dominant eigenfunction of an integral operator associated with the ensemble dynamics. We present a new sparse estimator for these eigenfunctions which can search through a large candidate pool of structural order parameters and build simple, interpretable approximations that employ only a small number of these order parameters. Example applications with a small molecule's rotational dynamics and simulations of protein conformational change and folding show that this approach can filter through statistical noise to identify simple reaction coordinates from complex dynamics.
View details for DOI 10.1063/1.4974306
View details for PubMedID 28147508
View details for PubMedCentralID PMC5272828
MSMBuilder: Statistical Models for Biomolecular Dynamics
2017; 112 (1): 10-15
MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov state models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models and time-structure based independent component analysis. MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python application programming interface. MSMBuilder was developed with careful consideration for compatibility with the broader machine learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics, but is just as applicable to other computational or experimental time-series measurements.
View details for Web of Science ID 000392163500004
View details for PubMedID 28076801
View details for PubMedCentralID PMC5232355
Optimized parameter selection reveals trends in Markov state models for protein folding
JOURNAL OF CHEMICAL PHYSICS
2016; 145 (19)
As molecular dynamics simulations access increasingly longer time scales, complementary advances in the analysis of biomolecular time-series data are necessary. Markov state models offer a powerful framework for this analysis by describing a system's states and the transitions between them. A recently established variational theorem for Markov state models now enables modelers to systematically determine the best way to describe a system's dynamics. In the context of the variational theorem, we analyze ultra-long folding simulations for a canonical set of twelve proteins [K. Lindorff-Larsen et al., Science 334, 517 (2011)] by creating and evaluating many types of Markov state models. We present a set of guidelines for constructing Markov state models of protein folding; namely, we recommend the use of cross-validation and a kinetically motivated dimensionality reduction step for improved descriptions of folding dynamics. We also warn that precise kinetics predictions rely on the features chosen to describe the system and pose the description of kinetic uncertainty across ensembles of models as an open issue.
View details for DOI 10.1063/1.4967809
View details for Web of Science ID 000388956900007
View details for PubMedID 27875868
View details for PubMedCentralID PMC5116026
Impurity effects on solid-solid transitions in atomic clusters.
2016; 8 (43): 18326-18340
We use the harmonic superposition approach to examine how a single atom substitution affects low-temperature anomalies in the vibrational heat capacity (CV) of model nanoclusters. Each anomaly is linked to competing solidlike "phases", where crossover of the corresponding free energies defines a solid-solid transition temperature (Ts). For selected Lennard-Jones clusters we show that Ts and the corresponding CV peak can be tuned over a wide range by varying the relative atomic size and binding strength of the impurity, but excessive atom-size mismatch can destroy a transition and may produce another. In some tunable cases we find up to two additional CV peaks emerging below Ts, signalling one- or two-step delocalisation of the impurity within the ground-state geometry. Results for Ni74X and Au54X clusters (X = Au, Ag, Al, Cu, Ni, Pd, Pt, Pb), modelled by the many-body Gupta potential, further corroborate the possibility of tuning, engineering, and suppressing finite-system analogues of a solid-solid transition in nanoalloys.
View details for PubMedID 27775141