Michael Levitt
Robert W. and Vivian K. Cahill Professor of Cancer Research
Structural Biology
Web page: http://csb.stanford.edu/levitt
Bio
The world-wide COVID-19 corona virus pandemic has hijacked all our academic attention. Please see Presentations.
Is it possible to understand the molecular structure and function of proteins and nucleic acids in enough detail to make accurate predictions about structure and function? We are mounting a two-pronged attack on this problem using both molecular dynamics simulation and molecular modeling. (i) Simulation attempts to reproduce the structural, thermodynamic and dynamic properties of a macromolecule in as accurate a way as possible. Starting with simple but realistic expressions for the interactions between atoms and classical laws of motion, we calculate a trajectory that specifies the position and velocity of every atom as a function of time. The time-step between calculated structures is small at 10-15 seconds, and we need to reduce hundreds of thousands of sets of atomic coordinates into a simple coherent description. We have simulated with reasonable fidelity the measurable static and dynamic properties of the several different proteins surrounded by thousands of water molecules. Simulation at different temperatures has allowed exploration of the pathways of protein denaturation of entire proteins and small fragments of protein secondary structure (alpha-helices and beta-hairpins). Companion studies of DNA double-helix segments in solution preserve the classical double helix while still showing a wide repertoire of interesting motions. (ii) Molecular modeling attempts to build a model of a macromolecule using known three-dimensional structures and energy minimization as complementary guidelines. Specific examples of this work include the automatic modeling of antibody variable domains, the general modeling of homologous proteins and studies of DNA base-pair mismatches. Questions we are trying to answer include: How can a protein be stabilized by a single amino acid change? How does the sequence of DNA cause local variations of double-helix conformation and stability? Extensive use is made of sophisticated programming, sequence and structural data bases, and computer graphics.
Academic Appointments
-
Professor, Structural Biology
-
Member, Bio-X
-
Member, Wu Tsai Neurosciences Institute
Administrative Appointments
-
Chair, Department of Structural Biology (1993 - 2004)
-
Associate Chair, Department of Structural Biology (2005 - 2010)
Honors & Awards
-
Nobel Prize in Chemistry, Nobel Foundation (2013)
-
Member, American Academy of Arts & Sciences (2010)
-
Blaise Pascal Professor of Research, Fondation de l'Ecole Normale Superieure, Paris, France (2003-2004)
-
Member, Editorial Board Proc. Natl. Acad. Sci. USA (2002)
-
Member, The US National Academy of Science (2002)
-
Fellow, The Royal Society (2001)
-
Co-director of Program in Mathematics and Molecular Biology, Mathematics and Molecular Biology (1997-2002)
-
Anniversary Prize, Federation of European Biochemical Societies (1986)
-
Member, European Molecular Biology Organization. (1981)
Boards, Advisory Committees, Professional Organizations
-
Member, National Academy of Sciences (2013 - Present)
Professional Education
-
PhD, Gonville and Caius College, Cambridge, Structural Biology (1971)
Current Research and Scholarly Interests
I pioneered of computational biology setting up the conceptual and theoretical framework for a field that I am still actively involved in at all levels. More specifically, I still write and maintain computer programs of all types including large simulation packages and molecular graphics interfaces. I have also developed a high-level of expertise in Perl scripting, as well as in the advanced use of the Office Suite of programs (Word, Excel and PowerPoint), which is more important and rare than it may seem. My research focuses on three different but inter-related areas of research. First, we are interested in predicting the folding of a polypeptide chain into a protein with a unique native-structure with particular emphasis on how the hydrophobic forces affect the pathway. We expect hydrophobic interactions to energetically favor structure that are more native-like. In this way, the same stabilizing interactions that exist in the final folded state the search tractable. Second we are interested in predicting protein structure from sequence without regard for the process of folding. Such prediction relies on the well-established paradigms that similar protein sequences imply similar three-dimensional structures. We have focused on the hardest problem in homology modeling: the refinement of a near-native structure to make it more precisely like the actual native structure of protein. We have also focused on how the general similarity of all protein sequences resulting from their evolution from common ancestor sequence affects the nature of the protein universe. Third, we are focusing on mesoscale modeling of large macromolecular complexes such as RNA polymerase and the mammalian chaperonin. In this work, done in close collaboration with experimentalists, we use new morphing strategies combined with normal mode analysis in torsion angle space to overcome problems caused by the size and complexity of these critical, biomedically important systems. All this work depends on the way a molecular structure is represented in terms of the force-field that allows calculation of the potential energy of the system. We employ a very wide variety of such energy functions that extend from knowledge-based statistical potentials for a single interaction center per residue to quantum-mechanical force-fields that include inductive effects as well as polarization.
2024-25 Courses
-
Independent Studies (23)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr, Sum) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390B (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390C (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Directed Reading in Biophysics
BIOPHYS 399 (Aut, Win, Spr, Sum) - Directed Reading in Structural Biology
SBIO 299 (Aut, Win, Spr, Sum) - Graduate Research
BIOPHYS 300 (Aut, Win, Spr, Sum) - Graduate Research
SBIO 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399P (Aut, Win, Spr, Sum) - Independent Work
CS 199 (Aut, Win, Spr, Sum) - Independent Work
CS 199P (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Medical Scholars Research
SBIO 370 (Aut, Win, Spr, Sum) - Part-time Curricular Practical Training
CS 390D (Aut, Win, Spr, Sum) - Programming Service Project
CS 192 (Aut, Win, Spr, Sum) - Research
PHYSICS 490 (Aut, Win, Spr, Sum) - Senior Project
CS 191 (Aut, Win, Spr, Sum) - Undergraduate Research
SBIO 199 (Aut, Win, Spr, Sum) - Writing Intensive Senior Research Project
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
Graduate and Fellowship Programs
-
Biomedical Data Science (Phd Program)
All Publications
-
Single-residue effects on the behavior of a nascent polypeptide chain inside the ribosome exit tunnel.
bioRxiv : the preprint server for biology
2024
Abstract
Nascent polypeptide chains (NCs) are extruded from the ribosome through an exit tunnel (ET) traversing the large ribosomal subunit. The ET's irregular and chemically complex wall allows for various NC-ET interactions. Translational arrest peptides (APs) bind in the ET to induce translational arrest, a property that can be exploited to study NC-ET interactions by Force Profile Analysis (FPA). We employed FPA and molecular dynamics (MD) simulations to investigate how individual residues placed in a glycine-serine repeat segment within an AP-stalled NC interact with the ET to exert a pulling force on the AP and release stalling. Our results indicate that large and hydrophobic residues generate a pulling force on the NC when placed ≳10 residues away from the peptidyl transfer center (PTC). Moreover, an asparagine placed 12 residues from the PTC makes a specific stabilizing interaction with the tip of ribosomal protein uL22 that reduces the pulling force on the NC, while a lysine or leucine residue in the same position increases the pulling force. Finally, the MD simulations suggest how the Mannheimia succiniproducens SecM AP interacts with the ET to promote translational stalling.
View details for DOI 10.1101/2024.08.20.608737
View details for PubMedID 39229094
View details for PubMedCentralID PMC11370347
-
Panel stacking is a threat to consensus statement validity.
Journal of clinical epidemiology
2024: 111428
Abstract
Consensus statements can be very influential in medicine and public health. Some of these statements use systematic evidence synthesis but others fail on this front. Many consensus statements use panels of experts to deduce perceived consensus through Delphi processes. We argue that stacking of panel members towards one particular position or narrative is a major threat, especially in absence of systematic evidence review. Stacking may involve financial conflicts of interest, but non-financial conflicts of strong advocacy can also cause major bias. Given their emerging importance, we describe here how such consensus statements may be misleading, by analysing in depth a recent high-impact Delphi consensus statement on COVID-19 recommendations as a case example. We demonstrate that many of the selected panel members and at least 35% of the core panel members had advocated towards COVID-19 elimination (zero-COVID) during the pandemic and were leading members of aggressive advocacy groups. These advocacy conflicts were not declared in the Delphi consensus publication, with rare exceptions. Therefore, we propose that consensus statements should always require rigorous evidence synthesis and maximal transparency on potential biases towards advocacy or lobbyist groups to be valid. While advocacy can have many important functions, its biased impact on consensus panels should be carefully avoided.
View details for DOI 10.1016/j.jclinepi.2024.111428
View details for PubMedID 38897481
-
The Determination of Free Energy of Hydration of Water Ions from First Principles.
Journal of chemical theory and computation
2024
Abstract
We model the autoionization of water by determining the free energy of hydration of the major intermediate species of water ions. We represent the smallest ions─the hydroxide ion OH-, the hydronium ion H3O+, and the Zundel ion H5O2+─by bonded models and the more extended ionic structures by strong nonbonded interactions (e.g., the Eigen H9O4+ = H3O+ + 3(H2O) and the Stoyanov H13O6+ = H5O2+ + 4(H2O)). Our models are faithful to the precise QM energies and their components to within 1% or less. Using the calculated free energies and atomization energies, we compute the pKa of pure water from first principles as a consistency check and arrive at a value within 1.3 log units of the experimental one. From these calculations, we conclude that the hydronium ion, and its hydrated state, the Eigen cation, are the dominant species in the water autoionization process.
View details for DOI 10.1021/acs.jctc.3c01411
View details for PubMedID 38842599
-
Neural Network Corrections to Intermolecular Interaction Terms of a Molecular Force Field Capture Nuclear Quantum Effects in Calculations of Liquid Thermodynamic Properties.
Journal of chemical theory and computation
2024
Abstract
We incorporate nuclear quantum effects (NQE) in condensed matter simulations by introducing short-range neural network (NN) corrections to the ab initio fitted molecular force field ARROW. Force field NN corrections are fitted to average interaction energies and forces of molecular dimers, which are simulated using the Path Integral Molecular Dynamics (PIMD) technique with restrained centroid positions. The NN-corrected force field allows reproduction of the NQE for computed liquid water and methane properties such as density, radial distribution function (RDF), heat of evaporation (HVAP), and solvation free energy. Accounting for NQE through molecular force field corrections circumvents the need for explicit computationally expensive PIMD simulations in accurate calculations of the properties of chemical and biological systems. The accuracy and locality of pairwise NN NQE corrections indicate that this approach could be applicable to complex heterogeneous systems, such as proteins.
View details for DOI 10.1021/acs.jctc.3c00921
View details for PubMedID 38240485
-
Combining Force Fields and Neural Networks for an Accurate Representation of Bonded Interactions.
The journal of physical chemistry. A
2024
Abstract
We present a formalism of a neural network encoding bonded interactions in molecules. This intramolecular encoding is consistent with the models of intermolecular interactions previously designed by this group. Variants of the encoding fed into a corresponding neural network may be used to economically improve the representation of torsional degrees of freedom in any force field. We test the accuracy of the reproduction of the ab initio potential energy surface on a set of conformations of two dipeptides, methyl-capped ALA and ASP, in several scenarios. The encoding, either alone or in conjunction with an analytical potential, improves agreement with ab initio energies that are on par with those of other neural network-based potentials. Using the encoding and neural nets in tandem with an analytical model places the agreements firmly within "chemical accuracy" of ±0.5 kcal/mol.
View details for DOI 10.1021/acs.jpca.3c07598
View details for PubMedID 38232765
-
Variability in excess deaths across countries with different vulnerability during 2020-2023.
Proceedings of the National Academy of Sciences of the United States of America
2023; 120 (49): e2309557120
Abstract
Excess deaths provide total impact estimates of major crises, such as the COVID-19 pandemic. We evaluated excess death trajectories across countries with accurate death registration and population age structure data and assessed relationships with vulnerability indicators. Using the Human Mortality Database on 34 countries, excess deaths were calculated for 2020-2023 (to week 29, 2023) using 2017-2019 as reference, with adjustment for 5 age strata. Countries were divided into less and more vulnerable; the latter had per capita nominal GDP < $30,000, Gini > 0.35 for income inequality and/or at least ≥2.5% of their population living in poverty. Excess deaths (as proportion of expected deaths, p%) were inversely correlated with per capita GDP (r = -0.60), correlated with proportion living in poverty (r = 0.66), and modestly correlated with income inequality (r = 0.45). Incidence rate ratio for deaths was 1.062 (95% CI, 1.038-1.087) in more versus less vulnerable countries. Excess deaths started deviating in the two groups after the first wave. Between-country heterogeneity diminished gradually within each group. Less vulnerable countries had mean p% = -0.8% and 0.4% in 0-64 and >65-y-old strata. More vulnerable countries had mean p% = 7.0% and 7.2%, respectively. Lower death rates were seen in children of age 0-14 y during 2020-2023 versus prepandemic years. While the pandemic hit some countries earlier than others, country vulnerability dominated eventually the cumulative impact. Half the analyzed countries witnessed no substantial excess deaths versus prepandemic levels, while the others suffered major death tolls.
View details for DOI 10.1073/pnas.2309557120
View details for PubMedID 38019858
-
Combining Force Fields and Neural Networks for an Accurate Representation of Chemically Diverse Molecular Interactions.
Journal of the American Chemical Society
2023
Abstract
A key goal of molecular modeling is the accurate reproduction of the true quantum mechanical potential energy of arbitrary molecular ensembles with a tractable classical approximation. The challenges are that analytical expressions found in general purpose force fields struggle to faithfully represent the intermolecular quantum potential energy surface at close distances and in strong interaction regimes; that the more accurate neural network approximations do not capture crucial physics concepts, e.g., nonadditive inductive contributions and application of electric fields; and that the ultra-accurate narrowly targeted models have difficulty generalizing to the entire chemical space. We therefore designed a hybrid wide-coverage intermolecular interaction model consisting of an analytically polarizable force field combined with a short-range neural network correction for the total intermolecular interaction energy. Here, we describe the methodology and apply the model to accurately determine the properties of water, the free energy of solvation of neutral and charged molecules, and the binding free energy of ligands to proteins. The correction is subtyped for distinct chemical species to match the underlying force field, to segment and reduce the amount of quantum training data, and to increase accuracy and computational speed. For the systems considered, the hybrid ab initio parametrized Hamiltonian reproduces the two-body dimer quantum mechanics (QM) energies to within 0.03 kcal/mol and the nonadditive many-molecule contributions to within 2%. Simulations of molecular systems using this interaction model run at speeds of several nanoseconds per day.
View details for DOI 10.1021/jacs.3c07628
View details for PubMedID 37856313
-
Variability in excess deaths across countries with different vulnerability during 2020-2023.
medRxiv : the preprint server for health sciences
2023
Abstract
Excess deaths provide total impact estimates of major crises, such as the COVID-19 pandemic. We evaluated excess death's trajectories during 2020-2023 across countries with accurate death registration and population age structure data; and assessed relationships with economic indicators of vulnerability. Using the Human Mortality Database on 34 countries, excess deaths were calculated for 2020-2023 (to week 29, 2023) using 2017-2019 as reference, with weekly expected death calculations and adjustment for 5 age strata. Countries were divided into less and more vulnerable; the latter had per capita nominal GDP<$30,000, Gini>0.35 for income inequality and/or at least 2.5% of their population living in poverty. Excess deaths (as proportion of expected deaths, p%) were inversely correlated with per capita GDP (r=-0.60), correlated with proportion living in poverty (r=0.66) and modestly correlated with income inequality (r=0.45). Incidence rate ratio for deaths was 1.06 (95% confidence interval, 1.04-1.08) in the more versus less vulnerable countries. Excess deaths started deviating in the two groups after the first wave. Between-country heterogeneity diminished over time within each of the two groups. Less vulnerable countries had mean p%=-0.8% and 0.4% in 0-64 and >65 year-old strata while more vulnerable countries had mean p%=7.0% and 7.2%, respectively. Usually lower death rates were seen in children 0-14 years old during 2020-2023 versus pre-pandemic years. While the pandemic hit some countries earlier than others, country vulnerability dominated eventually the cumulative impact. Half of the analyzed countries witnessed no substantial excess deaths versus pre-pandemic levels, while the other half suffered major death tolls.
View details for DOI 10.1101/2023.04.24.23289066
View details for PubMedID 37162934
View details for PubMedCentralID PMC10168510
-
What Really Happened During the Massive SARS-CoV-2 Omicron Wave in China?
JAMA internal medicine
2023
Abstract
This Viewpoint discusses reports from China after its zero COVID-19 policy ended in December 2022.
View details for DOI 10.1001/jamainternmed.2023.1547
View details for PubMedID 37184847
-
Flaws and uncertainties in pandemic global excess death calculations.
European journal of clinical investigation
2023: e14008
Abstract
Several teams have been publishing global estimates of excess deaths during the COVID-19 pandemic. Here, we examine potential flaws and underappreciated sources of uncertainty in global excess death calculations. Adjusting for changing population age structure is essential. Otherwise, excess deaths are markedly overestimated in countries with increasingly aging populations. Adjusting for changes in other high-risk indicators, such as residence in long-term facilities, may also make a difference. Death registration is highly incomplete in most countries; completeness corrections should allow for substantial uncertainty and consider that completeness may have changed during pandemic years. Excess death estimates have high sensitivity to modeling choice. Therefore different options should be considered and the full range of results should be shown for different choices of pre-pandemic reference periods and imposed models. Any post-modeling corrections in specific countries should be guided by pre-specified rules. Modeling of all-cause mortality (ACM) in countries that have ACM data and extrapolating these models to other countries is precarious; models may lack transportability. Existing global excess death estimates underestimate the overall uncertainty that is multiplicative across diverse sources of uncertainty. Informative excess death estimates require risk stratification, including age groups and ethnic/racial strata. Data to-date suggest a death deficit among children during the pandemic and marked socioeconomic differences in deaths, widening inequalities. Finally, causal explanations require great caution in disentangling SARS-CoV-2 deaths, indirect pandemic effects, and effects from measures taken. We conclude that excess deaths have many uncertainties, but globally deaths from SARS-CoV-2 may be the minority of calculated excess deaths.
View details for DOI 10.1111/eci.14008
View details for PubMedID 37067255
-
Excess death estimates from multiverse analysis in 2009-2021.
European journal of epidemiology
2023
Abstract
Excess death estimates have great value in public health, but they can be sensitive to analytical choices. Here we propose a multiverse analysis approach that considers all possible different time periods for defining the reference baseline and a range of 1 to 4 years for the projected time period for which excess deaths are calculated. We used data from the Human Mortality Database on 33 countries with detailed age-stratified death information on an annual basis during the period 2009-2021. The use of different time periods for reference baseline led to large variability in the absolute magnitude of the exact excess death estimates. However, the relative ranking of different countries compared to others for specific years remained largely unaltered. The relative ranking of different years for the specific country was also largely independent of baseline. Averaging across all possible analyses, distinct time patterns were discerned across different countries. Countries had declines between 2009 and 2019, but the steepness of the decline varied markedly. There were also large differences across countries on whether the COVID-19 pandemic years 2020-2021 resulted in an increase of excess deaths and by how much. Consideration of longer projected time windows resulted in substantial shrinking of the excess deaths in many, but not all countries. Multiverse analysis of excess deaths over long periods of interest can offer an approach that better accounts for the uncertainty in estimating expected mortality patterns, comparative mortality trends across different countries, and the nature of observed mortality peaks.
View details for DOI 10.1007/s10654-023-00998-2
View details for PubMedID 37043153
View details for PubMedCentralID 9225924
-
AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor.
Chemical science
2023; 14 (6): 1443-1452
Abstract
The application of artificial intelligence (AI) has been considered a revolutionary change in drug discovery and development. In 2020, the AlphaFold computer program predicted protein structures for the whole human genome, which has been considered a remarkable breakthrough in both AI applications and structural biology. Despite the varying confidence levels, these predicted structures could still significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold to our end-to-end AI-powered drug discovery engines, including a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42. A novel hit molecule against a novel target without an experimental structure was identified, starting from target selection towards hit identification, in a cost- and time-efficient manner. PandaOmics provided the protein of interest for the treatment of hepatocellular carcinoma (HCC) and Chemistry42 generated the molecules based on the structure predicted by AlphaFold, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for cyclin-dependent kinase 20 (CDK20) with a binding constant Kd value of 9.2 ± 0.5 μM (n = 3) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, a second round of AI-powered compound generation was conducted and through this, a more potent hit molecule, ISM042-2-048, was discovered with an average Kd value of 566.7 ± 256.2 nM (n = 3). Compound ISM042-2-048 also showed good CDK20 inhibitory activity with an IC50 value of 33.4 ± 22.6 nM (n = 3). In addition, ISM042-2-048 demonstrated selective anti-proliferation activity in an HCC cell line with CDK20 overexpression, Huh7, with an IC50 of 208.7 ± 3.3 nM, compared to a counter screen cell line HEK293 (IC50 = 1706.7 ± 670.0 nM). This work is the first demonstration of applying AlphaFold to the hit identification process in drug discovery.
View details for DOI 10.1039/d2sc05709c
View details for PubMedID 36794205
View details for PubMedCentralID PMC9906638
-
Estimates of COVID-19 deaths in Mainland China after abandoning zero COVID policy.
European journal of clinical investigation
2023: e13956
Abstract
BACKGROUND: China witnessed a surge of Omicron infections after abandoning "zero COVID" strategies on December 7, 2022. The authorities report very sparse deaths based on very restricted criteria, but massive deaths are speculated.METHODS: We aimed to estimate the COVID-19 fatalities in Mainland China until summer 2023 using the experiences of Hong Kong and of South Korea in 2022 as prototypes. Both these locations experienced massive Omicron waves after having had very few SARS-CoV-2 infections during 2020-2021. We estimated age-stratified infection fatality rates (IFRs) in Hong Kong and South Korea during 2022 and extrapolated to the population age structure of Mainland China. We also accounted separately for deaths of residents in long-term care facilities in both Hong Kong and South Korea.RESULTS: IFR estimates in non-elderly strata were modestly higher in Hong Kong than South Korea and projected 987,455 and 619,549 maximal COVID-19 deaths, respectively, if the entire China population was infected. Expected COVID-19 deaths in Mainland China until summer 2023 ranged from 49,962 to 691,219 assuming 25-70% of the non-elderly population being infected and variable protection of elderly (from none to three-quarter reduction in fatalities). The main analysis (45% of non-elderly population infected and fatality impact among elderly reduced by half) estimated 152,886-249,094 COVID-19 deaths until summer 2023. Large uncertainties exist regarding potential changes in dominant variant, health system strain, and impact on non-COVID-19 deaths.CONCLUSIONS: The most critical factor that can affect total COVID-19 fatalities in China is the extent to which the elderly can be protected.
View details for DOI 10.1111/eci.13956
View details for PubMedID 36691703
-
Estimates of COVID-19 deaths in Mainland China after abandoning zero COVID policy.
medRxiv : the preprint server for health sciences
2023
Abstract
Background: China witnessed a surge of Omicron infections after abandoning zero COVID strategies on December 7, 2022. The authorities report very sparse deaths based on very restricted criteria, but massive deaths are speculated.Methods: We aimed to estimate the COVID-19 fatalities in Mainland China until summer 2023 using the experiences of Hong Kong and of South Korea in 2022 as prototypes. Both these locations experienced massive Omicron waves after having had very few SARS-CoV-2 infections during 2020-2021. We estimated age-stratified infection fatality rates (IFRs) in Hong Kong and South Korea during 2022 and extrapolated to the population age structure of Mainland China. We also accounted separately for deaths of residents in long-term care facilities in both Hong Kong and South Korea.Results: IFR estimates in non-elderly strata were modestly higher in Hong Kong than South Korea and projected 987,455 and 619,549 maximal COVID-19 deaths, respectively, if the entire China population was infected. Expected COVID-19 deaths in Mainland China until summer 2023 ranged from 49,962 to 691,219 assuming 25-70% of the non-elderly population being infected and variable protection of elderly (from none to three-quarter reduction in fatalities). The main analysis (45% of non-elderly population infected and fatality impact among elderly reduced by half) estimated 152,886-249,094 COVID-19 deaths until summer 2023. Large uncertainties exist regarding potential changes in dominant variant, health system strain, and impact on non-COVID-19 deaths.Conclusions: The most critical factor that can affect total COVID-19 fatalities in China is the extent to which the elderly can be protected.
View details for DOI 10.1101/2022.12.29.22284048
View details for PubMedID 36597526
-
AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor
CHEMICAL SCIENCE
2023
View details for DOI 10.1039/d2sc05709c
View details for Web of Science ID 000915554000001
-
Protein-Ligand Binding Free-Energy Calculations with ARROW─A Purely First-Principles Parameterized Polarizable Force Field.
Journal of chemical theory and computation
2022
Abstract
Protein-ligand binding free-energy calculations using molecular dynamics (MD) simulations have emerged as a powerful tool for in silico drug design. Here, we present results obtained with the ARROW force field (FF)─a multipolar polarizable and physics-based model with all parameters fitted entirely to high-level ab initio quantum mechanical (QM) calculations. ARROW has already proven its ability to determine solvation free energy of arbitrary neutral compounds with unprecedented accuracy. The ARROW FF parameterization is now extended to include coverage of all amino acids including charged groups, allowing molecular simulations of a series of protein-ligand systems and prediction of their relative binding free energies. We ensure adequate sampling by applying a novel technique that is based on coupling the Hamiltonian Replica exchange (HREX) with a conformation reservoir generated via potential softening and nonequilibrium MD. ARROW provides predictions with near chemical accuracy (mean absolute error of 0.5 kcal/mol) for two of the three protein systems studied here (MCL1 and Thrombin). The third protein system (CDK2) reveals the difficulty in accurately describing dimer interaction energies involving polar and charged species. Overall, for all of the three protein systems studied here, ARROW FF predicts relative binding free energies of ligands with a similar accuracy level as leading nonpolarizable force fields.
View details for DOI 10.1021/acs.jctc.2c00930
View details for PubMedID 36459593
-
Virus spread on a scale-free network reproduces the Gompertz growth observed in isolated COVID-19 outbreaks.
Advances in biological regulation
2022: 100915
Abstract
The counts of confirmed cases and deaths in isolated SARS-CoV-2 outbreaks follow the Gompertz growth function for locations of very different sizes. This lack of dependence on region size leads us to hypothesize that virus spread depends on the universal properties of the network of social interactions. We test this hypothesis by simulating the propagation of a virus on networks of different topologies or connectivities. Our main finding is that we can reproduce the Gompertz growth observed for many early outbreaks with a simple virus spread model on a scale-free network, in which nodes with many more neighbors than average are common. Nodes that have very many neighbors are infected early in the outbreak and then spread the infection very rapidly. When these nodes are no longer infectious, the remaining nodes that have most neighbors take over and continue to spread the infection. In this way, the rate of spread is fastest at the very start and slows down immediately. Geometrically we see that the "surface" of the epidemic, the number of susceptible nodes in contact with the infected nodes, starts to rapidly decrease very early in the epidemic and as soon as the larger nodes have been infected. In our simulation, the speed and impact of an outbreak depend on three parameters: the average number of contacts each node makes, the probability of being infected by a neighbor, and the probability of recovery. Intelligent interventions to reduce the impact of future outbreaks need to focus on these critical parameters in order to minimize economic and social collateral damage.
View details for DOI 10.1016/j.jbior.2022.100915
View details for PubMedID 36220735
-
Comparison of pandemic excess mortality in 2020-2021 across different empirical calculations.
Environmental research
2022: 113754
Abstract
Different modeling approaches can be used to calculate excess deaths for the COVID-19 pandemic period. We compared 6 calculations of excess deaths (4 previously published [3 without age-adjustment] and two new ones that we performed with and without age-adjustment) for 2020-2021. With each approach, we calculated excess deaths metrics and the ratio R of excess deaths over recorded COVID-19 deaths. The main analysis focused on 33 high-income countries with weekly deaths in the Human Mortality Database (HMD at mortality.org) and reliable death registration. Secondary analyses compared calculations for other countries, whenever available. Across the 33 high-income countries, excess deaths were 2.0-2.8 million without age-adjustment, and 1.6-2.1 million with age-adjustment with large differences across countries. In our analyses after age-adjustment, 8 of 33 countries had no overall excess deaths; there was a death deficit in children; and 0.478 million (29.7%) of the excess deaths were in people <65 years old. In countries like France, Germany, Italy, and Spain excess death estimates differed 2 to 4-fold between highest and lowest figures. The R values' range exceeded 0.3 in all 33 countries. In 16 of 33 countries, the range of R exceeded 1. In 25 of 33 countries some calculations suggest R > 1 (excess deaths exceeding COVID-19 deaths) while others suggest R < 1 (excess deaths smaller than COVID-19 deaths). Inferred data from 4 evaluations for 42 countries and from 3 evaluations for another 98 countries are very tenuous. Estimates of excess deaths are analysis-dependent and age-adjustment is important to consider. Excess deaths may be lower than previously calculated.
View details for DOI 10.1016/j.envres.2022.113754
View details for PubMedID 35753371
-
Accurate determination of solvation free energies of neutral organic compounds from first principles.
Nature communications
1800; 13 (1): 414
Abstract
The main goal of molecular simulation is to accurately predict experimental observables of molecular systems. Another long-standing goal is to devise models for arbitrary neutral organic molecules with little or no reliance on experimental data. While separately these goals have been met to various degrees, for an arbitrary system of molecules they have not been achieved simultaneously. For biophysical ensembles that exist at room temperature and pressure, and where the entropic contributions are on par with interaction strengths, it is the free energies that are both most important and most difficult to predict. We compute the free energies of solvation for a diverse set of neutral organic compounds using a polarizable force field fitted entirely to ab initio calculations. The mean absolute errors (MAE) of hydration, cyclohexane solvation, and corresponding partition coefficients are 0.2 kcal/mol, 0.3 kcal/mol and 0.22 log units, i.e. within chemical accuracy. The model (ARROWFF) is multipolar, polarizable, and its accompanying simulation stack includes nuclear quantum effects (NQE). The simulation tools' computational efficiency is on a par with current state-of-the-art packages. The construction of a wide-coverage molecular modelling toolset from first principles, together with its excellent predictive ability in the liquid phase is a major advance in biomolecular simulation.
View details for DOI 10.1038/s41467-022-28041-0
View details for PubMedID 35058472
-
SARS-CoV-2 Omicron variant: viral spread dynamics, disease burden, and vaccine effectiveness.
Current medicine (Cham, Switzerland)
2022; 1 (1): 14
View details for DOI 10.1007/s44194-022-00014-x
View details for PubMedID 36062216
-
Probing Interplays between Human XBP1u Translational Arrest Peptide and 80S Ribosome.
Journal of chemical theory and computation
2021
Abstract
The ribosome stalling mechanism is a crucial biological process, yet its atomistic underpinning is still elusive. In this framework, the human XBP1u translational arrest peptide (AP) plays a central role in regulating the unfolded protein response (UPR) in eukaryotic cells. Here, we report multimicrosecond all-atom molecular dynamics simulations designed to probe the interactions between the XBP1u AP and the mammalian ribosome exit tunnel, both for the wild type AP and for four mutant variants of different arrest potencies. Enhanced sampling simulations allow investigating the AP release process of the different variants, shedding light on this complex mechanism. The present outcomes are in qualitative/quantitative agreement with available experimental data. In conclusion, we provide an unprecedented atomistic picture of this biological process and clear-cut insights into the key AP-ribosome interactions.
View details for DOI 10.1021/acs.jctc.1c00796
View details for PubMedID 34881571
-
The Gompertz Growth of COVID-19 Outbreaks is Caused by Super-Spreaders.
ArXiv
2021
Abstract
In individual SARS-CoV-2 outbreaks, the count of confirmed cases and deaths follow a Gompertz growth function for locations of very different sizes. This lack of dependence on region size leads us to hypothesize that virus spread depends on universal properties of the network of social interactions. We test this hypothesis by simulating the propagation of a virus on networks of different topologies. Our main finding is that Gompertz growth observed for early outbreaks occurs only for a scale-free network, in which nodes with many more neighbors than average are common. These nodes that have very many neighbors are infected early in the outbreak and then spread the infection very rapidly. When these nodes are no longer infectious, the remaining nodes that have most neighbors take over and continue to spread the infection. In this way, the rate of spread is fastest at the very start and slows down immediately. Geometrically it is seen that the "surface" of the epidemic, the number of susceptible nodes in contact with the infected nodes, starts to rapidly decrease very early in the epidemic and as soon as the larger nodes have been infected. In our simulation, the speed and impact of an outbreak depend on three parameters: the average number of contacts each node makes, the probability of being infected by a neighbor, and the probability of recovery. Intelligent interventions to reduce the impact of future outbreaks need to focus on these critical parameters in order to minimize economic and social collateral damage.
View details for PubMedID 34981031
View details for PubMedCentralID PMC8722603
-
Insights on cross-species transmission of SARS-CoV-2 from structural modeling.
bioRxiv : the preprint server for biology
2020
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the ongoing global pandemic that has infected more than 6 million people in more than 180 countries worldwide. Like other coronaviruses, SARS-CoV-2 is thought to have been transmitted to humans from wild animals. Given the scale and widespread geographical distribution of the current pandemic, the question emerges whether human-to-animal transmission is possible and if so, which animal species are most at risk. Here, we investigated the structural properties of several ACE2 orthologs bound to the SARS-CoV-2 spike protein. We found that species known not to be susceptible to SARS-CoV-2 infection have non-conservative mutations in several ACE2 amino acid residues that disrupt key polar and charged contacts with the viral spike protein. Our models also predict affinity-enhancing mutations that could be used to design ACE2 variants for therapeutic purposes. Finally, our study provides a blueprint for modeling viral-host protein interactions and highlights several important considerations when designing these computational studies and analyzing their results.
View details for DOI 10.1101/2020.06.05.136861
View details for PubMedID 32577636
View details for PubMedCentralID PMC7302186
-
Interfacea: Open-Source Library for Protein Interface Analysis
CELL PRESS. 2020: 516A
View details for Web of Science ID 000513023203328
-
Insights on cross-species transmission of SARS-CoV-2 from structural modeling
bioRxiv
2020
View details for DOI 10.1101/2020.06.05.136861
-
Insights on cross-species transmission of SARS-CoV-2 from structural modeling.
PLoS computational biology
2020; 16 (12): e1008449
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the ongoing global pandemic that has infected more than 31 million people in more than 180 countries worldwide. Like other coronaviruses, SARS-CoV-2 is thought to have been transmitted to humans from wild animals. Given the scale and widespread geographical distribution of the current pandemic and confirmed cases of cross-species transmission, the question of the extent to which this transmission is possible emerges, as well as what molecular features distinguish susceptible from non-susceptible animal species. Here, we investigated the structural properties of several ACE2 orthologs bound to the SARS-CoV-2 spike protein. We found that species known not to be susceptible to SARS-CoV-2 infection have non-conservative mutations in several ACE2 amino acid residues that disrupt key polar and charged contacts with the viral spike protein. Our models also allow us to predict affinity-enhancing mutations that could be used to design ACE2 variants for therapeutic purposes. Finally, our study provides a blueprint for modeling viral-host protein interactions and highlights several important considerations when designing these computational studies and analyzing their results.
View details for DOI 10.1371/journal.pcbi.1008449
View details for PubMedID 33270653
-
Solving the structure of Lgl2, a difficult blind test of unsupervised structure determination.
Proceedings of the National Academy of Sciences of the United States of America
2019
Abstract
In the companion paper by Ufimtsev and Levitt [Ufimtsev IS, Levitt M (2019) Proc Natl Acad Sci USA, 10.1073/pnas.1821512116], we presented a method for unsupervised solution of protein crystal structures and demonstrated its utility by solving several test cases of known structure in the 2.9- to 3.45-A resolution range. Here we apply this method to solve the crystal structure of a 966-amino acid construct of human lethal giant larvae protein (Lgl2) that resisted years of structure determination efforts, at 3.2-A resolution. The structure was determined starting with a molecular replacement (MR) model identified by unsupervised refinement of a pool of 50 candidate MR models. This initial model had 2.8-A RMSD from the solution. The solved structure was validated by comparison with a model subsequently derived from an alternative crystal form diffracting to higher resolution. This model could phase an anomalous difference Fourier map from an Hg derivative, and a single-wavelength anomalous dispersion phased density map made from these sites aligned with the refined structure.
View details for PubMedID 31088964
-
Unsupervised determination of protein crystal structures.
Proceedings of the National Academy of Sciences of the United States of America
2019
Abstract
We present a method for automatic solution of protein crystal structures. The method proceeds with a single initial model obtained, for instance, by molecular replacement (MR). If a good-quality search model is not available, as often is the case with MR of distant homologs, our method first can automatically screen a large pool of poorly placed models and single out promising candidates for further processing if there are any. We demonstrate its utility by solving a set of synthetic cases in the 2.9- to 3.45-A resolution.
View details for PubMedID 31088963
-
Automatic Inference of Sequence from Low-Resolution Crystallographic Data.
Structure (London, England : 1993)
2018
Abstract
At resolutions worse than 3.5A, the electron density is weak or nonexistent at the locations of the side chains. Consequently, the assignment of the protein sequences to their correct positions along the backbone is a difficult problem. In this work, we propose a fully automated computational approach to assign sequence at low resolution. It is based on our surprising observation that standard reciprocal-space indicators, such as the initial unrefined R value, are sensitive enough to detect an erroneous sequence assignment of even a single backbone position. Our approach correctly determines the amino acid type for 15%, 13%, and 9% of the backbone positions in crystallographic datasets with resolutions of 4.0A, 4.5A, and 5.0A, respectively. We implement these findings in an application for threading a sequence onto a backbone structure. For the three resolution ranges, the application threads 83%, 81%, and 64% of the sequences exactly as in the deposited PDB structures.
View details for PubMedID 30293812
-
On the importance of accounting for nuclear quantum effects in ab initio calibrated force fields in biological simulations.
Proceedings of the National Academy of Sciences of the United States of America
2018
Abstract
In many important processes in chemistry, physics, and biology the nuclear degrees of freedom cannot be described using the laws of classical mechanics. At the same time, the vast majority of molecular simulations that employ wide-coverage force fields treat atomic motion classically. In light of the increasing desire for and accelerated development of quantum mechanics (QM)-parameterized interaction models, we reexamine whether the classical treatment is sufficient for a simple but crucial chemical species: alkanes. We show that when using an interaction model or force field in excellent agreement with the "gold standard" QM data, even very basic simulated properties of liquid alkanes, such as densities and heats of vaporization, deviate significantly from experimental values. Inclusion of nuclear quantum effects via techniques that treat nuclear degrees of freedom using the laws of classical mechanics brings the simulated properties much closer to reality.
View details for PubMedID 30127031
-
Proteomic analysis of monolayer-integrated proteins on lipid droplets identifies amphipathic interfacial alpha-helical membrane anchors.
Proceedings of the National Academy of Sciences of the United States of America
2018
Abstract
Despite not spanning phospholipid bilayers, monotopic integral proteins (MIPs) play critical roles in organizing biochemical reactions on membrane surfaces. Defining the structural basis by which these proteins are anchored to membranes has been hampered by the paucity of unambiguously identified MIPs and a lack of computational tools that accurately distinguish monolayer-integrating motifs from bilayer-spanning transmembrane domains (TMDs). We used quantitative proteomics and statistical modeling to identify 87 high-confidence candidate MIPs in lipid droplets, including 21 proteins with predicted TMDs that cannot be accommodated in these monolayer-enveloped organelles. Systematic cysteine-scanning mutagenesis showed the predicted TMD of one candidate MIP, DHRS3, to be a partially buried amphipathic alpha-helix in both lipid droplet monolayers and the cytoplasmic leaflet of endoplasmic reticulum membrane bilayers. Coarse-grained molecular dynamics simulations support these observations, suggesting that this helix is most stable at the solvent-membrane interface. The simulations also predicted similar interfacial amphipathic helices when applied to seven additional MIPs from our dataset. Our findings suggest that interfacial helices may be a common motif by which MIPs are integrated into membranes, and provide high-throughput methods to identify and study MIPs.
View details for PubMedID 30104359
-
Unique function words characterize genomic proteins.
Proceedings of the National Academy of Sciences of the United States of America
2018; 115 (26): 6703–8
Abstract
Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of "words" or UFWs (57% shared), the "sentences" (MDAs) are different (1.3% shared).
View details for PubMedID 29895692
-
The solution structure of monomeric CCL5 in complex with a doubly sulfated N-terminal segment of CCR5
FEBS JOURNAL
2018; 285 (11): 1988–2003
Abstract
The inflammatory chemokine CCL5, which binds the chemokine receptor CCR5 in a two-step mechanism so as to activate signaling pathways in hematopoetic cells, plays an important role in immune surveillance, inflammation, and development as well as in several immune system pathologies. The recently published crystal structure of CCR5 bound to a high-affinity variant of CCL5 lacks the N-terminal segment of the receptor that is post-translationally sulfated and is known to be important for high-affinity binding. Here, we report the NMR solution structure of monomeric CCL5 bound to a synthetic doubly sulfated peptide corresponding to the missing first 27 residues of CCR5. Our structures show that two sulfated tyrosine residues, sY10 and sY14, as well as the unsulfated Y15 form a network of strong interactions with a groove on a surface of CCL5 that is formed from evolutionarily conserved basic and hydrophobic amino acids. We then use our NMR structures, in combination with available crystal data, to create an atomic model of full-length wild-type CCR5:CCL5. Our findings reveal the structural determinants involved in the recognition of CCL5 by the CCR5 N terminus. These findings, together with existing structural data, provide a complete structural framework with which to understand the specificity of receptor:chemokine interactions.Structural data are available in the PDB under the accession number 6FGP.
View details for PubMedID 29619777
-
An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12.
Scientific reports
2018; 8 (1): 9939
Abstract
Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.
View details for PubMedID 29967418
-
Emerging β-Sheet Rich Conformations in Supercompact Huntingtin Exon-1 Mutant Structures.
Journal of the American Chemical Society
2017; 139 (26): 8820-8827
Abstract
There exists strong correlation between the extended polyglutamines (polyQ) within exon-1 of Huntingtin protein (Htt) and age onset of Huntington's disease (HD); however, the underlying molecular mechanism is still poorly understood. Here we apply extensive molecular dynamics simulations to study the folding of Htt-exon-1 across five different polyQ-lengths. We find an increase in secondary structure motifs at longer Q-lengths, including β-sheet content that seems to contribute to the formation of increasingly compact structures. More strikingly, these longer Q-lengths adopt supercompact structures as evidenced by a surprisingly small power-law scaling exponent (0.22) between the radius-of-gyration and Q-length that is substantially below expected values for compact globule structures (∼0.33) and unstructured proteins (∼0.50). Hydrogen bond analyses further revealed that the supercompact behavior of polyQ is mainly due to the "glue-like" behavior of glutamine's side chains with significantly more side chain-side chain H-bonds than regular proteins in the Protein Data Bank (PDB). The orientation of the glutamine side chains also tend to be "buried" inside, explaining why polyQ domains are insoluble on their own.
View details for DOI 10.1021/jacs.7b00838
View details for PubMedID 28609090
View details for PubMedCentralID PMC5835228
-
Future of fundamental discovery in US biomedical research.
Proceedings of the National Academy of Sciences of the United States of America
2017
Abstract
Young researchers are crucially important for basic science as they make unexpected, fundamental discoveries. Since 1982, we find a steady drop in the number of grant-eligible basic-science faculty [principal investigators (PIs)] younger than 46. This fall occurred over a 32-y period when inflation-corrected congressional funds for NIH almost tripled. During this time, the PI success ratio (fraction of basic-science PIs who are R01 grantees) dropped for younger PIs (below 46) and increased for older PIs (above 55). This age-related bias seems to have caused the steady drop in the number of young basic-science PIs and could reduce future US discoveries in fundamental biomedical science. The NIH recognized this bias in its 2008 early-stage investigator (ESI) policy to fund young PIs at higher rates. We show this policy is working and recommend that it be enhanced by using better data. Together with the National Institute of General Medical Sciences (NIGMS) Maximizing Investigators' Research Award (MIRA) program to reward senior PIs with research time in exchange for less funding, this may reverse a decades-long trend of more money going to older PIs. To prepare young scientists for increased demand, additional resources should be devoted to transitional postdoctoral fellowships already offered by NIH.
View details for DOI 10.1073/pnas.1609996114
View details for PubMedID 28584129
-
Sequential allosteric mechanism of ATP hydrolysis by the CCT/TRiC chaperone is revealed through Arrhenius analysis
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2017; 114 (20): 5189-5194
Abstract
Knowing the mechanism of allosteric switching is important for understanding how molecular machines work. The CCT/TRiC chaperonin nanomachine undergoes ATP-driven conformational changes that are crucial for its folding function. Here, we demonstrate that insight into its allosteric mechanism of ATP hydrolysis can be achieved by Arrhenius analysis. Our results show that ATP hydrolysis triggers sequential ‟conformational waves." They also suggest that these waves start from subunits CCT6 and CCT8 (or CCT3 and CCT6) and proceed clockwise and counterclockwise, respectively.
View details for DOI 10.1073/pnas.1617746114
View details for Web of Science ID 000401314700058
View details for PubMedID 28461478
-
The language of the protein universe
CURRENT OPINION IN GENETICS & DEVELOPMENT
2015; 35: 50-56
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
View details for DOI 10.1016/j.gde.2015.08.010
View details for Web of Science ID 000366900600008
View details for PubMedID 26451980
View details for PubMedCentralID PMC4695241
-
Birth and Future of Multiscale Modeling for Macromolecular Systems (Nobel Lecture)
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION
2014; 53 (38): 10006-10018
View details for DOI 10.1002/anie.201403691
View details for Web of Science ID 000342761700002
View details for PubMedID 25100216
-
WeFold: A coopetition for protein structure prediction
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2014; 82 (9): 1850-1868
Abstract
The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.
View details for DOI 10.1002/prot.24538
View details for Web of Science ID 000340940300014
View details for PubMedID 24677212
-
Deformable elastic network refinement for low-resolution macromolecular crystallography.
Acta crystallographica. Section D, Biological crystallography
2014; 70: 2241-2255
Abstract
Crystals of membrane proteins and protein complexes often diffract to low resolution owing to their intrinsic molecular flexibility, heterogeneity or the mosaic spread of micro-domains. At low resolution, the building and refinement of atomic models is a more challenging task. The deformable elastic network (DEN) refinement method developed previously has been instrumental in the determinion of several structures at low resolution. Here, DEN refinement is reviewed, recommendations for its optimal usage are provided and its limitations are discussed. Representative examples of the application of DEN refinement to challenging cases of refinement at low resolution are presented. These cases include soluble as well as membrane proteins determined at limiting resolutions ranging from 3 to 7 Å. Potential extensions of the DEN refinement technique and future perspectives for the interpretation of low-resolution crystal structures are also discussed.
View details for DOI 10.1107/S1399004714016496
View details for PubMedID 25195739
-
Deformable elastic network refinement for low-resolution macromolecular crystallography
ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY
2014; 70: 2241-2255
Abstract
Crystals of membrane proteins and protein complexes often diffract to low resolution owing to their intrinsic molecular flexibility, heterogeneity or the mosaic spread of micro-domains. At low resolution, the building and refinement of atomic models is a more challenging task. The deformable elastic network (DEN) refinement method developed previously has been instrumental in the determinion of several structures at low resolution. Here, DEN refinement is reviewed, recommendations for its optimal usage are provided and its limitations are discussed. Representative examples of the application of DEN refinement to challenging cases of refinement at low resolution are presented. These cases include soluble as well as membrane proteins determined at limiting resolutions ranging from 3 to 7 Å. Potential extensions of the DEN refinement technique and future perspectives for the interpretation of low-resolution crystal structures are also discussed.
View details for DOI 10.1107/S1399004714016496
View details for Web of Science ID 000341819500001
View details for PubMedCentralID PMC4157441
-
Redundancy-weighting for better inference of protein structural features.
Bioinformatics
2014; 30 (16): 2295-2301
Abstract
Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families.In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology.cheny@il.ibm.com or chen.keasar@gmail.com.
View details for DOI 10.1093/bioinformatics/btu242
View details for PubMedID 24771517
-
Millisecond dynamics of RNA polymerase II translocation at atomic resolution
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2014; 111 (21): 7665-7670
Abstract
Transcription is a central step in gene expression, in which the DNA template is processively read by RNA polymerase II (Pol II), synthesizing a complementary messenger RNA transcript. At each cycle, Pol II moves exactly one register along the DNA, a process known as translocation. Although X-ray crystal structures have greatly enhanced our understanding of the transcription process, the underlying molecular mechanisms of translocation remain unclear. Here we use sophisticated simulation techniques to observe Pol II translocation on a millisecond timescale and at atomistic resolution. We observe multiple cycles of forward and backward translocation and identify two previously unidentified intermediate states. We show that the bridge helix (BH) plays a key role accelerating the translocation of both the RNA:DNA hybrid and transition nucleotide by directly interacting with them. The conserved BH residues, Thr831 and Tyr836, mediate these interactions. To date, this study delivers the most detailed picture of the mechanism of Pol II translocation at atomic level.
View details for DOI 10.1073/pnas.1315751111
View details for Web of Science ID 000336411300044
View details for PubMedID 24753580
View details for PubMedCentralID PMC4040580
-
Training-free atomistic prediction of nucleosome occupancy
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2014; 111 (17): 6293-6298
Abstract
Nucleosomes alter gene expression by preventing transcription factors from occupying binding sites along DNA. DNA methylation can affect nucleosome positioning and so alter gene expression epigenetically (without changing DNA sequence). Conventional methods to predict nucleosome occupancy are trained on observed DNA sequence patterns or known DNA oligonucleotide structures. They are statistical and lack the physics needed to predict subtle epigenetic changes due to DNA methylation. The training-free method presented here uses physical principles and state-of-the-art all-atom force fields to predict both nucleosome occupancy along genomic sequences as well as binding to known positioning sequences. Our method calculates the energy of both nucleosomal and linear DNA of the given sequence. Based on the DNA deformation energy, we accurately predict the in vitro occupancy profile observed experimentally for a 20,000-bp genomic region as well as the experimental locations of nucleosomes along 13 well-established positioning sequence elements. DNA with all C bases methylated at the 5 position shows less variation of nucleosome binding: Strong binding is weakened and weak binding is strengthened compared with normal DNA. Methylation also alters the preference of nucleosomes for some positioning sequences but not others.
View details for DOI 10.1073/pnas.1404475111
View details for Web of Science ID 000335199000052
View details for PubMedID 24733939
-
Architecture of an RNA Polymerase II Transcription Pre-Initiation Complex
SCIENCE
2013; 342 (6159): 709-?
Abstract
The protein density and arrangement of subunits of a complete, 32-protein, RNA polymerase II (pol II) transcription pre-initiation complex (PIC) were determined by means of cryogenic electron microscopy and a combination of chemical cross-linking and mass spectrometry. The PIC showed a marked division in two parts, one containing all the general transcription factors (GTFs) and the other pol II. Promoter DNA was associated only with the GTFs, suspended above the pol II cleft and not in contact with pol II. This structural principle of the PIC underlies its conversion to a transcriptionally active state; the PIC is poised for the formation of a transcription bubble and descent of the DNA into the pol II cleft.
View details for DOI 10.1126/science.1238724
View details for Web of Science ID 000326647600034
-
Architecture of an RNA polymerase II transcription pre-initiation complex.
Science
2013; 342 (6159): 1238724-?
Abstract
The protein density and arrangement of subunits of a complete, 32-protein, RNA polymerase II (pol II) transcription pre-initiation complex (PIC) were determined by means of cryogenic electron microscopy and a combination of chemical cross-linking and mass spectrometry. The PIC showed a marked division in two parts, one containing all the general transcription factors (GTFs) and the other pol II. Promoter DNA was associated only with the GTFs, suspended above the pol II cleft and not in contact with pol II. This structural principle of the PIC underlies its conversion to a transcriptionally active state; the PIC is poised for the formation of a transcription bubble and descent of the DNA into the pol II cleft.
View details for DOI 10.1126/science.1238724
View details for PubMedID 24072820
-
The crystal structures of the eukaryotic chaperonin CCT reveal its functional partitioning.
Structure
2013; 21 (4): 540-549
Abstract
In eukaryotes, CCT is essential for the correct and efficient folding of many cytosolic proteins, most notably actin and tubulin. Structural studies of CCT have been hindered by the failure of standard crystallographic analysis to resolve its eight different subunit types at low resolutions. Here, we exhaustively assess the R value fit of all possible CCT models to available crystallographic data of the closed and open forms with resolutions of 3.8 Å and 5.5 Å, respectively. This unbiased analysis finds the native subunit arrangements with overwhelming significance. The resulting structures provide independent crystallographic proof of the subunit arrangement of CCT and map major asymmetrical features of the particle onto specific subunits. The actin and tubulin substrates both bind around subunit CCT6, which shows other structural anomalies. CCT is thus clearly partitioned, both functionally and evolutionary, into a substrate-binding side that is opposite to the ATP-hydrolyzing side.
View details for DOI 10.1016/j.str.2013.01.017
View details for PubMedID 23478063
View details for PubMedCentralID PMC3622207
-
The Crystal Structures of the Eukaryotic Chaperonin CCT Reveal Its Functional Partitioning
STRUCTURE
2013; 21 (4): 540-549
View details for DOI 10.1016/j.str.2013.01.017
View details for Web of Science ID 000317800100004
View details for PubMedID 23478063
-
On the Universe of Protein Folds
ANNUAL REVIEW OF BIOPHYSICS, VOL 42
2013; 42: 559-582
Abstract
In the fifty years since the first atomic structure of a protein was revealed, tens of thousands of additional structures have been solved. Like all objects in biology, proteins structures show common patterns that seem to define family relationships. Classification of proteins structures, which started in the 1970s with about a dozen structures, has continued with increasing enthusiasm, leading to two main fold classifications, SCOP and CATH, as well as many additional databases. Classification is complicated by deciding what constitutes a domain, the fundamental unit of structure. Also difficult is deciding when two given structures are similar. Like all of biology, fold classification is beset by exceptions to all rules. Thus, the perspectives of protein fold space that the fold classifications offer differ from each other. In spite of these ambiguities, fold classifications are useful for prediction of structure and function. Studying the characteristics of fold space can shed light on protein evolution and the physical laws that govern protein behavior.
View details for DOI 10.1146/annurev-biophys-083012-130432
View details for Web of Science ID 000321695700025
View details for PubMedID 23527781
-
Evolutionarily consistent families in SCOP: sequence, structure and function
BMC STRUCTURAL BIOLOGY
2012; 12
Abstract
SCOP is a hierarchical domain classification system for proteins of known structure. The superfamily level has a clear definition: Protein domains belong to the same superfamily if there is structural, functional and sequence evidence for a common evolutionary ancestor. Superfamilies are sub-classified into families, however, there is not such a clear basis for the family level groupings. Do SCOP families group together domains with sequence similarity, do they group domains with similar structure or by common function? It is these questions we answer, but most importantly, whether each family represents a distinct phylogenetic group within a superfamily.Several phylogenetic trees were generated for each superfamily: one derived from a multiple sequence alignment, one based on structural distances, and the final two from presence/absence of GO terms or EC numbers assigned to domains. The topologies of the resulting trees and confidence values were compared to the SCOP family classification.We show that SCOP family groupings are evolutionarily consistent to a very high degree with respect to classical sequence phylogenetics. The trees built from (automatically generated) structural distances correlate well, but are not always consistent with SCOP (hand annotated) groupings. Trees derived from functional data are less consistent with the family level than those from structure or sequence, though the majority still agree. Much of GO and EC annotation applies directly to one family or subset of the family; relatively few terms apply at the superfamily level. Maximum sequence diversity within a family is on average 22% but close to zero for superfamilies.
View details for DOI 10.1186/1472-6807-12-27
View details for Web of Science ID 000311335600001
View details for PubMedID 23078280
-
KoBaMIN: a knowledge-based minimization web server for protein structure refinement
NUCLEIC ACIDS RESEARCH
2012; 40 (W1): W323-W328
View details for DOI 10.1093/nar/gks376
View details for Web of Science ID 000306670900053
-
KoBaMIN: a knowledge-based minimization web server for protein structure refinement.
Nucleic acids research
2012; 40 (Web Server issue): W323-8
Abstract
The KoBaMIN web server provides an online interface to a simple, consistent and computationally efficient protein structure refinement protocol based on minimization of a knowledge-based potential of mean force. The server can be used to refine either a single protein structure or an ensemble of proteins starting from their unrefined coordinates in PDB format. The refinement method is particularly fast and accurate due to the underlying knowledge-based potential derived from structures deposited in the PDB; as such, the energy function implicitly includes the effects of solvent and the crystal environment. Our server allows for an optional but recommended step that optimizes stereochemistry using the MESHI software. The KoBaMIN server also allows comparison of the refined structures with a provided reference structure to assess the changes brought about by the refinement protocol. The performance of KoBaMIN has been benchmarked widely on a large set of decoys, all models generated at the seventh worldwide experiments on critical assessment of techniques for protein structure prediction (CASP7) and it was also shown to produce top-ranking predictions in the refinement category at both CASP8 and CASP9, yielding consistently good results across a broad range of model quality values. The web server is fully functional and freely available at http://csb.stanford.edu/kobamin.
View details for DOI 10.1093/nar/gks376
View details for PubMedID 22564897
View details for PubMedCentralID PMC3394243
-
Multiscale natural moves refine macromolecules using single-particle electron microscopy projection images
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (25): 9845-9850
Abstract
The method presented here refines molecular conformations directly against projections of single particles measured by electron microscopy. By optimizing the orientation of the projection at the same time as the conformation, the method is well-suited to two-dimensional class averages from cryoelectron microscopy. Such direct use of two-dimensional images circumvents the need for a three-dimensional density map, which may be difficult to reconstruct from projections due to structural heterogeneity or preferred orientations of the sample on the grid. Our refinement protocol exploits Natural Move Monte Carlo to model a macromolecule as a small number of segments connected by flexible loops, on multiple scales. After tests on artificial data from lysozyme, we applied the method to the Methonococcus maripaludis chaperonin. We successfully refined its conformation from a closed-state initial model to an open-state final model using just one class-averaged projection. We also used Natural Moves to iteratively refine against heterogeneous projection images of Methonococcus maripaludis chaperonin in a mix of open and closed states. Our results suggest a general method for electron microscopy refinement specially suited to macromolecules with significant conformational flexibility. The algorithm is available in the program Methodologies for Optimization and Sampling In Computational Studies.
View details for DOI 10.1073/pnas.1205945109
View details for Web of Science ID 000306061400043
View details for PubMedID 22665770
View details for PubMedCentralID PMC3382478
-
Improving the accuracy of macromolecular structure refinement at 7 Å resolution.
Structure
2012; 20 (6): 957-966
Abstract
In X-ray crystallography, molecular replacement and subsequent refinement is challenging at low resolution. We compared refinement methods using synchrotron diffraction data of photosystem I at 7.4 Å resolution, starting from different initial models with increasing deviations from the known high-resolution structure. Standard refinement spoiled the initial models, moving them further away from the true structure and leading to high R(free)-values. In contrast, DEN refinement improved even the most distant starting model as judged by R(free), atomic root-mean-square differences to the true structure, significance of features not included in the initial model, and connectivity of electron density. The best protocol was DEN refinement with initial segmented rigid-body refinement. For the most distant initial model, the fraction of atoms within 2 Å of the true structure improved from 24% to 60%. We also found a significant correlation between R(free) values and the accuracy of the model, suggesting that R(free) is useful even at low resolution.
View details for DOI 10.1016/j.str.2012.04.020
View details for PubMedID 22681901
View details for PubMedCentralID PMC3380535
-
Improving the Accuracy of Macromolecular Structure Refinement at 7 angstrom Resolution
STRUCTURE
2012; 20 (6): 957-966
Abstract
In X-ray crystallography, molecular replacement and subsequent refinement is challenging at low resolution. We compared refinement methods using synchrotron diffraction data of photosystem I at 7.4 Å resolution, starting from different initial models with increasing deviations from the known high-resolution structure. Standard refinement spoiled the initial models, moving them further away from the true structure and leading to high R(free)-values. In contrast, DEN refinement improved even the most distant starting model as judged by R(free), atomic root-mean-square differences to the true structure, significance of features not included in the initial model, and connectivity of electron density. The best protocol was DEN refinement with initial segmented rigid-body refinement. For the most distant initial model, the fraction of atoms within 2 Å of the true structure improved from 24% to 60%. We also found a significant correlation between R(free) values and the accuracy of the model, suggesting that R(free) is useful even at low resolution.
View details for DOI 10.1016/j.str.2012.04.020
View details for Web of Science ID 000305094500004
View details for PubMedCentralID PMC3380535
-
Comparative modeling and protein-like features of hydrophobic-polar models on a two-dimensional lattice
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2012; 80 (6): 1683-1693
Abstract
Lattice models of proteins have been extensively used to study protein thermodynamics, folding dynamics, and evolution. Our study considers two different hydrophobic-polar (HP) models on the 2D square lattice: the purely HP model and a model where a compactness-favoring term is added. We exhaustively enumerate all the possible structures in our models and perform the study of their corresponding folds, HP arrangements in space and shapes. The two models considered differ greatly in their numbers of structures, folds, arrangements, and shapes. Despite their differences, both lattice models have distinctive protein-like features: (1) Shapes are compact in both models, especially when a compactness-favoring energy term is added. (2) The residue composition is independent of the chain length and is very close to 50% hydrophobic in both models, as we observe in real proteins. (3) Comparative modeling works well in both models, particularly in the more compact one. The fact that our models show protein-like features suggests that lattice models incorporate the fundamental physical principles of proteins. Our study supports the use of lattice models to study questions about proteins that require exactness and extensive calculations, such as protein design and evolution, which are often too complex and computationally demanding to be addressed with more detailed models.
View details for DOI 10.1002/prot.24067
View details for Web of Science ID 000303759000014
View details for PubMedID 22411636
View details for PubMedCentralID PMC3348970
-
Modeling nucleic acids
CURRENT OPINION IN STRUCTURAL BIOLOGY
2012; 22 (3): 273-278
Abstract
Nucleic acids are an important class of biological macromolecules that carry out a variety of cellular roles. For many functions, naturally occurring DNA and RNA molecules need to fold into precise three-dimensional structures. Due to their self-assembling characteristics, nucleic acids have also been widely studied in the field of nanotechnology, and a diverse range of intricate three-dimensional nanostructures have been designed and synthesized. Different physical terms such as base-pairing and stacking interactions, tertiary contacts, electrostatic interactions and entropy all affect nucleic acid folding and structure. Here we review general computational approaches developed to model nucleic acid systems. We focus on four key areas of nucleic acid modeling: molecular representation, potential energy function, degrees of freedom and sampling algorithm. Appropriate choices in each of these key areas in nucleic acid modeling can effectively combine to aid interpretation of experimental data and facilitate prediction of nucleic acid structure.
View details for DOI 10.1016/j.sbi.2012.03.012
View details for Web of Science ID 000306347800004
View details for PubMedID 22538125
-
EVALUATING MIXTURE MODELS FOR BUILDING RNA KNOWLEDGE-BASED POTENTIALS
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY
2012; 10 (2)
Abstract
Ribonucleic acid (RNA) molecules play important roles in a variety of biological processes. To properly function, RNA molecules usually have to fold to specific structures, and therefore understanding RNA structure is vital in comprehending how RNA functions. One approach to understanding and predicting biomolecular structure is to use knowledge-based potentials built from experimentally determined structures. These types of potentials have been shown to be effective for predicting both protein and RNA structures, but their utility is limited by their significantly rugged nature. This ruggedness (and hence the potential's usefulness) depends heavily on the choice of bin width to sort structural information (e.g. distances) but the appropriate bin width is not known a priori. To circumvent the binning problem, we compared knowledge-based potentials built from inter-atomic distances in RNA structures using different mixture models (Kernel Density Estimation, Expectation Minimization and Dirichlet Process). We show that the smooth knowledge-based potential built from Dirichlet process is successful in selecting native-like RNA models from different sets of structural decoys with comparable efficacy to a potential developed by spline-fitting - a commonly taken approach - to binned distance histograms. The less rugged nature of our potential suggests its applicability in diverse types of structural modeling.
View details for DOI 10.1142/S0219720012410107
View details for Web of Science ID 000302951300009
View details for PubMedID 22809345
-
Application of DEN refinement and automated model building to a difficult case of molecular-replacement phasing: the structure of a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum
ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY
2012; 68: 391-403
Abstract
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
View details for DOI 10.1107/S090744491104978X
View details for Web of Science ID 000302138400008
View details for PubMedID 22505259
View details for PubMedCentralID PMC3322598
-
Modeling and design by hierarchical natural moves
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (8): 2890-2895
Abstract
We develop a unique algorithm implemented in the program MOSAICS (Methodologies for Optimization and Sampling in Computational Studies) that is capable of nanoscale modeling without compromising the resolution of interest. This is achieved by modeling with customizable hierarchical degrees of freedom, thereby circumventing major limitations of conventional molecular modeling. With the emergence of RNA-based nanotechnology, large RNAs in all-atom representation are used here to benchmark our algorithm. Our method locates all favorable structural states of a model RNA of significant complexity while improving sampling accuracy and increasing speed many fold over existing all-atom RNA modeling methods. We also modeled the effects of sequence mutations on the structural building blocks of tRNA-based nanotechnology. With its flexibility in choosing arbitrary degrees of freedom as well as in allowing different all-atom energy functions, MOSAICS is an ideal tool to model and design biomolecules of the nanoscale.
View details for DOI 10.1073/pnas.1119918109
View details for Web of Science ID 000300495100048
View details for PubMedID 22308445
View details for PubMedCentralID PMC3287004
-
Subunit order of eukaryotic TRiC/CCT chaperonin by cross-linking, mass spectrometry, and combinatorial homology modeling
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (8): 2884-2889
Abstract
The TRiC/CCT chaperonin is a 1-MDa hetero-oligomer of 16 subunits that assists the folding of proteins in eukaryotes. Low-resolution structural studies confirmed the TRiC particle to be composed of two stacked octameric rings enclosing a folding cavity. The exact arrangement of the different proteins in the rings underlies the functionality of TRiC and is likely to be conserved across all eukaryotes. Yet despite its importance it has not been determined conclusively, mainly because the different subunits appear nearly identical under low resolution. This work successfully addresses the arrangement problem by the emerging technique of cross-linking, mass spectrometry, and modeling. We cross-linked TRiC under native conditions with a cross-linker that is primarily reactive toward exposed lysine side chains that are spatially close in the context of the particle. Following digestion and mass spectrometry we were able to identify over 60 lysine pairs that underwent cross-linking, thus providing distance restraints between specific residues in the complex. Independently of the cross-link set, we constructed 40,320 (= 8 factorial) computational models of the TRiC particle, which exhaustively enumerate all the possible arrangements of the different subunits. When we assessed the compatibility of each model with the cross-link set, we discovered that one specific model is significantly more compatible than any other model. Furthermore, bootstrapping analysis confirmed that this model is 10 times more likely to result from this cross-link set than the next best-fitting model. Our subunit arrangement is very different than any of the previously reported models and changes the context of existing and future findings on TRiC.
View details for DOI 10.1073/pnas.1119472109
View details for Web of Science ID 000300495100047
View details for PubMedID 22308438
View details for PubMedCentralID PMC3287007
-
Symmetry-free cryo-EM structures of the chaperonin TRiC along its ATPase-driven conformational cycle
EMBO JOURNAL
2012; 31 (3): 720-730
Abstract
The eukaryotic group II chaperonin TRiC/CCT is a 16-subunit complex with eight distinct but similar subunits arranged in two stacked rings. Substrate folding inside the central chamber is triggered by ATP hydrolysis. We present five cryo-EM structures of TRiC in apo and nucleotide-induced states without imposing symmetry during the 3D reconstruction. These structures reveal the intra- and inter-ring subunit interaction pattern changes during the ATPase cycle. In the apo state, the subunit arrangement in each ring is highly asymmetric, whereas all nucleotide-containing states tend to be more symmetrical. We identify and structurally characterize an one-ring closed intermediate induced by ATP hydrolysis wherein the closed TRiC ring exhibits an observable chamber expansion. This likely represents the physiological substrate folding state. Our structural results suggest mechanisms for inter-ring-negative cooperativity, intra-ring-positive cooperativity, and protein-folding chamber closure of TRiC. Intriguingly, these mechanisms are different from other group I and II chaperonins despite their similar architecture.
View details for DOI 10.1038/emboj.2011.366
View details for Web of Science ID 000300871700019
View details for PubMedID 22045336
View details for PubMedCentralID PMC3273382
-
Optimized Torsion-Angle Normal Modes Reproduce Conformational Changes More Accurately Than Cartesian Modes
BIOPHYSICAL JOURNAL
2011; 101 (12): 2966-2969
Abstract
We present what to our knowledge is a new method of optimized torsion-angle normal-mode analysis, in which the normal modes move along curved paths in Cartesian space. We show that optimized torsion-angle normal modes reproduce protein conformational changes more accurately than Cartesian normal modes. We also show that orthogonalizing the displacement vectors from torsion-angle normal-mode analysis and projecting them as straight lines in Cartesian space does not lead to better performance than Cartesian normal modes. Clearly, protein motion is more naturally described by curved paths in Cartesian space.
View details for DOI 10.1016/j.bpj.2011.10.054
View details for Web of Science ID 000298445500012
View details for PubMedID 22208195
-
Remarkable patterns of surface water ordering around polarized buckminsterfullerene
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (35): 14455-14460
Abstract
Accurate description of water structure affects simulation of protein folding, substrate binding, macromolecular recognition, and complex formation. We study the hydration of buckminsterfullerene, the smallest hydrophobic nanosphere, by molecular dynamics simulations using a state-of-the-art quantum mechanical polarizable force field (QMPFF3), derived from quantum mechanical data at the MP2/aug-cc-pVTZ(-hp) level augmented by CCSD(T). QMPFF3 calculation of the hydrophobic effect is compared to that obtained with empirical force fields. Using a novel and highly sensitive method, we see polarization increases ordered water structure so that the imprint of the hydrophobic surface atoms on the surrounding waters is stronger and extends to long-range. We see less water order for empirical force fields. The greater order seen with QMPFF3 will affect biological processes through a stronger hydrophobic effect.
View details for DOI 10.1073/pnas.1110626108
View details for Web of Science ID 000294425900024
View details for PubMedID 21844369
View details for PubMedCentralID PMC3167499
-
Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation
RNA-A PUBLICATION OF THE RNA SOCIETY
2011; 17 (6): 1066-1075
Abstract
RNA molecules play integral roles in gene regulation, and understanding their structures gives us important insights into their biological functions. Despite recent developments in template-based and parameterized energy functions, the structure of RNA--in particular the nonhelical regions--is still difficult to predict. Knowledge-based potentials have proven efficient in protein structure prediction. In this work, we describe two differentiable knowledge-based potentials derived from a curated data set of RNA structures, with all-atom or coarse-grained representation, respectively. We focus on one aspect of the prediction problem: the identification of native-like RNA conformations from a set of near-native models. Using a variety of near-native RNA models generated from three independent methods, we show that our potential is able to distinguish the native structure and identify native-like conformations, even at the coarse-grained level. The all-atom version of our knowledge-based potential performs better and appears to be more effective at discriminating near-native RNA conformations than one of the most highly regarded parameterized potential. The fully differentiable form of our potentials will additionally likely be useful for structure refinement and/or molecular dynamics simulations.
View details for DOI 10.1261/rna.2543711
View details for Web of Science ID 000290666300008
View details for PubMedID 21521828
-
Cryo-EM Structure of a Group II Chaperonin in the Prehydrolysis ATP-Bound State Leading to Lid Closure
STRUCTURE
2011; 19 (5): 633-639
Abstract
Chaperonins are large ATP-driven molecular machines that mediate cellular protein folding. Group II chaperonins use their "built-in lid" to close their central folding chamber. Here we report the structure of an archaeal group II chaperonin in its prehydrolysis ATP-bound state at subnanometer resolution using single particle cryo-electron microscopy (cryo-EM). Structural comparison of Mm-cpn in ATP-free, ATP-bound, and ATP-hydrolysis states reveals that ATP binding alone causes the chaperonin to close slightly with a ∼45° counterclockwise rotation of the apical domain. The subsequent ATP hydrolysis drives each subunit to rock toward the folding chamber and to close the lid completely. These motions are attributable to the local interactions of specific active site residues with the nucleotide, the tight couplings between the apical and intermediate domains within the subunit, and the aligned interactions between two subunits across the rings. This mechanism of structural changes in response to ATP is entirely different from those found in group I chaperonins.
View details for DOI 10.1016/j.str.2011.03.005
View details for Web of Science ID 000290815500006
View details for PubMedID 21565698
-
Normal Modes of Prion Proteins: From Native to Infectious Particle
BIOCHEMISTRY
2011; 50 (12): 2243-2248
Abstract
Prion proteins (PrP) are the infectious agent in transmissible spongiform encephalopathies (i.e., mad cow disease). To be infectious, prion proteins must undergo a conformational change involving a decrease in α-helical content along with an increase in β-strand content. This conformational change was evaluated by means of elastic normal modes. Elastic normal modes show a diminution of two α-helices by one and two residues, as well as an extension of two β-strands by three residues each, which could instigate the conformational change. The conformational change occurs in a region that is compatible with immunological studies, and it is observed more frequently in mutant prions that are prone to conversion than in wild-type prions because of differences in their starting structures, which are amplified through normal modes. These findings are valuable for our comprehension of the conversion mechanism associated with the conformational change in prion proteins.
View details for DOI 10.1021/bi1010514
View details for Web of Science ID 000288573500027
View details for PubMedID 21338080
View details for PubMedCentralID PMC3070235
-
Clustering to identify RNA conformations constrained by secondary structure
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (9): 3590-3595
Abstract
RNA often folds hierarchically, so that its sequence defines its secondary structure (helical base-paired regions connected by single-stranded junctions), which subsequently defines its tertiary fold. To preserve base-pairing and chain connectivity, the three-dimensional conformations that RNA can explore are strongly confined compared to when secondary structure constraints are not enforced. Using three examples, we studied how secondary structure confines and dictates an RNA's preferred conformations. We made use of Macromolecular Conformations by SYMbolic programming (MC-Sym) fragment assembly to generate RNA conformations constrained by secondary structure. Then, to understand the correlations between different helix placements and orientations, we robustly clustered all RNA conformations by employing unique methods to remove outliers and estimate the best number of conformational clusters. We observed that the preferred conformation (as judged by largest cluster size) for each type of RNA junction molecule tested is consistent with its biological function. Further, the improved quality of models in our pruned datasets facilitates subsequent discrimination using scoring functions based either on statistical analysis (knowledge based) or experimental data.
View details for DOI 10.1073/pnas.1018653108
View details for Web of Science ID 000287844400031
View details for PubMedID 21317361
View details for PubMedCentralID PMC3048103
-
To what extent does the citation advantage of collaboration depend on the citation counting system?
13th Conference of the International-Society-for-Scientometrics-and-Informetrics (ISSI)
INT SOC SCIENTOMETRICS & INFORMETRICS-ISSI. 2011: 398–408
View details for Web of Science ID 000305337100042
-
RNA polymerase II trigger loop residues stabilize and position the incoming nucleotide triphosphate in transcription
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2010; 107 (36): 15745-15750
Abstract
A structurally conserved element, the trigger loop, has been suggested to play a key role in substrate selection and catalysis of RNA polymerase II (pol II) transcription elongation. Recently resolved X-ray structures showed that the trigger loop forms direct interactions with the beta-phosphate and base of the matched nucleotide triphosphate (NTP) through residues His1085 and Leu1081, respectively. In order to understand the role of these two critical residues in stabilizing active site conformation in the dynamic complex, we performed all-atom molecular dynamics simulations of the wild-type pol II elongation complex and its mutants in explicit solvent. In the wild-type complex, we found that the trigger loop is stabilized in the "closed" conformation, and His1085 forms a stable interaction with the NTP. Simulations of point mutations of His1085 are shown to affect this interaction; simulations of alternative protonation states, which are inaccessible through experiment, indicate that only the protonated form is able to stabilize the His1085-NTP interaction. Another trigger loop residue, Leu1081, stabilizes the incoming nucleotide position through interaction with the nucleotide base. Our simulations of this Leu mutant suggest a three-component mechanism for correctly positioning the incoming NTP in which (i) hydrophobic contact through Leu1081, (ii) base stacking, and (iii) base pairing work together to minimize the motion of the incoming NTP base. These results complement experimental observations and provide insight into the role of the trigger loop on transcription fidelity.
View details for DOI 10.1073/pnas.1009898107
View details for Web of Science ID 000281637800024
View details for PubMedID 20798057
View details for PubMedCentralID PMC2936645
-
Consistent refinement of submitted models at CASP using a knowledge-based potential
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2010; 78 (12): 2668-2678
Abstract
Protein structure refinement is an important but unsolved problem; it must be solved if we are to predict biological function that is very sensitive to structural details. Specifically, critical assessment of techniques for protein structure prediction (CASP) shows that the accuracy of predictions in the comparative modeling category is often worse than that of the template on which the homology model is based. Here we describe a refinement protocol that is able to consistently refine submitted predictions for all categories at CASP7. The protocol uses direct energy minimization of the knowledge-based potential of mean force that is based on the interaction statistics of 167 atom types (Summa and Levitt, Proc Natl Acad Sci USA 2007; 104:3177-3182). Our protocol is thus computationally very efficient; it only takes a few minutes of CPU time to run typical protein models (300 residues). We observe an average structural improvement of 1% in GDT_TS, for predictions that have low and medium homology to known PDB structures (Global Distance Test score or GDT_TS between 50 and 80%). We also observe a marked improvement in the stereochemistry of the models. The level of improvement varies amongst the various participants at CASP, but we see large improvements (>10% increase in GDT_TS) even for models predicted by the best performing groups at CASP7. In addition, our protocol consistently improved the best predicted models in the refinement category at CASP7 and CASP8. These improvements in structure and stereochemistry prove the usefulness of our computationally inexpensive, powerful and automatic refinement protocol.
View details for DOI 10.1002/prot.22781
View details for Web of Science ID 000280822000008
View details for PubMedID 20589633
View details for PubMedCentralID PMC2911515
-
Conformational Optimization with Natural Degrees of Freedom: A Novel Stochastic Chain Closure Algorithm
JOURNAL OF COMPUTATIONAL BIOLOGY
2010; 17 (8): 993-1010
Abstract
The present article introduces a set of novel methods that facilitate the use of "natural moves" or arbitrary degrees of freedom that can give rise to collective rearrangements in the structure of biological macromolecules. While such "natural moves" may spoil the stereochemistry and even break the bonded chain at multiple locations, our new method restores the correct chain geometry by adjusting bond and torsion angles in an arbitrary defined molten zone. This is done by successive stages of partial closure that propagate the location of the chain break backwards along the chain. At the end of these stages, the size of the chain break is generally reduced so much that it can be repaired by adjusting the position of a single atom. Our chain closure method is efficient with a computational complexity of O(N(d)), where N(d) is the number of degrees of freedom used to repair the chain break. The new method facilitates the use of arbitrary degrees of freedom including the "natural" degrees of freedom inferred from analyzing experimental (X-ray crystallography and nuclear magnetic resonance [NMR]) structures of nucleic acids and proteins. In terms of its ability to generate large conformational moves and its effectiveness in locating low energy states, the new method is robust and computationally efficient.
View details for DOI 10.1089/cmb.2010.0016
View details for Web of Science ID 000281199700003
View details for PubMedID 20726792
View details for PubMedCentralID PMC3119633
-
MOTIF-EM: an automated computational tool for identifying conserved regions in CryoEM structures
BIOINFORMATICS
2010; 26 (12): i301-i309
Abstract
We present a new, first-of-its-kind, fully automated computational tool MOTIF-EM for identifying regions or domains or motifs in cryoEM maps of large macromolecular assemblies (such as chaperonins, viruses, etc.) that remain conformationally conserved. As a by-product, regions in structures that are not conserved are revealed: this can indicate local molecular flexibility related to biological activity. MOTIF-EM takes cryoEM volumetric maps as inputs. The technique used by MOTIF-EM to detect conserved sub-structures is inspired by a recent breakthrough in 2D object recognition. The technique works by constructing rotationally invariant, low-dimensional representations of local regions in the input cryoEM maps. Correspondences are established between the reduced representations (by comparing them using a simple metric) across the input maps. The correspondences are clustered using hash tables and graph theory is used to retrieve conserved structural domains or motifs. MOTIF-EM has been used to extract conserved domains occurring in large macromolecular assembly maps, including as those of viruses P22 and epsilon 15, Ribosome 70S, GroEL, that remain structurally conserved in different functional states. Our method can also been used to build atomic models for some maps. We also used MOTIF-EM to identify the conserved folds shared among dsDNA bacteriophages HK97, Epsilon 15, and ô29, though they have low-sequence similarity. Supplementary information: Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btq195
View details for Web of Science ID 000278689000037
View details for PubMedID 20529921
View details for PubMedCentralID PMC2881380
-
Super-resolution biomolecular crystallography with low-resolution data
NATURE
2010; 464 (7292): 1218-U146
Abstract
X-ray diffraction plays a pivotal role in the understanding of biological systems by revealing atomic structures of proteins, nucleic acids and their complexes, with much recent interest in very large assemblies like the ribosome. As crystals of such large assemblies often diffract weakly (resolution worse than 4 A), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, whereas others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex. Determining the structure of such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution better than 5 A generally exceeds the number of degrees of freedom. Here we introduce a method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with R(free) (the free R-factor) determines the optimum deformation and influence of the homology model. For test cases at 3.5-5 A resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model as monitored by coordinate accuracy, the definition of secondary structure and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the Protein Data Bank, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to the study of weakly diffracting crystals using X-ray micro-diffraction as well as data from new X-ray light sources. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to subnanometre resolution, it can use similar tools.
View details for DOI 10.1038/nature08892
View details for Web of Science ID 000276891100043
View details for PubMedID 20376006
View details for PubMedCentralID PMC2859093
-
Mechanism of folding chamber closure in a group II chaperonin
NATURE
2010; 463 (7279): 379-U130
Abstract
Group II chaperonins are essential mediators of cellular protein folding in eukaryotes and archaea. These oligomeric protein machines, approximately 1 megadalton, consist of two back-to-back rings encompassing a central cavity that accommodates polypeptide substrates. Chaperonin-mediated protein folding is critically dependent on the closure of a built-in lid, which is triggered by ATP hydrolysis. The structural rearrangements and molecular events leading to lid closure are still unknown. Here we report four single particle cryo-electron microscopy (cryo-EM) structures of Mm-cpn, an archaeal group II chaperonin, in the nucleotide-free (open) and nucleotide-induced (closed) states. The 4.3 A resolution of the closed conformation allowed building of the first ever atomic model directly from the single particle cryo-EM density map, in which we were able to visualize the nucleotide and more than 70% of the side chains. The model of the open conformation was obtained by using the deformable elastic network modelling with the 8 A resolution open-state cryo-EM density restraints. Together, the open and closed structures show how local conformational changes triggered by ATP hydrolysis lead to an alteration of intersubunit contacts within and across the rings, ultimately causing a rocking motion that closes the ring. Our analyses show that there is an intricate and unforeseen set of interactions controlling allosteric communication and inter-ring signalling, driving the conformational cycle of group II chaperonins. Beyond this, we anticipate that our methodology of combining single particle cryo-EM and computational modelling will become a powerful tool in the determination of atomic details involved in the dynamic processes of macromolecular machines in solution.
View details for DOI 10.1038/nature08701
View details for Web of Science ID 000273748100049
View details for PubMedID 20090755
View details for PubMedCentralID PMC2834796
-
Insights into the intra-ring subunit order of TRiC/CCT: a structural and evolutionary analysis.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2010: 252-259
Abstract
TRiC is an important group II chaperonin that facilitates the folding of many eukaryotic proteins. The TRiC complex consists of two stacked rings, each comprised of eight paralogous subunits with a mutual sequence identity of 30-35%. Each subunit has unique functional roles that are manifested by corresponding sequence conservation. It is generally assumed that the subunit order within each ring is fixed, but this order is still uncertain. Here we address the problem of the intra-ring subunit order by combining two sources of information: evolutionary conservation and a structural hypothesis. Specifically, we identify residues in the TRiC subunits that are likely to be part of the intra-unit interface, based on homology modeling to the solved thermosome structure. Within this set of residues, we search for a subset that shows an evolutionary conservation pattern that is indicative of the subunit order key. This pattern shows considerable conservation across species, but large variation across the eight subunits. By this approach we were able to locate two parts of the interface where complementary interactions seem to favor certain pairing of subunits. This knowledge leads to restrictions on the 5,040 (=7!) possible subunits arrangements in the ring, and limits them to just 72. Although our findings give only partial understanding of the inter-subunit interactions that determine their order, we conclude that they are comprised of complementary charged, polar and hydrophobic interactions that occur in both the equatorial and middle domains of each subunit.
View details for PubMedID 19908377
-
Nature of the protein universe
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (27): 11079-11084
Abstract
The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by approximately 15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families.
View details for DOI 10.1073/pnas.0905029106
View details for Web of Science ID 000267796100040
View details for PubMedID 19541617
View details for PubMedCentralID PMC2698892
-
Structural Basis of Transcription: Backtracked RNA Polymerase II at 3.4 Angstrom Resolution
SCIENCE
2009; 324 (5931): 1203-1206
Abstract
Transcribing RNA polymerases oscillate between three stable states, two of which, pre- and posttranslocated, were previously subjected to x-ray crystal structure determination. We report here the crystal structure of RNA polymerase II in the third state, the reverse translocated, or "backtracked" state. The defining feature of the backtracked structure is a binding site for the first backtracked nucleotide. This binding site is occupied in case of nucleotide misincorporation in the RNA or damage to the DNA, and is termed the "P" site because it supports proofreading. The predominant mechanism of proofreading is the excision of a dinucleotide in the presence of the elongation factor SII (TFIIS). Structure determination of a cocrystal with TFIIS reveals a rearrangement whereby cleavage of the RNA may take place.
View details for DOI 10.1126/science.1168729
View details for Web of Science ID 000266410100047
View details for PubMedID 19478184
View details for PubMedCentralID PMC2718261
-
Outcome of a Workshop on Applications of Protein Models in Biomedical Research
STRUCTURE
2009; 17 (2): 151-159
Abstract
We describe the proceedings and conclusions from the "Workshop on Applications of Protein Models in Biomedical Research" (the Workshop) that was held at the University of California, San Francisco on 11 and 12 July, 2008. At the Workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) the requirements and challenges for different applications, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
View details for DOI 10.1016/j.str.2008.12.014
View details for Web of Science ID 000263384800003
View details for PubMedID 19217386
View details for PubMedCentralID PMC2739730
-
Generalized ensemble methods for de novo structure prediction
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (5): 1415-1420
Abstract
Current methods for predicting protein structure depend on two interrelated components: (i) an energy function that should have a low value near the correct structure and (ii) a method for searching through different conformations of the polypeptide chain. Identification of the most efficient search methods is essential if we are to be able to apply such methods broadly and with confidence. In addition, efficient search methods provide a rigorous test of existing energy functions, which are generally knowledge-based and contain different terms added together with arbitrary weights. Here, we test different search methods with one of the most accurate and predictive energy functions, namely Rosetta the knowledge-based force-field from Baker's group [Simons K, Kooperberg C, Huang E, Baker D (1997) J Mol Biol 268:209-225]. We use an implementation of a generalized ensemble search method to scale relevant parts of the energy function. This method, known as Hamiltonian Replica Exchange Monte Carlo, outperforms the original Monte Carlo Simulated Annealing used in the Rosetta package in terms of sampling low-energy states. It also outperforms another widely used generalized ensemble search method known as Temperature Replica Exchange Monte Carlo. Our results reveal clear deficiencies in the low-resolution Rosetta energy function in that the lowest energy structures are not necessarily the most native-like. By using a set of nonnative low-energy structures found by our extensive sampling, we discovered that the long-range and short-range backbone hydrogen-bonding energy terms of the Rosetta energy discriminate between the nonnative and native-like structures significantly better than the low-resolution score used in Rosetta.
View details for DOI 10.1073/pnas.0812510106
View details for Web of Science ID 000263074600025
View details for PubMedID 19171891
View details for PubMedCentralID PMC2631076
-
Can Morphing Methods Predict Intermediate Structures?
JOURNAL OF MOLECULAR BIOLOGY
2009; 385 (2): 665-674
Abstract
Movement is crucial to the biological function of many proteins, yet crystallographic structures of proteins can give us only a static snapshot. The protein dynamics that are important to biological function often happen on a timescale that is unattainable through detailed simulation methods such as molecular dynamics as they often involve crossing high-energy barriers. To address this coarse-grained motion, several methods have been implemented as web servers in which a set of coordinates is usually linearly interpolated from an initial crystallographic structure to a final crystallographic structure. We present a new morphing method that does not extrapolate linearly and can therefore go around high-energy barriers and which can produce different trajectories between the same two starting points. In this work, we evaluate our method and other established coarse-grained methods according to an objective measure: how close a coarse-grained dynamics method comes to a crystallographically determined intermediate structure when calculating a trajectory between the initial and final crystal protein structure. We test this with a set of five proteins with at least three crystallographically determined on-pathway high-resolution intermediate structures from the Protein Data Bank. For simple hinging motions involving a small conformational change, segmentation of the protein into two rigid sections outperforms other more computationally involved methods. However, large-scale conformational change is best addressed using a nonlinear approach and we suggest that there is merit in further developing such methods.
View details for DOI 10.1016/j.jmb.2008.10.064
View details for Web of Science ID 000262916900027
View details for PubMedID 18996395
-
Protein segment finder: an online search engine for segment motifs in the PDB
NUCLEIC ACIDS RESEARCH
2009; 37: D224-D228
Abstract
Finding related conformations in the Protein Data Bank (PDB) is essential in many areas of bioscience. To assist this task, we designed a search engine that uses a compact database to quickly identify protein segments obeying a set of primary, secondary and tertiary structure constraints. The database contains information such as amino acid sequence, secondary structure, disulfide bonds, hydrogen bonds and atoms in contact as calculated from all protein structures in the PDB. The search engine parses the database and returns hits that match the queried parameters. The conformation search engine, which is notable for its high speed and interactive feedback, is expected to assist scientists in discovering conformation homologs and predicting protein structure. The engine is publicly available at http://ari.stanford.edu/psf and it will also be used in-house in an automatic mode aimed at discovering new protein motifs.
View details for DOI 10.1093/nar/gkn833
View details for Web of Science ID 000261906200041
View details for PubMedID 18974183
View details for PubMedCentralID PMC2686524
-
Solvent dramatically affects protein structure refinement
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2008; 105 (51): 20239-20244
Abstract
One of the most challenging problems in protein structure prediction is improvement of homology models (structures within 1-3 A C(alpha) rmsd of the native structure), also known as the protein structure refinement problem. It has been shown that improvement could be achieved using in vacuo energy minimization with molecular mechanics and statistically derived continuously differentiable hybrid knowledge-based (KB) potential functions. Globular proteins, however, fold and function in aqueous solution in vivo and in vitro. In this work, we study the role of solvent in protein structure refinement. Molecular dynamics in explicit solvent and energy minimization in both explicit and implicit solvent were performed on a set of 75 native proteins to test the various energy potentials. A more stringent test for refinement was performed on 729 near-native decoys for each native protein. We use a powerfully convergent energy minimization method to show that implicit solvent (GBSA) provides greater improvement for some proteins than the KB potential: 24 of 75 proteins showing an average improvement of >20% in C(alpha) rmsd from the native structure with GBSA, compared to just 7 proteins with KB. Molecular dynamics in explicit solvent moved the structures further away from their native conformation than the initial, unrefined decoys. Implicit solvent gives rise to a deep, smooth potential energy attractor basin that pulls toward the native structure.
View details for DOI 10.1073/pnas.0810818105
View details for Web of Science ID 000261995600042
View details for PubMedID 19073921
View details for PubMedCentralID PMC2600579
-
Inhibition mechanism of the acetylcholine receptor by alpha-neurotoxins as revealed by normal-mode dynamics
BIOCHEMISTRY
2008; 47 (13): 4065-4070
Abstract
The nicotinic acetylcholine receptor (AChR) is the prototype of ligand-gated ion channels. Here, we calculate the dynamics of the muscle AChR using normal modes. The calculations reveal a twist-like gating motion responsible for channel opening. The ion channel diameter is shown to increase with this twist motion. Strikingly, the twist motion and the increase in channel diameter are not observed for the AChR in complex with two alpha-bungarotoxin (alphaBTX) molecules. The toxins seems to lock together neighboring receptor subunits, thereby inhibiting channel opening. Interestingly, one alphaBTX molecule suffices to prevent the twist motion. These results shed light on the gating mechanism of the AChR and present a complementary inhibition mechanism by snake-venom-derived alpha-neurotoxins.
View details for DOI 10.1021/bi702272j
View details for Web of Science ID 000254408200010
View details for PubMedID 18327915
View details for PubMedCentralID PMC2750825
-
How hydrophobic Buckminsterfullerene affects surrounding water structure
JOURNAL OF PHYSICAL CHEMISTRY B
2008; 112 (10): 2981-2990
Abstract
The hydrophobic hydration of fullerenes in water is of significant interest as the most common Buckminsterfullerene (C60) is a mesoscale sphere; C60 also has potential in pharmaceutical and nanomaterial applications. We use an all-atom molecular dynamics simulation lasting hundreds of nanoseconds to determine the behavior of a single molecule of C60 in a periodic box of water, and compare this to methane. A C60 molecule does not induce drying at the surface; however, unlike a hard sphere methane, a hard sphere C60 solute does. This is due to a larger number of attractive Lennard-Jones interactions between the carbon atom centers in C60 and the surrounding waters. In these simulations, water is not uniformly arranged but rather adopts a range of orientations in the first hydration shell despite the spherical symmetry of both solutes. There is a clear effect of solute size on the orientation of the first hydration shell waters. There is a large increase in hydrogen-bonding contacts between waters in the C60 first hydration shell. There is also a disruption of hydrogen bonds between waters in the first and second hydration shells. Water molecules in the first hydration shell preferentially create triangular structures that minimize the net water dipole near the surface near both the methane and C60 surface, reducing the total energy of the system. Additionally, in the first and second hydration shells, the water dipoles are ordered to a distance of 8 A from the solute surface. We conclude that, with a diameter of approximately 1 nm, C60 behaves as a large hydrophobic solute.
View details for DOI 10.1021/jp076416h
View details for Web of Science ID 000253784700031
View details for PubMedID 18275178
-
Probing protein fold space with a simplified model
JOURNAL OF MOLECULAR BIOLOGY
2008; 375 (4): 920-933
Abstract
We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (33-300 residues) amino acid composition and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of parallel tempering and equi-energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all alpha: 4.77 A, all beta: 2.93 A, alpha/beta: 3.09 A, alpha+beta: 4.89 A on average and within 6 A for 71.41%, 92.85%, 94.29% and 64.28% for all-alpha, all-beta, alpha/beta and alpha+beta, classes, respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of alpha and beta folds. We find that alpha/beta proteins with alternating alpha and beta segments (such as the beta-barrel) are more stable than proteins in other fold classes.
View details for DOI 10.1016/j.jmb.2007.10.087
View details for Web of Science ID 000253098000004
View details for PubMedID 18054792
View details for PubMedCentralID PMC2254652
-
Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution
STRUCTURE
2007; 15 (12): 1630-1641
Abstract
Structural studies of large proteins and protein assemblies are a difficult and pressing challenge in molecular biology. Experiments often yield only low-resolution or sparse data that are not sufficient to fully determine atomistic structures. We have developed a general geometry-based algorithm that efficiently samples conformational space under constraints imposed by low-resolution density maps obtained from electron microscopy or X-ray crystallography experiments. A deformable elastic network (DEN) is used to restrain the sampling to prior knowledge of an approximate structure. The DEN restraints dramatically reduce over-fitting, especially at low resolution. Cross-validation is used to optimally weight the structural information and experimental data. Our algorithm is robust even for noise-added density maps and has a large radius of convergence for our test case. The DEN restraints can also be used to enhance reciprocal space simulated annealing refinement.
View details for DOI 10.1016/j.str.2007.09.021
View details for Web of Science ID 000251655400015
View details for PubMedID 18073112
View details for PubMedCentralID PMC2213367
-
Simulations of RNA base pairs in a nanodroplet reveal solvation-dependent stability
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (30): 12336-12340
Abstract
We show that RNA base pairs have variable stability depending on their degree of solvation. This finding has far-reaching biological implications for nucleic acid structure in a partially solvated cellular environment such as inside RNA-protein complexes. Molecular dynamics simulations of partially solvated Watson-Crick RNA base pairs show that whereas water serves to destabilize a base pair by competing for and disrupting base-base hydrogen bonds, when sufficient water molecules are present, fewer hydrogen bonds are available to disrupt the base pairs and the destabilization effect is reduced. The result is that base pairs exist at a stability minimum when solvated in between 20 and 100 water molecules, the upper limit of which corresponds to the approximate number of water molecules contained in the first hydration shell.
View details for DOI 10.1073/pnas.0705573104
View details for Web of Science ID 000248472100020
View details for PubMedID 17636124
View details for PubMedCentralID PMC1920539
-
Growth of novel protein structural data
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (9): 3183-3188
Abstract
Contrary to popular assumption, the rate of growth of structural data has slowed, and the Protein Data Bank (PDB) has not been growing exponentially since 1995. Reaching such a dramatic conclusion requires careful measurement of growth of novel structures, which can be achieved by clustering entry sequences, or by using a novel index to down-weight entries with a higher number of sequence neighbors. These measures agree, and growth rates are very similar for entire PDB files, clusters, and weighted chains. The overall sizes of Structural Classification of Proteins (SCOP) categories (number of families, superfamilies, and folds) appear to be directly proportional to the number of deposited PDB files. Using our weighted chain count, which is most correlated to the change in the size of each SCOP category in any time period, shows that the rate of increase of SCOP categories is actually slowing down. This enables the final size of each of these SCOP categories to be predicted without examining or comparing protein structures. In the last 3 years, structures solved by structural genomics (SG) initiatives, especially the United States National Institutes of Health Protein Structure Initiative, have begun to redress the slowing growth of the PDB. Structures solved by SG are 3.8 times less sequence-redundant than typical PDB structures. Since mid-2004, SG programs have contributed half the novel structures measured by weighted chain counts. Our analysis does not rely on visual inspection of coordinate sets: it is done automatically, providing an accurate, up-to-date measure of the growth of novel protein structural data.
View details for DOI 10.1073/pnas.0611678104
View details for Web of Science ID 000244661400031
View details for PubMedID 17360626
View details for PubMedCentralID PMC1802002
-
Near-native structure refinement using in vacuo energy minimization
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (9): 3177-3182
Abstract
One of the greatest shortcomings of macromolecular energy minimization and molecular dynamics techniques is that they generally do not preserve the native structure of proteins as observed by x-ray crystallography. This deformation of the native structure means that these methods are not generally used to refine structures produced by homology-modeling techniques. Here, we use a database of 75 proteins to test the ability of a variety of popular molecular mechanics force fields to maintain the native structure. Minimization from the native structure is a weak test of potential energy functions: It is complemented by a much stronger test in which the same methods are compared for their ability to attract a near-native decoy protein structure toward the native structure. We use a powerfully convergent energy-minimization method and show that, of the traditional molecular mechanics potentials tested, only one showed a modest net improvement over a large data set of structurally diverse proteins. A smooth, differentiable knowledge-based pairwise atomic potential performs better on this test than traditional potential functions. This work is expected to have important implications for protein structure refinement, homology modeling, and structure prediction.
View details for DOI 10.1073/pnas.0611593104
View details for Web of Science ID 000244661400030
View details for PubMedID 17360625
View details for PubMedCentralID PMC1802011
-
Discussion of "equi-energy sampler" by Kou, Zhou and Wong
ANNALS OF STATISTICS
2006; 34 (4): 1636-1641
View details for DOI 10.1214/009053606000000470
View details for Web of Science ID 000242314100004
-
Spatial regulation and the rate of signal transduction activation
PLOS COMPUTATIONAL BIOLOGY
2006; 2 (5): 343-349
Abstract
Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.
View details for DOI 10.1371/journal.pcbi.0020044
View details for Web of Science ID 000239493900003
View details for PubMedID 16699596
View details for PubMedCentralID PMC1458967
-
Theory and simulation - Accuracy and reliability in modelling proteins and complexes
CURRENT OPINION IN STRUCTURAL BIOLOGY
2006; 16 (2): 139-141
View details for DOI 10.1016/j.sbi.2006.03.012
View details for Web of Science ID 000237234500001
-
An atomic environment potential for use in protein structure prediction
JOURNAL OF MOLECULAR BIOLOGY
2005; 352 (4): 986-1001
Abstract
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.
View details for DOI 10.1016/j.jmb.2005.07.054
View details for Web of Science ID 000232188100018
View details for PubMedID 16126228
-
Describing RNA structure by libraries of clustered nucleotide doublets
JOURNAL OF MOLECULAR BIOLOGY
2005; 351 (1): 26-38
Abstract
The rapidly increasing wealth of structural information on RNA and knowledge of its varying roles in biology have facilitated the study of RNA structure using computational methods. Here, we present a new method to describe RNA structure based on nucleotide doublets, where a doublet is any two nucleotides in a structure. We restrict our search to doublets that are close together in space, but not necessarily in sequence, and obtain doublet libraries of various sizes by clustering a large set of doublets taken from a data set of high-resolution RNA structures. We demonstrate that these libraries are able to both capture structural features present in RNA and fit local RNA structure with a high level of accuracy. Libraries ranging in size from ten to 100 doublets are examined, and a detailed analysis shows that a library with as few as 30 doublets is sufficient to capture the most common structural features, while larger libraries would be more appropriate for accurate modeling. We anticipate many uses for these libraries, from annotation to structure refinement and prediction.
View details for DOI 10.1016/j.jmb.2005.06.024
View details for Web of Science ID 000230803100004
View details for PubMedID 15993894
View details for PubMedCentralID PMC2746451
-
Nonpolar solutes enhance water structure within hydration shells while reducing interactions between them
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2005; 102 (19): 6777-6782
Abstract
The origins of the hydrophobic effect are widely thought to lie in structural changes of the water molecules surrounding a nonpolar solute. The spatial distribution functions of the water molecules surrounding benzene and cyclohexane computed previously from molecular dynamics simulations show a high density first hydration shell surrounding both solutes. In addition, benzene showed a strong preference for hydrogen bonding with two water molecules, one to each face of the benzene ring. The position data alone, however, do not describe the majority of orientational changes in the water molecules in the first hydration shells surrounding these solutes. In this paper, we measure the changes in orientation of the water molecules with respect to the solute through spatial orientation functions as well as radial/angular distribution functions. These data show that the water molecules hydrogen bonded to benzene have a strong orientation preference, whereas those around cyclohexane show a weaker tendency. In addition, the water-water interactions within and between the first two hydration shells were measured as a function of distance and "best" hydrogen bonding angle. Water molecules within the first hydration shell have increased hydrogen bonding structure; water molecules interacting across shell 1 and shell 2 have reduced hydrogen bonding structure.
View details for DOI 10.1073/pnas.0500225102
View details for Web of Science ID 000229048500026
View details for PubMedID 15867152
View details for PubMedCentralID PMC1100774
-
Comprehensive evaluation of protein structure alignment methods: Scoring by geometric measures
JOURNAL OF MOLECULAR BIOLOGY
2005; 346 (4): 1173-1188
Abstract
We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity. We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists. We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called "Best-of-All" that combines the best results of all methods. Many commonly used methods miss 10-50% of the good Best-of-All alignments. By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution.
View details for DOI 10.1016/j.jmb.2004.12.032
View details for Web of Science ID 000227187800018
View details for PubMedID 15701525
View details for PubMedCentralID PMC2692023
-
Inverse kinematics in biology: The protein loop closure problem
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH
2005; 24 (2-3): 151-163
View details for DOI 10.1177/0278364905050352
View details for Web of Science ID 000227409900004
-
Diffusion of nucleoside triphosphates and role of the entry site to the RNA polymerase II active center
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (50): 17361-17364
Abstract
Nucleoside triphosphates (NTPs) diffuse to the active center of RNA polymerase II through a funnel-shaped opening that narrows to a negatively charged pore. Computer simulation shows that the funnel and pore reduce the rate of diffusion by a factor of approximately 2 x 10(-7). The resulting limitation on the rate of RNA synthesis under conditions of low NTP concentration may be overcome by NTP binding to an entry site adjacent to the active center. Binding to the entry site greatly enhances the lifetime of an NTP in the active center region, and it prevents "backtracking" and the consequent occlusion of the active site.
View details for DOI 10.1073/pnas.0408168101
View details for Web of Science ID 000225803400011
View details for PubMedID 15574497
View details for PubMedCentralID PMC536049
-
The area derivative of a space-filling diagram
DISCRETE & COMPUTATIONAL GEOMETRY
2004; 32 (3): 293-308
View details for DOI 10.1007/s00454-004-1099-1
View details for Web of Science ID 000223650400001
-
Detailed hydration maps of benzene and cyclohexane reveal distinct water structures
JOURNAL OF PHYSICAL CHEMISTRY B
2004; 108 (35): 13492-13500
View details for DOI 10.1021/jp049481p
View details for Web of Science ID 000223600800057
-
Improved protein structure selection using decoy-dependent discriminatory functions.
BMC structural biology
2004; 4: 8-?
Abstract
A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations.We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement.Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
View details for PubMedID 15207004
-
Simulating protein evolution in sequence and structure space
CURRENT OPINION IN STRUCTURAL BIOLOGY
2004; 14 (2): 202-207
Abstract
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.
View details for DOI 10.1016/j.sbi.2004.03.001
View details for Web of Science ID 000221340700012
View details for PubMedID 15093835
-
Funnel-like organization in sequence space determines the distributions of protein stability and folding rate preferred by evolution
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2004; 55 (1): 107-114
Abstract
To understand the physical and evolutionary determinants of protein folding, we map out the complete organization of thermodynamic and kinetic properties for protein sequences that share the same fold. The exhaustive nature of our study necessitates using simplified models of protein folding. We obtain a stability map and a folding rate map in sequence space. Comparison of the two maps reveals a common organizational principle: optimality decreases more or less uniformly with distance from the optimal sequence in the sequence space. This gives a funnel-shaped optimality surface. Evolutionary dynamics of a sequence population on these two maps reveal how the simple organization of sequence space affects the distributions of stability and folding rate preferred by evolution.
View details for DOI 10.1002/prot.10563
View details for Web of Science ID 000220980300011
View details for PubMedID 14997545
View details for PubMedCentralID PMC2745081
-
The ASTRAL Compendium in 2004
NUCLEIC ACIDS RESEARCH
2004; 32: D189-D192
Abstract
The ASTRAL Compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release 4 years ago. ASTRAL has undergone major transformations in the past 2 years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods. ASTRAL may be accessed at http://astral.stanford. edu/.
View details for DOI 10.1093/nar/gkh034
View details for Web of Science ID 000188079000043
View details for PubMedID 14681391
-
Funnel sculpting for in silico assembly of secondary structure elements of proteins
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2003; 100 (19): 10700-10705
Abstract
We present a method for designing a funnel-shaped free-energy surface that reproducibly assembles secondary structure elements of proteins into their native conformations from a random extended configuration. Assuming a priori knowledge of secondary structure, our method can design a funnel-shaped surface for folding of alpha, beta, and alphabeta structures individually. We design energy surfaces that fold up to five unrelated sequences with the same energy parameters. We develop a measure of the foldability of an energy landscape in silico and present an alternative way to view energy landscapes.
View details for DOI 10.1073/pnas.1732312100
View details for Web of Science ID 000185415300025
View details for PubMedID 12925740
View details for PubMedCentralID PMC293046
-
A novel approach to decoy set generation: Designing a physical energy function having local minima with native structure characteristics
JOURNAL OF MOLECULAR BIOLOGY
2003; 329 (1): 159-174
Abstract
We suggest a new approach to the generation of candidate structures (decoys) for ab initio prediction of protein structures. Our method is based on random sampling of conformation space and subsequent local energy minimization. At the core of this approach lies the design of a novel type of energy function. This energy function has local minima with native structure characteristics and wide basins of attraction. The current work presents our motivation for deriving such an energy function and also tests the derived energy function. Our approach is novel in that it takes advantage of the inherently rough energy landscape of proteins, which is generally considered a major obstacle for protein structure prediction. When local minima have wide basins of attraction, the protein's conformation space can be greatly reduced by the convergence of large regions of the space into single points, namely the local minima corresponding to these funnels. We have implemented this concept by an iterative process. The potential is first used to generate decoy sets and then we study these sets of decoys to guide further development of the potential. A key feature of our potential is the use of cooperative multi-body interactions that mimic the role of the entropic and solvent contributions to the free energy. The validity and value of our approach is demonstrated by applying it to 14 diverse, small proteins. We show that, for these proteins, the size of conformation space is considerably reduced by the new energy function. In fact, the reduction is so substantial as to allow efficient conformational sampling. As a result we are able to find a significant number of near-native conformations in random searches performed with limited computational resources.
View details for DOI 10.1016/S0022-2836(03)00323-1
View details for Web of Science ID 000183067000013
View details for PubMedID 12742025
-
Protein decoy assembly using short fragments under geometric constraints
BIOPOLYMERS
2003; 68 (3): 278-285
Abstract
A small set of protein fragments can represent adequately all known local protein structure. This set of fragments, along with a construction scheme that assembles these fragments into structures, defines a discrete (relatively small) conformation space, which approximates protein structures accurately. We generate protein decoys by sampling geometrically valid structures from this conformation space, biased by the secondary structure prediction for the protein. Unlike other methods, secondary structure prediction is the only protein-specific information used for generating the decoys. Nevertheless, these decoys are qualitatively similar to those found by others. The method works well for all-alpha proteins, and shows promising results for alpha and beta proteins.
View details for DOI 10.1002/bip.10262
View details for Web of Science ID 000181363000006
View details for PubMedID 12601789
-
Evidence of turn and salt bridge contributions to beta-hairpin stability: MD simulations of C-terminal fragment from the B1 domain of protein G
BIOPHYSICAL CHEMISTRY
2002; 101: 187-201
Abstract
We ran and analyzed a total of eighteen, 10 ns molecular dynamics simulations of two C-terminal beta-hairpins from the B1 domain of Protein G: twelve runs for the last 16 residues and six runs for the last 15 residues, G41-E56 and E42-E56, respectively. Based on their CalphaRMS deviation from the starting structure and the pattern of stabilizing interactions (hydrogen bonds, hydrophobic contacts, and salt bridges), we were able to classify the twelve runs on G41-E56 into one of three general states of the beta-hairpin ensemble: 'Stable', 'Unstable', and 'Unfolded'. Comparing the specific interactions between these states, we find that on average the stable beta-hairpin buries 287 A(2) of hydrophobic surface area, makes 13 hydrogen bonds, and forms 3 salt-bridges. We find that the hydrophobic core prefers to make some specific contacts; however, this core does not require optimal packing. Side-chain hydrogen bonds stabilize the beta-hairpin turn with strong stabilizing interactions primarily due to the carboxyl of D46 with contributions from T49 hydroxyl. Buoyed by the strength of the hydrophobic core, other hydrogen bonds, primarily main-chain, guide the beta-hairpin into registration by forming a loose network of interactions, making an approximately constant number of hydrogen bonds from a pool of possible candidates. In simulations on E42-E56, where the salt bridge closing the termini is not favored, we observe that all the simulations show no 'Stable' behavior, but are 'Unstable' or 'Unfolded'. We can estimate that the salt-bridge between the termini provides approximately 1.3 kcal/mol. Altogether, the results suggest that the beta-hairpin folds beginning at the turn, followed by hydrophobic collapse, and then hydrogen bond formation. Salt bridges help to stabilize the folded conformations by inhibiting unfolded states.
View details for Web of Science ID 000180165100020
-
Sequence variations within protein families are linearly related to structural variations
JOURNAL OF MOLECULAR BIOLOGY
2002; 323 (3): 551-562
Abstract
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.
View details for DOI 10.1016/S0022-2836(02)00971-3
View details for Web of Science ID 000179083800012
View details for PubMedID 12381308
View details for PubMedCentralID PMC2692051
-
Small libraries of protein fragments model native protein structures accurately
JOURNAL OF MOLECULAR BIOLOGY
2002; 323 (2): 297-307
Abstract
Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9A for a 2.7-state model on the basis of fragments of length 7-0.76A for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1A compared to over 20 states per residue needed previously. For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling.
View details for DOI 10.1016/S0022-2836(02)00942-7
View details for Web of Science ID 000178976500011
View details for PubMedID 12381322
-
Roles of mutation and recombination in the evolution of protein thermodynamics
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2002; 99 (16): 10382-10387
Abstract
We present a comprehensive study of the evolutionary origin of the thermodynamic behavior of proteins. With the use of a simplified model, we exhaustively enumerate the space of all sequences and the space of all structures, simulate the evolutionary relationship between sequences and structures, and characterize the steady-state sequence distribution for all structures in terms of several thermodynamic variables. We assess the effects of two major forces of evolution: mutation and recombination. Three simplifications are made. First, a two-dimensional lattice model is used to represent protein sequences and structures. Second, proteins undergo neutral evolution so that the fitness landscape has a flat allowed region inside of which all sequences are equally fit. Third, we ignore otherwise important factors such as finite population size and evolutionary time. Two scenarios emerge from our study. The first occurs when evolution is dominated by mutation events. Even though the prototype sequence that is most mutationally robust is preferred by evolution, the preference is not strong enough to offset the huge size of sequence space. Most native sequences are located near the boundary of the fitness region and are marginally compatible with the native structure. The second scenario occurs when evolution is dominated by recombination events. Now evolutionary preference for prototype sequence is strong enough to overcome the size of sequence space so that most native sequences are located near the center of sequence-structure compatibility. We conclude that the relative frequency of mutation and recombination events is a major determinant of how optimal protein sequences are for their structures.
View details for DOI 10.1073/pnas.162097799
View details for Web of Science ID 000177343200032
View details for PubMedID 12149452
View details for PubMedCentralID PMC124923
-
Design of an optimal Chebyshev-expanded discrimination function for globular proteins
PROTEIN SCIENCE
2002; 11 (8): 2010-2021
Abstract
We describe the construction of a scoring function designed to model the free energy of protein folding. An optimization technique is used to determine the best functional forms of the hydrophobic, residue-residue and hydrogen-bonding components of the potential. The scoring function is expanded by use of Chebyshev polynomials, the coefficients of which are determined by minimizing the score, in units of standard deviation, of native structures in the ensembles of alternate decoy conformations. The derived effective potential is then tested on decoy sets used conventionally in such studies. Using our scoring function, we achieve a high level of discrimination between correct and incorrect folds. In addition, our method is able to represent functions of arbitrary shape with fewer parameters than the usual histogram potentials of similar resolution. Finally, our representation can be combined easily with many optimization methods, because the total energy is a linear function of the parameters. Our results show that the techniques of Z-score optimization and Chebyshev expansion work well.
View details for DOI 10.1110/ps.0200702
View details for Web of Science ID 000177036500015
View details for PubMedID 12142455
View details for PubMedCentralID PMC2373672
-
A comprehensive analysis of 40 blind protein structure predictions.
BMC structural biology
2002; 2: 3-?
Abstract
We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships.For 23 of these proteins, we produced models ranging from 1.0 to 6.0 A root mean square deviation (RMSD) for the Calpha atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 A Calpha RMSD for 60-100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 A Calpha RMSD for residues 1-80 for T110/rbfa.The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism.
View details for PubMedID 12150712
-
Protein topology and stability define the space of allowed sequences
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2002; 99 (3): 1280-1285
Abstract
We describe a new approach to explore and quantify the sequence space associated with a given protein structure. A set of sequences are optimized for a given target structure, using all-atom models and a physical energy function. Specificity of the sequence for its target is ensured by using the random energy model, which keeps the amino acid composition of the sequence constant. The designed sequences provide a multiple sequence alignment that describes the sequence space compatible with the structure of interest; here the size of this space is estimated by using an information entropy measure. In parallel, multiple alignments of naturally occurring sequences can be derived by using either sequence or structure alignments. We compared these 3 independent multiple sequence alignments for 10 different proteins, ranging in size from 56 to 310 residues. We observed that the subset of the sequence space derived by using our design procedure is similar in size to the sequence spaces observed in nature. These results suggest that the volume of sequence space compatible with a given protein fold is defined by the length of the protein as well as by the topology (i.e., geometry of the polypeptide chain) and the stability (i.e., free energy of denaturation) of the fold.
View details for Web of Science ID 000173752500035
View details for PubMedID 11805293
-
Within the twilight zone: A sensitive profile-profile comparison tool based on information theory
JOURNAL OF MOLECULAR BIOLOGY
2002; 315 (5): 1257-1275
Abstract
This paper presents a novel approach to profile-profile comparison. The method compares two input profiles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our profile-profile comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the profile-profile alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is significantly more sensitive in detecting distant homologies than the popular profile-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity.
View details for DOI 10.1006/jmbi.2001.5293
View details for Web of Science ID 000173867900026
View details for PubMedID 11827492
-
Improved recognition of native-like protein structures using a family of designed sequences
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2002; 99 (2): 691-696
Abstract
The goal of the inverse protein folding problem is to identify amino acid sequences that stabilize a given target protein conformation. Methods that attempt to solve this problem have proven useful for protein sequence design. Here we show that the same methods can provide valuable information for protein fold recognition and for ab initio protein structure prediction. We present a measure of the compatibility of a test sequence with a target model structure, based on computational protein design. The model structure is used as input to design a family of low free energy sequences, and these sequences are compared with the test sequence by using a metric in sequence space based on nearest-neighbor connectivity. We find that this measure is able to recognize the native fold of a myoglobin sequence among different globin folds. It is also powerful enough to recognize near-native protein structures among non-native models.
View details for Web of Science ID 000173450100030
View details for PubMedID 11782533
-
ASTRAL compendium enhancements
NUCLEIC ACIDS RESEARCH
2002; 30 (1): 260-263
Abstract
The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. It is partially derived from the SCOP database of protein domains, and it includes sequences for each domain as well as other resources useful for studying these sequences and domain structures. Several major improvements have been made to the ASTRAL compendium since its initial release 2 years ago. The number of protein domain sequences included has doubled from 15 190 to 30 867, and additional databases have been added. The Rapid Access Format (RAF) database contains manually curated mappings linking the biological amino acid sequences described in the SEQRES records of PDB entries to the amino acid sequences structurally observed (provided in the ATOM records) in a format designed for rapid access by automated tools. This information is used to derive sequences for protein domains in the SCOP database. In cases where a SCOP domain spans several protein chains, all of which can be traced back to a single genetic source, a 'genetic domain' sequence is created by concatenating the sequences of each chain in the order found in the original gene sequence. Both the original-style library of SCOP sequences and a new library including genetic domain sequences are available. Selected representative subsets of each of these libraries, based on multiple criteria and degrees of similarity, are also included. ASTRAL may be accessed at http://astral.stanford.edu/.
View details for Web of Science ID 000173077100070
View details for PubMedID 11752310
-
Peter Kollman - Obituary
NATURE STRUCTURAL BIOLOGY
2001; 8 (8): 662-662
View details for Web of Science ID 000170139500008
-
Quantification of the hydrophobic interaction by simulations of the aggregation of small hydrophobic solutes in water
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2001; 98 (11): 5965-5969
Abstract
The hydrophobic interaction, the tendency for nonpolar molecules to aggregate in solution, is a major driving force in biology. In a direct approach to the physical basis of the hydrophobic effect, nanosecond molecular dynamics simulations were performed on increasing numbers of hydrocarbon solute molecules in water-filled boxes of different sizes. The intermittent formation of solute clusters gives a free energy that is proportional to the loss in exposed molecular surface area with a constant of proportionality of 45 +/- 6 cal/mol A(2). The molecular surface area is the envelope of the solute cluster that is impenetrable by solvent and is somewhat smaller than the more traditional solvent-accessible surface area, which is the area transcribed by the radius of a solvent molecule rolled over the surface of the cluster. When we apply a factor relating molecular surface area to solvent-accessible surface area, we obtain 24 cal/mol A(2). Ours is the first direct calculation, to our knowledge, of the hydrophobic interaction from molecular dynamics simulations; the excellent qualitative and quantitative agreement with experiment proves that simple van der Waals interactions and atomic point-charge electrostatics account for the most important driving force in biology.
View details for Web of Science ID 000168883700008
View details for PubMedID 11353861
-
The birth of computational structural biology
NATURE STRUCTURAL BIOLOGY
2001; 8 (5): 392-393
View details for Web of Science ID 000168315300008
View details for PubMedID 11323711
-
Determination of optimal Chebyshev-expanded hydrophobic discrimination function for globular proteins
IBM JOURNAL OF RESEARCH AND DEVELOPMENT
2001; 45 (3-4): 525-532
View details for Web of Science ID 000170154600013
-
A novel method for sampling alpha-helical protein backbones
JOURNAL OF MOLECULAR BIOLOGY
2001; 305 (2): 191-201
Abstract
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.
View details for DOI 10.1006/jmbi.2000.4290
View details for Web of Science ID 000166413600002
View details for PubMedID 11124899
-
De novo protein design
4th NATO Advanced-Study-Institute on Dynamics, Structure and Function of Biological Macromolecules
I O S PRESS. 2001: 57–75
View details for Web of Science ID 000170487100005
-
Extracting knowledge-based energy functions from protein structures by error rate minimization: Comparison of methods using lattice model
JOURNAL OF CHEMICAL PHYSICS
2000; 113 (20): 9318-9330
View details for Web of Science ID 000165217000051
-
Constructing side chains on near-native main chains for ab initio protein structure prediction
PROTEIN ENGINEERING
2000; 13 (7): 453-457
Abstract
Is there value in constructing side chains while searching protein conformational space during an ab initio simulation? If so, what is the most computationally efficient method for constructing these side chains? To answer these questions, four published approaches were used to construct side chain conformations on a range of near-native main chains generated by ab initio protein structure prediction methods. The accuracy of these approaches was compared with a naive approach that selects the most frequently observed rotamer for a given amino acid to construct side chains. An all-atom conditional probability discriminatory function is useful at selecting conformations with overall low all-atom root mean square deviation (r.m.s.d.) and the discrimination improves on sets that are closer to the native conformation. In addition, the naive approach performs as well as more sophisticated methods in terms of the percentage of chi(1) angles built accurately and the all-atom r. m.s.d., between the native and near-native conformations. The results suggest that the naive method would be extremely useful for fast and efficient side chain construction on vast numbers of conformations for ab initio prediction of protein structure.
View details for Web of Science ID 000088743100001
View details for PubMedID 10906341
-
Decoys 'R' Us: A database of incorrect conformations to improve protein structure prediction
PROTEIN SCIENCE
2000; 9 (7): 1399-1401
Abstract
The development of an energy or scoring function for protein structure prediction is greatly enhanced by testing the function on a set of computer-generated conformations (decoys) to determine whether it can readily distinguish native-like conformations from nonnative ones. We have created "Decoys 'R' Us," a database containing many such sets of conformations, to provide a resource that allows scoring functions to be improved.
View details for Web of Science ID 000088376100017
View details for PubMedID 10933507
-
Ab initio construction of protein tertiary structures using a hierarchical approach
JOURNAL OF MOLECULAR BIOLOGY
2000; 300 (1): 171-185
Abstract
We present a hierarchical method to predict protein tertiary structure models from sequence. We start with complete enumeration of conformations using a simple tetrahedral lattice model. We then build conformations with increasing detail, and at each step select a subset of conformations using empirical energy functions with increasing complexity. After enumeration on lattice, we select a subset of low energy conformations using a statistical residue-residue contact energy function, and generate all-atom models using predicted secondary structure. A combined knowledge-based atomic level energy function is then used to select subsets of the all-atom models. The final predictions are generated using a consensus distance geometry procedure. We test the feasibility of the procedure on a set of 12 small proteins covering a wide range of protein topologies. A rigorous double-blind test of our method was made under the auspices of the CASP3 experiment, where we did ab initio structure predictions for 12 proteins using this approach. The performance of our methodology at CASP3 is reasonably good and completely consistent with our initial tests.
View details for DOI 10.1006/jmbi.2000.3835
View details for Web of Science ID 000087980600015
View details for PubMedID 10864507
-
Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins.
Proceedings. International Conference on Intelligent Systems for Molecular Biology
2000; 8: 395-406
Abstract
In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure we defined 4,670 clusters of related sequences in this space. Of these clusters, 1,421 are centered on a sequence of known structure. All 4,670 clusters were then compared using either a structure metric (when 3D structures are known) or a novel sequence profile metric. These scores were used to define a unified and consistent metric between all clusters. Two schemes were employed to organize these clusters in a meta-organization. The first uses a graph theory method and cluster the clusters in an hierarchical organization. This organization extends our ability to predict the structure and function of many proteins beyond what is possible with existing tools for sequence analysis. The second uses a variation on a multidimensional scaling technique to embed the clusters in a low dimensional real space. This last approach resulted in a projection of the protein space onto a 2D plane that provides us with a bird's eye view of the protein space. Based on this map we suggest a list of possible target sequences with unknown structure that are likely to adopt new, unknown folds.
View details for PubMedID 10977100
-
Expectations from structural genomics
PROTEIN SCIENCE
2000; 9 (1): 197-200
Abstract
Structural genomics projects aim to provide an experimental structure or a good model for every protein in all completed genomes. Most of the experimental work for these projects will be directed toward proteins whose fold cannot be readily recognized by simple sequence comparison with proteins of known structure. Based on the history of proteins classified in the SCOP structure database, we expect that only about a quarter of the early structural genomics targets will have a new fold. Among the remaining ones, about half are likely to be evolutionarily related to proteins of known structure, even though the homology could not be readily detected by sequence analysis.
View details for Web of Science ID 000085042500023
View details for PubMedID 10739263
-
The ASTRAL compendium for protein structure and sequence analysis
NUCLEIC ACIDS RESEARCH
2000; 28 (1): 254-256
Abstract
The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. The SPACI scores included in the system summarize the overall characteristics of a protein structure. A structural alignments database indicates residue equivalencies in superimposed protein domain structures. The PDB sequence-map files provide a linkage between the amino acid sequence of the molecule studied (SEQRES records in a database entry) and the sequence of the atoms experimentally observed in the structure (ATOM records). These maps are combined with information in the SCOPdatabase to provide sequences of protein domains. Selected subsets of the domain database, with varying degrees of similarity measured in several different ways, are also available. ASTRALmay be accessed at http://astral.stanford.edu/
View details for Web of Science ID 000084896300073
View details for PubMedID 10592239
-
Probing structure-function relationships of the DNA polymerase alpha-associated zinc-finger protein using computational approaches.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2000: 179-190
Abstract
We present the application of a method for protein structure prediction to aid the determination of structure-function relationships by experiment. The structure prediction method was rigourously tested by making blind predictions at the third meeting on the Critical Assessment of Protein Structure methods (CASP3). The method is a combined hierarchical approach involving exhaustive enumeration of all possible folds of a small protein sequence on a tetrahedral lattice. A set of filters, primarily in the form of discriminatory functions, are applied to these conformations. As the filters are applied, greater detail is added to the models resulting in a handful of all-atom "final" conformations. Encouraged by the results at CASP3, we used our approach to help solve a practical biological problem: the prediction of the structure and function of the 67-residue C-terminal zinc-finger region of the DNA polymerase alpha-associated zinc-finger (PAZ) protein. We discuss how the prediction points to a novel function relative to the sequence homologs, in conjunction with evidence from experiment, and how the predicted structure is guiding further experimental studies. This work represents a move from the theoretical realm to actual application of structure prediction methods for gaining unique insight to guide experimental biologists.
View details for PubMedID 10902167
-
De novo protein design. I. In search of stability and specificity
JOURNAL OF MOLECULAR BIOLOGY
1999; 293 (5): 1161-1181
Abstract
We have developed a fully automated protein design strategy that works on the entire sequence of the protein and uses a full atom representation. At each step of the procedure, an all-atom model of the protein is built using the template protein structure and the current designed sequence. The energy of the model is used to drive a Monte Carlo optimization in sequence space: random moves are either accepted or rejected based on the Metropolis criterion. We rely on the physical forces that stabilize native protein structures to choose the optimum sequence. Our energy function includes van der Waals interactions, electrostatics and an environment free energy. Successful protein design should be specific and generate a sequence compatible with the template fold and incompatible with competing folds. We impose specificity by maintaining the amino acid composition constant, based on the random energy model. The specificity of the optimized sequence is tested by fold recognition techniques. Successful sequence designs for the B1 domain of protein G, for the lambda repressor and for sperm whale myoglobin are presented. We show that each additional term of the energy function improves the performance of our design procedure: the van der Waals term ensures correct packing, the electrostatics term increases the specificity for the correct native fold, and the environment solvation term ensures a correct pattern of buried hydrophobic and exposed hydrophilic residues. For the globin family, we show that we can design a protein sequence that is stable in the myoglobin fold, yet incompatible with the very similar hemoglobin fold.
View details for Web of Science ID 000083798400015
View details for PubMedID 10547293
-
De novo protein design. II. Plasticity in sequence space
JOURNAL OF MOLECULAR BIOLOGY
1999; 293 (5): 1183-1193
Abstract
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.
View details for Web of Science ID 000083798400016
View details for PubMedID 10547294
-
Structure-based conformational preferences of amino acids
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1999; 96 (22): 12524-12529
Abstract
Proteins can be very tolerant to amino acid substitution, even within their core. Understanding the factors responsible for this behavior is of critical importance for protein engineering and design. Mutations in proteins have been quantified in terms of the changes in stability they induce. For example, guest residues in specific secondary structures have been used as probes of conformational preferences of amino acids, yielding propensity scales. Predicting these amino acid propensities would be a good test of any new potential energy functions used to mimic protein stability. We have recently developed a protein design procedure that optimizes whole sequences for a given target conformation based on the knowledge of the template backbone and on a semiempirical potential energy function. This energy function is purely physical, including steric interactions based on a Lennard-Jones potential, electrostatics based on a Coulomb potential, and hydrophobicity in the form of an environment free energy based on accessible surface area and interatomic contact areas. Sequences designed by this procedure for 10 different proteins were analyzed to extract conformational preferences for amino acids. The resulting structure-based propensity scales show significant agreements with experimental propensity scale values, both for alpha-helices and beta-sheets. These results indicate that amino acid conformational preferences are a natural consequence of the potential energy we use. This confirms the accuracy of our potential and indicates that such preferences should not be added as a design criterion.
View details for Web of Science ID 000083373000060
View details for PubMedID 10535955
-
Hierarchy of structure loss in MD simulations of src SH3 domain unfolding
JOURNAL OF MOLECULAR BIOLOGY
1999; 291 (1): 215-225
Abstract
To complement experimental studies of the src SH3 domain folding, we studied 30 independent, high-temperature, molecular dynamics simulations of src SH3 domain unfolding. These trajectories were observed to differ widely from each other. Thus, rather than analyzing individual trajectories, we sought to identify the recurrent features of the high-temperature unfolding process. The conformations from all simulations were combined and then divided into groups based on the number of native contacts. Average occupancies of each side-chain hydrophobic contact and hydrogen bond in the protein were then determined. In the symmetric funnel limit, the occupancies of all contacts should decrease in concert with the loss in total number of native contacts. If there is a lack of symmetry or hierarchy to the unfolding process, the occupancies of some contacts should decrease more slowly, and others more rapidly. Despite the heterogeneity of the individual trajectories, the ensemble averaging revealed an order to the unfolding process: contacts between the N and C-terminal strands are the first to disappear, whereas contacts within the distal beta-hairpin and a hydrogen-bonding network involving the distal loop beta-turn and the diverging turn persist well after the majority of the native contacts are lost. This hierarchy of events resembles but is somewhat less pronounced than that observed in our experimental studies of the folding of src SH3 domain.
View details for Web of Science ID 000081903500015
View details for PubMedID 10438616
-
Theory and simulation - Can theory challenge experiment? Editorial overview
CURRENT OPINION IN STRUCTURAL BIOLOGY
1999; 9 (2): 155-156
View details for Web of Science ID 000085219800001
View details for PubMedID 10465610
-
A brighter future for protein structure prediction.
Nature structural biology
1999; 6 (2): 108-111
View details for PubMedID 10048917
-
A combined approach for ab initio construction of low resolution protein tertiary structures from sequence.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
1999: 505-516
Abstract
An approach to construct low resolution models of protein structure from sequence information using a combination of different methodologies is described. All possible compact self-avoiding C alpha conformations (approximately 10 million) of a small protein chain were exhaustively enumerated on a tetrahedral lattice. The best scoring 10,000 conformations were selected using a lattice-based scoring function. All-atom structures were then generated by fitting an off-lattice four-state phi/psi model to the lattice conformations, using idealised helix and sheet values based on predicted secondary structure. The all-atom conformations were minimised using ENCAD and scored using a second hybrid scoring function. The best scoring 50, 100, and 500 conformations were input to a consensus-based distance geometry routine that used constraints from each the conformation sets and produced a single structure for each set (total of three). Secondary structures were again fitted to the three structures, and the resulting structures were minimised and scored. The lowest scoring conformation was taken to be the "correct" answer. The results of application of this method to twelve proteins are presented.
View details for PubMedID 10380223
-
Ab initio protein structure prediction using a combined hierarchical approach
3rd Meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP3)
WILEY-BLACKWELL. 1999: 194–198
Abstract
As part of the third Critical Assessment of Structure Prediction meeting (CASP3), we predict the three-dimensional structures for 13 proteins using a hierarchical approach. First, all possible compact conformations of a protein sequence are enumerated using a highly simplified tetrahedral lattice model. We select a large subset of these conformations using a lattice-based scoring function and build detailed all-atom models incorporating predicted secondary structure. A combined all-atom knowledge-based scoring function is then used to select three smaller subsets from these all-atom models. Finally, a consensus-based distance geometry procedure is used to generate the best conformations from each of the all-atom subsets. With this method, we are able to predict the global topology/shape for all or a large part of the sequence for six out of the thirteen proteins. For two other proteins, the topology/shape for shorter fragments are predicted. This represents a marked improvement in ab initio prediction since CASP was first instigated in 1994.
View details for Web of Science ID 000082804100024
View details for PubMedID 10526368
-
The PRESAGE database for structural genomics
NUCLEIC ACIDS RESEARCH
1999; 27 (1): 251-253
Abstract
The PRESAGE database is a collaborative resource for structural genomics. It provides a database of proteins to which researchers add annotations indicating current experimental status, structural predictions and suggestions. The database is intended to enhance communication among structural genomics researchers and aid dissemination of their results. The PRESAGE database may be accessed at http://presage.stanford.edu/
View details for Web of Science ID 000077983000064
View details for PubMedID 9847193
-
Accuracy of side-chain prediction upon near-native protein backbones generated by ab initio folding methods
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
1998; 33 (2): 204-217
Abstract
The ab initio folding problem can be divided into two sequential tasks of approximately equal computational complexity: the generation of native-like backbone folds and the positioning of side chains upon these backbones. The prediction of side-chain conformation in this context is challenging, because at best only the near-native global fold of the protein is known. To test the effect of displacements in the protein backbones on side-chain prediction for folds generated ab initio, sets of near-native backbones (< or = 4 A C alpha RMS error) for four small proteins were generated by two methods. The steric environment surrounding each residue was probed by placing the side chains in the native conformation on each of these decoys, followed by torsion-space optimization to remove steric clashes on a rigid backbone. We observe that on average 40% of the chi1 angles were displaced by 40 degrees or more, effectively setting the limits in accuracy for side-chain modeling under these conditions. Three different algorithms were subsequently used for prediction of side-chain conformation. The average prediction accuracy for the three methods was remarkably similar: 49% to 51% of the chi1 angles were predicted correctly overall (33% to 36% of the chi1+2 angles). Interestingly, when the inter-side-chain interactions were disregarded, the mean accuracy increased. A consensus approach is described, in which side-chain conformations are defined based on the most frequently predicted chi angles for a given method upon each set of near-native backbones. We find that consensus modeling, which de facto includes backbone flexibility, improves side-chain prediction: chi1 accuracy improved to 51-54% (36-42% of chi1+2). Implications of a consensus method for ab initio protein structure prediction are discussed.
View details for Web of Science ID 000076257100005
View details for PubMedID 9779788
-
Simulating water and the molecules of life
SCIENTIFIC AMERICAN
1998; 279 (5): 100-105
View details for Web of Science ID 000076499900029
View details for PubMedID 9841379
-
A unified statistical framework for sequence comparison and structure comparison
Colloquium on Computational Biomolecular Science
NATL ACAD SCIENCES. 1998: 5913–20
Abstract
We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., BLAST and FASTA validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.
View details for Web of Science ID 000073852600013
View details for PubMedID 9600892
-
Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins
PROTEIN SCIENCE
1998; 7 (2): 445-456
Abstract
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.
View details for Web of Science ID 000072056400026
View details for PubMedID 9521122
-
Simulating the minimum core for hydrophobic collapse in globular proteins
PROTEIN SCIENCE
1997; 6 (12): 2606-2616
Abstract
To investigate the nature of hydrophobic collapse considered to be the driving force in protein folding, we have simulated aqueous solutions of two model hydrophobic solutes, methane and isobutylene. Using a novel methodology for determining contacts, we can precisely follow hydrophobic aggregation as it proceeds through three stages: dispersed, transition, and collapsed. Theoretical modeling of the cluster formation observed by simulation indicates that this aggregation is cooperative and that the simulations favor the formation of a single cluster midway through the transition stage. This defines a minimum solute hydrophobic core volume. We compare this with protein hydrophobic core volumes determined from solved crystal structures. Our analysis shows that the solute core volume roughly estimates the minimum core size required for independent hydrophobic stabilization of a protein and defines a limiting concentration of nonpolar residues that can cause hydrophobic collapse. These results suggest that the physical forces driving aggregation of hydrophobic molecules in water is indeed responsible for protein folding.
View details for Web of Science ID A1997YK91200012
View details for PubMedID 9416609
-
A structural census of the current population of protein sequences
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1997; 94 (22): 11911-11916
Abstract
We examine the occurrence of the approximately 300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences ( approximately 140,000) in structural terms, by matching them to known structures via sequence comparison (or by secondary-structure class prediction for those without structural homologues). Overall, we find that an appreciable fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), and most of the common folds are associated with many families of nonhomologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distinct distributions of folds. So, for instance, some of the most common folds in vertebrates, such as globins or zinc fingers, are rare or absent in bacteria. Many of these differences in fold usage are biologically reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communication being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an unknown sequence from a plant is more likely to have a certain fold (e.g., a TIM barrel) than an unknown sequence from an animal.
View details for Web of Science ID A1997YD50600031
View details for PubMedID 9342336
-
Calibration and testing of a water model for simulation of the molecular dynamics of proteins and nucleic acids in solution
JOURNAL OF PHYSICAL CHEMISTRY B
1997; 101 (25): 5051-5061
View details for Web of Science ID A1997XF41800028
-
Factors affecting the ability of energy functions to discriminate correct from incorrect folds
JOURNAL OF MOLECULAR BIOLOGY
1997; 266 (4): 831-846
Abstract
Eighteen low and medium resolution empirical energy functions were tested for their ability to distinguish correct from incorrect folds from three test sets of decoy protein conformations. The energy functions included 13 pairwise potentials of mean force, covering a wide range of functional forms and methods of parameterization, four potentials that attempt to detect properly formed hydrophobic cores, and one environment-based potential. the first of the three test sets consists of large ensembles of plausible conformations for eight small proteins, all of which have correct native secondary structure and are reasonably compact. The second is the set of all subconformations in a database of known protein structures applied to the sequences in that database (ungapped threading). The third is a set of ensembles of 1000 conformations each for seven small proteins taken from molecular dynamics simulations at 298 K and 498 K. Our results show that there are functions effective for each challenge set; moreover, success in one test is no guarantee of success in another. We examine the factors that seem to be important for accurate discrimination of correct structures in each of the test sets, and note that extremely simple functions are often as effective as more complex functions.
View details for Web of Science ID A1997WL95400017
View details for PubMedID 9102472
-
Packing as a structural basis of protein stability: understanding mutant properties from wildtype structure.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
1997: 245-255
Abstract
Modeling of protein core mutations using sidechain packing can forecast their effects on stability. We have assessed the structural basis of this approach, by evaluating the accuracy of our 1991 model of a three-site mutant of lambda repressor (V36L/M40L/V47I), against the recently reported crystal structure. The three mutated residues matched the crystal structure to within 0.89A (1.11A for sidechain atoms), giving fairly accurate sidechain placement and packing (81-99th percentile rank in coordinate accuracy). However, the model used different sidechain torsional angles than seen in the crystal structure at residues 36 and 40, apparently to compensate for the backbone shifts present in the actual mutant structure, but not accounted for in our modeling method. To understand the structural basis of stability across a set of lambda repressor core mutants, we have analyzed the mutant models, revealing several simple packing effects: V36I, predicted to be stabilized by filling a hydrophobic cavity; M40V, destabilized by a steric clash with the unusual structural demands of a helix-turn transition. These effects illustrate how mutant stability can often be understood directly from scrutiny of wildtype structure. Simply adding the calculated energies of neighboring point mutations predicts the stability effect of the combined mutant relatively well, with little apparent cooperativity, yielding simple rules for each site's amino acid preferences. Our treatment of core packing indicates that it can permit a large fraction of sequences to fit the native fold, as observed experimentally, far more than indicated by rotamer hard-sphere models.
View details for PubMedID 9390296
-
A retrospective analysis of CASP2 threading predictions
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
1997: 83-91
Abstract
Analysis of CASP2 protein threading results shows that the success rate of structure predictions varies widely among prediction targets. We set "critical" thresholds in fold recognition specificity and threading model accuracy at the points where "incorrect" CASP2 predictions just outnumber "correct" predictions. Using these thresholds we find that correct predictions were made for all of those targets and for only those targets where more than 50% of target residues may be superimposed on previously known structures. Three-fourths of these correct predictions were furthermore made for targets with greater than 12% residue identity in structural alignment, where characteristic sequence motifs are also present. Based on these observations we suggest that the sustained performance of threading methods is best characterized by counting the numbers of correct predictions for targets of increasing "difficulty." We suggest that target difficulty may be assigned, once the true structure of the target is known, according to the fraction of residues superimposable onto previously known structures and the fraction of identical residues in those structural alignments.
View details for Web of Science ID 000071920700012
View details for PubMedID 9485499
-
Protein folding: The endgame
ANNUAL REVIEW OF BIOCHEMISTRY
1997; 66: 549-579
Abstract
The last stage of protein folding, the "endgame," involves the ordering of amino acid side-chains into a well defined and closely packed configuration. We review a number of topics related to this process. We first describe how the observed packing in protein crystal structures is measured. Such measurements show that the protein interior is packed exceptionally tightly, more so than the protein surface or surrounding solvent and even more efficiently than crystals of simple organic molecules. In vitro protein folding experiments also show that the protein is close-packed in solution and that the tight packing and intercalation of side-chains is a final and essential step in the folding pathway. These experimental observations, in turn, suggest that a folded protein structure can be described as a kind of three-dimensional jigsaw puzzle and that predicting side-chain packing is possible in the sense of solving this puzzle. The major difficulty that must be overcome in predicting side-chain packing is a combinatorial "explosion" in the number of possible configurations. There has been much recent progress towards overcoming this problem, and we survey a variety of the approaches. These approaches differ principally in whether they use ab initio (physical) or more knowledge-based methods, how they divide up and search conformational space, and how they evaluate candidate configurations (using scoring functions). The accuracy of side-chain prediction depends crucially on the (assumed) positioning of the main-chain. Methods for predicting main-chain conformation are, in a sense, not as developed as that for side-chains. We conclude by surveying these methods. As with side-chain prediction, there are a great variety of approaches, which differ in how they divide up and search space and in how they score candidate conformations.
View details for Web of Science ID A1997XH20100019
View details for PubMedID 9242917
-
Competitive assessment of protein fold recognition and alignment accuracy
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
1997: 92-104
Abstract
The predictions made for fold recognition and modeling accuracy at the 1996 Critical Assessment of Structure Prediction meeting (CASP2) were assessed to discover which groups did best. With 32 groups making a total of 369 predictions, it was necessary to use simple criteria for distinguishing between the entries. By focusing on the predictors' ability to use the sequence of the unknown target structure to recognize the target fold from a database of known folds and also on the quality of the model judged by the accuracy of the predicted alignment, it is easy to determine the best predictions for a given target. Assessing overall performance of the predictors on all the targets is much more difficult and use was made of weighted averages of fold recognition and alignment accuracy with and without normalization for target difficulty. By plotting these results in two dimensions the winning groups stand out, allowing readers to focus their attention on the most promising methods. When the present results are compared with the results of the earlier CASP1 meeting, held in 1994, it is clear that threading predictions have progressed dramatically. For this assessor, the strongest lesson learned is that subjectivity is pervasive and affects us all. It is abundantly clear that the blind predictions made at CASP are essential if progress is to be made in predicting protein structure.
View details for Web of Science ID 000071920700013
View details for PubMedID 9485500
-
Finite-difference solution of the Poisson-Boltzmann equation: Complete elimination of self-energy
JOURNAL OF COMPUTATIONAL CHEMISTRY
1996; 17 (11): 1344-1351
View details for Web of Science ID A1996UX99400007
-
Keeping the shape but changing the charges: A simulation study of urea and its iso-steric analogs
JOURNAL OF CHEMICAL PHYSICS
1996; 104 (23): 9417-9430
View details for Web of Science ID A1996UQ67200016
-
Energy functions that discriminate X-ray and near-native folds from well-constructed decoys
JOURNAL OF MOLECULAR BIOLOGY
1996; 258 (2): 367-392
Abstract
This study generates ensembles of decoy or test structures for eight small proteins with a variety of different folds. Between 35,000 and 200,000 decoys were generated for each protein using our four-state off-lattice model together with a novel relaxation method. These give compact self-avoiding conformations each constrained to have native secondary structure. Ensembles of these decoy conformations were used to test the ability of several types of empirical contact, surface area and distance-dependent energy functions to distinguish between correct and incorrect conformations. These tests have shown that none of the functions is able to distinguish consistently either the X-ray conformation or the near-native conformations from others which are incorrect. Certain combinations of two of these energy functions were able, however, consistently to identify X-ray structures from amongst the decoy conformations. These same combinations are better also at identifying near-native conformations, consistently finding them with a hundred-fold higher frequency than chance. The fact that these combination energy functions perform better than generally accepted energy functions suggests their future use in folding simulations and perhaps threading predictions.
View details for Web of Science ID A1996UH33000013
View details for PubMedID 8627632
-
From structure to sequence and back again
JOURNAL OF MOLECULAR BIOLOGY
1996; 258 (1): 201-209
Abstract
With a simple lattice model and sequence design algorithm, we can design sequences to fit arbitrary compact globular structures. We judged the success of the design algorithm by performing exhaustive conformational searches to determine if a designed sequence's lowest energy conformation matched the target for which it was designed. Designed sequences tend to be much better optimized for their targets than a natural sequence is optimized for its lowest energy model conformation. We examined the effect of varying the number of available amino acid types on the success of the design method. It was more difficult but not impossible to successfully design discriminating sequences using fewer amino acid types.
View details for Web of Science ID A1996UG25600016
View details for PubMedID 8613988
-
Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations
JOURNAL OF MOLECULAR BIOLOGY
1996; 257 (3): 716-725
Abstract
There are several knowledge-based energy functions that can distinguish the native fold from a pool of grossly misfolded decoys for a given sequence of amino acids. These decoys, which are typically generated by mounting, or "threading", the sequence onto the backbones of unrelated protein structures, tend to be non-compact and quite different from the native structure: the root-mean-squared (RMS) deviations from the native are commonly in the range of 15 to 20 angstroms. Effective energy functions should also demonstrate a similar recognition capability when presented with compact decoys that depart only slightly in conformation from the correct structure (i.e. those with RMS deviations of approximately 5 angstroms or less). Recently, we developed a simple yet powerful method for native fold recognition based on the tendency for native folds to form hydrophobic cores. Our energy measure, which we call the hydrophobic fitness score, is challenged to recognize the native fold from 2000 near-native structures generated for each of five small monomeric proteins. First, 1000 conformations for each protein were generated by molecular dynamics simulation at room temperature. The average RMS deviation of this set of 5000 was 1.5 angstroms. A total of 323 decoys had energies lower than native; however, none of these had RMS deviations greater than 2 angstroms. Another 1000 structures were generated for each at high temperature, in which a greater range of conformational space was explored (4.3 angstroms RMS deviation). Out of this set, only seven decoys were misrecognized. The hydrophobic fitness energy of a conformation is strongly dependent upon the RMS deviation. On average our potential yields energy values which are lowest for the population of structures generated at room temperature, intermediate for those produced at high temperature and highest for those constructed by threading methods. In general, the lowest energy decoy conformations have backbones very close to native structure. The possible utility of our method for screening backbone candidates for the purpose of modelling by side-chain packing optimization is discussed.
View details for Web of Science ID A1996UC77300021
View details for PubMedID 8648635
-
Theory and simulation through the breach
CURRENT OPINION IN STRUCTURAL BIOLOGY
1996; 6 (2): 193-194
View details for Web of Science ID A1996UF60500009
-
Through the breach.
Current opinion in structural biology
1996; 6 (2): 193-194
View details for PubMedID 8794145
-
Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures.
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
1996; 4: 59-67
Abstract
We show how a basic pairwise alignment procedure can be improved to more accurately align conserved structural regions, by using variable, position-dependent gap penalties that depend on secondary structure and by taking the consensus of a number of suboptimal alignments. These improvements, which are novel for structural alignment, are direct analogs of what is possible with normal sequences alignment. They are feasible for us since our basic structural alignment procedure, unlike others, is so similar to normal sequence alignment. We further present preliminary results that show how our procedure can be generalized to produce a multiple alignment of a family of structures. Our approach is based on finding a "median" structure from doing all possible pairwise alignments and then aligning everything to it.
View details for PubMedID 8877505
-
Packing as a structural basis of protein stability: Understanding mutant properties from wildtype structure
2nd Pacific Symposium on Biocomputing (PSB)
WORLD SCIENTIFIC PUBL CO PTE LTD. 1996: 245–255
View details for Web of Science ID A1996BH75M00028
-
Simulating the dynamics of the DNA double helix in solution
NATO Advanced Study Institute / International-School-of-Biological-Magnetic-Resonances 2nd Course on Dynamics and the Problem of Recognition in Biological Macromolecules
PLENUM PRESS DIV PLENUM PUBLISHING CORP. 1996: 173–191
View details for Web of Science ID A1996BG71C00013
-
RECOGNIZING NATIVE FOLDS BY THE ARRANGEMENT OF HYDROPHOBIC AND POLAR RESIDUES
JOURNAL OF MOLECULAR BIOLOGY
1995; 252 (5): 709-720
Abstract
Central to the ab initio protein folding problem is the development of an energy function for which the correct native structure has a lower energy than all other conformations. Existing potentials of mean force typically rely extensively on database-derived contact frequencies or knowledge of three-dimensional structural information in order to be successful in the problem of recognizing the native fold for a given sequence from a set of decoy backbone conformations. Is the detailed statistical information or sophisticated analysis used by these knowledge-based potentials needed to achieve the observed degree of success in fold recognition? Here we introduce a novel pairwise energy function that enumerates contacts between hydrophobic residues while weighting their sum by the total number of residues surrounding these hydrophobic residues. Thus it effectively selects compact folds with the desired structural feature of a buried, intact core. This approach represents an advance over using pairwise terms whose energies of interaction that are independent of the position in the protein and greatly improves the discrimination capability of an energy function. Our results show that 85% of a set of 195 representative native folds were recognized correctly. The 29 exceptions were lipophilic proteins, small proteins with prosthetic groups or disulfide bonds, and oligomeric proteins. Overall, our method separates the native fold from incorrect folds by a larger margin (measured in standard deviation units) than has been previously demonstrated by more sophisticated methods. The arrangement of hydrophobic and polar residues alone as evaluated by our novel scoring scheme, is unexpectedly effective at recognizing native folds in general. It is surprising that a simple binary pattern of hydrophobic and polar residues apparently selects a give unique fold topology.
View details for Web of Science ID A1995RY58400015
View details for PubMedID 7563083
-
POTENTIAL-ENERGY FUNCTION AND PARAMETERS FOR SIMULATIONS OF THE MOLECULAR-DYNAMICS OF PROTEINS AND NUCLEIC-ACIDS IN SOLUTION
COMPUTER PHYSICS COMMUNICATIONS
1995; 91 (1-3): 215-231
View details for Web of Science ID A1995TF32200011
-
THE VOLUME OF ATOMS ON THE PROTEIN SURFACE - CALCULATED FROM SIMULATION, USING VORONOI POLYHEDRA
JOURNAL OF MOLECULAR BIOLOGY
1995; 249 (5): 955-966
Abstract
We analyze the volume of atoms on the protein surface during a molecular-dynamics simulation of a small protein (pancreatic trypsin inhibitor). To calculate volumes, we use a particular geometric construction, called Voronoi polyhedra, that divides the total volume of the simulation box amongst the atoms, rendering them relatively larger or smaller depending on how tightly they are packed. We find that most of the atoms on the protein surface are larger than those buried in the core (by approximately 6%), except for the charged atoms, which decrease in size, presumably due to electroconstriction. We also find that water molecules are larger near apolar atoms on the protein surface and smaller near charged atoms, in comparison to "bulk" water molecules far from the protein. Taken together, these findings necessarily imply that apolar atoms on the protein surface and their associated water molecules are less tightly packed (than corresponding atoms in the protein core and bulk water) and the opposite is the case for charged atoms. This looser apolar packing and tighter charged packing fundamentally reflects protein-water distances that are larger or smaller than those expected from van der Waals radii. In addition to the calculation of mean volumes, simulations allow us to investigate the volume fluctuations and hence compressibilities of the protein and solvent atoms. The relatively large volume fluctuations of atoms at the protein-water interface indicates that they have a more variable packing than corresponding atoms in the protein core or in bulk water. We try to adhere to traditional conventions throughout our calculations. Nevertheless, we are aware of and discuss three complexities that significantly qualify our calculations: the positioning of the dividing plane between atoms, the problem of vertex error, and the choice of atom radii. In particular, our results highlight how poor a "compromise" the commonly accepted value of 1.4 A is for the radius of a water molecule.
View details for Web of Science ID A1995RF04700011
View details for PubMedID 7540695
-
THE COMPLEXITY AND ACCURACY OF DISCRETE STATE MODELS OF PROTEIN-STRUCTURE
JOURNAL OF MOLECULAR BIOLOGY
1995; 249 (2): 493-507
Abstract
The prediction of protein structure depends on the quality of the models used. In this paper, we examine the relationship between the complexity and accuracy of representation of various models of protein alpha-carbon backbone structure. First, we develop an efficient algorithm for the near optimal fitting of arbitrary lattice and off-lattice models of polypeptide chains to their true X-ray structures. Using this, we show that the relationship between the complexity of a model, taken as the number of possible conformational states per residue, and the simplest measure of accuracy, the root-mean-square deviation from the X-ray structure, is approximately (Accuracy) varies; is directly proportional to (Complexity)-1/2. This relationship is insensitive to the particularities of individual models, i.e. lattice and off-lattice models of the same complexity tend to have similar average root-mean-square deviations, and this also implies that improvements in model accuracy with increasing complexity are very small. However, other measures of model accuracy, such as the preservation of X-ray residue-residue contacts and the alpha-helix, do distinguish among models. In addition, we show that low complexity models, which take into account the uneven distribution of residue conformations in real proteins, can represent X-ray structures as accurately as more complex models, which do not: a selected 6-state model can represent protein structures almost as accurately (1.7 A root-mean-square) as a 17-state lattice model (1.6 A root-mean-square). Finally, we use a novel optimization procedure to generate eight 4-state models, which fit native proteins to an average of 2.4 A, and preserve 85% of native residue-residue contacts. We discuss the implications of these findings for protein folding and the prediction of protein conformation.
View details for Web of Science ID A1995RB61600021
View details for PubMedID 7783205
-
STRUCTURAL DIVERSITY IN A CONSERVED CHOLERA-TOXIN EPITOPE INVOLVED IN GANGLIOSIDE BINDING
PROTEIN SCIENCE
1995; 4 (5): 841-848
Abstract
Cholera is a widespread disease for which there is no efficient vaccine. A better understanding of the conformational rearrangements at the epitope might be very helpful for the development of a good vaccine. Cholera toxin (CT) as well as the closely related heat-labile toxin from Escherichia coli (LT) are composed of two subunits, A and B, which form an oligomeric assembly AB5. Residues 50-64 on the surface of the B subunits comprise a conserved loop (CTP3), which is involved in saccharide binding to the receptor on epithelial cells. This loop exhibits remarkable conformational plasticity induced by environmental constraints. The crystal structure of this loop is compared in the free and receptor-bound toxins as well as in the crystal and solution structures of a complex with TE33, a monoclonal antibody elicited against CTP3. In the toxins this loop forms an irregular structure connecting a beta-strand to the central alpha-helix. Ser 55 and Gln 56 exhibit considerable conformational variability in the five subunits of the unliganded toxins. Saccharide binding induces a change primarily in Ser 55 and Gln 56 to a conformation identical in all five copies. Thus, saccharide binding confers rigidity upon the loop. The conformation of CTP3 in complex with TE33 is quite different. The amino-terminal part of CTP3 forms a beta-turn that fits snugly into a deep binding pocket on TE33, in both the crystal and NMR-derived solution structure. Only 8 and 12 residues out of 15 are seen in the NMR and crystal structures, respectively. Despite these conformational differences, TE33 is cross-reactive with intact CT, albeit with a thousandfold decrease in affinity. This suggests a different interaction of TE33 with intact CT.
View details for Web of Science ID A1995QW98100003
View details for PubMedID 7545048
-
SIMULATION OF PROTEIN-FOLDING PATHWAYS - LOST IN (CONFORMATIONAL) SPACE
TRENDS IN BIOTECHNOLOGY
1995; 13 (1): 23-27
View details for Web of Science ID A1995QE52900007
-
EXPLORING CONFORMATIONAL SPACE WITH A SIMPLE LATTICE MODEL FOR PROTEIN-STRUCTURE
JOURNAL OF MOLECULAR BIOLOGY
1994; 243 (4): 668-682
Abstract
We present a low resolution lattice model for which we can exhaustively generate all possible compact backbone conformations for small proteins. Using simple structural and energetic criteria, for a variety of proteins, we can select for lattice structures that have significant similarities with their known native structures. Our energetic parameters are based on pairwise amino acid contact frequencies in a database of experimentally determined structures. A key step in our method involves the threading of a sequence onto every lattice model, such that a locally optimal pattern of tertiary interactions is formed. We evaluate our results against statistics collected for structures covering all of conformational space, and against statistics collected for permuted sequences. Despite the low resolution of the model, our low energy structures contain many native features. These results indicate that the overall pattern of hydrophobicity of a sequence significantly constrains the range of folds that sequence is likely to adopt.
View details for Web of Science ID A1994PQ66300012
View details for PubMedID 7966290
-
DIFFERENT PROTEIN SEQUENCES CAN GIVE RISE TO HIGHLY SIMILAR FOLDS THROUGH DIFFERENT STABILIZING INTERACTIONS
PROTEIN SCIENCE
1994; 3 (11): 1938-1944
Abstract
We report an interesting case of structural similarity between 2 small, nonhomologous proteins, the third domain of ovomucoid (ovomucoid) and the C-terminal fragment of ribosomal L7/L12 protein (CTF). The region of similarity consists of a 3-stranded beta-sheet and an alpha-helix. This region is highly similar; the corresponding elements of secondary structure share a common topology, and the RMS difference for "equivalent" C alpha atoms is 1.6 A. Surprisingly, this common structure arises from completely different sequences. For the common core, the sequence identity is less than 3%, and there is neither significant sequence similarity nor similarity in the position or orientation of conserved hydrophobic residues. This superposition raises the question of how 2 entirely different sequences can produce an identical structure. Analyzing this common region in ovomucoid revealed that it is stabilized by disulfide bonds. In contrast, the corresponding structure in CTF is stabilized in the alpha-helix by a composition of residues with high helix-forming propensities. This result suggests that different sequences and different stabilizing interactions can produce an identical structure.
View details for Web of Science ID A1994PZ82400005
View details for PubMedID 7703840
-
PROTEIN FOLDING[--]UNFOLDING DYNAMICS
CURRENT OPINION IN STRUCTURAL BIOLOGY
1994; 4 (2): 291-295
View details for Web of Science ID A1994NG68900019
-
WATER - NOW YOU SEE IT, NOW YOU DONT
STRUCTURE
1993; 1 (4): 223-226
View details for Web of Science ID A1993NC28000002
View details for PubMedID 8081736
-
PROTEIN UNFOLDING PATHWAYS EXPLORED THROUGH MOLECULAR-DYNAMICS SIMULATIONS
JOURNAL OF MOLECULAR BIOLOGY
1993; 232 (2): 600-619
Abstract
Herein we describe the results of molecular dynamics simulations of the bovine pancreatic trypsin inhibitor (BPTI) in solution at a variety of temperatures both with and without disulfide bonds. The reduced form of the protein unfolded at high temperature to an ensemble of conformations with all the properties of the molten globule state. In this account we outline the structural details of the actual unfolding process between the native and molten globule states. The first steps of unfolding involved expansion of the protein, which disrupted packing interactions. The solvent-accessible surface area also quickly increased. The unfolding was localized mostly to the turn and loop regions of the molecule, while leaving the secondary structure intact. Then, there was more gradual unfolding of the secondary structure and non-native turns became prevalent. This same trajectory was continued and more drastic unfolding occurred that resulted in a relatively compact state devoid of stable secondary structure.
View details for Web of Science ID A1993LQ98500023
View details for PubMedID 7688428
-
STRUCTURAL SIMILARITY OF DNA-BINDING DOMAINS OF BACTERIOPHAGE REPRESSORS AND THE GLOBIN CORE
CURRENT BIOLOGY
1993; 3 (3): 141-148
Abstract
In recent years, the determination of large numbers of protein structures has created a need for automatic and objective methods for the comparison of structures or conformations. Many protein structures show similarities of conformation that are undetectable by comparing their sequences. Comparison of structures can reveal similarities between proteins thought to be unrelated, providing new insight into the interrelationships of sequence, structure and function.Using a new tool that we have developed to perform rapid structural alignment, we present the highlights of an exhaustive comparison of all pairs of protein structures in the Brookhaven protein database. Notably, we find that the DNA-binding domain of the bacteriophage repressor family is almost completely embedded in the larger eight-helix fold of the globin family of proteins. The significant match of specific residues is correlated with functional, structural and evolutionary information.Our method can help to identify structurally similar folds rapidly and with high-sensitivity, providing a powerful tool for analyzing the ever-increasing number of protein structures being elucidated.
View details for Web of Science ID A1993LL95500002
View details for PubMedID 15335781
-
REALISTIC SIMULATIONS OF NATIVE-PROTEIN DYNAMICS IN SOLUTION AND BEYOND
ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE
1993; 22: 353-380
View details for Web of Science ID A1993LH45400014
View details for PubMedID 8347994
-
INDUCED PEPTIDE CONFORMATIONS IN DIFFERENT ANTIBODY COMPLEXES - MOLECULAR MODELING OF THE 3-DIMENSIONAL STRUCTURE OF PEPTIDE ANTIBODY COMPLEXES USING NMR-DERIVED DISTANCE RESTRAINTS
BIOCHEMISTRY
1992; 31 (30): 6884-6897
Abstract
Intramolecular interactions in bound cholera toxin peptide (CTP3) in three antibody complexes were studied by two-dimensional transferred NOE spectroscopy. These measurements together with previously recorded spectra that show intermolecular interactions in these complexes were used to obtain restraints on interproton distances in two of these complexes (TE32 and TE33). The NMR-derived distance restraints were used to dock the peptide into calculated models for the three-dimensional structure of the antibody combining site. It was found that TE32 and TE33 recognize a loop comprising the sequence VPGSQHID and a beta-turn formed by the sequence VPGS. The third antibody, TE34, recognizes a different epitope within the same peptide and a beta-turn formed by the sequence IDSQ. Neither of these two turns was observed in the free peptide. The formation of a beta-turn in the bound peptide gives a compact conformation that maximizes the contact with the antibody and that has greater conformational freedom than alpha-helix or beta-sheet secondary structure. A total of 15 antibody residues are involved in peptide contacts in the TE33 complex, and 73% of the contact area in the antibody combining site consists of the side chains of aromatic amino acids. A comparison of the NMR-derived models for CTP3 interacting with TE32 and TE33 with the previously derived model for TE34 reveals a relationship between amino acid sequence and combining site structure and function. (a) The three aromatic residues that interact with the peptide in TE32 and TE33 complexes, Tyr 32L, Tyr 32H, and Trp 50H, are invariant in all light chains sharing at least 65% identity with TE33 and TE32 and in all heavy chains sharing at least 75% identity with TE33. Although TE34 differs from TE32 and TE33 in its fine specificity, these aromatic residues are conserved in TE34 and interact with its antigen. Therefore, we conclude that the role of these three aromatic residues is to participate in nonspecific hydrophobic interactions with the antigen. (b) Residues 31, 31c, and 31e of CDR1 of the light chain interact with the antigen in all three antibodies that we have studied. The amino acids in these positions in TE34 differ from those in TE32 and TE33, and they are involved in specific polar interactions with the antigen. (c) CDR3 of the heavy chain varies considerably both in length and in sequence between TE34 and the two other anti-CTP3 antibodies. These changes modify the shape of the combining site and the hydrophobic and polar interactions of CDR3 with the peptide antigen.
View details for Web of Science ID A1992JG66700004
View details for PubMedID 1379072
-
ACCURATE MODELING OF PROTEIN CONFORMATION BY AUTOMATIC SEGMENT MATCHING
JOURNAL OF MOLECULAR BIOLOGY
1992; 226 (2): 507-533
Abstract
Segment match modeling uses a data base of highly refined known protein X-ray structures to build an unknown target structure from its amino acid sequence and the atomic coordinates of a few of its atoms (generally only the C alpha atoms). The target structure is first broken into a set of short segments. The data base is then searched for matching segments, which are fitted onto the framework of the target structure. Three criteria are used for choosing a matching data base segment: amino acid sequence similarity, conformational similarity (atomic co-ordinates), and compatibility with the target structure (van der Waals' interactions). The new method works surprisingly well: for eight test proteins ranging in size from 46 to 323 residues, the all-atom root-mean-square deviation of the modeled structures is between 0.93 A and 1.73 A (the average is 1.26 A). Deviations of this magnitude are comparable with those found for protein co-ordinates before and after refinement against X-ray data or for co-ordinates of the same protein in different crystal packings. These results are insensitive to errors in the C alpha positions or to missing C alpha atoms: accurate models can be built with C alpha errors of up to 1 A or by using only half the C alpha atoms. The fit to the X-ray structures is improved significantly by building several independent models based on different random choices and then averaging co-ordinates; this novel concept has general implications for other modeling tasks. The segment match modeling method is fully automatic, yields a complete set of atomic co-ordinates without any human intervention and is efficient (14 s/residue on the Silicon Graphics 4D/25 Personal Iris workstation.
View details for Web of Science ID A1992JF96600020
View details for PubMedID 1640463
-
A MODEL OF THE MOLTEN GLOBULE STATE FROM MOLECULAR-DYNAMICS SIMULATIONS
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1992; 89 (11): 5142-5146
Abstract
It is generally accepted that a protein's primary sequence determines its three-dimensional structure. It has proved difficult, however, to obtain detailed structural information about the actual protein folding process and intermediate states. We present the results of molecular dynamics simulations of the unfolding of reduced bovine pancreatic trypsin inhibitor. The resulting partially "denatured" state was compact but expanded relative to the native state (11-25%); the expansion was not caused by an influx of water molecules. The structures were mobile, with overall secondary structure contents comparable to those of the native protein. The protein experienced relatively local unfolding, with the largest changes in the structure occurring in the loop regions. A hydrophobic core was maintained although packing of the side chains was compromised. The properties displayed in the simulation are consistent with unfolding to a molten globule state. Our simulations provide an in-depth view of this state and details of water-protein interactions that cannot yet be obtained experimentally.
View details for Web of Science ID A1992HX16800075
View details for PubMedID 1594623
-
A LATTICE MODEL FOR PROTEIN-STRUCTURE PREDICTION AT LOW RESOLUTION
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1992; 89 (7): 2536-2540
Abstract
The prediction of the folded structure of a protein from its sequence has proven to be a very difficult computational problem. We have developed an exceptionally simple representation of a polypeptide chain, with which we can enumerate all possible backbone conformations of small proteins. A protein is represented by a self-avoiding path of connected vertices on a tetrahedral lattice, with several amino acid residues assigned to each lattice vertex. For five small structurally dissimilar proteins, we find that we can separate native-like structures from the vast majority of non-native folds by using only simple structural and energetic criteria. This method demonstrates significant generality and predictive power without requiring foreknowledge of any native structural details.
View details for Web of Science ID A1992HL81600006
View details for PubMedID 1557356
-
MOLECULAR-DYNAMICS SIMULATIONS OF HELIX DENATURATION
JOURNAL OF MOLECULAR BIOLOGY
1992; 223 (4): 1121-1138
Abstract
An understanding of the structural transitions that an alpha-helix undergoes will help to elucidate such motions in proteins and their role in protein folding. We present the results of molecular dynamics simulations to investigate these transitions in a short polyalanine peptide (13 residues) both in vacuo and in the presence of solvent. The denaturation of this peptide was monitored as a function of temperature (ranging from 5 to 200 degrees C). In vacuo, the helical state predominated at all temperatures, whereas in solution the helix melted with increasing temperature. The peptide was predominantly helical at low temperature in solution, while at intermediate temperatures the peptide spent the bulk of the time fluctuating between different conformations with intermediate amounts of helix, e.g. not completely helical nor entirely non-helical. Many of these conformations consisted of short helical segments with intervening non-helical residues. At high temperature the peptide unfolded and adopted various collapsed unstructured states. The intrahelical hydrogen bonds that break at high temperature were not fully compensated by hydrogen bonds with water molecules in the partially unfolded forms of the peptide. Increases in temperature disrupted both the helical structure and the peptide-water interactions. Water played a major but indirect role in facilitating unfolding, as opposed to specifically competing for the intrapeptide hydrogen bonds. The implications of our results to protein folding are discussed.
View details for Web of Science ID A1992HG60100023
View details for PubMedID 1538392
-
A MOLECULAR-DYNAMICS SIMULATION OF THE C-TERMINAL FRAGMENT OF THE L7/L12 RIBOSOMAL-PROTEIN IN SOLUTION
CHEMICAL PHYSICS
1991; 158 (2-3): 501-512
View details for Web of Science ID A1991GT76600021
-
STRUCTURAL AND KINETIC-STUDIES OF THE FAB FRAGMENT OF A MONOCLONAL ANTI-SPIN LABEL ANTIBODY BY NUCLEAR-MAGNETIC-RESONANCE
JOURNAL OF MOLECULAR BIOLOGY
1991; 221 (1): 257-270
Abstract
Nuclear magnetic resonance has been used to study the structure of the anti-spin label antibody AN02 combining site and kinetic rates for the hapten-antibody reaction. The association reaction for the hapten dinitrophenyl-diglycine (DNP-diGly) is diffusion-limited. The activation enthalpy for association, 5.1 kcal/mol, is close to the activation enthalpy for diffusion in water. Several reliable resonance assignments have been made with the aid of recently reported crystal structure. Structural data deduced from the nuclear magnetic resonance (n.m.r.) spectra compare favorably with the crystal structure in terms of the combining site amino acid composition, distances of tyrosine residues from the unpaired electron of the hapten, and residues in direct contact with the hapten. Evidence is presented that a single binding site region tyrosine residue can assume two distinct conformations on binding of DNP-diGly. The AN02 antibody is an autoantibody. Dimerization of the Fab fragments is blocked by the hapten DNP-diGly. The n.m.r. spectra suggests that some of the amino acid residues involved in the binding of the DNP-hapten are also involved in the Fab dimerization.
View details for Web of Science ID A1991GG37200025
View details for PubMedID 1920409
-
ACCURATE PREDICTION OF THE STABILITY AND ACTIVITY EFFECTS OF SITE-DIRECTED MUTAGENESIS ON A PROTEIN CORE
NATURE
1991; 352 (6334): 448-451
Abstract
Theoretical prediction of the structure, stability and activity of proteins, an important unsolved problem in molecular biology, would be of use for guiding site-directed mutagenesis and other protein-engineering techniques. X-ray diffraction studies have provided extensive structural information for many proteins, challenging theorists to develop reliable techniques able to use such knowledge as a base for prediction of mutants' characteristics. Here we report theoretical calculation of stabilization energies for 78 triple-site sequence variants of lambda repressor characterized experimentally by Lim and Sauer. The calculated energies correlate with the mutants' measured activities; active and inactive mutations are discriminated with 92% reliability. They correlate even more directly with the mutants' thermostabilities, correctly identifying two of the mutants to be more stable than the wild type.
View details for Web of Science ID A1991FZ34600071
View details for PubMedID 1861725
-
REAL-TIME INTERACTIVE FREQUENCY FILTERING OF MOLECULAR-DYNAMICS TRAJECTORIES
JOURNAL OF MOLECULAR BIOLOGY
1991; 220 (1): 1-4
Abstract
Molecular dynamics simulations of atomic motion in protein and nucleic acid molecules must be done on a femtosecond time-scale. Much of this rapid motion is unimportant for the slower changes that are most relevant to biological function (conformational changes, substrate binding, protein folding). The high-frequency motion makes simulations computationally expensive. More importantly, the high frequencies obscure visualization of the relevant dynamics processes. Sessions, Dauber-Osguthorpe and Osguthorpe presented a method for removing high-frequency motions from atomic co-ordinates of trajectories generated by simulation. While that study used fast Fourier methods and emphasized the use of filtering for analysis of trajectories, this communication describes a new method that makes it much easier to use frequency filtering in programs that display trajectories as a sequence of moving images. Tests of the method on systems extending from pure water to proteins and nucleic acid molecules in vacuo and in solution have demonstrated its general utility. Impressed with the power and simplicity of the new method, we wish to present it in sufficient detail to allow others to implement it themselves.
View details for Web of Science ID A1991FW10200001
View details for PubMedID 2067008
-
ENHANCED STABILITY OF SUBTILISIN BY 3 POINT MUTATIONS
BIOTECHNOLOGY AND APPLIED BIOCHEMISTRY
1991; 13 (1): 12-24
Abstract
This study was undertaken to characterize the effect of three point mutations made on aprA-subtilisin on the stability of the protein to both heat- and detergent-induced denaturation. Asparagine residues at positions 109 and 218 were replaced with serine residues to prevent the possible cyclization between these asparagines and the adjacent glycine residues and hence to increase the long-term stability. The effect of these substitutions on conformational stability was examined by thermal denaturation. At high calcium concentrations, the Ser109-substituted analog showed a 3 degrees C higher transition temperature than that of aprA-subtilisin, while the Ser218 substituted analog had a 4 degrees C higher transition temperature. The analog with both changes had a 7 degrees C higher transition temperature than that of the original aprA-subtilisin, indicating that the contributions of the individual mutations were additive. The analog with both mutations also exhibited increased stability in the presence of sodium dodecyl sulfate (SDS) when compared to aprA-subtilisin. In addition to the above two mutations, the asparagine at position 76, located in the high affinity Ca(2+) binding loop of subtilisin, was changed to aspartic acid. The effect of this mutation on the thermal stability of the protein was examined at different calcium concentrations. The analog with all three mutations exhibited little dependence on calcium concentration below 1 mM levels, while the proteins without the mutation at asparagine-76 displayed a strong dependence of melting temperature on Ca(2+) concentration in this range. At much higher calcium concentrations, the analog with three mutations showed an increase in stability similar to that observed with aprA-subtilisin. The analog with three mutations also exhibited greater stability to SDS-induced denaturation than both aprA-subtilisin and the Ser109- and Ser218-substituted analogs. The activation energy barrier for loss of structure in 1% SDS for the analog with all three mutations was increased over that for aprA-subtilisin by 16 kcal/ml. These results suggest that the mutation of asparagine-76 to aspartic acid increases the affinity of the primary Ca(2+) binding site.
View details for Web of Science ID A1991EW55900002
View details for PubMedID 2054102
- Protein Folding Curr. Opinions Struct. Biol. 1991; 1: 224-229
-
NMR-DERIVED MODEL FOR A PEPTIDE-ANTIBODY COMPLEX
BIOCHEMISTRY
1990; 29 (43): 10032-10041
Abstract
The TE34 monoclonal antibody against cholera toxin peptide 3 (CTP3; VEVPGSQHIDSQKKA) was sequenced and investigated by two-dimensional transferred NOE difference spectroscopy and molecular modeling. The VH sequence of TE34, which does not bind cholera toxin, shares remarkable homology to that of TE32 and TE33, which are both anti-CTP3 antibodies that bind the toxin. However, due to a shortened heavy chain CDR3, TE34 assumes a radically different combining site structure. The assignment of the combining site interactions to specific peptide residues was completed by use of AcIDSQRKA, a truncated peptide analogue in which lysine-13 was substituted by arginine, specific deuteration of individual polypeptide chains of the antibody, and a computer model for the Fv fragment of TE34. NMR-derived distance restraints were then applied to the calculated model of the Fv to generate a three-dimensional structure of the TE34/CTP3 complex. The combining site was found to be a very hydrophobic cavity composed of seven aromatic residues. Charged residues are found in the periphery of the combining site. The peptide residues HIDSQKKA form a beta-turn inside the combining site. The contact area between the peptide and the TE34 antibody is 388 A2, about half of the contact area observed in protein-antibody complexes.
View details for Web of Science ID A1990EG39900004
View details for PubMedID 2271636
-
CONFORMATIONS OF IMMUNOGLOBULIN HYPERVARIABLE REGIONS
NATURE
1989; 342 (6252): 877-883
Abstract
On the basis of comparative studies of known antibody structures and sequences it has been argued that there is a small repertoire of main-chain conformations for at least five of the six hypervariable regions of antibodies, and that the particular conformation adopted is determined by a few key conserved residues. These hypotheses are now supported by reasonably successful predictions of the structures of most hypervariable regions of various antibodies, as revealed by comparison with their subsequently determined structures.
View details for Web of Science ID A1989CF63700048
View details for PubMedID 2687698
-
A HUMANIZED ANTIBODY THAT BINDS TO THE INTERLEUKIN-2 RECEPTOR
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1989; 86 (24): 10029-10033
Abstract
The anti-Tac monoclonal antibody is known to bind to the p55 chain of the human interleukin 2 receptor and to inhibit proliferation of T cells by blocking interleukin 2 binding. However, use of anti-Tac as an immunosuppressant drug would be impaired by the human immune response against this murine antibody. We have therefore constructed a "humanized" antibody by combining the complementarity-determining regions (CDRs) of the anti-Tac antibody with human framework and constant regions. The human framework regions were chosen to maximize homology with the anti-Tac antibody sequence. In addition, a computer model of murine anti-Tac was used to identify several amino acids which, while outside the CDRs, are likely to interact with the CDRs or antigen. These mouse amino acids were also retained in the humanized antibody. The humanized anti-Tac antibody has an affinity for p55 of 3 x 10(9) M-1, about 1/3 that of murine anti-Tac.
View details for Web of Science ID A1989CE97600082
View details for PubMedID 2513570
-
PROBING ANTIBODY DIVERSITY BY 2D NMR - COMPARISON OF AMINO-ACID SEQUENCES, PREDICTED STRUCTURES, AND OBSERVED ANTIBODY ANTIGEN INTERACTIONS IN COMPLEXES OF 2 ANTIPEPTIDE ANTIBODIES
BIOCHEMISTRY
1989; 28 (18): 7168-7175
Abstract
The interactions between the aromatic amino acids of two monoclonal antibodies (TE32 and TE33) with specific amino acid residues of a peptide of cholera toxin (CTP3) have been determined by two-dimensional (2D) transferred NOE difference spectroscopy. Aromatic amino acids are found to play an important role in peptide binding. In both antibodies two tryptophan and two tyrosine residues and one histidine residue interact with the peptide. In TE33 there is an additional phenylalanine residue that also interacts with the peptide. The residues of the CTP3 peptide that have been found to interact with the antibody are val 3, pro 4, gly 5, gln 7, his 8, and asp 10. We have determined the amino acid sequences of the two antibodies by direct mRNA sequencing. Computerized molecular modeling has been used to build detailed all-atom models of both antibodies from the known conformations of other antibodies. These models allow unambiguous assignment of most of the antibody residues that interact with the peptide. A comparison of the amino acid sequences of the two anti-CTP3 antibodies with other antibodies from the same gene family reveals that the majority of the aromatic residues involved in the binding of CTP3 are conserved although these antibodies have different specificities. This similarity suggests that these aromatic residues create a general hydrophobic pocket and that other residues in the complementarity-determining regions (CDRs) modulate the shape and the polarity of the combining site to fit the specific antigens.
View details for Web of Science ID A1989AQ25000006
View details for PubMedID 2819059
-
STABILIZATION OF PHAGE-T4 LYSOZYME BY ENGINEERED DISULFIDE BONDS
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1989; 86 (17): 6562-6566
Abstract
Four different disulfide bridges (linking positions 9-164, 21-142, 90-122, and 127-154) were introduced into a cysteine-free phage T4 lysozyme at sites suggested by theoretical calculations and computer modeling. The new cysteines spontaneously formed disulfide bonds on exposure to air in vitro. In all cases the oxidized (crosslinked) lysozyme was more stable than the corresponding reduced (noncrosslinked) enzyme toward thermal denaturation. Relative to wild-type lysozyme, the melting temperatures of the 9-164 and 21-142 disulfide mutants were increased by 6.4 degrees C and 11.0 degrees C, whereas the other two mutants were either less stable or equally stable. Measurement of the equilibrium constants for the reduction of the engineered disulfide bonds by dithiothreitol indicates that the less thermostable mutants tend to have a less favorable crosslink in the native structure. The two disulfide bridges that are most effective in increasing the stability of T4 lysozyme have, in common, a large loop size and a location that includes a flexible part of the molecule. The results suggest that stabilization due to the effect of the crosslink on the entropy of the unfolded polypeptide is offset by the strain energy associated with formation of the disulfide bond in the folded protein. The design of disulfide bridges is discussed in terms of protein flexibility.
View details for Web of Science ID A1989AP19100026
View details for PubMedID 2671995
-
MOLECULAR-DYNAMICS OF MACROMOLECULES IN WATER
CHEMICA SCRIPTA
1989; 29A: 197-203
View details for Web of Science ID A1989CD22800028
-
ACCURATE SIMULATION OF PROTEIN DYNAMICS IN SOLUTION
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1988; 85 (20): 7557-7561
Abstract
Simulation of the molecular dynamics of a small protein, bovine pancreatic trypsin inhibitor, was found to be more realistic when water molecules were included than when in vacuo: the time-averaged structure was much more like that observed in high-resolution x-ray studies, the amplitudes of atomic vibration in solution were smaller, and fewer incorrect hydrogen bonds were formed. Our approach, which provides a sound basis for reliable simulation of diverse properties of biological macromolecules in solution, uses atom-centered forces and classical mechanics.
View details for Web of Science ID A1988Q580700030
View details for PubMedID 2459709
-
AROMATIC RINGS ACT AS HYDROGEN-BOND ACCEPTORS
JOURNAL OF MOLECULAR BIOLOGY
1988; 201 (4): 751-754
Abstract
Simple energy calculations show that there is a significant interaction between a hydrogen bond donor (like the greater than NH group) and the centre of a benzene ring, which acts as a hydrogen bond acceptor. This interaction, which is about half as strong as a normal hydrogen bond, contributes approximately 3 kcal/mol (1 cal = 4.184 J) of stabilizing enthalpy and is expected to play a significant role in molecular associations. It is of interest that the aromatic hydrogen bond arises from small partial charges centred on the ring carbon and hydrogen atoms: there is no need to consider delocalized electrons. Although some energy calculations have included such partial charges, their role in forming such a strong interaction was not appreciated until after aromatic hydrogen bonds had been observed in protein-drug complexes.
View details for Web of Science ID A1988N947200007
View details for PubMedID 3172202
-
CONTRIBUTION OF TRYPTOPHAN RESIDUES TO THE COMBINING SITE OF A MONOCLONAL ANTI DINITROPHENYL SPIN-LABEL ANTIBODY
BIOCHEMISTRY
1987; 26 (19): 6058-6064
Abstract
Two Fab fragments of the monoclonal anti dinitrophenyl (DNP) spin-label antibody AN02 were prepared by recombination of specifically deuterated heavy and light chains. In the recombinant H(I)L(II) all the tyrosines and phenylalanines were perdeuterated as were the tryptophan residues of the heavy chain. In the recombinant H(II)L(I) all the tyrosines and phenylalanines were perdeuterated as were the tryptophan residues of the light chain. Saturation of three resonances of H(I)L(II), assigned to tryptophan protons of the light chain, resulted in magnetization transfer to the aromatic proton at position 6 of the DNP ring and to the CH2 protons of the glycines linked to the DNP in a diamagnetic hapten (DNP-DG). Saturation of three resonances of H(II)L(I) assigned to tryptophan protons of the heavy chain resulted in magnetization transfer to the CH2 protons of the glycines in DNP-DG. From the dependence of the magnetization transfer on the irradiation time, the cross relaxation rates between the involved protons were estimated. The inferred distances between these protons of the hapten and certain tryptophan protons are 3-4 A. It is concluded that in the combining site of AN02 there is one tryptophan from the light chain and one tryptophan from the heavy chain that are very near the hapten. When all tyrosines and phenylalanines were perdeuterated and all tryptophan aromatic protons were deuterated except for the protons at positions 2 and 5, titration of the Fab fragments with variable amounts of paramagnetic hapten showed that one proton from the light chain tryptophan is near (less than 7 A) the unpaired electron and that three other protons are significantly closer than 15 A.(ABSTRACT TRUNCATED AT 250 WORDS)
View details for Web of Science ID A1987K210700017
View details for PubMedID 3120771
-
THE PREDICTED STRUCTURE OF IMMUNOGLOBULIN-D1.3 AND ITS COMPARISON WITH THE CRYSTAL-STRUCTURE
SCIENCE
1986; 233 (4765): 755-758
Abstract
Predictions of the structures of the antigen-binding domains of an antibody, recorded before its experimental structure determination and tested subsequently, were based on comparative analysis of known antibody structures or on conformational energy calculations. The framework, the relative positions of the hypervariable regions, and the folds of four of the hypervariable loops were predicted correctly. This portion includes all residues in contact with the antigen, in this case hen egg white lysozyme, implying that the main chain conformation of the antibody combining site does not change upon ligation. The conformations of three residues in each of the other two hypervariable loops are different in the predicted models and the experimental structure.
View details for Web of Science ID A1986D523400028
View details for PubMedID 3090684
-
HELIX TO HELIX PACKING IN PROTEINS
JOURNAL OF MOLECULAR BIOLOGY
1981; 145 (1): 215-250
View details for Web of Science ID A1981KX76800010
View details for PubMedID 7265198
-
PERIODICITY OF DEOXYRIBONUCLEASE-I DIGESTION OF CHROMATIN
SCIENCE
1979; 204 (4395): 855-858
Abstract
Two methods have been used to measure the single-strand lengths of the DNA fragments produced by deoxyribonuclease I digestion of chromatin. The average lengths obtained are muliples of about 10.4 bases, significantly different from the value of 10 previously reported. This periodicity in fragment lengths is closely related to the periodicity of the DNA double helix in chromatin, but the two values need not be exactly the same.
View details for Web of Science ID A1979GV41300034
View details for PubMedID 441739
-
CONFORMATION OF AMINO-ACID SIDE-CHAINS IN PROTEINS
JOURNAL OF MOLECULAR BIOLOGY
1978; 125 (3): 357-386
View details for Web of Science ID A1978FY32300007
View details for PubMedID 731698
-
STRUCTURE OF PROTEINS - PACKING OF ALPHA-HELICES AND PLEATED SHEETS
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1977; 74 (10): 4130-4134
Abstract
Simple models are presented that describe the rules for almost all the packing that occurs between and among alpha-helices and pleated sheets. These packing rules, together with the primary and secondary structures, are the major determinants of the three-dimensional structure of proteins.
View details for Web of Science ID A1977DZ33900005
View details for PubMedID 270659
-
STRUCTURE OF NUCLEOSOME CORE PARTICLES OF CHROMATIN
NATURE
1977; 269 (5623): 29-36
Abstract
Cystals have been obtained on nucleosome cores and analysed by X-ray diffraction and electron microscopy. The core is a flat particle of dimensions about 110 X 110 X 57 A, somewhat wedge shaped, and strongly divided into two 'layers', consistent with the DNA being wound into about 1 3/4 turns of a flat superhelix of a pitch about 28 A. The organisation of the DNA can be correlated with the results to enzyme digestion studies. A change in the screw of the DNA double helix on nucleosome formation can be deduced.
View details for Web of Science ID A1977DS90100026
View details for PubMedID 895884
-
STRUCTURAL PATTERNS IN GLOBULAR PROTEINS
NATURE
1976; 261 (5561): 552-558
Abstract
A simple diagrammatic representation has been used to show the arrangement of alpha helices and beta sheets in 31 globular proteins, which are classified into four clearly separated classes. The observed arrangements are significantly non-random in that pieces of secondary structure adjacent in sequence along the polypeptide chain are also often in contact in three dimensions.
View details for Web of Science ID A1976BU52600022
View details for PubMedID 934293