Education & Certifications
B.S., Peking University, China, Biochemistry and Molecular Biology (1997)
M.S., Peking University, China, Biochemistry and Molecular Biology (2000)
Ph.D., University of Washington, Seattle, Computational Biology (2005)
My research interests lie in drug repurposing and personalization through structural biology and systems biology approaches. In particular, my focus is on relating the genetic features of patients to acquired genes and pathway dependencies and identifying small-molecule drugs that target them.
Relating Essential Proteins to Drug Side-Effects Using Canonical Component Analysis: A Structure-Based Approach
JOURNAL OF CHEMICAL INFORMATION AND MODELING
2015; 55 (7): 1483-1494
The molecular mechanism of many drug side-effects is unknown and difficult to predict. Previous methods for explaining side-effects have focused on known drug targets and their pathways. However, low affinity binding to proteins that are not usually considered drug targets may also drive side-effects. In order to assess these alternative targets, we used the 3D structures of 563 essential human proteins systematically to predict binding to 216 drugs. We first benchmarked our affinity predictions with available experimental data. We then combined singular value decomposition and canonical component analysis (SVD-CCA) to predict side-effects based on these novel target profiles. Our method predicts side-effects with good accuracy (average AUC: 0.82 for side effects present in <50% of drug labels). We also noted that side-effect frequency is the most important feature for prediction and can confound efforts at elucidating mechanism; our method allows us to remove the contribution of frequency and isolate novel biological signals. In particular, our analysis produces 2768 triplet associations between 50 essential proteins, 99 drugs, and 77 side-effects. Although experimental validation is difficult because many of our essential proteins do not have validated assays, we nevertheless attempted to validate a subset of these associations using experimental assay data. Our focus on essential proteins allows us to find potential associations that would likely be missed if we used recognized drug targets. Our associations provide novel insights about the molecular mechanisms of drug side-effects and highlight the need for expanded experimental efforts to investigate drug binding to proteins more broadly.
View details for DOI 10.1021/acs.jcim.5b00030
View details for Web of Science ID 000358821300020
Variations in the binding pocket of an inhibitor of the bacterial division protein FtsZ across genotypes and species.
PLoS computational biology
2015; 11 (3)
The recent increase in antibiotic resistance in pathogenic bacteria calls for new approaches to drug-target selection and drug development. Targeting the mechanisms of action of proteins involved in bacterial cell division bypasses problems associated with increasingly ineffective variants of older antibiotics; to this end, the essential bacterial cytoskeletal protein FtsZ is a promising target. Recent work on its allosteric inhibitor, PC190723, revealed in vitro activity on Staphylococcus aureus FtsZ and in vivo antimicrobial activities. However, the mechanism of drug action and its effect on FtsZ in other bacterial species are unclear. Here, we examine the structural environment of the PC190723 binding pocket using PocketFEATURE, a statistical method that scores the similarity between pairs of small-molecule binding sites based on 3D structure information about the local microenvironment, and molecular dynamics (MD) simulations. We observed that species and nucleotide-binding state have significant impacts on the structural properties of the binding site, with substantially disparate microenvironments for bacterial species not from the Staphylococcus genus. Based on PocketFEATURE analysis of MD simulations of S. aureus FtsZ bound to GTP or with mutations that are known to confer PC190723 resistance, we predict that PC190723 strongly prefers to bind Staphylococcus FtsZ in the nucleotide-bound state. Furthermore, MD simulations of an FtsZ dimer indicated that polymerization may enhance PC190723 binding. Taken together, our results demonstrate that a drug-binding pocket can vary significantly across species, genetic perturbations, and in different polymerization states, yielding important information for the further development of FtsZ inhibitors.
View details for DOI 10.1371/journal.pcbi.1004117
View details for PubMedID 25811761
- "Genotype-first" approaches on a curious case of idiopathic progressive cognitive decline BMC MEDICAL GENOMICS 2014; 7
Identifying druggable targets by protein microenvironments matching: application to transcription factors.
CPT: pharmacometrics & systems pharmacology
Druggability of a protein is its potential to be modulated by drug-like molecules. It is important in the target selection phase. We hypothesize that: (i) known drug-binding sites contain advantageous physicochemical properties for drug binding, or "druggable microenvironments" and (ii) given a target, the presence of multiple druggable microenvironments similar to those seen previously is associated with a high likelihood of druggability. We developed DrugFEATURE to quantify druggability by assessing the microenvironments in potential small-molecule binding sites. We benchmarked DrugFEATURE using two data sets. One data set measures druggability using NMR-based screening. DrugFEATURE correlates well with this metric. The second data set is based on historical drug discovery outcomes. Using the DrugFEATURE cutoffs derived from the first, we accurately discriminated druggable and difficult targets in the second. We further identified novel druggable transcription factors with implications for cancer therapy. DrugFEATURE provides useful insight for drug discovery, by evaluating druggability and suggesting specific regions for interacting with drug-like molecules.CPT: Pharmacometrics Systems Pharmacology (2014) 3, e93; doi:10.1038/psp.2013.66; published online 22 January 2014.
View details for DOI 10.1038/psp.2013.66
View details for PubMedID 24452614
Bioinformatics and variability in drug response: a protein structural perspective
JOURNAL OF THE ROYAL SOCIETY INTERFACE
2012; 9 (72): 1409-1437
Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk-benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein-drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein-drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants.
View details for DOI 10.1098/rsif.2011.0843
View details for Web of Science ID 000304437400001
View details for PubMedID 22552919
Using Multiple Microenvironments to Find Similar Ligand-Binding Sites: Application to Kinase Inhibitor Binding
PLOS COMPUTATIONAL BIOLOGY
2011; 7 (12)
The recognition of cryptic small-molecular binding sites in protein structures is important for understanding off-target side effects and for recognizing potential new indications for existing drugs. Current methods focus on the geometry and detailed chemical interactions within putative binding pockets, but may not recognize distant similarities where dynamics or modified interactions allow one ligand to bind apparently divergent binding pockets. In this paper, we introduce an algorithm that seeks similar microenvironments within two binding sites, and assesses overall binding site similarity by the presence of multiple shared microenvironments. The method has relatively weak geometric requirements (to allow for conformational change or dynamics in both the ligand and the pocket) and uses multiple biophysical and biochemical measures to characterize the microenvironments (to allow for diverse modes of ligand binding). We term the algorithm PocketFEATURE, since it focuses on pockets using the FEATURE system for characterizing microenvironments. We validate PocketFEATURE first by showing that it can better discriminate sites that bind similar ligands from those that do not, and by showing that we can recognize FAD-binding sites on a proteome scale with Area Under the Curve (AUC) of 92%. We then apply PocketFEATURE to evolutionarily distant kinases, for which the method recognizes several proven distant relationships, and predicts unexpected shared ligand binding. Using experimental data from ChEMBL and Ambit, we show that at high significance level, 40 kinase pairs are predicted to share ligands. Some of these pairs offer new opportunities for inhibiting two proteins in a single pathway.
View details for DOI 10.1371/journal.pcbi.1002326
View details for Web of Science ID 000299167800043
View details for PubMedID 22219723
Comparative Modeling: The State of the Art and Protein Drug Target Structure Prediction
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING
2011; 14 (6): 532-547
The goal of computational protein structure prediction is to provide three-dimensional (3D) structures with resolution comparable to experimental results. Comparative modeling, which predicts the 3D structure of a protein based on its sequence similarity to homologous structures, is the most accurate computational method for structure prediction. In the last two decades, significant progress has been made on comparative modeling methods. Using the large number of protein structures deposited in the Protein Data Bank (~65,000), automatic prediction pipelines are generating a tremendous number of models (~1.9 million) for sequences whose structures have not been experimentally determined. Accurate models are suitable for a wide range of applications, such as prediction of protein binding sites, prediction of the effect of protein mutations, and structure-guided virtual screening. In particular, comparative modeling has enabled structure-based drug design against protein targets with unknown structures. In this review, we describe the theoretical basis of comparative modeling, the available automatic methods and databases, and the algorithms to evaluate the accuracy of predicted structures. Finally, we discuss relevant applications in the prediction of important drug target proteins, focusing on the G protein-coupled receptor (GPCR) and protein kinase families.
View details for Web of Science ID 000292772100008
View details for PubMedID 21521153
Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues
BMC STRUCTURAL BIOLOGY
The emergence of structural genomics presents significant challenges in the annotation of biologically uncharacterized proteins. Unfortunately, our ability to analyze these proteins is restricted by the limited catalog of known molecular functions and their associated 3D motifs.In order to identify novel 3D motifs that may be associated with molecular functions, we employ an unsupervised, two-phase clustering approach that combines k-means and hierarchical clustering with knowledge-informed cluster selection and annotation methods. We applied the approach to approximately 20,000 cysteine-based protein microenvironments (3D regions 7.5 A in radius) and identified 70 interesting clusters, some of which represent known motifs (e.g. metal binding and phosphatase activity), and some of which are novel, including several zinc binding sites. Detailed annotation results are available online for all 70 clusters at http://feature.stanford.edu/clustering/cys.The use of microenvironments instead of backbone geometric criteria enables flexible exploration of protein function space, and detection of recurring motifs that are discontinuous in sequence and diverse in structure. Clustering microenvironments may thus help to functionally characterize novel proteins and better understand the protein structure-function relationship.
View details for DOI 10.1186/1472-6807-10-4
View details for Web of Science ID 000275410900001
View details for PubMedID 20122268
Prediction of calcium-binding sites by combining loop-modeling with machine learning
BMC STRUCTURAL BIOLOGY
Protein ligand-binding sites in the apo state exhibit structural flexibility. This flexibility often frustrates methods for structure-based recognition of these sites because it leads to the absence of electron density for these critical regions, particularly when they are in surface loops. Methods for recognizing functional sites in these missing loops would be useful for recovering additional functional information.We report a hybrid approach for recognizing calcium-binding sites in disordered regions. Our approach combines loop modeling with a machine learning method (FEATURE) for structure-based site recognition. For validation, we compared the performance of our method on known calcium-binding sites for which there are both holo and apo structures. When loops in the apo structures are rebuilt using modeling methods, FEATURE identifies 14 out of 20 crystallographically proven calcium-binding sites. It only recognizes 7 out of 20 calcium-binding sites in the initial apo crystal structures.We applied our method to unstructured loops in proteins from SCOP families known to bind calcium in order to discover potential cryptic calcium binding sites. We built 2745 missing loops and evaluated them for potential calcium binding. We made 102 predictions of calcium-binding sites. Ten predictions are consistent with independent experimental verifications. We found indirect experimental evidence for 14 other predictions. The remaining 78 predictions are novel predictions, some with intriguing potential biological significance. In particular, we see an enrichment of beta-sheet folds with predicted calcium binding sites in the connecting loops on the surface that may be important for calcium-mediated function switches.Protein crystal structures are a potentially rich source of functional information. When loops are missing in these structures, we may be losing important information about binding sites and active sites. We have shown that limited loop modeling (e.g. loops less than 17 residues) combined with pattern matching algorithms can recover functions and propose putative conformations associated with these functions.
View details for DOI 10.1186/1472-6807-9-72
View details for Web of Science ID 000273849100001
View details for PubMedID 20003365
A novel method for predicting and using distance constraints of high accuracy for refining protein structure prediction
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2009; 77 (1): 220-234
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints-based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics-based residue-specific all-atom probability discriminatory function (RAPDF) to discriminate native-like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native-like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement.
View details for DOI 10.1002/prot.22434
View details for Web of Science ID 000269300000020
View details for PubMedID 19422061
- Predicting drug side-effects by chemical systems biology GENOME BIOLOGY 2009; 10 (9)
Improving the accuracy of template-based predictions by mixing and matching between initial models
BMC STRUCTURAL BIOLOGY
Comparative modeling is a technique to predict the three dimensional structure of a given protein sequence based primarily on its alignment to one or more proteins with experimentally determined structures. A major bottleneck of current comparative modeling methods is the lack of methods to accurately refine a starting initial model so that it approaches the resolution of the corresponding experimental structure. We investigate the effectiveness of a graph-theoretic clique finding approach to solve this problem.Our method takes into account the information presented in multiple templates/alignments at the three-dimensional level by mixing and matching regions between different initial comparative models. This method enables us to obtain an optimized conformation ensemble representing the best combination of secondary structures, resulting in the refined models of higher quality. In addition, the process of mixing and matching accumulates near-native conformations, resulting in discriminating the native-like conformation in a more effective manner. In the seventh Critical Assessment of Structure Prediction (CASP7) experiment, the refined models produced are more accurate than the starting initial models.This novel approach can be applied without any manual intervention to improve the quality of comparative predictions where multiple template/alignment combinations are available for modeling, producing conformational models of higher quality than the starting initial predictions.
View details for DOI 10.1186/1472-6807-8-24
View details for Web of Science ID 000256681000001
View details for PubMedID 18457597
Scoring functions for de novo protein structure prediction revisited.
Methods in molecular biology (Clifton, N.J.)
2008; 413: 243-281
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
View details for PubMedID 18075169
Scaffold proteins and the regeneration of visual pigments
PHOTOCHEMISTRY AND PHOTOBIOLOGY
2006; 82 (6): 1482-1488
CRALBP, cellular retinaldehyde-binding protein, is a retinoid-binding protein necessary for efficient regeneration of rod and cone visual pigments. The C terminus of CRALBP binds to the PDZ domains of EBP50/NHERF-1, which in turn bind to ezrin and actin, proteins localized to the apical processes of the retinal pigment epithelium. In this study, we examined structural features associated with the interaction of the two proteins. The C-terminal amino-acid sequence of 11 orthologous CRALBPs is either ENTAL, ENTAF or EDTAL. Peptides ending in each of these sequences inhibited the interaction of CRALBP and EBP50/NHERF-1 with the use of an overlay assay. Molecular modeling showed that both NTAL and NTAF formed similar networks of H bonds with PDZ1 of EBP50/ NHERF-1, and the side chains of both C-terminal Leu and Phe fit into the peptide-binding groove of PDZ1x CRALBP.11-cis-retinal and EBP50/NHERF-1 migrated as single components when analyzed individually by gel filtration and as a complex when mixed together before gel filtration. Complex formation was abolished by preincubation of EBP50/NHERF-1 with peptide EVENTAL. The ligand absorption spectrum of the complex was identical with that of CRALBP x 11-cis-retinal, demonstrating that complex formation did not perturb the ligand-binding domain of CRALBP.
View details for DOI 10.1562/2006-01-25-RA-784
View details for Web of Science ID 000243214100015
View details for PubMedID 16553463
The effect of experimental resolution on the performance of knowledge-based discriminatory functions for protein structure selection
PROTEIN ENGINEERING DESIGN & SELECTION
2006; 19 (9): 431-437
The key to an accurate method of protein structure prediction is the development of an effective discriminatory function. Knowledge-based discriminatory functions extract parameters from statistical analysis of experimentally determined protein structures. We assess how the quality of the protein structures used for compiling statistics affects the performance of a residue-specific all-atom probability discriminatory function (RAPDF). We find that the discriminatory power correlates with the quality of the structural dataset on which the RAPDF is parameterized in a statistically significant manner. The overrepresentation of unfavorable contacts in the low-resolution and NMR structures contributes to the major errors in the compilation of the conditional probabilities. Such errors weaken the discriminatory power of the function, especially when decoy conformations also contain considerable numbers of unfavorable contacts. This indicates that using high-resolution structural datasets after filtering out unfavorable contacts can improve the performance of knowledge-based discriminatory functions.
View details for DOI 10.1093/protein/gzl027
View details for Web of Science ID 000240544600005
View details for PubMedID 16845128
- CRALBP ligand and protein interactions RETINAL DEGENERATIVE DISEASES 2006; 572: 477-483
Structural insights into the cellular retinaldehyde-binding protein (CRALBP)
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2005; 61 (2): 412-422
Cellular retinaldehyde-binding protein (CRALBP) is an essential protein in the human visual cycle without a known three-dimensional structure. Previous studies associate retinal pathologies to specific mutations in the CRALBP protein. Here we use homology modeling and molecular dynamics methods to investigate the structural mechanisms by which CRALBP functions in the visual cycle. We have constructed two conformations of CRALBP representing two states in the process of ligand association and dissociation. Notably, our homology models map the pathology-associated mutations either directly in or adjacent to the putative ligand-binding cavity. Furthermore, six novel residues have been identified to be crucial for the hinge movement of the lipid-exchange loop in CRALBP. We conclude that the binding and release of retinoid involve large conformational changes in the lipid-exchange loop at the entrance of the ligand-binding cavity.
View details for DOI 10.1002/prot.20621
View details for Web of Science ID 000232420800019
View details for PubMedID 16121400
PROTINFO: new algorithms for enhanced protein structure predictions
NUCLEIC ACIDS RESEARCH
2005; 33: W77-W80
We describe new algorithms and modules for protein structure prediction available as part of the PROTINFO web server. The modules, comparative and de novo modelling, have significantly improved back-end algorithms that were rigorously evaluated at the sixth meeting on the Critical Assessment of Protein Structure Prediction methods. We were one of four server groups invited to make an oral presentation (only the best performing groups are asked to do so). These two modules allow a user to submit a protein sequence and return atomic coordinates representing the tertiary structure of that protein. The PROTINFO server is available at http://protinfo.compbio.washington.edu.
View details for DOI 10.1093/nar/gki403
View details for Web of Science ID 000230271400012
View details for PubMedID 15980581
Identification of CRALBP ligand interactions by photoaffinity labeling, hydrogen/deuterium exchange, and structural modeling
JOURNAL OF BIOLOGICAL CHEMISTRY
2004; 279 (26): 27357-27364
Cellular retinaldehyde-binding protein (CRALBP) functions in the retinal pigment epithelium (RPE) as an acceptor of 11-cis-retinol in the isomerization step of the rod visual cycle and as a substrate carrier for 11-cis-retinol dehydrogenase. Toward a better understanding of CRALBP function, the ligand binding cavity in human recombinant CRALBP (rCRALBP) was characterized by photoaffinity labeling with 3-diazo-4-keto-11-cis-retinal and by high resolution mass spectrometric topological analyses. Eight photoaffinity-modified residues were identified in rCRALBP by liquid chromatography tandem mass spectrometry, including Tyr(179), Phe(197), Cys(198), Met(208), Lys(221), Met(222), Val(223), and Met(225). Multiple different adduct masses were found on the photolabeled residues, and the molecular identity of each modification remains unknown. Supporting the specificity of photo-labeling, 50% of the modified residues have been associate with retinoid interactions by independent analyses. In addition, topological analysis of apo- and holo-rCRALBP by hydrogen/deuterium exchange and mass spectrometry demonstrated residues 198-255 incorporate significantly less deuterium when the retinoid binding pocket is occupied with 11-cis-retinal. This hydrophobic region encompasses all but one of the photo-labeled residues. A structural model of CRALBP ligand binding domain was constructed based on the crystal structures of three homologues in the CRAL-TRIO family of lipid-binding proteins. In the model, all of the photolabeled residues line the ligand binding cavity except Met(208), which appears to reside in a flexible loop at the entrance/exit of the ligand cavity. Overall, the results expand to 12 the number of residues proposed to interact with ligand and provide further insight into CRALBP ligand and protein interactions.
View details for DOI 10.1074/jbc.M401960200
View details for Web of Science ID 000222120400066
View details for PubMedID 15100222