Honors & Awards
-
Honorary Professor, Keio University (2001)
-
Fellow, American College of Medical Informatics (2001)
-
Best Computer Application in Science, Computerworld-Smithsonian Award (1992)
-
Fellow, American Association for the Advancement of Science (1986)
-
Teacher Scholar Award, Henry and Camille Dreyfus (1979)
-
Young Investigator Award, Basil O'Conner National Foundation (1975)
-
Ph.D. with Great Distinction, Stanford Biochemistry Department (1972)
-
Henry M. Green Award for Undergraduate Research, California Institute of Technology (1968)
Boards, Advisory Committees, Professional Organizations
-
Board of Directors, IntelliCorp (1980 - 1985)
-
Scientific Advisory Board, IntelliCorp (1980 - 1986)
-
Board of Directors, IntelliGenetics (1986 - 1991)
-
Editorial Board, Journal of Computational Biology (1993 - 1998)
-
CoFounder, International Society for Computational Biology (1996 - Present)
-
Chairman, Scientific Advisory Board, Time Logic (1997 - 2003)
-
Scientific Advisory Board, DoubleTwist (2000 - 2002)
-
Chairman, Scientific Advisory Board, Pathwork Informatics (2003 - 2006)
-
Presidential Advisory Board, Max Planck Institutes (2003 - 2011)
Professional Education
-
B.S. with Honors, California Inst. of Technology, Biology (1968)
-
Ph.D. with Great Distinction, Stanford University, Biochemistry (1972)
Community and International Work
-
Gordon Conference Organizer, Ventura, California
Topic
Structural, Functional and Evolutionary Genomics
Partnering Organization(s)
NIH
Populations Served
International
Location
US
Ongoing Project
No
Opportunities for Student Involvement
No
-
Presidential Advisor, Max Planck Institute, Germany
Topic
Scientific Evaluation of Bioinformatics at Max Planck
Populations Served
Max Planck Institutes
Location
International
Ongoing Project
Yes
Opportunities for Student Involvement
No
-
Dagstuhl Seminar on Bioinformatics, Dagstuhl, Germany
Topic
Computational Molecular Biology
Populations Served
Interneational
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
International Conferences on Intelligent Systems in Molecular Biology
Topic
Computational Molecular Biology
Partnering Organization(s)
International Society for Computational Biology
Populations Served
International
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
Imperial Cancer Research Fund, London, England
Topic
Computational Molecular Biology
Populations Served
United Kingdom
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
Course on Computer Applications in Molecular Biology, Sydney, Australia
Topic
Computational Molecular Biology
Partnering Organization(s)
Australian National Genomic Information Centre
Populations Served
Australia
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
UNIDO Workshop on Computer Applications in Molecular Biology, Moscow, Russia
Topic
Computational Molecular Biology
Populations Served
Russia
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
Genbank Symposium, the first 15 years, Bethesda, Md
Partnering Organization(s)
National Institute of General Medical Science
Populations Served
International
Location
US
Ongoing Project
No
Opportunities for Student Involvement
No
-
UNIDO Course on Computational Biology, Trieste, Italy
Topic
computational molecular biology
Partnering Organization(s)
International Centre for Genetic Engineering and Biotechnology
Populations Served
International
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
National Library of Medicine, Board of Scientific Counselors, Bethesda, Md
Topic
Informatics
Populations Served
International
Location
US
Ongoing Project
No
Opportunities for Student Involvement
No
-
GenBank Database, Bethesda, Md
Topic
Coinvestigator & Consultant
Populations Served
International
Location
US
Ongoing Project
No
Opportunities for Student Involvement
No
-
BIONET Resource for Computational Biology, Palo Alto, CA
Topic
Networking & Database Access
Partnering Organization(s)
National Institutes of Medicine
Populations Served
International
Location
California
Ongoing Project
No
Opportunities for Student Involvement
No
-
National Library of Medicine, Long Range Planning Panel, Bethesday, Md.
Topic
20 year planning panel
Partnering Organization(s)
National Library of Medicine
Populations Served
International
Location
International
Ongoing Project
No
Opportunities for Student Involvement
No
-
NIH Genetics Study Section, Bethesda, Md
Partnering Organization(s)
National Institutes of Health
Location
US
Ongoing Project
No
Opportunities for Student Involvement
No
Current Research and Scholarly Interests
My primary interest is to understand the flow of information from the genome to the phenotype of an organism. This interest includes predicting the structure and function of molecules from their primary sequence, predicting function from structure and finally simulating protein folding and protein-ligand docking. These goals are the same as the goals of molecular biology, however, we use primarily computational approaches.
2021-22 Courses
-
Independent Studies (2)
- Directed Reading and Research
BIOMEDIN 299 (Aut) - Medical Scholars Research
BIOMEDIN 370 (Aut)
- Directed Reading and Research
Graduate and Fellowship Programs
-
Biomedical Informatics (Phd Program)
All Publications
-
Using Stochastic Roadmap Simulation to predict experimental quantities in protein folding kinetics: Folding rates and phi-values
10th Annual International Conference on Research in Computational Molecular Biology
MARY ANN LIEBERT INC. 2007: 578–93
Abstract
This paper presents a new method for studying protein folding kinetics. It uses the recently introduced Stochastic Roadmap Simulation (SRS) method to estimate the transition state ensemble (TSE) and predict the rates and the Phi-values for protein folding. The new method was tested on 16 proteins, whose rates and Phi-values have been determined experimentally. Comparison with experimental data shows that our method estimates the TSE much more accurately than an existing method based on dynamic programming. This improvement leads to better folding-rate predictions. We also compute the mean first passage time of the unfolded states and show that the computed values correlate with experimentally determined folding rates. The results on Phi-value predictions are mixed, possibly due to the simple energy model used in the tests. This is the first time that results obtained from SRS have been compared against a substantial amount of experimental data. The results further validate the SRS method and indicate its potential as a general tool for studying protein folding kinetics.
View details for DOI 10.1089/cmb.2007.R004
View details for Web of Science ID 000247927100005
View details for PubMedID 17683262
-
Dynamic use of multiple parameter sets in sequence alignment
NUCLEIC ACIDS RESEARCH
2007; 35 (2): 678-686
Abstract
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257,716 pairs of homologous sequences from 100 protein families. On 168,475 of the 257,716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM.
View details for DOI 10.1093/nar/gkl1063
View details for Web of Science ID 000243993600038
View details for PubMedID 17182633
-
Genotypic predictors of human immunodeficiency virus type 1 drug resistance
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (46): 17355-17360
Abstract
Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with cross-resistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype-phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in > or =2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low/intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype-phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, cross-resistance patterns differ for different mutation subsets and that cross-resistance has been underestimated.
View details for DOI 10.1073/pnas.0607274103
View details for Web of Science ID 000242249400053
View details for PubMedID 17065321
View details for PubMedCentralID PMC1622926
-
A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites
NUCLEIC ACIDS RESEARCH
2006; 34 (20): 5730-5739
Abstract
Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs.
View details for DOI 10.1093/nar/gkl585
View details for Web of Science ID 000242474800009
View details for PubMedID 17041233
View details for PubMedCentralID PMC1635261
-
MotifCut: regulatory motifs finding with maximum density subgraphs
14th Conference on Intelligent Systems for Molecular Biology
OXFORD UNIV PRESS. 2006: E150–E157
Abstract
DNA motif finding is one of the core problems in computational biology, for which several probabilistic and discrete approaches have been developed. Most existing methods formulate motif finding as an intractable optimization problem and rely either on expectation maximization (EM) or on local heuristic searches. Another challenge is the choice of motif model: simpler models such as the position-specific scoring matrix (PSSM) impose biologically unrealistic assumptions such as independence of the motif positions, while more involved models are harder to parametrize and learn.We present MotifCut, a graph-theoretic approach to motif finding leading to a convex optimization problem with a polynomial time solution. We build a graph where the vertices represent all k-mers in the input sequences, and edges represent pairwise k-mer similarity. In this graph, we search for a motif as the maximum density subgraph, which is a set of k-mers that exhibit a large number of pairwise similarities. Our formulation does not make strong assumptions regarding the structure of the motif and in practice both motifs that fit well the PSSM model, and those that exhibit strong dependencies between position pairs are found as dense subgraphs. We benchmark MotifCut on both synthetic and real yeast motifs, and find that it compares favorably to existing popular methods. The ability of MotifCut to detect motifs appears to scale well with increasing input size. Moreover, the motifs we discover are different from those discovered by the other methods.MotifCut server and other materials can be found at motifcut.stanford.edu.
View details for DOI 10.1093/bioinformatics/btl243
View details for Web of Science ID 000250005000019
View details for PubMedID 16873465
-
Development and validation of a consistency based multiple structure alignment algorithm
BIOINFORMATICS
2006; 22 (9): 1080-1087
Abstract
We introduce an algorithm that uses the information gained from simultaneous consideration of an entire group of related proteins to create multiple structure alignments (MSTAs). Consistency-based alignment (CBA) first harnesses the information contained within regions that are consistently aligned among a set of pairwise superpositions in order to realign pairs of proteins through both global and local refinement methods. It then constructs a multiple alignment that is maximally consistent with the improved pairwise alignments. We validate CBA's alignments by assessing their accuracy in regions where at least two of the aligned structures contain the same conserved sequence motif.CBA correctly aligns well over 90% of motif residues in superpositions of proteins belonging to the same family or superfamily, and it outperforms a number of previously reported MSTA algorithms.
View details for DOI 10.1093/bioinformatics/btl046
View details for Web of Science ID 000236997600007
View details for PubMedID 16473868
-
Nucleotide channel of RNA-dependent RNA polymerase used for intermolecular uridylylation of protein primer
JOURNAL OF MOLECULAR BIOLOGY
2006; 357 (2): 665-675
Abstract
Poliovirus VPg is a 22 amino acid residue peptide that serves as the protein primer for replication of the viral RNA genome. VPg is known to bind directly to the viral RNA-dependent RNA polymerase, 3D, for covalent uridylylation, yielding mono and di-uridylylated products, VPg-pU and VPg-pUpU, which are subsequently elongated. To model the docking of the VPg substrate to a putative VPg-binding site on the 3D polymerase molecule, we performed a variety of structure-based computations followed by experimental verification. First, potential VPg folded structures were identified, yielding a suite of predicted beta-hairpin structures. These putative VPg structures were then docked to the region of the polymerase implicated by genetic experiments to bind VPg, using grid-based and fragment-based methods. Residues in VPg predicted to affect binding were identified through molecular dynamics simulations, and their effects on the 3D-VPg interaction were tested computationally and biochemically. Experiments with mutant VPg and mutant polymerase molecules confirmed the predicted binding site for VPg on the back side of the polymerase molecule during the uridylylation reaction, opposite to that predicted to bind elongating RNA primers.
View details for DOI 10.1016/j.jmb.2005.12.044
View details for Web of Science ID 000236120200027
View details for PubMedID 16427083
-
A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (5): 1412-1417
Abstract
A striking feature of the human genome is the dearth of CpG dinucleotides (CpGs) interrupted occasionally by CpG islands (CGIs), regions with relatively high content of the dinucleotide. CGIs are generally associated with promoters; genes, whose promoters are especially rich in CpG sequences, tend to be expressed in most tissues. However, all working definitions of what constitutes a CGI rely on ad hoc thresholds. Here we adopt a direct and comprehensive survey to identify the locations of all CpGs in the human genome and find that promoters segregate naturally into two classes by CpG content. Seventy-two percent of promoters belong to the class with high CpG content (HCG), and 28% are in the class whose CpG content is characteristic of the overall genome (low CpG content). The enrichment of CpGs in the HCG class is symmetric and peaks around the core promoter. The broad-based expression of the HCG promoters is not a consequence of a correlation with CpG content because within the HCG class the breadth of expression is independent of the CpG content. The overall depletion of CpGs throughout the genome is thought to be a consequence of the methylation of some germ-line CpGs and their susceptibility to mutation. A comparison of the frequencies of inferred deamination mutations at CpG and GpC dinucleotides in the two classes of promoters using SNPs in human-chimpanzee sequence alignments shows that CpGs mutate at a lower frequency in the HCG promoters, suggesting that CpGs in the HCG class are hypomethylated in the germ line.
View details for DOI 10.1073/pnas.0510310103
View details for Web of Science ID 000235094300047
View details for PubMedID 16432200
View details for PubMedCentralID PMC1345710
-
Predicting experimental quantities in protein folding kinetics using Stochastic Roadmap Simulation
10th Annual International Conference on Research in Computational Molecular Biology
SPRINGER-VERLAG BERLIN. 2006: 410–424
View details for Web of Science ID 000236991800034
-
eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
NUCLEIC ACIDS RESEARCH
2005; 33: D178-D182
Abstract
Classifying proteins into families and superfamilies allows identification of functionally important conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools that recognize similar patterns in novel sequences, and thus enable the prediction of protein function for genomes. The eBLOCKs database enumerates a cascade of protein blocks with varied conservation levels for each functional domain. A biologically important region is most stringently conserved among a smaller family of highly similar proteins. The same region is often found in a larger group of more remotely related proteins with a reduced stringency. Through enumeration, highly specific signatures can be generated from blocks with more columns and fewer family members, while highly sensitive signatures can be derived from blocks with fewer columns and more members as in a superfamily. By applying PSI-BLAST and a modified K-means clustering algorithm, eBLOCKs automatically groups protein sequences according to different levels of similarity. Multiple sequence alignments are made and trimmed into a series of ungapped blocks. Motifs and position-specific scoring matrices were derived from eBLOCKs and made available for sequence search and annotation. The eBLOCKs database provides a tool for high-throughput genome annotation with maximal specificity and sensitivity. The eBLOCKs database is freely available on the World Wide Web at http://motif.stanford.edu/eblocks/ to all users for online usage. Academic and not-for-profit institutions wishing copies of the program may contact Douglas L. Brutlag (brutlag@stanford.edu). Commercial firms wishing copies of the program for internal installation may contact Jacqueline Tay at the Stanford Office of Technology Licensing (jacqueline.tay@stanford.edu; http://otl.stanford.edu/).
View details for DOI 10.1093/nar/gki060
View details for Web of Science ID 000226524300034
View details for PubMedID 15608172
-
Homology modeling of a human glycine alpha 1 receptor reveals a plausible anesthetic binding site
JOURNAL OF CHEMICAL INFORMATION AND MODELING
2005; 45 (1): 128-135
Abstract
The superfamily of ligand-gated ion channels (LGICs) has been implicated in anesthetic and alcohol responses. Mutations within glycine and GABA receptors have demonstrated that possible sites of anesthetic action exist within the transmembrane subunits of these receptors. The exact molecular arrangement of this transmembrane region remains at intermediate resolution with current experimental techniques. Homology modeling methods were therefore combined with experimental data to produce a more exact model of this region. A consensus from multiple bioinformatics techniques predicted the topology within the transmembrane domain of a glycine alpha one receptor (GlyRa1) to be alpha helical. This fold information was combined with sequence information using the SeqFold algorithm to search for modeling templates. Independently, the FoldMiner algorithm was used to search for templates that had structural folds similar to published coordinates of the homologous nAChR (1OED). Both SeqFold and Foldminer identified the same modeling template. The GlyRa1 sequence was aligned with this template using multiple scoring criteria. Refinement of the alignment closed gaps to produce agreement with labeling studies carried out on the homologous receptors of the superfamily. Structural assignment and refinement was achieved using Modeler. The final structure demonstrated a cavity within the core of a four-helix bundle. Residues known to be involved in modulating anesthetic potency converge on and line this cavity. This suggests that the binding sites for volatile anesthetics in the LGICs are the cavities formed within the core of transmembrane four-helix bundles.
View details for DOI 10.1021/ci0497399
View details for Web of Science ID 000227982800016
View details for PubMedID 15667138
-
Computational functional genomics
IEEE SIGNAL PROCESSING MAGAZINE
2004; 21 (6): 62-69
View details for Web of Science ID 000225031500008
-
A suite of web-based programs to search for transcriptional regulatory motifs
NUCLEIC ACIDS RESEARCH
2004; 32: W204-W207
Abstract
The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i) BioProspector, a Gibbs-sampling-based program for predicting regulatory motifs from co-regulated genes in prokaryotes or lower eukaryotes; (ii) CompareProspector, an extension to BioProspector which incorporates comparative genomics features to be used for higher eukaryotes; (iii) MDscan, a program for finding protein-DNA interaction sites from ChIP-on-chip targets. All three programs examine a group of sequences that may share common regulatory motifs and output a list of putative motifs as position-specific probability matrices, the individual sites used to construct the motifs and the location of each site on the input sequences. The web servers and executables can be accessed at http://seqmotifs.stanford.edu.
View details for DOI 10.1093/nar/gkh461
View details for Web of Science ID 000222273100043
View details for PubMedID 15215381
View details for PubMedCentralID PMC441599
-
FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web
NUCLEIC ACIDS RESEARCH
2004; 32: W536-W541
Abstract
The FoldMiner web server (http://foldminer.stanford.edu/) provides remote access to methods for protein structure alignment and unsupervised motif discovery. FoldMiner is unique among such algorithms in that it improves both the motif definition and the sensitivity of a structural similarity search by combining the search and motif discovery methods and using information from each process to enhance the other. In a typical run, a query structure is aligned to all structures in one of several databases of single domain targets in order to identify its structural neighbors and to discover a motif that is the basis for the similarity among the query and statistically significant targets. This process is fully automated, but options for manual refinement of the results are available as well. The server uses the Chime plugin and customized controls to allow for visualization of the motif and of structural superpositions. In addition, we provide an interface to the LOCK 2 algorithm for rapid alignments of a query structure to smaller numbers of user-specified targets.
View details for DOI 10.1093/nar/gkh389
View details for Web of Science ID 000222273100106
View details for PubMedID 15215444
View details for PubMedCentralID PMC441527
-
FoldMiner: Structural motif discovery using an improved superposition algorithm
PROTEIN SCIENCE
2004; 13 (1): 278-294
Abstract
We report an unsupervised structural motif discovery algorithm, FoldMiner, which is able to detect global and local motifs in a database of proteins without the need for multiple structure or sequence alignments and without relying on prior classification of proteins into families. Motifs, which are discovered from pairwise superpositions of a query structure to a database of targets, are described probabilistically in terms of the conservation of each secondary structure element's position and are used to improve detection of distant structural relationships. During each iteration of the algorithm, the motif is defined from the current set of homologs and is used both to recruit additional homologous structures and to discard false positives. FoldMiner thus achieves high specificity and sensitivity by distinguishing between homologous and nonhomologous structures by the regions of the query to which they align. We find that when two proteins of the same fold are aligned, highly conserved secondary structure elements in one protein tend to align to highly conserved elements in the second protein, suggesting that FoldMiner consistently identifies the same motif in members of a fold. Structural alignments are performed by an improved superposition algorithm, LOCK 2, which detects distant structural relationships by placing increased emphasis on the alignment of secondary structure elements. LOCK 2 obeys several properties essential in automated analysis of protein structure: It is symmetric, its alignments of secondary structure elements are transitive, its alignments of residues display a high degree of transitivity, and its scoring system is empirically found to behave as a metric.
View details for DOI 10.1110/ps.03239404
View details for Web of Science ID 000187587700027
View details for PubMedID 14691242
View details for PubMedCentralID PMC2286532
-
WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures
NUCLEIC ACIDS RESEARCH
2003; 31 (13): 3324-3327
Abstract
WebFEATURE (http://feature.stanford.edu/webfeature/) is a web-accessible structural analysis tool that allows users to scan query structures for functional sites in both proteins and nucleic acids. WebFEATURE is the public interface to the scanning algorithm of the FEATURE package, a supervised learning algorithm for creating and identifying 3D, physicochemical motifs in molecular structures. Given an input structure or Protein Data Bank identifier (PDB ID), and a statistical model of a functional site, WebFEATURE will return rank-scored 'hits' in 3D space that identify regions in the structure where similar distributions of physicochemical properties occur relative to the site model. Users can visualize and interactively manipulate scored hits and the query structure in web browsers that support the Chime plug-in. Alternatively, results can be downloaded and visualized through other freely available molecular modeling tools, like RasMol, PyMOL and Chimera. A major application of WebFEATURE is in rapid annotation of function to structures in the context of structural genomics.
View details for DOI 10.1093/nar/gkg553
View details for Web of Science ID 000183832900010
View details for PubMedID 12824318
View details for PubMedCentralID PMC168960
-
3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence motifs
NUCLEIC ACIDS RESEARCH
2003; 31 (13): 3328-3332
Abstract
Computational methods such as sequence alignment and motif construction are useful in grouping related proteins into families, as well as helping to annotate new proteins of unknown function. These methods identify conserved amino acids in protein sequences, but cannot determine the specific functional or structural roles of conserved amino acids without additional study. In this work, we present 3MATRIX (http://3matrix.stanford.edu) and 3MOTIF (http://3motif.stanford.edu), a web-based sequence motif visualization system that displays sequence motif information in its appropriate three-dimensional (3D) context. This system is flexible in that users can enter sequences, keywords, structures or sequence motifs to generate visualizations. In 3MOTIF, users can search using discrete sequence motifs such as PROSITE patterns, eMOTIFs, or any other regular expression-like motif. Similarly, 3MATRIX accepts an eMATRIX position-specific scoring matrix, or will convert a multiple sequence alignment block into an eMATRIX for visualization. Each query motif is used to search the protein structure database for matches, in which the motif is then visually highlighted in three dimensions. Important properties of motifs such as sequence conservation and solvent accessible surface area are also displayed in the visualizations, using carefully chosen color shading schemes.
View details for DOI 10.1093/nar/gkg564
View details for Web of Science ID 000183832900011
View details for PubMedID 12824319
View details for PubMedCentralID PMC168971
-
Remote homology detection: a motif based approach
BIOINFORMATICS
2003; 19: i26-i33
Abstract
Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases.We present a method for detecting remote homology that is based on the presence of discrete sequence motifs. The motif content of a pair of sequences is used to define a similarity that is used as a kernel for a Support Vector Machine (SVM) classifier. We test the method on two remote homology detection tasks: prediction of a previously unseen SCOP family and prediction of an enzyme class given other enzymes that have a similar function on other substrates. We find that it performs significantly better than an SVM method that uses BLAST or Smith-Waterman similarity scores as features.
View details for DOI 10.1093/bioinformatics/btg1002
View details for Web of Science ID 000207434200004
View details for PubMedID 12855434
-
3MOTIF: visualizing conserved protein sequence motifs in the protein structure database
BIOINFORMATICS
2003; 19 (4): 541-542
Abstract
3MOTIF is a web application that visually maps conserved sequence motifs onto three-dimensional protein structures in the Protein Data Bank (PDB; Berman et al., Nucleic Acids Res., 28, 235-242, 2000). Important properties of motifs such as conservation strength and solvent accessible surface area at each position are visually represented on the structure using a variety of color shading schemes. Users can manipulate the displayed motifs using the freely available Chime plugin.http://motif.stanford.edu/3motif/
View details for DOI 10.1093/bioinformatics/btf862
View details for Web of Science ID 000181410700014
View details for PubMedID 12611811
-
Automated construction of structural motifs for predicting functional sites on protein structures.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2003: 204-215
Abstract
Structural genomics initiatives are beginning to rapidly generate vast numbers of protein structures. For many of the structures, functions are not yet determined and high-throughput methods for determining function are necessary. Although there has been extensive work in function prediction at the sequence level, predicting function at the structure level may provide better sensitivity and predictive value. We describe a method to predict functional sites by automatically creating three dimensional structural motifs from amino acid sequence motifs. These structural motifs perform comparably well with manually generated structural motifs and perform better than sequence motifs. Automatically generated structural motifs can be used for structural-genomic scale function prediction on protein structures.
View details for PubMedID 12603029
-
Stochastic conformational roadmaps for computing ensemble properties of molecular motion
5th International Workshop on Algorithmic Foundations of Robotics
SPRINGER-VERLAG BERLIN. 2003: 131–147
View details for Web of Science ID 000189129600009
-
Automatic construction of 3D structural motifs for protein function prediction
2nd International Computational Systems Bioinformatics Conference
IEEE COMPUTER SOC. 2003: 613–614
View details for Web of Science ID 000188997700136
-
Stochastic roadmap simulation: An efficient representation and algorithm for analyzing molecular motion
6th Annual International Conference on Computational Biology (RECOMB 2002)
MARY ANN LIEBERT INC. 2003: 257–81
Abstract
Classic molecular motion simulation techniques, such as Monte Carlo (MC) simulation, generate motion pathways one at a time and spend most of their time in the local minima of the energy landscape defined over a molecular conformation space. Their high computational cost prevents them from being used to compute ensemble properties (properties requiring the analysis of many pathways). This paper introduces stochastic roadmap simulation (SRS) as a new computational approach for exploring the kinetics of molecular motion by simultaneously examining multiple pathways. These pathways are compactly encoded in a graph, which is constructed by sampling a molecular conformation space at random. This computation, which does not trace any particular pathway explicitly, circumvents the local-minima problem. Each edge in the graph represents a potential transition of the molecule and is associated with a probability indicating the likelihood of this transition. By viewing the graph as a Markov chain, ensemble properties can be efficiently computed over the entire molecular energy landscape. Furthermore, SRS converges to the same distribution as MC simulation. SRS is applied to two biological problems: computing the probability of folding, an important order parameter that measures the "kinetic distance" of a protein's conformation from its native state; and estimating the expected time to escape from a ligand-protein binding site. Comparison with MC simulations on protein folding shows that SRS produces arguably more accurate results, while reducing computation time by several orders of magnitude. Computational studies on ligand-protein binding also demonstrate SRS as a promising approach to study ligand-protein interactions.
View details for PubMedID 12935328
-
Stochastic roadmap simulation for the study of ligand-protein interactions
European Conference on Computational Biology (ECCB 2002)
OXFORD UNIV PRESS. 2002: S18–S26
Abstract
Understanding the dynamics of ligand-protein interactions is indispensable in the design of novel therapeutic agents. In this paper, we establish the use of Stochastic Roadmap Simulation (SRS) for the study of ligand-protein interactions through two studies. In our first study, we measure the effects of mutations on the catalytic site of a protein, a process called computational mutagenesis. In our second study, we focus on distinguishing the catalytic site from other putative binding sites. SRS compactly represents many Monte Carlo (MC) simulation paths in a compact graph structure, or roadmap. Furthermore, SRS allows us to analyze all the paths in this roadmap simultaneously. In our application of SRS to the domain of ligand-protein interactions, we consider a new parameter called escape time, the expected number of MC simulation steps required for the ligand to escape from the 'funnel of attraction' of the binding site, as a metric for analyzing such interactions. Although computing escape times would probably be infeasible with MC simulation, these computations can be performed very efficiently with SRS. Our results for six mutant complexes for the first study and seven ligand-protein complexes for the second study, are very promising: In particular, the first results agree well with the biological interpretation of the mutations, while the second results show that escape time is a good metric to distinguish the catalytic site for five out of seven complexes.
View details for Web of Science ID 000178836800004
View details for PubMedID 12385979
-
Using robotics to fold proteins and dock ligands
European Conference on Computational Biology (ECCB 2002)
OXFORD UNIV PRESS. 2002: S74–S74
Abstract
The problems of protein folding and ligand docking have been explored largely using molecular dynamics or Monte Carlo methods. These methods are very compute intensive because they often explore a much wider range of energies, conformations and time than necessary. In addition, Monte Carlo methods often get trapped in local minima. We initially showed that robotic motion planning permitted one to determine the energy of binding and dissociation of ligands from protein binding sites (Singh et al., 1999). The robotic motion planning method maps complicated three-dimensional conformational states into a much simpler, but higher dimensional space in which conformational rearrangements can be represented as linear paths. The dimensionality of the conformation space is of the same order as the number of degrees of conformational freedom in three-dimensional space. We were able to determine the relative energy of association and dissociation of a ligand to a protein by calculating the energetics of interaction for a few thousand conformational states in the vicinity of the protein and choosing the best path from the roadmap. More recently, we have applied roadmap planning to the problem of protein folding (Apaydin et al., 2002a). We represented multiple conformations of a protein as nodes in a compact graph with the edges representing the probability of moving between neighboring states. Instead of using Monte Carlo simulation to simulate thousands of possible paths through various conformational states, we were able to use Markov methods to calculate the steady state occupancy of each conformation, needing to calculate the energy of each conformation only once. We referred to this Markov method of representing multiple conformations and transitions as stochastic roadmap simulation or SRS. We demonstrated that the distribution of conformational states calculated with exhaustive Monte Carlo simulations asymptotically approached the Markov steady state if the same Boltzman energy distribution was used in both methods. SRS permits one to calculate contributions from all possible paths simultaneously with far fewer energy calculations than Monte Carlo or molecular dynamics methods. The SRS method also permits one to represent multiple unfolded starting states and multiple, near-native, folded states and all possible paths between them simultaneously. The SRS method is also independent of the function used to calculate the energy of the various conformational states. In a paper to be presented at this conference (Apaydin et al., 2002b) we have also applied SRS to ligand docking in which we calculate the dynamics of ligand-protein association and dissociation in the region of various binding sites on a number of proteins. SRS permits us to determine the relative times of association to and dissociation from various catalytic and non-catalytic binding sites on protein surfaces. Instead of just following the best path in a roadmap, we can calculate the contribution of all the possible binding or dissociation paths and their relative probabilities and energies simultaneously.
View details for PubMedID 12385986
-
An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments
NATURE BIOTECHNOLOGY
2002; 20 (8): 835-839
Abstract
Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-array) has become a popular procedure for studying genome-wide protein-DNA interactions and transcription regulation. However, it can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP-array experiments in yeast (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms. MDscan can be used to find DNA motifs not only in ChIP-array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.
View details for DOI 10.1038/nbt717
View details for Web of Science ID 000177182500037
View details for PubMedID 12101404
- Capturing Molecular Energy Landscapes with Probabilistic Conformational Roadmaps." International Conference on Robotics and Automatons - 2001: 932-939
-
The EMOTIF database
NUCLEIC ACIDS RESEARCH
2001; 29 (1): 202-204
Abstract
The EMOTIF database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T. D. and Brutlag,D.L. (1997) ISMB-97, 5, 202-209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the EMOTIF patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/.
View details for Web of Science ID 000166360300055
View details for PubMedID 11125091
-
BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2001: 127-138
Abstract
The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. The significance of each motif found is judged based on a motif score distribution estimated by a Monte Carlo method. In addition, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP. We are currently working on combining BioProspector with a clustering program to explore gene expression networks and regulatory mechanisms.
View details for PubMedID 11262934
-
Capturing molecular energy landscapes with probabilistic conformational roadmaps
IEEE International Conference on Robotics and Automation
IEEE. 2001: 932–939
View details for Web of Science ID 000172615800151
-
Fast probabilistic analysis of sequence function using scoring matrices
BIOINFORMATICS
2000; 16 (3): 233-244
Abstract
We present techniques for increasing the speed of sequence analysis using scoring matrices. Our techniques are based on calculating, for a given scoring matrix, the quantile function, which assigns a probability, or p, value to each segmental score. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow scoring matrices to be used more widely in large-scale sequencing and annotation projects.We develop three techniques for increasing the speed of sequence analysis: probability filtering, lookahead scoring, and permuted lookahead scoring. In probability filtering, we compute the score threshold that corresponds to the user-specified p threshold. We use the score threshold to limit the number of segments that are retained in the search process. In lookahead scoring, we test intermediate scores to determine whether they will possibly exceed the score threshold. In permuted lookahead scoring, we score each segment in a particular order designed to maximize the likelihood of early termination. Our two lookahead scoring techniques reduce substantially the number of residues that must be examined. The fraction of residues examined ranges from 62 to 6%, depending on the p threshold chosen by the user. These techniques permit sequence analysis with scoring matrices at speeds that are several times faster than existing programs. On a database of 12 177 alignment blocks, our techniques permit sequence analysis at a speed of 225 residues/s for a p threshold of 10-6, and 541 residues/s for a p threshold of 10-20. In order to compute the quantile function, we may use either an independence assumption or a Markov assumption. We measure the effect of first- and second-order Markov assumptions and find that they tend to raise the p value of segments, when compared with the independence assumption, by average ratios of 1.30 and 1.69, respectively. We also compare our technique with the empirical 99. 5th percentile scores compiled in the BLOCKSPLUS database, and find that they correspond on average to a p value of 1.5 x 10-5.The techniques described above are implemented in a software package called EMATRIX. This package is available from the authors for free academic use or for licensed commercial use. The EMATRIX set of programs is also available on the Internet at http://motif.stanford.edu/ematrix.
View details for Web of Science ID 000087630000006
View details for PubMedID 10869016
-
Bayesian segmentation of protein secondary structure
JOURNAL OF COMPUTATIONAL BIOLOGY
2000; 7 (1-2): 233-248
Abstract
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for alpha-helices, beta-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide significant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.
View details for Web of Science ID 000087833300014
View details for PubMedID 10890399
-
Minimal-risk scoring matrices for sequence analysis
JOURNAL OF COMPUTATIONAL BIOLOGY
1999; 6 (2): 219-235
Abstract
We introduce a minimal-risk method for estimating the frequencies of amino acids at conserved positions in a protein family. Our method, called minimal-risk estimation, finds the optimal weighting between a set of observed amino acid counts and a set of pseudofrequencies, which represent prior information about the frequencies. We compute the optimal weighting by minimizing the expected distance between the estimated frequencies and the true population frequencies, measured by either a squared-error or a relative-entropy metric. Our method accounts for the source of the pseudofrequencies, which arise either from the background distribution of amino acids or from applying a substitution matrix to the observed data. Our frequency estimates therefore depend on the size and composition of the observed data as well as the source of the pseudofrequencies. We convert our frequency estimates into minimal-risk scoring matrices for sequence analysis. A large-scale cross-validation study, involving 48 variants of seven methods, shows that the best performing method is minimal-risk estimation using the squared-error metric. Our method is implemented in the package EMATRIX, which is available on the Internet at http://motif.stanford.edu/ematrix.
View details for Web of Science ID 000081343800005
View details for PubMedID 10421524
-
A motion planning approach to flexible ligand binding.
Proceedings. International Conference on Intelligent Systems for Molecular Biology
1999: 252-261
Abstract
Most computational models of protein-ligand interactions consider only the energetics of the final bound state of the complex and do not examine the dynamics of the ligand as it enters the binding site. We have developed a novel technique for studying the dynamics of protein-ligand interactions based on motion planning algorithms from the field of robotics. Our algorithm uses electrostatic and van der Waals potentials to compute the most energetically favorable path between any given initial and goal ligand configurations. We use probabilistic motion planning to sample the distribution of possible paths to a given goal configuration and compute an energy-based "difficulty weight" for each path. By statistically averaging this weight over several randomly generated starting configurations, we compute the relative difficulty of entering and leaving a given binding configuration. This approach yields details of the energy contours around the binding site and can be used to characterize and predict good binding sites. Results from tests with three protein-ligand complexes indicate that our algorithm is able to detect energy barriers around the true binding site that distinguish this site from other predicted low-energy binding sites.
View details for PubMedID 10786308
-
Regression analysis of multiple protein structures
JOURNAL OF COMPUTATIONAL BIOLOGY
1998; 5 (3): 585-595
Abstract
A general framework is presented for analyzing multiple protein structures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this approach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposing these landmarks. Matching is performed using a robust dynamic programming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, which impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measures the amount of structural variability at each landmark. A variation of our algorithm permits a separate weight for each landmark, thereby allowing one to emphasize particular segments of a protein structure or to compensate for variances that differ at various positions in a structure. In addition, a method is introduced for finding an initial correspondence, by measuring the discrete curvature along each protein backbone. Discrete curvature also characterizes the secondary structure of a protein backbone, distinguishing among helical, strand, and loop regions. An example is presented involving a set of seven globin structures. Regression analysis, using both affine and orthogonal transformations, reveals that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.
View details for Web of Science ID 000075921100016
View details for PubMedID 9773352
-
Directions for clinical research and genomic research into the next decade: Implications for informatics
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
1998; 5 (5): 404-411
Abstract
Medical informatics is defined largely by its host disciplines in clinical and biological medicine, and to project the agenda for informatics into the next decade, the health community must envision the broad context of biomedical research. This paper is a sketch of this vision, taking into account pressures from changes in the U.S. health care system, the need for more objective information on which to base health care decisions, and the accelerating progress and clinical impact of genomics research. The lessons of modern genomics research demonstrate the power of computing and communication tools to facilitate rapid progress through the adoption of open community standards for information exchange and collaboration. While aspects of this vision are speculative, it seems clear that the core agenda for informatics must be the development of interoperating systems that can facilitate the secure gathering, interchange, and analysis of high-quality information and can gain leverage from worldwide collaboration in advancing and applying new medical knowledge.
View details for Web of Science ID 000075932200003
View details for PubMedID 9760387
-
Genomics and computational molecular biology
CURRENT OPINION IN MICROBIOLOGY
1998; 1 (3): 340-345
Abstract
There has been a dramatic increase in the number of completely sequenced bacterial genomes during the past two years as a result of the efforts both of public genome agencies and the pharmaceutical industry. The availability of completely sequenced genomes permits more systematic analyses of genes, evolution and genome function than was otherwise possible. Using computational methods - which are used to identify genes and their functions including statistics, sequence similarity, motifs, profiles, protein folds and probabilistic models - it is possible to develop characteristic genome signatures, assign functions to genes, identify pathogenic genes, identify metabolic pathways, develop diagnostic probes and discover potential drug-binding sites. All of these directions are critical to understanding bacterial growth, pathogenicity and host-pathogen interactions.
View details for Web of Science ID 000075765200012
View details for PubMedID 10066490
-
Highly specific protein sequence motifs for genome analysis
Colloquium on Computational Biomolecular Science
NATL ACAD SCIENCES. 1998: 5865–71
Abstract
We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called EMOTIF (http://motif. stanford.edu/emotif). Given an aligned set of protein sequences, EMOTIF generates a set of motifs with a wide range of specificities and sensitivities. EMOTIF also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used EMOTIF to generate sets of motifs from all 7,000 protein alignments in the BLOCKS and PRINTS databases. The resulting database, called IDENTIFY (http://motif. stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10(-10) to 10(-5). Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. IDENTIFY assigns biological functions to 25-30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, IDENTIFY assigned functions to 172 of proteins of unknown function in the yeast genome.
View details for Web of Science ID 000073852600006
View details for PubMedID 9600885
-
Modeling and superposition of multiple protein structures using affine transformations: analysis of the globins.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
1998: 509-520
Abstract
A novel approach for analyzing multiple protein structures is presented. A family of related protein structures may be characterized by an affine model, obtained by applying transformation matrices that permit both rotation and shear. The affine model and transformation matrices can be computed efficiently using a single eigen-decomposition. A novel method for finding correspondences is also introduced. This method matches curvatures along the protein backbone. The algorithm is applied to analyze a set of seven globin structures. Our method identifies 100 corresponding landmarks across all seven structures. Results show that most helices in globins can be identified by high curvature, with the exception of the C and D helices. Analysis of the superposition reveals that globins are most strongly conserved structurally in the mid-regions of the E and G helices.
View details for PubMedID 9697208
-
Enumerating and ranking discrete motifs
5th International Conference on Intelligent Systems for Molecular Biology (ISMB-97)
AMER ASSOC ARTIFICIAL INTELLIGENCE. 1997: 202–209
Abstract
Discrete motifs that discriminate functional classes of proteins are useful for classifying new sequences, capturing structural constraints, and identifying protein subclasses. Despite the fact that the space of such motifs can grow exponentially with sequence length and number, we show that in practice it usually does not, and we describe a technique that infers motifs from aligned protein sequences by exhaustively searching this space. Our method generates sequence motifs over a wide range of recall and precision, and chooses a representative motif based on a score that we derive from both statistical and information-theoretic frameworks. Finally, we show that the selected motifs perform well in practice, classifying unseen sequences with extremely high precision, and infer protein subclasses that correspond to known biochemical classes.
View details for Web of Science ID 000072320000029
View details for PubMedID 9322037
-
Hierarchical protein structure superposition using both secondary structure and atomic representations
5th International Conference on Intelligent Systems for Molecular Biology (ISMB-97)
AMER ASSOC ARTIFICIAL INTELLIGENCE. 1997: 284–293
Abstract
The structural comparison of proteins has become increasingly important as a means to identify protein motifs and fold families. In this paper we present a new algorithm for the comparison of proteins based on a hierarchy of structural representations, from the secondary structure level to the atomic level. Our technique represents alpha-helices and beta-strands as vectors and uses a set of seven scoring functions to compare pairs of vectors from different proteins. The scores obtained are used in a dynamic programming algorithm that finds the best local alignment of the two sets of vectors. The second step in our algorithm is based on the atomic coordinates of the protein structures and improves the initial vector alignment by iteratively minimizing the RMSD between pairs of nearest atoms from the two proteins. We refine the final alignment by determining a core of well aligned atoms and minimizing the RMSD of this core. In a comparison of our method to Holm and Sander's DALI algorithm, our program was able to detect structural similarity at the same level as DALI. We also performed searches of a representative set of the Protein Data Bank (PDB) using our program and detected structurally similarity between several distantly related proteins.
View details for Web of Science ID 000072320000043
View details for PubMedID 9322051
-
Introns and reading frames: Correlation between splicing sites and their codon positions
MOLECULAR BIOLOGY AND EVOLUTION
1996; 13 (9): 1219-1223
Abstract
Computer analyses of the entire GenBank database were conducted to examine correlation between splicing sites and codon positions in reading frames. Intron insertion patterns (i.e., splicing site locations with respect to codon positions) have been analyzed for all of the 74 codons of all the eukaryote taxonomic groups: primates, rodents mammals, vertebrates, invertebrates, and plants. We found that reading frames are interrupted by an intron at a codon boundary (as opposed to the middle of a codon) significantly more often than expected. This observation is consistent with the exon shuffling hypothesis, because exons that end at codon boundaries can be concatenated without causing a frame shift and thus are evolutionarily advantageous. On the other hand, when introns interrupt at the middles of codons, they exist in between the first and second bases much more frequently than between the second and third bases, despite the fact that boundaries between the first and second bases of codons are generally far more important than those between the second and third bases. The reason for this is not clear and yet to be explained. We also show that the length of an exon is a multiple of 3 more frequently than expected. Furthermore, the total length of two consecutive exons is also more frequently a multiple of 3. All the observations above are consistent with results recently published by Long, Rosenberg, and Gilbert (1995).
View details for Web of Science ID A1996VR29900008
View details for PubMedID 8896374
-
Sequences and topology - Challenges for algorithms and experts - Editorial overview
CURRENT OPINION IN STRUCTURAL BIOLOGY
1996; 6 (3): 343-345
View details for Web of Science ID A1996UU65400010
View details for PubMedID 8804838
-
Discovering empirically conserved amino acid substitution groups in databases of protein families.
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
1996; 4: 230-240
Abstract
This paper introduces a method for identifying empirically conserved amino acid substitution groups. In contrast with existing approaches that view amino acid substitution as a pairwise phenomenon, the method presented here identifies conserved groups of amino acids using a data structure called a conditional distribution matrix. The conditional distribution matrix extends the concept of a pairwise substitution matrix by changing the context of substitution from a single amino acid to a group of amino acids. The matrix tabulates information from a database of protein families that contains numerous aligned positions. Each row in the matrix contains the distribution of amino acids in those aligned positions that contain a given conditioning group of amino acids. The method converts a database of protein families into a conditional distribution matrix and then examines each possible substitution group for evidence of conservation. The algorithm is applied to the BLOCKS and HSSP databases. Twenty amino acid substitution groups are found to be conserved empirically in both databases. These groups provide insight into biochemical properties that are conserved in protein evolution.
View details for PubMedID 8877523
-
Identification of protein motifs using conserved amino acid properties and partitioning techniques.
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
1995; 3: 402-410
Abstract
Analyzing a set of protein sequences involves a fundamental relationship between the coherency of the set and the specificity of the motif that describes it. Motifs may be obscured by training sets that contain incoherent sequences, in part due to protein subclasses, contamination, or errors. We develop an algorithm for motif identification that systematically explores possible patterns of coherency within a set of protein sequences. Our algorithm constructs alternative partitions of the training set data, where one subset of each partition is presumed to contain coherent data and is used for forming a motif. The motif is represented by multiple overlapping amino acid groups based on evolutionary, biochemical, or physical properties. We demonstrate our method on a training set of reverse transcriptases that contains subclasses, sequence errors, misalignments, and contaminating sequences. Despite these complications, our program identifies a novel motif for the subclass of retroviral and retrovirus-related reverse transcriptases. This motif has a much higher specificity than previously reported motifs and suggests the importance of conserved hydrophilic and hydrophobic residues in the structure of reverse transcriptases.
View details for PubMedID 7584465
-
DISCOVERING STRUCTURAL CORRELATIONS IN ALPHA-HELICES
PROTEIN SCIENCE
1994; 3 (10): 1847-1857
Abstract
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.
View details for Web of Science ID A1994PR26000024
View details for PubMedID 7849600
-
On near-optimal alignments of biological sequences.
Journal of computational biology
1994; 1 (4): 349-366
Abstract
A near-optimal alignment between a pair of sequences is an alignment whose score lies within the neighborhood of the optimal score. We present an efficient method for representing all alignments whose score is within any given delta from the optimal score. The representation is a compact graph that makes it easy to impose additional biological constraints and select one desirable alignment from the large set of alignments. We study the combinatorial nature of near-optimal alignments, and define a set of "canonical" near-optimal alignments. We then show how to enumerate near-optimal alignments efficiently in order of their score, and count their number. When applied to comparisons of two distantly related proteins, near-optimal alignments reveal that the most conserved regions among the near-optimal alignments are the highly structured regions in the proteins. We also show that by counting the number of near optimal alignments as a function of the distance from the optimal score, we can select a good set of parameters that best constraints the biologically relevant alignments.
View details for PubMedID 8790476
-
Discovering side-chain correlation in alpha-helices.
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
1994; 2: 236-243
Abstract
Using a new representation for interactions in protein sequences based on correlations between pairs of amino acids, we have examined alpha-helical segments from known protein structures for important interactions. Traditional techniques for representing protein sequences usually make an explicit assumption of conditional independence of residues in the sequences. Protein structure analyses, however, have repeatedly demonstrated the importance of amino acid interactions for structural stability. We have developed an automated program for discovering sequence correlations in sets of aligned protein sequences using standard statistical tests and for representing them with Bayesian networks. In this paper, we demonstrate the power of our discovery program and representation by analyzing pairs of residues from alpha-helices. The sequence correlations we find represent physical and chemical interactions among amino-acid side chains in helical structures. Furthermore, these local interactions are likely to be important for stabilizing and packing alpha-helices. Lastly, we have also detect correlations in side-chain comformations that indicate important structural interactions but which don't appear as sequence correlations.
View details for PubMedID 7584396
-
BLAZE (TM) - AN IMPLEMENTATION OF THE SMITH-WATERMAN SEQUENCE COMPARISON ALGORITHM ON A MASSIVELY-PARALLEL COMPUTER
2ND INTERNATIONAL WORKSHOP ON OPEN PROBLEMS OF COMPUTATIONAL MOLECULAR BIOLOGY
PERGAMON-ELSEVIER SCIENCE LTD. 1993: 203–7
View details for Web of Science ID A1993LX57100011
-
Detection of correlations in tRNA sequences with structural implications.
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
1993; 1: 225-233
Abstract
Using an flexible representation of biological sequences, we have performed a comparative analysis of 1208 known tRNA sequences. We believe we our technique is a more sensitive method for detecting structural and functional relationships in sets of aligned sequences because we use a flexible representation (for sequences), as well as a general statistical method that can detect a wide range of relationships between positions in a sequence. Our method utilizes functional classifications of the sequence building-blocks (nucleotide bases and amino acids) based on physical or chemical properties. This flexibility in sequence representation improves the significance of finding sequence relationships mediated by the defining property. For example, using a purine/pyrimidine classification, we can detect base-stacking interactions in sets of nucleotide sequences that form base-paired helices. We use several statistical measures, including chi 2-tests, Monte Carlo simulations and an information measure to detect significant correlations in sequences. In this paper we illustrate our method by analyzing a set of tRNA sequences and showing that the correlations our program discovers, in each case, correspond to the known base-pairing and higher order interactions observed in tRNA crystal structures. Furthermore, we show that novel and interesting features of tRNAs are detected when sequence correlations with the charged amino acid (and anticodon) are evaluated. This technique is a powerful method for predicting the structure of RNAs and for analyzing specific functional characteristics.
View details for PubMedID 7584340
-
SEARCHING GENE AND PROTEIN-SEQUENCE DATABASES
M D COMPUTING
1991; 8 (3): 144-149
Abstract
A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.
View details for Web of Science ID A1991FM59600003
View details for PubMedID 1857191
-
KNOWLEDGE-BASED SIMULATION OF DNA METABOLISM - PREDICTION OF ENZYME ACTION
COMPUTER APPLICATIONS IN THE BIOSCIENCES
1991; 7 (1): 9-19
Abstract
We have developed a knowledge-based simulation of DNA metabolism that accurately predicts the actions of enzymes on DNA under a large number of environmental conditions. Previous simulations of enzyme systems rely predominantly on mathematical models. We use a frame-based representation to model enzymes, substrates and conditions. Interactions between these objects are expressed using production rules and an underlying truth maintenance system. The system performs rapid inference and can explain its reasoning. A graphical interface provides access to all elements of the simulation, including object representations and explanation graphs. Predicting enzyme action is the first step in the development of a large knowledge base to envision the metabolic pathways of DNA replication and repair.
View details for Web of Science ID A1991EW01400002
View details for PubMedID 2004281
-
IMPROVED SENSITIVITY OF BIOLOGICAL SEQUENCE DATABASE SEARCHES
COMPUTER APPLICATIONS IN THE BIOSCIENCES
1990; 6 (3): 237-245
Abstract
We have increased the sensitivity of DNA and protein sequence database searches by allowing similar but non-identical amino acids or nucleotides to match. In addition, one can match k-tuples or words instead of matching individual residues in order to speed the search. A matching matrix species which k-tuples match each other. The matching matrix can be calculated from a similarity matrix of amino acids and a threshold of similarity required for matching. This permits amino acid similarity matrices or replacement matrices (PAM matrices) to be used in the first step of a sequence comparison rather than in a secondary scoring phase. The concept of matching non-identical k-tuples also increases the power of DNA database searches. For example, a matrix that specifies that any 3-tuple in a DNA sequence can match any other 3-tuple encoding the same amino acid permits a DNA database search using a DNA query sequence for regions that would encode a similar amino acid sequence.
View details for Web of Science ID A1990DU66600011
View details for PubMedID 2207748
-
CONVERSION AND RECIPROCAL EXCHANGE BETWEEN TANDEM REPEATS IN DROSOPHILA-MELANOGASTER
MOLECULAR & GENERAL GENETICS
1989; 220 (1): 140-146
Abstract
We have developed an experimental system to assay conversion and reciprocal exchange between tandem repeats in Drosophila melanogaster. In this system, the recombining markers map 0.76 kb apart within the Adh gene, and the length of the repeated unit is 4.75 kb. Our results provide a preliminary record of germline frequencies of gene conversion and unequal exchange between these markers. Conversions involving dispersed repeats were not observed, and may be less frequent. This work demonstrates that conversion takes place at an appreciable frequency between tandem repeats in metazoan germline. It confirms that gene conversion can mediate homogenization of reiterated sequences in higher eukaryotes.
View details for Web of Science ID A1989CE16800022
View details for PubMedID 2514345
-
IS THERE A RELATIONSHIP BETWEEN DNA-SEQUENCES ENCODING PEPTIDE LIGANDS AND THEIR RECEPTORS
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1989; 86 (1): 42-45
Abstract
It has been suggested that the coding for a ligand and its receptor may have originated in inverse complementary strands of the same DNA. This would imply a deficiency of stop codons in the complementary strand of the ligand message sequence. We have sought evidence of such deficiencies by an analysis of the usage of selected codons in 23 human neuropeptide and hormone mRNA sequences. We have also searched directly for similarities between substance K or substance P and the substance K receptor. Although bovine proopiomelanocortin has an open reading frame for the full extent of the inverse complement of the coding region, this seems to be a unique case. The data as a whole do not support the hypothesis.
View details for Web of Science ID A1989R820200010
View details for PubMedID 2536158
-
EXPRESSION OF THE DROSOPHILA TYPE-II TOPOISOMERASE IS DEVELOPMENTALLY REGULATED
BIOCHEMISTRY
1988; 27 (2): 560-565
Abstract
The expression of the type II topoisomerase from Drosophila melanogaster was studied during development and in tissue culture cells. RNA blot and protein blot analyses using probes specific for Drosophila topoisomerase II show that the enzyme is developmentally regulated. Levels of both RNA transcript and protein appear highest during early embryogenesis and pupation, periods which are known to have the highest mitotic activity. Tissue culture analysis using Drosophila Kc cells supports these results as levels of topoisomerase II message are higher in rapidly dividing cells than in quiescent cells. Analysis of topoisomerase II levels in early embryos suggests that levels are adequately high for the enzyme to act in DNA replication or segregation at termination of replication. Apparent in vivo proteolysis of topoisomerase II is seen throughout the life cycle, in spite of careful precautions. Whether these proteolytic fragments are important in vivo is still uncertain.
View details for Web of Science ID A1988L864800009
View details for PubMedID 2831969
-
IDENTICAL SATELLITE DNA-SEQUENCES IN SIBLING SPECIES OF DROSOPHILA
JOURNAL OF MOLECULAR BIOLOGY
1987; 194 (2): 161-170
Abstract
The evolution of simple satellite DNAs was examined by DNA-DNA hybridization of ten Drosophila melanogaster satellite sequences to DNAs of the sibling species, Drosophila simulans and Drosophila erecta. Seven of these repeat types are present in tandem arrays in D. simulans and each of the ten sequences is repeated in D. erecta. In thermal melts, six of the seven satellite sequences in D. simulans and seven of the ten sequences in D. erecta melted within 1 deg.C of the corresponding values in D. melanogaster. The remaining sequences melted within 3 deg.C of the homologous hybrids. Therefore, there is little or no alteration in those satellite sequences held in common, despite a period of about ten million years since the divergence of D. melanogaster and D. simulans from a common ancestor. Simple satellite sequences appear to be more highly conserved than coding regions of the genome, on a per nucleotide basis. Since multiple copies of three satellite sequences could not be detected in D. simulans yet are present in D. erecta, a species more distantly related to D. melanogaster than is D. simulans, these sequences show discontinuities in evolution. There were major quantitative variations between species, showing that satellite DNAs are prone to massive amplification or diminution events over timespans as short as those separating sibling species. In D. melanogaster, these sequences amount to 21% of the genome but only 5% in D. simulans and 0.4% in D. erecta. There was a general trend of lower abundance with evolutionary distance for most satellites, suggesting that the amounts of different satellite sequences do not vary independently during evolution.
View details for Web of Science ID A1987G569100001
View details for PubMedID 3112413
-
ADJACENT SATELLITE DNA SEGMENTS IN DROSOPHILA - STRUCTURE OF JUNCTIONS
JOURNAL OF MOLECULAR BIOLOGY
1987; 194 (2): 171-179
Abstract
The structure of eight satellite DNA molecules containing a junction between tandem arrays of different repeated sequences is described. In one class of junctions there was an abrupt switch with the juxtaposition of two satellite arrays. These arrays were closely related and the periodicity of repeats was maintained in phase across the junction. These arrays usually showed extreme homogeneity in their repeating sequences. A second class of junctions was more complex, and in two cases may have arisen by the insertion of a mobile element into a satellite array. A novel mechanism of satellite formation is proposed to explain the precision of junctions and sequence similarities of neighboring satellite arrays. Homogeneous satellite arrays would be generated enzymatically by synthesis of a repeat using the preceding repeat as template. Occasional errors in copying of the template, either single base changes or misreading the length of the repeat unit, would lead to abrupt switches in the repeating sequence.
View details for Web of Science ID A1987G569100002
View details for PubMedID 3112414
-
MULTIPLE FORMS AND CELLULAR-LOCALIZATION OF DROSOPHILA DNA TOPOISOMERASE-II
JOURNAL OF BIOLOGICAL CHEMISTRY
1986; 261 (17): 8063-8069
Abstract
Purified type II topoisomerase from Drosophila melanogaster embryos was reported earlier to contain a major polypeptide of 166,000 daltons and several smaller peptides between 132,000 and 145,000 daltons (Shelton, E. R., Osheroff, N. and Brutlag, D. L. (1983) J. Biol. Chem. 258, 9530-9535). Using purified topoisomerase II we have raised antibodies against the 132,000-166,000-dalton cluster of polypeptides. In this paper we demonstrate that at least three of these polypeptides are also present in embryos immediately upon lysis. Using antigen-affinity purified antibody from the cluster of purified topoisomerase II antigens, we have also discovered several smaller polypeptides in the molecular size range of 30,000-40,000 daltons in embryo extracts. These observations suggest the presence of multiple forms of DNA topoisomerases in the cell. In addition, we demonstrate that purified Drosophila topoisomerase II antibody recognizes yeast topoisomerase II antigens expressed by lambda gt 11-yeast topoisomerase II recombinants (Goto, T. and Wang, J. C. (1984) Cell 36, 1073-1080) establishing a structural homology between yeast and Drosophila enzymes. Antibody preparations were also used to localize the distribution of topoisomerase II in polytene nuclei. In contrast with the distribution of topoisomerase I which is located primarily at puffs, the Drosophila topoisomerase II is distributed generally along the chromosomes paralleling the distribution of DNA itself.
View details for Web of Science ID A1986C767100080
View details for PubMedID 3011806
-
PROXIMITY-DEPENDENT ENHANCEMENT OF SGS-4 GENE-EXPRESSION IN DROSOPHILA-MELANOGASTER
CELL
1986; 44 (6): 879-883
Abstract
A weakly expressed allele of Sgs-4 obtained from the Kochi strain of D. melanogaster (Sgs-4K) increases 4-fold in levels of accumulated transcript when paired with a wild-type (Oregon-R) Sgs-4 allele (Sgs-4ORE) and increases 9-fold when paired with a duplication of the wild-type Sgs-4 allele Confluens (Sgs-4Co). There is no enhancement of expression when Sgs-4K is paired with the Sgs-4 null allele Ber-I (Sgs-4BER) or with another weakly expressed Sgs-4 allele, Hikone-R (Sgs-4HIK); there is minimal enhancement when chromosome pairing is disrupted near the Sgs-4 locus by rearrangement of the wild-type Sgs-4 allele. Cytological analysis of third instar salivary gland nuclei from Kochi/Confluens heterozygotes suggests that complete pairing is not essential for Sgs-4K enhancement. Interactions between homologous loci that could produce a trans enhancement effect are discussed and a hypothesis is formulated to test the significance of specific upstream sequences from both alleles in expression enhancement.
View details for Web of Science ID A1986A741800008
View details for PubMedID 3082518
-
MULTIPLICITY OF SATELLITE DNA-SEQUENCES IN DROSOPHILA-MELANOGASTER
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1986; 83 (3): 696-700
Abstract
Three Drosophila melanogaster satellite DNAs (1.672, 1.686, and 1.705 g/ml in CsCl), each containing a simple sequence repeated in tandem, were cloned in pBR322 as small fragments about 500 base pairs long. This precaution minimized deletions, since inserts of the same size as the fragments used for cloning were recovered in a stable form. A homogeneous tandem array of one sequence type usually extended the length of the insert. Eleven distinct repeat sequences were discovered, but only one sequence was predominant in each satellite preparation. The remaining classes were minor in amount. The repeat unit lengths were restricted to 5, 7, or 10 base pairs, with sequences closely related. Each sequence conforms to the expression (RRN)m(RN)n, where R is A or G. The multiplicity of simple repeated sequences revealed despite the small sample size suggests that numerous repeat sequences reside in heterochromatin and that particular rules apply to the structure of the repeating sequence.
View details for Web of Science ID A1986AZB0800036
View details for PubMedID 3080746
-
BIONET - NATIONAL COMPUTER RESOURCE FOR MOLECULAR-BIOLOGY
NUCLEIC ACIDS RESEARCH
1986; 14 (1): 17-20
Abstract
This paper describes briefly the BIONET National Computer Resource for Molecular Biology. This presentation is intended as information for scientists in molecular biology and related disciplines who require access to computational methods for sequence analysis. We describe the goals, and the service and research opportunities offered to the community by BIONET, the relationship of BIONET to other national and regional resources, our recent efforts toward distribution of the resource to BIONET Satellites, and procedures for investigators to gain access to the Resource.
View details for Web of Science ID A1986AYR7500005
View details for PubMedID 3945548
-
A FAMILY OF DISPERSED REPETITIVE EXTRAGENIC PALINDROMIC DNA-SEQUENCES IN ESCHERICHIA-COLI
EMBO JOURNAL
1984; 3 (6): 1417-1421
Abstract
We report the properties of 67 members of a family of dispersed repetitive palindromic extragenic bacterial DNA sequences. These sequences, called palindromic units, appear to be present at least several hundred times outside structural genes on the Escherichia coli chromosome. They are found either in clusters - as in a previously described intercistronic element - or in single occurrences. They are not only found within an operon but also between different operons, including between convergent ones. The palindromic units could yield a stem and loop structure at the level of DNA or RNA. The base of the stem is made of eight remarkably conserved base pairs while the rest varies somewhat in length and sequence. We analyse the data available on the palindromic units and we speculate on their possible roles with emphasis on transcription and mRNA stability or processing, as well as on their possible relation to transposition elements and the modular evolution of the genome.
View details for Web of Science ID A1984SU71000031
View details for PubMedID 6378622
-
SIMILARITIES IN STRUCTURE AND FUNCTION OF CALF THYMUS AND DROSOPHILA CASEIN KINASE-II
JOURNAL OF BIOLOGICAL CHEMISTRY
1984; 259 (14): 9001-9006
Abstract
Both calf and Drosophila contain a type II casein kinase with similar molecular structure and catalytic activity. Purified calf thymus casein kinase II is composed of three subunits of Mr = 44,000 (alpha), 40,000 (alpha'), and 26,000 (beta) (Dahmus, M.E. (1981) J. Biol. Chem. 256, 3319-3325), whereas the Drosophila enzyme is composed of two subunits of Mr = 36,700 (alpha) and 28,200 (beta) (Glover, C. V. C., Shelton, E. R., and Brutlag, D. L. (1983) J. Biol. Chem. 258, 3258-3265). The native form of the enzyme is an alpha 2 beta 2 tetramer. Polyclonal antibodies prepared against each enzyme react with both the alpha and beta subunits of the homologous enzyme and cross-react with both subunits of the heterologous enzyme. Reaction of polyclonal antibodies with proteins resolved by sodium dodecyl sulfate-polyacrylamide gel electrophoresis establishes that no significant difference in subunit molecular weight exists between the purified enzymes and the enzyme present in initial cell extracts. Each antibody effectively inhibits the in vitro activity of the homologous enzyme and causes a slight inhibition in the activity of the heterologous enzyme. Peptide maps derived from purified subunits indicate that the alpha and beta subunits are unique and that there is extensive primary sequence homology between the corresponding subunits of the calf and Drosophila enzyme. Casein kinase II from both sources phosphorylates the same subunits of calf thymus RNA polymerase II and an identical set of proteins in a complex mixture of acid-soluble proteins from Drosophila tissue culture cells. The striking similarity in molecular structure and catalytic activity between the calf and Drosophila enzyme suggests that casein kinase II has been highly conserved in evolution.
View details for Web of Science ID A1984TB56300053
View details for PubMedID 6589223
-
RAPID SEARCHES FOR COMPLEX PATTERNS IN BIOLOGICAL MOLECULES
NUCLEIC ACIDS RESEARCH
1984; 12 (1): 263-280
Abstract
The intrinsic redundancy of genetic information makes searching for patterns in biological sequences a difficult task. We have designed an interactive self-documenting computer program called QUEST that allows rapid searching of large DNA and protein data banks for highly redundant consensus sequences or character patterns. QUEST uses a concise language for specifying character patterns containing several levels of ambiguity and pattern arrangement. Examples of the use of this program for sequence data are given. Details of the algorithm and pattern optimization are explained.
View details for Web of Science ID A1984SA44700026
View details for PubMedID 6546419
-
DNA TOPOISOMERASE-II FROM DROSOPHILA-MELANOGASTER - RELAXATION OF SUPERCOILED DNA
JOURNAL OF BIOLOGICAL CHEMISTRY
1983; 258 (15): 9536-9543
Abstract
In order to study the double-strand DNA passage reaction of eukaryotic type II topoisomerases, a quantitative assay to monitor the enzymic conversion of supercoiled circular DNA to relaxed circular DNA was developed. Under conditions of maximal activity, relaxation catalyzed by the Drosophila melanogaster topoisomerase II was processive and the energy of activation was 14.3 kcal . mol-1. Removal of supercoils was accompanied by the hydrolysis of either ATP or dATP to inorganic phosphate and the corresponding nucleoside diphosphate. Apparent Km values were 200 microM for pBR322 plasmid DNA, 140 microM for SV40 viral DNA, 280 microM for ATP, and 630 microM for dATP. The turnover number for the Drosophila enzyme was at least 200 supercoils of DNA relaxed/min/molecule of topoisomerase II. The enzyme interacts preferentially with negatively supercoiled DNA over relaxed molecules, is capable of removing positive superhelical twists, and was found to be strongly inhibited by single-stranded DNA. Kinetic and inhibition studies indicated that the beta and gamma phosphate groups, the 2'-OH of the ribose sugar, and the C6-NH2 of the adenine ring are important for the interaction of ATP with the enzyme. While the binding of ATP to Drosophila topoisomerase II was sufficient to induce a DNA strand passage event, hydrolysis was required for enzyme turnover. The ATPase activity of the topoisomerase was stimulated 17-fold by the presence of negatively supercoiled DNA and approximately 4 molecules of ATP were hydrolyzed/supercoil removed. Finally, a kinetic model describing the switch from a processive to a distributive relaxation reaction is presented.
View details for Web of Science ID A1983RB88600076
View details for PubMedID 6308011
-
DNA TOPOISOMERASE-II FROM DROSOPHILA-MELANOGASTER - PURIFICATION AND PHYSICAL CHARACTERIZATION
JOURNAL OF BIOLOGICAL CHEMISTRY
1983; 258 (15): 9530-9535
Abstract
A type II DNA topoisomerase has been purified from the nuclei of Drosophila melanogaster 6- to 18-h-old embryos. The enzyme, as assayed by its ability to catenate supercoiled DNA, behaved as a single homogeneous species throughout the procedure and the yield was approximately 0.5 mg of protein/100 g of dechorionated embryos. The final product was entirely ATP-dependent and free of topoisomerase I, endonuclease and protease activities. The purified topoisomerase II had a Stokes radius of 69 A and a sedimentation coefficient (S20,w) of 9.2 S, leading to a calculated native molecular weight of approximately 261,000. The protein consists of a single polypeptide of molecular weight 166,000, as determined by electrophoresis on sodium dodecyl sulfate-polyacrylamide gels. Taken together with the above hydrodynamic studies, the Drosophila enzyme is probably a homodimer, as has been observed for other eukaryotic type II enzymes. Thus, it appears that during the course of evolution the heterologous subunits which comprise bacterial type II topoisomerases have been combined into a single polypeptide chain in eukaryotes.
View details for Web of Science ID A1983RB88600075
View details for PubMedID 6308010
-
PURIFICATION AND CHARACTERIZATION OF A TYPE-II CASEIN KINASE FROM DROSOPHILA-MELANOGASTER
JOURNAL OF BIOLOGICAL CHEMISTRY
1983; 258 (5): 3258-3265
Abstract
A cyclic nucleotide-independent protein kinase has been isolated from Drosophila melanogaster by chromatography on phosphocellulose and hydroxylapatite followed by gel filtration and glycerol gradient sedimentation. As determined by sodium dodecyl sulfate gel electrophoresis, the purified enzyme is greater than 95% homogeneous and is composed of two distinct subunits, alpha and beta, having Mr = 36,700 and 28,200, respectively. The native form of the enzyme is an alpha 2 beta 2 tetramer having a Stokes radius of 48 A, a sedimentation coefficient of 6.4 S, and Mr approximately 130,000. The purified kinase undergoes an autocatalytic reaction resulting in the specific phosphorylation of the beta subunit, exhibits a low apparent Km for both ATP and GTP as nucleoside triphosphate donor (17 and 66 microM, respectively), phosphorylates both casein and phosvitin but neither histones nor protamine, modifies both serine and threonine residues in casein, and is strongly inhibited by heparin (I50 = 21 ng/ml). These properties are remarkably similar to those of casein kinase II, an enzyme previously described in several mammalian and avian species. The strong similarities among the insect, avian, and mammalian enzymes suggest that casein kinase II has been highly conserved during evolution.
View details for Web of Science ID A1983QE92000082
View details for PubMedID 6298230
-
MAXAMIZE - A DNA SEQUENCING STRATEGY ADVISOR
NUCLEIC ACIDS RESEARCH
1982; 10 (1): 295-304
Abstract
The MAXAMIZE advisory system determines from user-provided restriction maps an optimal strategy to do nucleotide sequencing by methods involving end-labeled fragments. The maps may be either simple linear restriction maps of fragments or complex circular maps including restriction sites of a vector. The whole system is interactive and is written in the Genetic English language provided by the GENESIS System, a molecular genetics knowledge representation and manipulation package. In addition, MAXAMIZE provides bookkeeping facilities for sequencing and offers advise on how to verify the newly obtained sequence data.
View details for Web of Science ID A1982MX56500027
View details for PubMedID 6278407
-
GENESIS, A KNOWLEDGE-BASED GENETIC-ENGINEERING SIMULATION SYSTEM FOR REPRESENTATION OF GENETIC DATA AND EXPERIMENT PLANNING
NUCLEIC ACIDS RESEARCH
1982; 10 (1): 323-340
Abstract
We have built a knowledge-based genetic engineering simulation system-- GENESIS-- capable of representing both domain-specific and general knowledge. Information is stored within a hierarchically-organized framework composed of structures called units. A series of sophisticated editors enables no-computer specialist molecular geneticists to construct a knowledge base through direct interaction with the computer. Three types of knowledge specific to the domain of molecular genetics, MAPS, sequences and RULES are discussed in detail with examples.
View details for Web of Science ID A1982MX56500029
View details for PubMedID 6950365
-
SEQ - A NUCLEOTIDE-SEQUENCE ANALYSIS AND RECOMBINATION SYSTEM
NUCLEIC ACIDS RESEARCH
1982; 10 (1): 279-294
Abstract
SEQ is an interactive, self-documenting computer program that contains procedures for the analysis of nucleotide sequences and the manipulation of such sequences to allow the simulation and prediction of the results of recombinant DNA experiments.
View details for Web of Science ID A1982MX56500026
View details for PubMedID 7063402
-
RIBONUCLEIC-ACID AND OTHER POLYANIONS FACILITATE CHROMATIN ASSEMBLY INVITRO
BIOCHEMISTRY
1981; 20 (9): 2594-2601
Abstract
Crude extracts of Drosophila embryos are a rich source of both DNA topoisomerase I and chromatin assembly activity [Nelson, T., Hsieh, T., & Brutlag, D.L. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 5510-5514; Hseih, T., & Brutlag, D. L. (1980) Cell (Cambridge, Mass.) 21, 115-125]. Purified topoisomerase I from Drosophila embryos, however, is not sufficient for chromatin assembly. Rather, the ability of Drosophila embryo extracts to mediate chromatin assembly in vitro requires an anionic fraction which we demonstrate to be RNA. Exogenous natural and homopolymer RNAs, if of sufficient length, can also mediate chromatin assembly in vitro. The RNA acts stoichiometrically in assembly, being required in amounts at least equal in weight to the amount of histones present. Natural and homopolymer DNAs, whether single or double stranded, are inactive under the same conditions. The arginine-rich histones H3 and H4 or histone H4 alone is sufficient to produce nucleoprotein complexes with physiological numbers of supertwists in the DNA. Complexes containing these subsets of the core histones also resemble assembled complexes containing all four core histones with respect to some patterns of nuclease sensitivity, although complexes containing all four core histones more closely resemble native chromatin in nuclease digestions.
View details for Web of Science ID A1981LN09600035
View details for PubMedID 6165383
-
HISTONE ACETYLASE FROM DROSOPHILA-MELANOGASTER SPECIFIC FOR H-4
JOURNAL OF BIOLOGICAL CHEMISTRY
1981; 256 (9): 4578-4583
Abstract
Histone acetylation is a rapid and reversible modification which introduces significant changes in histone-DNA interactions. Such changes have been correlated with different states of DNA transcription and replication in the cell. We have purified a histone acetylase about 1200-fold from extracts of Drosophila melanogaster embryos. Major steps in the purification include chromatography on histone-Sepharose and Bio-Rex 70. This enzyme, the only histone acetylase detected in these extracts, acetylates only histone H4. All of the acetate groups are introduced within the NH2-terminal amino acids 4 to 17. This 14-residue peptide contains the four lysines which are acetylated in vivo. The acetylase is inhibited by its substrate, histone H4, and by several highly charged polymers including polylysine, polyarginine, DNA, RNA, and polyglutamic acid. It is not inhibited by polyethyleneimine, spermine, or the other histones H2A, H2B, H3, or H1. The enzyme does not acetylate H4 which is in chromatin. This enzyme is most likely involved in the acetylation of newly synthesized histones in the cytoplasm prior to chromatin assembly.
View details for Web of Science ID A1981LP45500071
View details for PubMedID 6783663
-
MOLECULAR ARRANGEMENT AND EVOLUTION OF HETEROCHROMATIC DNA
ANNUAL REVIEW OF GENETICS
1980; 14: 121-144
View details for Web of Science ID A1980KU80900006
View details for PubMedID 6260016
-
ATP-DEPENDENT DNA TOPOISOMERASE FROM DROSOPHILA-MELANOGASTER REVERSIBLY CATENATES DUPLEX DNA RINGS
CELL
1980; 21 (1): 115-125
Abstract
Extracts of Drosophila embryos contain an enzymatic activity that converts circular DNAs into huge networks of catenated rings in an ATP-dependent fashion. The catenated activity is resolved into two protein components during purification. One component is a novel DNA topoisomerase that requires the presence of ATP in order to relax supercoiled DNA. We have shown that the ATP-dependent DNA topoisomerase relaxes DNA by a mechanism distinct from that of nicking-closing enzymes. The Drosophila ATP-dependent topoisomerase allows one segment of a circular DNA to pass through transient breaks in both strands at another site on the DNA circle without any relative rotation between the ends at the transient break. This mechanism can convert negative supertwists to positive twists and vice versa until a relaxed equilibrium state is reached. The formation of catenated rings is mediated by an analogous bimolecular reaction which can occur between two nonhomologous DNA circles. The catenation reaction is fully reversible: in the presence of the second protein component, circular DNA is converted quantitatively into catenated forms; in its absence, the ATP-dependent topoisomerase resolves catenated networks back into monomer circles. The Drosophila ATP-dependent topoisomerase appears to be closely related to E. coli DNA gyrase in that both use a similar mechanism to change the topology of DNA, both require ATP and both are inhibited by the antibiotic novobiocin. The presence of an enzyme that allows one DNA helix to pass freely through another could not only be useful in relaxation of topological constraints, but also may be involved in the folding and unfolding of eucaryotic chromosomes.
View details for Web of Science ID A1980KE07200011
View details for PubMedID 6250707
-
Addition of homopolymers to the 3'-ends of duplex DNA with terminal transferase.
Methods in enzymology
1979; 68: 41-50
View details for PubMedID 542126
-
EXTRACTS OF DROSOPHILA EMBRYOS MEDIATE CHROMATIN ASSEMBLY INVITRO
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1979; 76 (11): 5510-5514
Abstract
Extracts of Drosophila embryos can mediate the assembly of a chromatinlike structure from histones and DNA under physiological conditions. The histone-DNA complex formed in vitro contains micrococcal nuclease-sensitive sites spaced at 200-base pair intervals. More extensive digestion of the complex by micrococcal nuclease generates 11S particles which cosediment with nucleosome core particles isolated from native chromatin. These particles contain 140-base pair DNA fragments which upon further cleavage with micrococcal nuclease give rise to a pattern of discretely sized DNA fragments characteristic of nucleosome core particles. We have assayed the chromatin assembly process both qualitatively by measuring the induction of supertwists into a relaxed circular DNA (a process requiring a nicking-closing enzyme) and quantitatively by measuring the formation of micrococcal nuclease-resistant DNA fragments from radioactively labeled linear DNA. The amount of chromatin formed depends primarily on the amount of histones, whereas the rate of assembly depends on the amount of extract protein added. The factors in the extract that mediate chromatin assembly appear to interact first with the DNA because preincubation of the DNA with the extract markedly increases the extent of assembly.
View details for Web of Science ID A1979HW11500022
View details for PubMedID 118449
-
PROTEIN THAT PREFERENTIALLY BINDS DROSOPHILA SATELLITE DNA
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1979; 76 (2): 726-730
Abstract
Using a nitrocellulose filter binding assay, we have detected and partially purified a protein from embryos of Drosophila melanogaster that preferentially binds to a highly repeated satellite DNA of the same species. Formation of the satellite DNA-protein complex requires physiological conditions of salt and temperature, but once formed, the complex is stable in high salt (1 M NaCl) or at low temperature. Optimal formation of the specific complex also requires the satellite DNA to be in a supertwisted conformation. The protein interacts with a limited region within the 359-base-pair repeated sequence of the satellite DNA.
View details for Web of Science ID A1979GL33900039
View details for PubMedID 106393
-
SEQUENCE AND SEQUENCE VARIATION WITHIN THE 1.688 G-CM3 SATELLITE DNA OF DROSOPHILA-MELANOGASTER
JOURNAL OF MOLECULAR BIOLOGY
1979; 135 (2): 465-481
View details for Web of Science ID A1979JA28200009
View details for PubMedID 231676
-
DETECTION AND RESOLUTION OF CLOSELY RELATED SATELLITE DNA-SEQUENCES BY MOLECULAR-CLONING
JOURNAL OF MOLECULAR BIOLOGY
1979; 135 (3): 581-593
View details for Web of Science ID A1979HZ70300004
View details for PubMedID 119872
-
DIFFERENT REGIONS OF A COMPLEX SATELLITE DNA VARY IN SIZE AND SEQUENCE OF THE REPEATING UNIT
JOURNAL OF MOLECULAR BIOLOGY
1979; 135 (2): 483-500
View details for Web of Science ID A1979JA28200010
View details for PubMedID 231677
-
SEQUENCES OF THE 1.672G-CM3 SATELLITE DNA OF DROSOPHILA-MELANOGASTER
JOURNAL OF MOLECULAR BIOLOGY
1979; 135 (3): 565-580
View details for Web of Science ID A1979HZ70300003
View details for PubMedID 119871
-
NICKING-CLOSING ENZYME ASSEMBLES NUCLEOSOME-LIKE STRUCTURES INVITRO
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1979; 76 (8): 3779-3783
Abstract
The four core histones (H2A, H2B, H3, and H4) and DNA were assembled into nucleosome-like particles at physiological ionic strengths either by an extract of chromatin rich in nicking-closing activity or by the purified nicking-closing enzyme itself. When histone-DNA complexes were assembled in vitro from relaxed circular DNA, nearly physiological numbers of superhelical turns were induced in the DNA molecule. Electron microscopy of the complexes assembled by the chromatin extract revealed a beaded structure and a reduction of the contour length compared to free DNA. Micrococcal nuclease digestion of the histone-DNA complexes yielded 145-base-pair DNA fragments typical of nucleosome core particles and shorter subnucleosomal DNA fragments of discrete length.
View details for Web of Science ID A1979HJ25800045
View details for PubMedID 226980
-
A gene adjacent to satellite DNA in Drosophila melanogaster.
Proceedings of the National Academy of Sciences of the United States of America
1978; 75 (12): 5898-5902
Abstract
Several copies of a sequence adjacent to 1.688 g/cm3 satellite DNA in the Drosophila melanogaster genome have been isolated by molecular cloning. This sequence, called the Dm142 gene, is homologous to a 1.6-kilobase RNA found in both D. melanogaster embryos and tissue culture cells. One cloned DNA segment includes two copies of the Dm142 gene and 1.688 g/cm3 satellite DNA sequences, which are located between and flanking both gene copies. The Dm142 gene is repeated many times in the D. melanogaster genome, and some copies are not flanked by 1,688 g/cm3 satellite DNA.
View details for PubMedID 104294
View details for PubMedCentralID PMC393083
-
GENE ADJACENT TO SATELLITE DNA IN DROSOPHILA-MELANOGASTER
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1978; 75 (12): 5989-5902
View details for Web of Science ID A1978GC25200035
-
DNA sequence organization in Drosophila heterochromatin.
Cold Spring Harbor symposia on quantitative biology
1978; 42: 1137-1146
View details for PubMedID 98265
-
ONE OF COPIA GENES IS ADJACENT TO SATELLITE DNA IN DROSOPHILA-MELANOGASTER
CELL
1978; 15 (3): 733-742
Abstract
A method for purifying sequences adjacent to satellite DNA in the heterochromatin of D. melanogaster is described. A cloned DNA segment containing part of a copia gene adjacent to 1.688 g/cm3 satellite DNA has been isolated. The copia genes compose a repeated gene family which codes for abundant cytoplasmic poly(a)-containing RNA (Young and Hogness, 1977; Finnegan et al., 1978). We have identified two major poly (A)-containing RNA species [5.2 and 2.1 kilobases (kb)] produced by the copia gene family. The cloned segment contains copia sequences homologous to the 5' end of RNA within 0.65 kb of the 1.688 satellite DNA sequences. Seven different cloned copia genes from elsewhere in the genome have also been isolated, and a 5.2 kb region present in five of the clones was identified as copia by heteroduplex analysis. In addition, three ususual copies of copia were found: a "partial" copy of the gene (3.7 kb) which has one endpoint in common with the 5.2 kb unit; a copia gene flanked on one side by a 1.6 kb sequence and on the other by the same 1.6 kb sequence in the inverted orientation; and a copia gene flanked only on one side by the same sequence.
View details for Web of Science ID A1978FX85700003
View details for PubMedID 103627
-
HIGHLY REPEATED DNA IN DROSOPHILA-MELANOGASTER
JOURNAL OF MOLECULAR BIOLOGY
1977; 112 (1): 31-47
View details for Web of Science ID A1977DH07000002
View details for PubMedID 407366
-
CLONING AND CHARACTERIZATION OF A COMPLEX SATELLITE DNA FROM DROSOPHILA-MELANOGASTER
CELL
1977; 11 (2): 371-381
View details for Web of Science ID A1977DL69200014
View details for PubMedID 408008
-
MITOCHONDRIAL-DNA OF DROSOPHILA-MELANOGASTER EXISTS IN 2 DISTINCT AND STABLE SUPERHELICAL FORMS
CELL
1977; 12 (2): 471-482
Abstract
We have studied the structure and replication of mitochondrial DNA from Drosophila melanogaster embryos, larvae, adult flies and two established tissue culture lines. The most striking observation is that the organism maintains at least two stable, distinct closed circular forms of mitochondrial DNA throughout development of the early embryo and in the adult fly. The major closed circular monomeric form comprises approximately 75% of the population and has a normal number of superhelical turns. In contrast, closed circular mitochondrial DNA isolated from Drosophila tissue culture cells is comprised almost entirely of molecules with the low superhelix density. We have been unable to detect the D loop form of mitochondrial DNA present in other eucaryotic systems, and find by electron microscope and pulse-chase labeling techniques that the time required to replicate Drosophila mitochondrial DNA is very short (less than 15 min) compared to the mouse L cell system (greater than 1 hr). We conclude that Drosophila mitochondrial DNA utilizes a replication mechanism different from that of other higher eucaryotes. We postulate that the maintenance of markedly different topological forms of mitochondrial DNA is most probably related to different demands for replication and transcription of the genome in these sources.
View details for Web of Science ID A1977DY09600015
View details for PubMedID 410503
-
SYNTHESIS OF HYBRID BACTERIAL PLASMIDS CONTAINING HIGHLY REPEATED SATELLITE DNA
CELL
1977; 10 (3): 509-519
Abstract
Hybrid plasmid molecules containing tandemly repeated Drosophila satellite DNA were constructed using a modification of the (dA)-(dT) homopolymer procedure of Lobban and Kaiser (1973). Recombinant plasmids recovered after transformation of recA bacteria contained 10% of the amount of satellite DNA present in the transforming molecules. The cloned plasmids were not homogenous in size. Recombinant plasmids isolated from a single colony contained populations of circular molecules which varied both in the length of the satellite region and in the poly(dA)-(dt) regions linking satellite and vector. While subcloning reduced the heterogeneity of these plasmid populations, continued cell growth caused further variations in the size of the repeated regions. Two different simple sequence satellites of Drosophila melanogaster (1.672 and 1.705 g/cm3) were unstable in both recA and recBC hosts and in both pSC101 and pCR1 vectors. We propose that this recA-independent instability of tandemly repeated sequences is due to unequal intramolecular recombination events in replicating DNA molecules, a mechanism analogous to sister chromatid exchange in eucaryotes.
View details for Web of Science ID A1977CZ96300018
View details for PubMedID 403010
-
DNA-SEQUENCE ORGANIZATION IN DROSOPHILA HETEROCHROMATIN
COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY
1977; 42: 1137-1146
View details for Web of Science ID A1977FK40800052
-
The organization of highly repeated DNA sequences in Drosophila melanogaster chromosomes.
Cold Spring Harbor symposia on quantitative biology
1974; 38: 405-416
View details for PubMedID 4133985
-
Initiation of deoxyribonucleic acid synthesis. IV. Incorporation of the ribonucleic acid primer into the phage replicative form.
journal of biological chemistry
1973; 248 (4): 1361-1364
View details for PubMedID 4568814
-
ORGANIZATION OF HIGHLY REPEATED DNA SEQUENCES IN DROSOPHILA-MELANOGASTER CHROMOSOMES
COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY
1973; 38: 405-416
View details for Web of Science ID A1973S874900041
-
INITIATION OF DEOXYRIBONUCLEIC ACID SYNTHESIS .4. INCORPORATION OF RIBONUCLEIC-ACID PRIMER INTO PHAGE REPLICATIVE FORM
JOURNAL OF BIOLOGICAL CHEMISTRY
1973; 248 (4): 1361-1364
View details for Web of Science ID A1973O892500035
-
DEOXYRIBONUCLEIC ACID POLYMERASE - 2 DISTINCT ENZYMES IN ONE POLYPEPTIDE .1. PROTEOLYTIC FRAGMENT CONTAINING POLYMERASE AND 3' -] 5' EXONUCLEASE FUNCTIONS
JOURNAL OF BIOLOGICAL CHEMISTRY
1972; 247 (1): 224-?
View details for Web of Science ID A1972L468000029
View details for PubMedID 4552924
-
ENZYMATIC-SYNTHESIS OF DEOXYRIBONUCLEIC ACID .36. PROOF-READING FUNCTION FOR 3' -] 5' EXONUCLEASE ACTIVITY IN DEOXYRIBONUCLEIC ACID POLYMERASES
JOURNAL OF BIOLOGICAL CHEMISTRY
1972; 247 (1): 241-?
View details for Web of Science ID A1972L468000031
View details for PubMedID 4336040
-
RNA SYNTHESIS INITIATES IN-VITRO CONVERSION OF M13 DNA TO ITS REPLICATIVE FORM
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1972; 69 (4): 965-?
Abstract
Soluble enzyme fractions from uninfected Escherichia coli convert M13 and varphiX174 viral single strands to their double-stranded replicative forms. Rifampicin, an inhibitor of RNA polymerase, blocks conversion of M13 single strands to the replicative forms in vivo and in vitro. However, rifampicin does not block synthesis of the replicative forms of varphiX174 either in vivo or in soluble extracts. The replicative form of M13 synthesized in vitro consists of a full-length, linear, complementary strand annealed to a viral strand. The conversion of single strands of M13 to the replicative form proceeds in two separate stages. The first stage requires enzymes, ribonucleoside triphosphates, and single-stranded DNA; the reaction is inhibited by rifampicin. The macromolecular product separated at this stage supports DNA synthesis with deoxyribonucleoside triphosphates and a fresh addition of enzymes; ribonucleoside triphosphates are not required in this second stage nor does rifampicin inhibit the reaction. We presume that in the first stage there is synthesis of a short RNA chain, which then primes the synthesis of a replicative form by a DNA polymerase.
View details for Web of Science ID A1972M193300045
View details for PubMedID 4554537
-
INITIATION OF DNA-SYNTHESIS .3. SYNTHESIS OF PHIX174 REPLICATIVE FORM REQUIRES RNA SYNTHESIS RESISTANT TO RIFAMPICIN
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1972; 69 (9): 2691-?
Abstract
Conversion of single-stranded DNA of phage varphiX174 to the double-stranded replicative form in Escherichia coli uses enzymes essential for initiation and replication of the host chromosome. These enzymes can now be purified by the assay that this phage system provides. The varphiX174 conversion is distinct from that of M13. The reaction requires different host enzymes and is resistant to rifampicin and streptolydigin, inhibitors of RNA polymerase. However, RNA synthesis is essential for varphiX174 DNA synthesis: the reaction is inhibited by low concentrations of actinomycin D, all four ribonucleoside triphosphates are required, and an average of one phosphodiester bond links DNA to RNA in the isolated double-stranded circles. Thus, we presume that, as in the case of M13, synthesis of a short RNA chain primes the synthesis of a replicative form by DNA polymerase. Initiation of DNA synthesis by RNA priming is a mechanism of wide significance.
View details for Web of Science ID A1972N512500079
View details for PubMedID 4560696
-
POSSIBLE ROLE FOR RNA POLYMERASE IN INITIATION OF M13 DNA SYNTHESIS
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1971; 68 (11): 2826-?
Abstract
The conversion of single-stranded DNA of bacteriophage M13 to the double-stranded replicative form in Escherichia coli is blocked by rifampicin, an antibiotic that specifically inhibits the host-cell RNA polymerase. Chloramphenicol, an inhibitor of protein synthesis, does not block this conversion. The next stage in phage DNA replication, multiplication of the doublestranded forms, is also inhibited by rifampicin; chloramphenicol, although inhibitory, has a much smaller effect. An E. coli mutant whose RNA polymerase is resistant to rifampicin action does not show inhibition of M13 DNA replication by rifampicin. These findings indicate that a specific rifampicin-RNA polymerase interaction is responsible for blocking new DNA synthesis. It now seems plausible that RNA polymerase has some direct role in the initiation of DNA replication, perhaps by forming a primer RNA that serves for covalent attachment of the deoxyribonucleotide that starts the new DNA chain.
View details for Web of Science ID A1971K842500050
View details for PubMedID 4941987
-
PROPERTIES OF FORMALDEHYDE-TREATED NUCLEOHISTONE
BIOCHEMISTRY
1969; 8 (8): 3214-?
View details for Web of Science ID A1969D883000013
View details for PubMedID 5809221
-
AN ACTIVE FRAGMENT OF DNA POLYMERASE PRODUCED BY PROTEOLYTIC CLEAVAGE
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
1969; 37 (6): 982-?
View details for Web of Science ID A1969E770800018
View details for PubMedID 4982877
-
Properties of chromosomal bonhistone protein of rat liver.
Biochemistry
1968; 7 (9): 3149-3155
View details for PubMedID 5684341
-
PROPERTIES OF CHROMOSOMAL NONHISTONE PROTEIN OF RAT LIVER
BIOCHEMISTRY
1968; 7 (9): 3149-?
View details for Web of Science ID A1968B745500018