1977 B.A, Chemistry and Biology, University of Rochester, NY
1978-1982 Ph.D. California Institute of Technology, CA Advisor: Dr. Norman Davidson
1982-1986 Postdoctoral Research Stanford University School of Medicine, CA Advisor: Dr. Ronald Davis
1986-2009 Faculty Dept of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT
2009-present Dept of Genetics, Stanford University School of Medicine, Stanford, CA
Chair, Dept. of Genetics (2009 - Present)
Director, Center for Genomics and Personalized Medicine (2009 - Present)
Current Research and Scholarly Interests
We are presently in an omics revolution in which genomes and other omes can be readily characterized. Our laboratory uses a variety of approaches to analyze genomes and regulatory networks. Our research focuses on yeast, an ideal model organism ideally suited to genetic analysis, and humans.
To annotate genomes, we developed RNA sequencing for annotation the yeast and human transcriptomes. We discovered that the eukaryotic transcriptome is much more complex than previously appreciated and that embryonic stem cells have more transcript isoforms than differentiated cells.
2) Transcription Factor Binding Networks
We have also developed methods for mapping transcription factor binding sites through the genome. We used this to develop regulatory maps and have been using this to help decipher the combinatorial regulatory code which factors work together to regulate which genes. Using this approach we have mapped out pathways crucial for metabolism and inflammation.
3) Integrated Regulatory Networks
In addition to transcriptional factor binding networks we have also been mapping phosphorylation and metabolite-protein interaction networks. These studies have revealed novel global regulators and key points in integrated regulatory networks.
We have been analyzing differences between individuals and species at two levels: DNA sequence variation and regulatory information variations. We developed paired end sequencing for humans and found that humans have extensive structural variation (SV), i.e. deletions, insertions and inversions. This is likely to be a major cause of phenotypic variation and human disease. In addition, by mapping binding sites difference among different yeast strains and humans, we have found that individuals differ much more in their regulatory information than in coding sequence differences. We can correlate these differences with those in SNPS and SVs, thereby associating noncoding DNA differences with regulatory information.
5) Human Disease
Finally, we are applying omics approaches of genome sequencing, transcriptomics proteomics metabolomics, DNA methylation and microbiome assays to the analysis of human disease. These integrative omics approaches are being applied to help understand the molecular basis of disease and the development of diagnostics and therapeutics.
Understanding and Diagnosing Allergic Disease in Twins
The purpose of this study is to gain better understanding of how the immune system works in twins with and without allergic disease. Healthy volunteers are not specifically targeted. Healthy non-allergic study participants may be found through the course of evaluation for the presence of allergies.
Independent Studies (18)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Genetics
GENE 299 (Aut, Win, Spr, Sum)
- Directed Reading in Immunology
IMMUNOL 299 (Win, Spr)
- Directed Reading in Stem Cell Biology and Regenerative Medicine
STEMREM 299 (Aut, Win, Spr)
- Early Clinical Experience in Immunology
IMMUNOL 280 (Win, Spr)
- Graduate Research
GENE 399 (Aut, Win, Spr, Sum)
- Graduate Research
IMMUNOL 399 (Win, Spr)
- Graduate Research
STEMREM 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
GENE 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
STEMREM 370 (Aut, Win, Spr)
- Out-of-Department Advanced Research Laboratory in Experimental Biology
BIO 199X (Aut, Win, Spr)
- Supervised Study
GENE 260 (Aut, Win, Spr, Sum)
- Teaching in Immunology
IMMUNOL 290 (Win, Spr)
- Undergraduate Research
GENE 199 (Aut, Win, Spr, Sum)
- Undergraduate Research
IMMUNOL 199 (Win, Spr)
- Undergraduate Research
STEMREM 199 (Aut, Win, Spr)
- Biomedical Informatics Teaching Methods
Doctoral Dissertation Reader (AC)
Postdoctoral Faculty Sponsor
Charles Abbott, Sara Ahadi, Gireesh Bogu, Alessandra Breschi, Can Cenik, James Chappell, Justin Chen, Songjie Chen, Jessilyn Dunn, Jijuan Gu, Daniel Hornburg, Aaron Horning, Jingga Inlora, Chao Jiang, Brian Johnson, Ryan Kellogg, Samuel Lancaster, Brittany Lee, Hayan Lee, Xiao Li, Liang Liang, Andrew Lipchik, Qing Liu, David Marciano, Tejaswini Mishra, Emma Monte, Anil Narasimha, Baoxu Pang, Jeniffer Quijada, Ashwin Ram, Morteza Roodgar, Mohammad Reza Sailani, Donald Sharon, Ming Shian Tsai, Kevin Van Bortle, Ting Wang, Si Wu, Sai Zhang, Bingqing Zhao, Wenyu Zhou
Fetal de novo mutations and preterm birth.
2017; 13 (4)
Preterm birth (PTB) affects ~12% of pregnancies in the US. Despite its high mortality and morbidity, the molecular etiology underlying PTB has been unclear. Numerous studies have been devoted to identifying genetic factors in maternal and fetal genomes, but so far few genomic loci have been associated with PTB. By analyzing whole-genome sequencing data from 816 trio families, for the first time, we observed the role of fetal de novo mutations in PTB. We observed a significant increase in de novo mutation burden in PTB fetal genomes. Our genomic analyses further revealed that affected genes by PTB de novo mutations were dosage sensitive, intolerant to genomic deletions, and their mouse orthologs were likely developmentally essential. These genes were significantly involved in early fetal brain development, which was further supported by our analysis of copy number variants identified from an independent PTB cohort. Our study indicates a new mechanism in PTB occurrence independently contributed from fetal genomes, and thus opens a new avenue for future PTB research.
View details for DOI 10.1371/journal.pgen.1006689
View details for PubMedID 28388617
De novo and rare mutations in the HSPA1L heat shock gene associated with inflammatory bowel disease
Inflammatory bowel disease (IBD) is a chronic, relapsing inflammatory disease of the gastrointestinal tract which includes ulcerative colitis and Crohn's disease. Genetic risk factors for IBD are not well understood.We performed a family-based whole exome sequencing (WES) analysis on a core family (Family A) to identify potential causal mutations and then analyzed exome data from a Caucasian pediatric cohort (136 patients and 106 controls) to validate the presence of mutations in the candidate gene, heat shock 70 kDa protein 1-like (HSPA1L). Biochemical assays of the de novo and rare (minor allele frequency, MAF < 0.01) mutation variant proteins further validated the predicted deleterious effects of the identified alleles.In the proband of Family A, we found a heterozygous de novo mutation (c.830C > T; p.Ser277Leu) in HSPA1L. Through analysis of WES data of 136 patients, we identified five additional rare HSPA1L mutations (p.Gly77Ser, p.Leu172del, p.Thr267Ile, p.Ala268Thr, p.Glu558Asp) in six patients. In contrast, rare HSPA1L mutations were not observed in controls, and were significantly enriched in patients (P = 0.02). Interestingly, we did not find non-synonymous rare mutations in the HSP70 isoforms HSPA1A and HSPA1B. Biochemical assays revealed that all six rare HSPA1L variant proteins showed decreased chaperone activity in vitro. Moreover, three variants demonstrated dominant negative effects on HSPA1L and HSPA1A protein activity.Our results indicate that de novo and rare mutations in HSPA1L are associated with IBD and provide insights into the pathogenesis of IBD, and also expand our understanding of the roles of HSP70s in human disease.
View details for DOI 10.1186/s13073-016-0394-9
View details for Web of Science ID 000393834000001
View details for PubMedID 28126021
Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information.
2017; 15 (1)
A new wave of portable biosensors allows frequent measurement of health-related physiology. We investigated the use of these devices to monitor human physiological changes during various activities and their role in managing health and diagnosing and analyzing disease. By recording over 250,000 daily measurements for up to 43 individuals, we found personalized circadian differences in physiological parameters, replicating previous physiological findings. Interestingly, we found striking changes in particular environments, such as airline flights (decreased peripheral capillary oxygen saturation [SpO2] and increased radiation exposure). These events are associated with physiological macro-phenotypes such as fatigue, providing a strong association between reduced pressure/oxygen and fatigue on high-altitude flights. Importantly, we combined biosensor information with frequent medical measurements and made two important observations: First, wearable devices were useful in identification of early signs of Lyme disease and inflammatory responses; we used this information to develop a personalized, activity-based normalization framework to identify abnormal physiological signals from longitudinal data for facile disease detection. Second, wearables distinguish physiological differences between insulin-sensitive and -resistant individuals. Overall, these results indicate that portable biosensors provide useful information for monitoring personal activities and physiology and are likely to play an important role in managing health and enabling affordable health care access to groups traditionally limited by socioeconomic class or remote geography.
View details for DOI 10.1371/journal.pbio.2001402
View details for PubMedID 28081144
Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers.
Cell stem cell
In familial pulmonary arterial hypertension (FPAH), the autosomal dominant disease-causing BMPR2 mutation is only 20% penetrant, suggesting that genetic variation provides modifiers that alleviate the disease. Here, we used comparison of induced pluripotent stem cell-derived endothelial cells (iPSC-ECs) from three families with unaffected mutation carriers (UMCs), FPAH patients, and gender-matched controls to investigate this variation. Our analysis identified features of UMC iPSC-ECs related to modifiers of BMPR2 signaling or to differentially expressed genes. FPAH-iPSC-ECs showed reduced adhesion, survival, migration, and angiogenesis compared to UMC-iPSC-ECs and control cells. The "rescued" phenotype of UMC cells was related to an increase in specific BMPR2 activators and/or a reduction in inhibitors, and the improved cell adhesion could be attributed to preservation of related signaling. The improved survival was related to increased BIRC3 and was independent of BMPR2. Our findings therefore highlight protective modifiers for FPAH that could help inform development of future treatment strategies.
View details for DOI 10.1016/j.stem.2016.08.019
View details for PubMedID 28017794
Simul-seq: combined DNA and RNA sequencing for whole-genome and transcriptome profiling.
2016; 13 (11): 953-958
Paired DNA and RNA profiling is increasingly employed in genomics research to uncover molecular mechanisms of disease and to explore personal genotype and phenotype correlations. Here, we introduce Simul-seq, a technique for the production of high-quality whole-genome and transcriptome sequencing libraries from small quantities of cells or tissues. We apply the method to laser-capture-microdissected esophageal adenocarcinoma tissue, revealing a highly aneuploid tumor genome with extensive blocks of increased homozygosity and corresponding increases in allele-specific expression. Among this widespread allele-specific expression, we identify germline polymorphisms that are associated with response to cancer therapies. We further leverage this integrative data to uncover expressed mutations in several known cancer genes as well as a recurrent mutation in the motor domain of KIF3B that significantly affects kinesin-microtubule interactions. Simul-seq provides a new streamlined approach for generating comprehensive genome and transcriptome profiles from limited quantities of clinically relevant samples.
View details for DOI 10.1038/nmeth.4028
View details for PubMedID 27723755
- Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations NATURE GENETICS 2016; 48 (2): 117-125
Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome.
2016; 34 (1): 64-69
Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.
View details for DOI 10.1038/nbt.3416
View details for PubMedID 26655498
Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features.
2016; 7: 12474-?
Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs.
View details for DOI 10.1038/ncomms12474
View details for PubMedID 27527408
- Identification of Human Neuronal Protein Complexes Reveals Biochemical Activities and Convergent Mechanisms of Action in Autism Spectrum Disorders CELL SYSTEMS 2015; 1 (5): 361-374
- Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions CELL 2015; 162 (5): 1051-1065
Recurrent somatic mutations in regulatory regions of human cancer genomes.
2015; 47 (7): 710-716
Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus-specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.
View details for DOI 10.1038/ng.3332
View details for PubMedID 26053494
Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events
2015; 33 (7): 736-742
Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.
View details for DOI 10.1038/nbt.3242
View details for Web of Science ID 000358396100029
Comparison of the transcriptional landscapes between human and mouse tissues
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2014; 111 (48): 17224-17229
Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.
View details for DOI 10.1073/pnas.1413624111
View details for Web of Science ID 000345920800059
View details for PubMedID 25413365
View details for PubMedCentralID PMC4260565
- Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature 2014; 512 (7515): 400-405
- Comparative analysis of regulatory information and circuits across distant species. Nature 2014; 512 (7515): 453-456
- Defining a personal, allele-specific, and single-molecule long-read transcriptome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 2014; 111 (27): 9869-9874
- Clinical interpretation and implications of whole-genome sequencing. JAMA : the journal of the American Medical Association 2014; 311 (10): 1035-1045
Divergence in a master variator generates distinct phenotypes and transcriptional responses
GENES & DEVELOPMENT
2014; 28 (4): 409-421
Genetic basis of phenotypic differences in individuals is an important area in biology and personalized medicine. Analysis of divergent Saccharomyces cerevisiae strains grown under different conditions revealed extensive variation in response to both drugs (e.g., 4-nitroquinoline 1-oxide [4NQO]) and different carbon sources. Differences in 4NQO resistance were due to amino acid variation in the transcription factor Yrr1. Yrr1(YJM789) conferred 4NQO resistance but caused slower growth on glycerol, and vice versa with Yrr1(S96), indicating that alleles of Yrr1 confer distinct phenotypes. The binding targets of Yrr1 alleles from diverse yeast strains varied considerably among different strains grown under the same conditions as well as for the same strain under different conditions, indicating that distinct molecular programs are conferred by the different Yrr1 alleles. Our results demonstrate that genetic variations in one important control gene (YRR1), lead to distinct regulatory programs and phenotypes in individuals. We term these polymorphic control genes "master variators."
View details for DOI 10.1101/gad.228940.113
View details for Web of Science ID 000331616100009
View details for PubMedID 24532717
Integrated systems analysis reveals a molecular network underlying autism spectrum disorders.
Molecular systems biology
2014; 10: 774-?
Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.
View details for DOI 10.15252/msb.20145487
View details for PubMedID 25549968
- Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology 2014; 10 (12): 774-?
Extensive Variation in Chromatin States Across Humans
2013; 342 (6159): 750-752
The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.
View details for DOI 10.1126/science.1242510
View details for Web of Science ID 000326647600047
View details for PubMedID 24136358
A single-molecule long-read survey of the human transcriptome.
2013; 31 (11): 1009-1014
Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.
View details for DOI 10.1038/nbt.2705
View details for PubMedID 24108091
Dynamic trans-Acting Factor Colocalization in Human Cells
2013; 155 (3): 713-724
Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.
View details for DOI 10.1016/j.cell.2013.09.043
View details for Web of Science ID 000326571800023
View details for PubMedID 24243024
Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias.
journal of allergy and clinical immunology
2013; 132 (3): 656-664 e17
Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.
View details for DOI 10.1016/j.jaci.2013.06.013
View details for PubMedID 23830146
- Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY 2013; 132 (3): 656-?
Systematic functional regulatory assessment of disease-associated variants.
Proceedings of the National Academy of Sciences of the United States of America
2013; 110 (23): 9607-9612
Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.
View details for DOI 10.1073/pnas.1219099110
View details for PubMedID 23690573
Specific plasma autoantibody reactivity in myelodysplastic syndromes.
2013; 3: 3311-?
Increased autoantibody reactivity in plasma from Myelodysplastic Syndromes (MDS) patients may provide novel disease signatures, and possible early detection. In a two-stage study we investigated Immunoglobulin G reactivity in plasma from MDS, Acute Myeloid Leukemia post MDS patients, and a healthy cohort. In exploratory Stage I we utilized high-throughput protein arrays to identify 35 high-interest proteins showing increased reactivity in patient subgroups compared to healthy controls. In validation Stage II we designed new arrays focusing on 25 of the proteins identified in Stage I and expanded the initial cohort. We validated increased antibody reactivity against AKT3, FCGR3A and ARL8B in patients, which enabled sample classification into stable MDS and healthy individuals. We also detected elevated AKT3 protein levels in MDS patient plasma. The discovery of increased specific autoantibody reactivity in MDS patients, provides molecular signatures for classification, supplementing existing risk categorizations, and may enhance diagnostic and prognostic capabilities for MDS.
View details for DOI 10.1038/srep03311
View details for PubMedID 24264604
Extensive genetic variation in somatic human tissues
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (44): 18018-18023
Genetic variation between individuals has been extensively investigated, but differences between tissues within individuals are far less understood. It is commonly assumed that all healthy cells that arise from the same zygote possess the same genomic content, with a few known exceptions in the immune system and germ line. However, a growing body of evidence shows that genomic variation exists between differentiated tissues. We investigated the scope of somatic genomic variation between tissues within humans. Analysis of copy number variation by high-resolution array-comparative genomic hybridization in diverse tissues from six unrelated subjects reveals a significant number of intraindividual genomic changes between tissues. Many (79%) of these events affect genes. Our results have important consequences for understanding normal genetic and phenotypic variation within individuals, and they have significant implications for both the etiology of genetic diseases such as cancer and for immortalized cell lines that might be used in research and therapeutics.
View details for DOI 10.1073/pnas.1213736109
View details for Web of Science ID 000311149900070
View details for PubMedID 23043118
An integrated encyclopedia of DNA elements in the human genome
2012; 489 (7414): 57-74
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
View details for DOI 10.1038/nature11247
View details for Web of Science ID 000308347000039
View details for PubMedID 22955616
View details for PubMedCentralID PMC3439153
Architecture of the human regulatory network derived from ENCODE data
2012; 489 (7414): 91-100
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
View details for DOI 10.1038/nature11245
View details for Web of Science ID 000308347000042
View details for PubMedID 22955619
Linking disease associations with regulatory information in the human genome
2012; 22 (9): 1748-1759
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
View details for DOI 10.1101/gr.136127.111
View details for Web of Science ID 000308272800016
View details for PubMedID 22955986
Annotation of functional variation in personal genomes using RegulomeDB
2012; 22 (9): 1790-1797
As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.
View details for DOI 10.1101/gr.137323.112
View details for Web of Science ID 000308272800019
View details for PubMedID 22955989
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
2012; 22 (9): 1813-1831
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
View details for DOI 10.1101/gr.136184.111
View details for Web of Science ID 000308272800021
View details for PubMedID 22955991
View details for PubMedCentralID PMC3431496
Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes
2012; 148 (6): 1293-1307
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
View details for DOI 10.1016/j.cell.2012.02.009
View details for Web of Science ID 000301889500023
View details for PubMedID 22424236
View details for PubMedCentralID PMC3341616
- Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY 2012; 30 (3): 226-229
Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation
2012; 148 (1-2): 84-98
Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.
View details for DOI 10.1016/j.cell.2011.12.014
View details for Web of Science ID 000299540700016
View details for PubMedID 22265404
View details for PubMedCentralID PMC3339270
- Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY 2012; 30 (1): 78-U118
Dissecting phosphorylation networks: lessons learned from yeast
EXPERT REVIEW OF PROTEOMICS
2011; 8 (6): 775-786
Protein phosphorylation continues to be regarded as one of the most important post-translational modifications found in eukaryotes and has been implicated in key roles in the development of a number of human diseases. In order to elucidate roles for the 518 human kinases, phosphorylation has routinely been studied using the budding yeast Saccharomyces cerevisiae as a model system. In recent years, a number of technologies have emerged to globally map phosphorylation in yeast. In this article, we review these technologies and discuss how these phosphorylation mapping efforts have shed light on our understanding of kinase signaling pathways and eukaryotic proteomic networks in general.
View details for DOI 10.1586/EPR.11.64
View details for Web of Science ID 000297299000013
View details for PubMedID 22087660
Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF
2001; 409 (6819): 533-538
Proteins interact with genomic DNA to bring the genome to life; and these interactions also define many functional features of the genome. SBF and MBF are sequence-specific transcription factors that activate gene expression during the G1/S transition of the cell cycle in yeast. SBF is a heterodimer of Swi4 and Swi6, and MBF is a heterodimer of Mbpl and Swi6 (refs 1, 3). The related Swi4 and Mbp1 proteins are the DNA-binding components of the respective factors, and Swi6 mayhave a regulatory function. A small number of SBF and MBF target genes have been identified. Here we define the genomic binding sites of the SBF and MBF transcription factors in vivo, by using DNA microarrays. In addition to the previously characterized targets, we have identified about 200 new putative targets. Our results support the hypothesis that SBF activated genes are predominantly involved in budding, and in membrane and cell-wall biosynthesis, whereas DNA replication and repair are the dominant functions among MBF activated genes. The functional specialization of these factors may provide a mechanism for independent regulation of distinct molecular processes that normally occur in synchrony during the mitotic cell cycle.
View details for Web of Science ID 000166570500053
View details for PubMedID 11206552
Isolated Congenital Anosmia and CNGA2 Mutation.
2017; 7 (1): 2667-?
Isolated congenital anosmia (ICA) is a rare condition that is associated with life-long inability to smell. Here we report a genetic characterization of a large Iranian family segregating ICA. Whole exome sequencing in five affected family members and five healthy members revealed a stop gain mutation in CNGA2 (OMIM 300338) (chrX:150,911,102; CNGA2. c.577C > T; p.Arg193*). The mutation segregates in an X-linked pattern, as all the affected family members are hemizygotes, whereas healthy family members are either heterozygote or homozygote for the reference allele. cnga2 knockout mice are congenitally anosmic and have abnormal olfactory system physiology, additionally Karstensen et al. recently reported two anosmic brothers sharing a CNGA2 truncating variant. Our study in concert with these findings provides strong support for role of CNGA2 gene with pathogenicity of ICA in humans. Together, these results indicate that mutations in key olfactory signaling pathway genes are responsible for human disease.
View details for DOI 10.1038/s41598-017-02947-y
View details for PubMedID 28572688
Succinate and its G-protein-coupled receptor stimulates osteoclastogenesis.
2017; 8: 15621-?
The mechanism underlying bone impairment in patients with diabetes mellitus, a metabolic disorder characterized by chronic hyperglycaemia and dysregulation in metabolism, is unclear. Here we show the difference in the metabolomics of bone marrow stromal cells (BMSCs) derived from hyperglycaemic (type 2 diabetes mellitus, T2D) and normoglycaemic mice. One hundred and forty-two metabolites are substantially regulated in BMSCs from T2D mice, with the tricarboxylic acid (TCA) cycle being one of the primary metabolic pathways impaired by hyperglycaemia. Importantly, succinate, an intermediate metabolite in the TCA cycle, is increased by 24-fold in BMSCs from T2D mice. Succinate functions as an extracellular ligand through binding to its specific receptor on osteoclastic lineage cells and stimulates osteoclastogenesis in vitro and in vivo. Strategies targeting the receptor activation inhibit osteoclastogenesis. This study reveals a metabolite-mediated mechanism of osteoclastogenesis modulation that contributes to bone dysregulation in metabolic disorders.
View details for DOI 10.1038/ncomms15621
View details for PubMedID 28561074
Multi-platform analysis reveals a complex transcriptome architecture of a circovirus.
2017; 237: 37-46
In this study, we used Pacific Biosciences RS II long-read and Illumina HiScanSQ short-read sequencing technologies for the characterization of porcine circovirus type 1 (PCV-1) transcripts. Our aim was to identify novel RNA molecules and transcript isoforms, as well as to determine the exact 5'- and 3'-end sequences of previously described transcripts with single base-pair accuracy. We discovered a novel 3'-UTR length isoform of the Cap transcript, and a non-spliced Cap transcript variant. Additionally, our analysis has revealed a 3'-UTR isoform of Rep and two 5'-UTR isoforms of Rep' transcripts, and a novel splice variant of the longer Rep' transcript. We also explored two novel long transcripts, one with a previously identified splice site, and a formerly undetected mRNA of ORF3. Altogether, our methods have identified nine novel RNA molecules, doubling the size of PCV-1 transcriptome that had been known before. Additionally, our investigations revealed an intricate pattern of transcript overlapping, which might produce transcriptional interference between the transcriptional machineries of adjacent genes, and thereby may potentially play a role in the regulation of gene expression in circoviruses.
View details for DOI 10.1016/j.virusres.2017.05.010
View details for PubMedID 28549855
- Non-equivalence of Wnt and R-spondin ligands during Lgr5(+) intestinal stem-cell self-renewal NATURE 2017; 545 (7653): 238-?
intestinal stem-cell self-renewal.
2017; 545 (7653): 238-242
The canonical Wnt/β-catenin signalling pathway governs diverse developmental, homeostatic and pathological processes. Palmitoylated Wnt ligands engage cell-surface frizzled (FZD) receptors and LRP5 and LRP6 co-receptors, enabling β-catenin nuclear translocation and TCF/LEF-dependent gene transactivation. Mutations in Wnt downstream signalling components have revealed diverse functions thought to be carried out by Wnt ligands themselves. However, redundancy between the 19 mammalian Wnt proteins and 10 FZD receptors and Wnt hydrophobicity have made it difficult to attribute these functions directly to Wnt ligands. For example, individual mutations in Wnt ligands have not revealed homeostatic phenotypes in the intestinal epithelium-an archetypal canonical, Wnt pathway-dependent, rapidly self-renewing tissue, the regeneration of which is fueled by proliferative crypt Lgr5(+) intestinal stem cells (ISCs). R-spondin ligands (RSPO1-RSPO4) engage distinct LGR4-LGR6, RNF43 and ZNRF3 receptor classes, markedly potentiate canonical Wnt/β-catenin signalling, and induce intestinal organoid growth in vitro and Lgr5(+) ISCs in vivo. However, the interchangeability, functional cooperation and relative contributions of Wnt versus RSPO ligands to in vivo canonical Wnt signalling and ISC biology remain unknown. Here we identify the functional roles of Wnt and RSPO ligands in the intestinal crypt stem-cell niche. We show that the default fate of Lgr5(+) ISCs is to differentiate, unless both RSPO and Wnt ligands are present. However, gain-of-function studies using RSPO ligands and a new non-lipidated Wnt analogue reveal that these ligands have qualitatively distinct, non-interchangeable roles in ISCs. Wnt proteins are unable to induce Lgr5(+) ISC self-renewal, but instead confer a basal competency by maintaining RSPO receptor expression that enables RSPO ligands to actively drive and specify the extent of stem-cell expansion. This functionally non-equivalent yet cooperative interaction between Wnt and RSPO ligands establishes a molecular precedent for regulation of mammalian stem cells by distinct priming and self-renewal factors, with broad implications for precise control of tissue regeneration.
View details for DOI 10.1038/nature22313
View details for PubMedID 28467820
Histone variant H2A.J accumulates in senescent cells and promotes inflammatory gene expression
The senescence of mammalian cells is characterized by a proliferative arrest in response to stress and the expression of an inflammatory phenotype. Here we show that histone H2A.J, a poorly studied H2A variant found only in mammals, accumulates in human fibroblasts in senescence with persistent DNA damage. H2A.J also accumulates in mice with aging in a tissue-specific manner and in human skin. Knock-down of H2A.J inhibits the expression of inflammatory genes that contribute to the senescent-associated secretory phenotype (SASP), and over expression of H2A.J increases the expression of some of these genes in proliferating cells. H2A.J accumulation may thus promote the signalling of senescent cells to the immune system, and it may contribute to chronic inflammation and the development of aging-associated diseases.
View details for DOI 10.1038/ncomms14995
View details for Web of Science ID 000400886800001
View details for PubMedID 28489069
Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens
CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.
View details for DOI 10.1038/ncomms15178
View details for Web of Science ID 000400851300001
View details for PubMedID 28474669
A Case Report of Hypoglycemia and Hypogammaglobulinemia: DAVID syndrome in a patient with a novel NFKB2 mutation.
journal of clinical endocrinology and metabolism
DAVID syndrome (Deficient Anterior pituitary with Variable Immune Deficiency) is a rare disorder in which children present with symptomatic ACTH deficiency preceded by hypogammaglobulinemia from B-cell dysfunction with recurrent infections, termed common variable immunodeficiency (CVID). Subsequent whole exome sequencing studies have revealed germline heterozygous C-terminal mutations of NFKB2 as either a cause of DAVID syndrome or of CVID without clinical hypopituitarism. However, to the best of our knowledge there have been no cases in which the endocrinopathy has presented in the absence of a prior clinical history of CVID.A previously healthy 7 year-old boy with no history of clinical immunodeficiency, presented with profound hypoglycemia and seizures. He was found to have secondary adrenal insufficiency and was started on glucocorticoid replacement. An evaluation for autoimmune disease, including for anti-pituitary antibodies, was negative. Evaluation unexpectedly revealed hypogammaglobulinemia (decreased IgG, IgM, and IgA). He had moderately reduced serotype-specific IgG responses following pneumococcal polysaccharide vaccine. Subsequently, he was found to have growth hormone (GH) deficiency. Six years after initial presentation, whole exome sequencing revealed a novel de novo heterozygous NFKB2 missense mutation c.2596A>C (p.Ser866Arg) in the C-terminal region predicted to abrogate the processing of the p100 NFKB2 protein to its active p52 form.Isolated early-onset ACTH deficiency is rare and C-terminal region NFKB2 mutations should be considered as an etiology even in the absence of a clinical history of CVID. Early immunologic evaluation is indicated in the diagnosis and management of isolated ACTH deficiency.
View details for DOI 10.1210/jc.2017-00341
View details for PubMedID 28472507
- Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers CELL STEM CELL 2017; 20 (4): 490-?
Gpr124 is essential for blood-brain barrier integrity in central nervous system disease
2017; 23 (4): 450-?
Although blood-brain barrier (BBB) compromise is central to the etiology of diverse central nervous system (CNS) disorders, endothelial receptor proteins that control BBB function are poorly defined. The endothelial G-protein-coupled receptor (GPCR) Gpr124 has been reported to be required for normal forebrain angiogenesis and BBB function in mouse embryos, but the role of this receptor in adult animals is unknown. Here Gpr124 conditional knockout (CKO) in the endothelia of adult mice did not affect homeostatic BBB integrity, but resulted in BBB disruption and microvascular hemorrhage in mouse models of both ischemic stroke and glioblastoma, accompanied by reduced cerebrovascular canonical Wnt-β-catenin signaling. Constitutive activation of Wnt-β-catenin signaling fully corrected the BBB disruption and hemorrhage defects of Gpr124-CKO mice, with rescue of the endothelial gene tight junction, pericyte coverage and extracellular-matrix deficits. We thus identify Gpr124 as an endothelial GPCR specifically required for endothelial Wnt signaling and BBB integrity under pathological conditions in adult mice. This finding implicates Gpr124 as a potential therapeutic target for human CNS disorders characterized by BBB disruption.
View details for DOI 10.1038/nm.4309
View details for Web of Science ID 000398768100013
View details for PubMedID 28288111
- Induced Pluripotent Stem Cell Model of Pulmonary Arterial Hypertension Reveals Novel Gene Expression and Patient Specificity AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE 2017; 195 (7): 930-941
Characterization of the Dynamic Transcriptome of a Herpesvirus with Long-read Single Molecule Real-Time Sequencing.
2017; 7: 43751-?
Herpesvirus gene expression is co-ordinately regulated and sequentially ordered during productive infection. The viral genes can be classified into three distinct kinetic groups: immediate-early, early, and late classes. In this study, a massively parallel sequencing technique that is based on PacBio Single Molecule Real-time sequencing platform, was used for quantifying the poly(A) fraction of the lytic transcriptome of pseudorabies virus (PRV) throughout a 12-hour interval of productive infection on PK-15 cells. Other approaches, including microarray, real-time RT-PCR and Illumina sequencing are capable of detecting only the aggregate transcriptional activity of particular genomic regions, but not individual herpesvirus transcripts. However, SMRT sequencing allows for a distinction between transcript isoforms, including length- and splice variants, as well as between overlapping polycistronic RNA molecules. The non-amplified Isoform Sequencing (Iso-Seq) method was used to analyse the kinetic properties of the lytic PRV transcripts and to then classify them accordingly. Additionally, the present study demonstrates the general utility of long-read sequencing for the time-course analysis of global gene expression in practically any organism.
View details for DOI 10.1038/srep43751
View details for PubMedID 28256586
View details for PubMedCentralID PMC5335617
Association of AHSG with alopecia and mental retardation (APMR) syndrome.
2017; 136 (3): 287-296
Alopecia with mental retardation syndrome (APMR) is a very rare autosomal recessive condition that is associated with total or partial absence of hair from the scalp and other parts of the body as well as variable intellectual disability. Here we present whole-exome sequencing results of a large consanguineous family segregating APMR syndrome with seven affected family members. Our study revealed a novel predicted pathogenic, homozygous missense mutation in the AHSG (OMIM 138680) gene (AHSG: NM_001622:exon7:c.950G>A:p.Arg317His). The variant is predicted to affect a region of the protein required for protein processing and disrupts a phosphorylation motif. In addition, the altered protein migrates with an aberrant size relative to healthy individuals. Consistent with the phenotype, AHSG maps within APMR linkage region 1 (APMR 1) as reported before, and falls within runs of homozygosity (ROH). Previous families with APMR syndrome have been studied through linkage analyses and the linkage resolution did not allow pointing out to a single gene candidate. Our study is the first report to identify a homozygous missense mutation for APMR syndrome through whole-exome sequencing.
View details for DOI 10.1007/s00439-016-1756-5
View details for PubMedID 28054173
- A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N-1-methyladenosine modification RNA 2017; 23 (3): 270-283
RNA (New York, N.Y.)
2017; 23 (3): 270-283
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.
View details for DOI 10.1261/rna.059105.116
View details for PubMedID 27994090
View details for PubMedCentralID PMC5311483
- Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors NUCLEIC ACIDS RESEARCH 2017; 45 (3): 1281-1296
Pharmacological rescue of diabetic skeletal stem cell niches.
Science translational medicine
2017; 9 (372)
Diabetes mellitus (DM) is a metabolic disease frequently associated with impaired bone healing. Despite its increasing prevalence worldwide, the molecular etiology of DM-linked skeletal complications remains poorly defined. Using advanced stem cell characterization techniques, we analyzed intrinsic and extrinsic determinants of mouse skeletal stem cell (mSSC) function to identify specific mSSC niche-related abnormalities that could impair skeletal repair in diabetic (Db) mice. We discovered that high serum concentrations of tumor necrosis factor-α directly repressed the expression of Indian hedgehog (Ihh) in mSSCs and in their downstream skeletogenic progenitors in Db mice. When hedgehog signaling was inhibited during fracture repair, injury-induced mSSC expansion was suppressed, resulting in impaired healing. We reversed this deficiency by precise delivery of purified Ihh to the fracture site via a specially formulated, slow-release hydrogel. In the presence of exogenous Ihh, the injury-induced expansion and osteogenic potential of mSSCs were restored, culminating in the rescue of Db bone healing. Our results present a feasible strategy for precise treatment of molecular aberrations in stem and progenitor cell populations to correct skeletal manifestations of systemic disease.
View details for DOI 10.1126/scitranslmed.aag2809
View details for PubMedID 28077677
ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis.
Nucleic acids research
2017; 45 (1)
ChIA-PET2 is a versatile and flexible pipeline for analyzing different types of ChIA-PET data from raw sequencing reads to chromatin loops. ChIA-PET2 integrates all steps required for ChIA-PET data analysis, including linker trimming, read alignment, duplicate removal, peak calling and chromatin loop calling. It supports different kinds of ChIA-PET data generated from different ChIA-PET protocols and also provides quality controls for different steps of ChIA-PET analysis. In addition, ChIA-PET2 can use phased genotype data to call allele-specific chromatin interactions. We applied ChIA-PET2 to different ChIA-PET datasets, demonstrating its significantly improved performance as well as its ability to easily process ChIA-PET raw data. ChIA-PET2 is available at https://github.com/GuipengLi/ChIA-PET2.
View details for DOI 10.1093/nar/gkw809
View details for PubMedID 27625391
View details for PubMedCentralID PMC5224499
Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis
2016; 167 (7): 1734-?
Mutation of highly conserved residues in transcription factors may affect protein-protein or protein-DNA interactions, leading to gene network dysregulation and human disease. Human mutations in GATA4, a cardiogenic transcription factor, cause cardiac septal defects and cardiomyopathy. Here, iPS-derived cardiomyocytes from subjects with a heterozygous GATA4-G296S missense mutation showed impaired contractility, calcium handling, and metabolic activity. In human cardiomyocytes, GATA4 broadly co-occupied cardiac enhancers with TBX5, another transcription factor that causes septal defects when mutated. The GATA4-G296S mutation disrupted TBX5 recruitment, particularly to cardiac super-enhancers, concomitant with dysregulation of genes related to the phenotypic abnormalities, including cardiac septation. Conversely, the GATA4-G296S mutation led to failure of GATA4 and TBX5-mediated repression at non-cardiac genes and enhanced open chromatin states at endothelial/endocardial promoters. These results reveal how disease-causing missense mutations can disrupt transcriptional cooperativity, leading to aberrant chromatin states and cellular dysfunction, including those related to morphogenetic defects.
View details for DOI 10.1016/j.cell.2016.11.033
View details for Web of Science ID 000393114700013
View details for PubMedID 27984724
View details for PubMedCentralID PMC5180611
Can heavy isotopes increase lifespan? Studies of relative abundance in various organisms reveal chemical perspectives on aging.
2016; 38 (11): 1093-1101
Stable heavy isotopes co-exist with their lighter counterparts in all elements commonly found in biology. These heavy isotopes represent a low natural abundance in isotopic composition but impose great retardation effects in chemical reactions because of kinetic isotopic effects (KIEs). Previous isotope analyses have recorded pervasive enrichment or depletion of heavy isotopes in various organisms, strongly supporting the capability of biological systems to distinguish different isotopes. This capability has recently been found to lead to general decline of heavy isotopes in metabolites during yeast aging. Conversely, supplementing heavy isotopes in growth medium promotes longevity. Whether this observation prevails in other organisms is not known, but it potentially bears promise in promoting human longevity.
View details for DOI 10.1002/bies.201600040
View details for PubMedID 27554342
Nat1 Deficiency Is Associated with Mitochondrial Dysfunction and Exercise Intolerance in Mice
2016; 17 (2): 527-540
We recently identified human N-acetyltransferase 2 (NAT2) as an insulin resistance (IR) gene. Here, we examine the cellular mechanism linking NAT2 to IR and find that Nat1 (mouse ortholog of NAT2) is co-regulated with key mitochondrial genes. RNAi-mediated silencing of Nat1 led to mitochondrial dysfunction characterized by increased intracellular reactive oxygen species and mitochondrial fragmentation as well as decreased mitochondrial membrane potential, biogenesis, mass, cellular respiration, and ATP generation. These effects were consistent in 3T3-L1 adipocytes, C2C12 myoblasts, and in tissues from Nat1-deficient mice, including white adipose tissue, heart, and skeletal muscle. Nat1-deficient mice had changes in plasma metabolites and lipids consistent with a decreased ability to utilize fats for energy and a decrease in basal metabolic rate and exercise capacity without altered thermogenesis. Collectively, our results suggest that Nat1 deficiency results in mitochondrial dysfunction, which may constitute a mechanistic link between this gene and IR.
View details for DOI 10.1016/j.celrep.2016.09.005
View details for Web of Science ID 000385850700019
View details for PubMedID 27705799
View details for PubMedCentralID PMC5097870
Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.
2016; 48 (10): 1193-1203
We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.
View details for DOI 10.1038/ng.3646
View details for PubMedID 27526324
View details for PubMedCentralID PMC5042844
- A proposal for validation of antibodies NATURE METHODS 2016; 13 (10): 823-?
Multiple Pairwise Analysis of Non-homologous Centromere Coupling Reveals Preferential Chromosome Size-Dependent Interactions and a Role for Bouquet Formation in Establishing the Interaction Pattern
2016; 12 (10)
During meiosis, chromosomes undergo a homology search in order to locate their homolog to form stable pairs and exchange genetic material. Early in prophase, chromosomes associate in mostly non-homologous pairs, tethered only at their centromeres. This phenomenon, conserved through higher eukaryotes, is termed centromere coupling in budding yeast. Both initiation of recombination and the presence of homologs are dispensable for centromere coupling (occurring in spo11 mutants and haploids induced to undergo meiosis) but the presence of the synaptonemal complex (SC) protein Zip1 is required. The nature and mechanism of coupling have yet to be elucidated. Here we present the first pairwise analysis of centromere coupling in an effort to uncover underlying rules that may exist within these non-homologous interactions. We designed a novel chromosome conformation capture (3C)-based assay to detect all possible interactions between non-homologous yeast centromeres during early meiosis. Using this variant of 3C-qPCR, we found a size-dependent interaction pattern, in which chromosomes assort preferentially with chromosomes of similar sizes, in haploid and diploid spo11 cells, but not in a coupling-defective mutant (spo11 zip1 haploid and diploid yeast). This pattern is also observed in wild-type diploids early in meiosis but disappears as meiosis progresses and homologous chromosomes pair. We found no evidence to support the notion that ancestral centromere homology plays a role in pattern establishment in S. cerevisiae post-genome duplication. Moreover, we found a role for the meiotic bouquet in establishing the size dependence of centromere coupling, as abolishing bouquet (using the bouquet-defective spo11 ndj1 mutant) reduces it. Coupling in spo11 ndj1 rather follows telomere clustering preferences. We propose that a chromosome size preference for centromere coupling helps establish efficient homolog recognition.
View details for DOI 10.1371/journal.pgen.1006347
View details for Web of Science ID 000386683300016
View details for PubMedID 27768699
iPSC-derived cardiomyocytes reveal abnormal TGF-ß signalling in left ventricular non-compaction cardiomyopathy.
Nature cell biology
2016; 18 (10): 1031-1042
Left ventricular non-compaction (LVNC) is the third most prevalent cardiomyopathy in children and its pathogenesis has been associated with the developmental defect of the embryonic myocardium. We show that patient-specific induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) generated from LVNC patients carrying a mutation in the cardiac transcription factor TBX20 recapitulate a key aspect of the pathological phenotype at the single-cell level and this was associated with perturbed transforming growth factor beta (TGF-β) signalling. LVNC iPSC-CMs have decreased proliferative capacity due to abnormal activation of TGF-β signalling. TBX20 regulates the expression of TGF-β signalling modifiers including one known to be a genetic cause of LVNC, PRDM16, and genome editing of PRDM16 caused proliferation defects in iPSC-CMs. Inhibition of TGF-β signalling and genome correction of the TBX20 mutation were sufficient to reverse the disease phenotype. Our study demonstrates that iPSC-CMs are a useful tool for the exploration of pathological mechanisms underlying poorly understood cardiomyopathies including LVNC.
View details for DOI 10.1038/ncb3411
View details for PubMedID 27642787
Full-Length Isoform Sequencing Reveals Novel Transcripts and Substantial Transcriptional Overlaps in a Herpesvirus
2016; 11 (9)
Whole transcriptome studies have become essential for understanding the complexity of genetic regulation. However, the conventionally applied short-read sequencing platforms cannot be used to reliably distinguish between many transcript isoforms. The Pacific Biosciences (PacBio) RS II platform is capable of reading long nucleic acid stretches in a single sequencing run. The pseudorabies virus (PRV) is an excellent system to study herpesvirus gene expression and potential interactions between the transcriptional units. In this work, non-amplified and amplified isoform sequencing protocols were used to characterize the poly(A+) fraction of the lytic transcriptome of PRV, with the aim of a complete transcriptional annotation of the viral genes. The analyses revealed a previously unrecognized complexity of the PRV transcriptome including the discovery of novel protein-coding and non-coding genes, novel mono- and polycistronic transcription units, as well as extensive transcriptional overlaps between neighboring and distal genes. This study identified non-coding transcripts overlapping all three replication origins of the PRV, which might play a role in the control of DNA synthesis. We additionally established the relative expression levels of gene products. Our investigations revealed that the whole PRV genome is utilized for transcription, including both DNA strands in all coding and intergenic regions. The genome-wide occurrence of transcript overlaps suggests a crosstalk between genes through a network formed by interacting transcriptional machineries with a potential function in the control of gene expression.
View details for DOI 10.1371/journal.pone.0162868
View details for Web of Science ID 000384328500015
View details for PubMedID 27685795
Transcriptome Profiling of Patient-Specific Human iPSC-Cardiomyocytes Predicts Individual Drug Safety and Efficacy Responses In Vitro.
Cell stem cell
2016; 19 (3): 311-325
Understanding individual susceptibility to drug-induced cardiotoxicity is key to improving patient safety and preventing drug attrition. Human induced pluripotent stem cells (hiPSCs) enable the study of pharmacological and toxicological responses in patient-specific cardiomyocytes (CMs) and may serve as preclinical platforms for precision medicine. Transcriptome profiling in hiPSC-CMs from seven individuals lacking known cardiovascular disease-associated mutations and in three isogenic human heart tissue and hiPSC-CM pairs showed greater inter-patient variation than intra-patient variation, verifying that reprogramming and differentiation preserve patient-specific gene expression, particularly in metabolic and stress-response genes. Transcriptome-based toxicology analysis predicted and risk-stratified patient-specific susceptibility to cardiotoxicity, and functional assays in hiPSC-CMs using tacrolimus and rosiglitazone, drugs targeting pathways predicted to produce cardiotoxicity, validated inter-patient differential responses. CRISPR/Cas9-mediated pathway correction prevented drug-induced cardiotoxicity. Our data suggest that hiPSC-CMs can be used in vitro to predict and validate patient-specific drug safety and efficacy, potentially enabling future clinical approaches to precision medicine.
View details for DOI 10.1016/j.stem.2016.07.006
View details for PubMedID 27545504
Predicting Ovarian Cancer Patients' Clinical Response to Platinum-Based Chemotherapy by Their Tumor Proteomic Signatures
JOURNAL OF PROTEOME RESEARCH
2016; 15 (8): 2455-2465
Ovarian cancer is the deadliest gynecologic malignancy in the United States with most patients diagnosed in the advanced stage of the disease. Platinum-based antineoplastic therapeutics is indispensable to treating advanced ovarian serous carcinoma. However, patients have heterogeneous responses to platinum drugs, and it is difficult to predict these interindividual differences before administering medication. In this study, we investigated the tumor proteomic profiles and clinical characteristics of 130 ovarian serous carcinoma patients analyzed by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), predicted the platinum drug response using supervised machine learning methods, and evaluated our prediction models through leave-one-out cross-validation. Our data-driven feature selection approach indicated that tumor proteomics profiles contain information for predicting binarized platinum response (P < 0.0001). We further built a least absolute shrinkage and selection operator (LASSO)-Cox proportional hazards model that stratified patients into early relapse and late relapse groups (P = 0.00013). The top proteomic features indicative of platinum response were involved in ATP synthesis pathways and Ran GTPase binding. Overall, we demonstrated that proteomic profiles of ovarian serous carcinoma patients predicted platinum drug responses as well as provided insights into the biological processes influencing the efficacy of platinum-based therapeutics. Our analytical approach is also extensible to predicting response to other antineoplastic agents or treatment modalities for both ovarian and other cancers.
View details for DOI 10.1021/acs.jproteome.5b01129
View details for Web of Science ID 000381235900010
View details for PubMedID 27312948
EPHB4 kinase-inactivating mutations cause autosomal dominant lymphatic-related hydrops fetalis.
journal of clinical investigation
2016; 126 (8): 3080-3088
Hydrops fetalis describes fluid accumulation in at least 2 fetal compartments, including abdominal cavities, pleura, and pericardium, or in body tissue. The majority of hydrops fetalis cases are nonimmune conditions that present with generalized edema of the fetus, and approximately 15% of these nonimmune cases result from a lymphatic abnormality. Here, we have identified an autosomal dominant, inherited form of lymphatic-related (nonimmune) hydrops fetalis (LRHF). Independent exome sequencing projects on 2 families with a history of in utero and neonatal deaths associated with nonimmune hydrops fetalis uncovered 2 heterozygous missense variants in the gene encoding Eph receptor B4 (EPHB4). Biochemical analysis determined that the mutant EPHB4 proteins are devoid of tyrosine kinase activity, indicating that loss of EPHB4 signaling contributes to LRHF pathogenesis. Further, inactivation of Ephb4 in lymphatic endothelial cells of developing mouse embryos led to defective lymphovenous valve formation and consequent subcutaneous edema. Together, these findings identify EPHB4 as a critical regulator of early lymphatic vascular development and demonstrate that mutations in the gene can cause an autosomal dominant form of LRHF that is associated with a high mortality rate.
View details for DOI 10.1172/JCI85794
View details for PubMedID 27400125
Omics Profiling in Precision Oncology.
Molecular & cellular proteomics
2016; 15 (8): 2525-2536
Cancer causes significant morbidity and mortality worldwide, and is the area most targeted in precision medicine. Recent development of high-throughput methods enables detailed omics analysis of the molecular mechanisms underpinning tumor biology. These studies have identified clinically actionable mutations, gene and protein expression patterns associated with prognosis, and provided further insights into the molecular mechanisms indicative of cancer biology and new therapeutics strategies such as immunotherapy. In this review, we summarize the techniques used for tumor omics analysis, recapitulate the key findings in cancer omics studies, and point to areas requiring further research on precision oncology.
View details for DOI 10.1074/mcp.O116.059253
View details for PubMedID 27099341
Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.
2016; 166 (3): 755-765
To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC.
View details for DOI 10.1016/j.cell.2016.05.069
View details for PubMedID 27372738
Integrated Network Analysis Reveals an Association between Plasma Mannose Levels and Insulin Resistance
2016; 24 (1): 172-184
To investigate the biological processes that are altered in obese subjects, we generated cell-specific integrated networks (INs) by merging genome-scale metabolic, transcriptional regulatory and protein-protein interaction networks. We performed genome-wide transcriptomics analysis to determine the global gene expression changes in the liver and three adipose tissues from obese subjects undergoing bariatric surgery and integrated these data into the cell-specific INs. We found dysregulations in mannose metabolism in obese subjects and validated our predictions by detecting mannose levels in the plasma of the lean and obese subjects. We observed significant correlations between plasma mannose levels, BMI, and insulin resistance (IR). We also measured plasma mannose levels of the subjects in two additional different cohorts and observed that an increased plasma mannose level was associated with IR and insulin secretion. We finally identified mannose as one of the best plasma metabolites in explaining the variance in obesity-independent IR.
View details for DOI 10.1016/j.cmet.2016.05.026
View details for Web of Science ID 000380793400022
View details for PubMedID 27345421
Using Mass Spectrometry to Quantify Rituximab and Perform Individualized Immunoglobulin Phenotyping in ANCA-Associated Vasculitis
2016; 88 (12): 6317-6325
Therapeutic monoclonal immunoglobulins (mAbs) are used to treat patients with a wide range of disorders including autoimmune diseases. As pharmaceutical companies bring more fully humanized therapeutic mAb drugs to the healthcare market analytical platforms that perform therapeutic drug monitoring (TDM) without relying on mAb specific reagents will be needed. In this study we demonstrate that liquid-chromatography-mass spectrometry (LC-MS) can be used to perform TDM of mAbs in the same manner as smaller nonbiologic drugs. The assay uses commercially available reagents combined with heavy and light chain disulfide bond reduction followed by light chain analysis by microflow-LC-electrospray ionization-quadrupole-time-of-flight mass spectrometry (ESI-Q-TOF MS). Quantification is performed using the peak areas from multiply charged mAb light chain ions using an in-house developed software package developed for TDM of mAbs. The data presented here demonstrate the ability of an LC-MS assay to quantify a therapeutic mAb in a large cohort of patients in a clinical trial. The ability to quantify any mAb in serum via the reduced light chain without the need for reagents specific for each mAb demonstrates the unique capabilities of LC-MS. This fact, coupled with the ability to phenotype a patient's polyclonal repertoire in the same analysis further shows the potential of this approach to mAb analysis.
View details for DOI 10.1021/acs.analchem.6b00544
View details for Web of Science ID 000378470200034
View details for PubMedID 27228216
- Genome assembly from synthetic long read clouds BIOINFORMATICS 2016; 32 (12): 216-224
Genome assembly from synthetic long read clouds.
2016; 32 (12): i216-i224
Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.Our source code is freely available at https://email@example.com.
View details for DOI 10.1093/bioinformatics/btw267
View details for PubMedID 27307620
Concerted genomic targeting of H3K27 demethylase REF6 and chromatin-remodeling ATPase BRM in Arabidopsis
2016; 48 (6): 687-?
SWI/SNF-type chromatin remodelers, such as BRAHMA (BRM), and H3K27 demethylases both have active roles in regulating gene expression at the chromatin level, but how they are recruited to specific genomic sites remains largely unknown. Here we show that RELATIVE OF EARLY FLOWERING 6 (REF6), a plant-unique H3K27 demethylase, targets genomic loci containing a CTCTGYTY motif via its zinc-finger (ZnF) domains and facilitates the recruitment of BRM. Genome-wide analyses showed that REF6 colocalizes with BRM at many genomic sites with the CTCTGYTY motif. Loss of REF6 results in decreased BRM occupancy at BRM-REF6 co-targets. Furthermore, REF6 directly binds to the CTCTGYTY motif in vitro, and deletion of the motif from a target gene renders it inaccessible to REF6 in vivo. Finally, we show that, when its ZnF domains are deleted, REF6 loses its genomic targeting ability. Thus, our work identifies a new genomic targeting mechanism for an H3K27 demethylase and demonstrates its key role in recruiting the BRM chromatin remodeler.
View details for DOI 10.1038/ng.3555
View details for Web of Science ID 000376744200018
View details for PubMedID 27111034
The genetic predisposition to bronchopulmonary dysplasia
CURRENT OPINION IN PEDIATRICS
2016; 28 (3): 318-323
Bronchopulmonary dysplasia (BPD) is a prevalent chronic lung disease in premature infants. Twin studies have shown strong heritability underlying this disease; however, the genetic architecture of BPD remains unclear.A number of studies employed different approaches to characterize the genetic aberrations associated with BPD, including candidate gene studies, genome-wide association studies, exome sequencing, integrative omics analysis, and pathway analysis. Candidate gene studies identified a number of genes potentially involved with the development of BPD, but the etiological contribution from each gene is not substantial. Copy number variation studies and three independent genome-wide association studies did not identify genetic variations significantly and consistently associated with BPD. A recent exome-sequencing study pointed to rare variants implicated in the disease. In this review, we summarize these studies' methodology and findings, and suggest future research directions to better understand the genetic underpinnings of this potentially life-long lung disease.Genetic factors play a significant role in the development of BPD. Recent studies suggested that rare variants in genes participating in lung development pathways could contribute to BPD susceptibility.
View details for DOI 10.1097/MOP.0000000000000344
View details for Web of Science ID 000376387000010
View details for PubMedID 26963946
Age-Dependent Pancreatic Gene Regulation Reveals Mechanisms Governing Human beta Cell Function
2016; 23 (5): 909-920
Intensive efforts are focused on identifying regulators of human pancreatic islet cell growth and maturation to accelerate development of therapies for diabetes. After birth, islet cell growth and function are dynamically regulated; however, establishing these age-dependent changes in humans has been challenging. Here, we describe a multimodal strategy for isolating pancreatic endocrine and exocrine cells from children and adults to identify age-dependent gene expression and chromatin changes on a genomic scale. These profiles revealed distinct proliferative and functional states of islet α cells or β cells and histone modifications underlying age-dependent gene expression changes. Expression of SIX2 and SIX3, transcription factors without prior known functions in the pancreas and linked to fasting hyperglycemia risk, increased with age specifically in human islet β cells. SIX2 and SIX3 were sufficient to enhance insulin content or secretion in immature β cells. Our work provides a unique resource to study human-specific regulators of islet cell maturation and function.
View details for DOI 10.1016/j.cmet.2016.04.002
View details for Web of Science ID 000375550700021
View details for PubMedID 27133132
Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection.Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.
View details for DOI 10.1186/s12859-016-0957-1
View details for Web of Science ID 000370775000001
View details for PubMedID 26908256
Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations.
2016; 48 (2): 117-125
Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.
View details for DOI 10.1038/ng.3471
View details for PubMedID 26691984
Proteome-wide survey of the autoimmune target repertoire in autoimmune polyendocrine syndrome type 1
Autoimmune polyendocrine syndrome type 1 (APS1) is a monogenic disorder that features multiple autoimmune disease manifestations. It is caused by mutations in the Autoimmune regulator (AIRE) gene, which promote thymic display of thousands of peripheral tissue antigens in a process critical for establishing central immune tolerance. We here used proteome arrays to perform a comprehensive study of autoimmune targets in APS1. Interrogation of established autoantigens revealed highly reliable detection of autoantibodies, and by exploring the full panel of more than 9000 proteins we further identified MAGEB2 and PDILT as novel major autoantigens in APS1. Our proteome-wide assessment revealed a marked enrichment for tissue-specific immune targets, mirroring AIRE's selectiveness for this category of genes. Our findings also suggest that only a very limited portion of the proteome becomes targeted by the immune system in APS1, which contrasts the broad defect of thymic presentation associated with AIRE-deficiency and raises novel questions what other factors are needed for break of tolerance.
View details for DOI 10.1038/srep20104
View details for Web of Science ID 000368996700001
View details for PubMedID 26830021
- Distance from sub-Saharan Africa predicts mutational load in diverse human genomes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 2016; 113 (4): E440-E449
- Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing G3-GENES GENOMES GENETICS 2016; 6 (1): 41-49
- NIH working group report-using genomic information to guide weight management: From universal to precision treatment OBESITY 2016; 24 (1): 14-22
- Metformin Improves Diabetic Bone Health by Re-Balancing Catabolism and Nitrogen Disposal PLOS ONE 2015; 10 (12)
- Integrated Proteomic and Genomic Analysis of Gastric Cancer Patient Tissues JOURNAL OF PROTEOME RESEARCH 2015; 14 (12): 4995-5006
Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans
2015; 25 (11): 1610-1621
Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation.
View details for DOI 10.1101/gr.193342.115
View details for Web of Science ID 000364355600003
View details for PubMedID 26297486
View details for PubMedCentralID PMC4617958
- Design and Implementation of the International Genetics and Translational Research in Transplantation Network TRANSPLANTATION 2015; 99 (11): 2401-2412
- Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data PLOS GENETICS 2015; 11 (10)
- Mango: a bias-correcting ChIA-PET analysis pipeline. Bioinformatics 2015; 31 (19): 3092-3098
Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.
2015; 11 (10)
High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.
View details for DOI 10.1371/journal.pgen.1005496
View details for PubMedID 26448358
View details for PubMedCentralID PMC4598191
Exome Sequencing of Neonatal Blood Spots and the Identification of Genes Implicated in Bronchopulmonary Dysplasia.
American journal of respiratory and critical care medicine
2015; 192 (5): 589-596
Bronchopulmonary dysplasia (BPD), a prevalent severe lung disease of premature infants, has a strong genetic component. Large-scale genome-wide association studies for common variants have not revealed its genetic basis.Given the historical high mortality rate of extremely preterm infants who now survive and develop BPD, we hypothesized that risk loci underlying this disease are under severe purifying selection during evolution; thus, rare variants likely explain greater risk of the disease.We performed exome sequencing on 50 BPD-affected and unaffected twin pairs using DNA isolated from neonatal blood spots and identified genes affected by extremely rare nonsynonymous mutations. Functional genomic approaches were then used to systematically compare these affected genes.We identified 258 genes with rare nonsynonymous mutations in patients with BPD. These genes were highly enriched for processes involved in pulmonary structure and function including collagen fibril organization, morphogenesis of embryonic epithelium, and regulation of Wnt signaling pathway; displayed significantly elevated expression in fetal and adult lungs; and were substantially up-regulated in a murine model of BPD. Analyses of mouse mutants revealed their phenotypic enrichment for embryonic development and the cyanosis phenotype, a clinical manifestation of BPD.Our study supports the role of rare variants in BPD, in contrast with the role of common variants targeted by genome-wide association studies. Overall, our study is the first to sequence BPD exomes from newborn blood spot samples and identify with high confidence genes implicated in BPD, thereby providing important insights into its biology and molecular etiology.
View details for DOI 10.1164/rccm.201501-0168OC
View details for PubMedID 26030808
- Evaluating Common Humoral Responses against Fungal Infections with Yeast Protein Microarrays JOURNAL OF PROTEOME RESEARCH 2015; 14 (9): 3924-3931
RNA Sequencing Analysis Detection of a Novel Pathway of Endothelial Dysfunction in Pulmonary Arterial Hypertension
AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE
2015; 192 (3): 356-366
Pulmonary arterial hypertension is characterized by endothelial dysregulation, but global changes in gene expression have not been related to perturbations in function.RNA sequencing was utilized to discriminate changes in transcriptomes of endothelial cells cultured from lungs of patients with idiopathic pulmonary arterial hypertension vs. controls and to assess the functional significance of major differentially expressed transcripts.The endothelial transcriptomes from seven control and six idiopathic pulmonary arterial hypertension patients' lungs were analyzed. Differentially expressed genes were related to BMPR2 signaling. Those downregulated were assessed for function in cultured cells, and in a transgenic mouse.Fold-differences in ten genes were significant (p<0.05), four increased and six decreased in patients vs.No patient was mutant for BMPR2. However, knockdown of BMPR2 by siRNA in control pulmonary arterial endothelial cells recapitulated six/ten patient-related gene changes, including decreased collagen IV (COL4A1, COL4A2) and ephrinA1 (EFNA1). Reduction of BMPR2 regulated transcripts was related to decreased β-catenin. Reducing COL4A1, COL4A2 and EFNA1 by siRNA inhibited pulmonary endothelial adhesion, migration and tube formation. In mice null for the EFNA1 receptor, EphA2, vs. controls, VEGF receptor blockade and hypoxia caused more severe pulmonary hypertension, judged by elevated right ventricular systolic pressure, right ventricular hypertrophy and loss of small arteries.The novel relationship between BMPR2 dysfunction and reduced expression of endothelial COL4 and EFNA1 may underlie vulnerability to injury in pulmonary arterial hypertension.
View details for DOI 10.1164/rccm.201408-1528OC
View details for Web of Science ID 000359178500017
View details for PubMedID 26030479
- Probing High-density Functional Protein Microarrays to Detect Protein-protein Interactions JOVE-JOURNAL OF VISUALIZED EXPERIMENTS 2015
- Single-cell chromatin accessibility reveals principles of regulatory variation NATURE 2015; 523 (7561): 486-U264
Single-cell chromatin accessibility reveals principles of regulatory variation.
2015; 523 (7561): 486-490
Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.
View details for DOI 10.1038/nature14590
View details for PubMedID 26083756
- Achieving high-sensitivity for clinical applications using augmented exome sequencing GENOME MEDICINE 2015; 7
- Recurrent somatic mutations in regulatory regions of human cancer genomes NATURE GENETICS 2015; 47 (7): 710-?
- Where Next for Genetics and Genomics? PLOS BIOLOGY 2015; 13 (7)
- Metabolome progression during early gut microbial colonization of gnotobiotic mice SCIENTIFIC REPORTS 2015; 5
Transglutaminase 4 as a prostate autoantigen in male subfertility
SCIENCE TRANSLATIONAL MEDICINE
2015; 7 (292)
Autoimmune polyendocrine syndrome type 1 (APS1), a monogenic disorder caused by AIRE gene mutations, features multiple autoimmune disease components. Infertility is common in both males and females with APS1. Although female infertility can be explained by autoimmune ovarian failure, the mechanisms underlying male infertility have remained poorly understood. We performed a proteome-wide autoantibody screen in APS1 patient sera to assess the autoimmune response against the male reproductive organs. By screening human protein arrays with male and female patient sera and by selecting for gender-imbalanced autoantibody signals, we identified transglutaminase 4 (TGM4) as a male-specific autoantigen. Notably, TGM4 is a prostatic secretory molecule with critical role in male reproduction. TGM4 autoantibodies were detected in most of the adult male APS1 patients but were absent in all the young males. Consecutive serum samples further revealed that TGM4 autoantibodies first presented during pubertal age and subsequent to prostate maturation. We assessed the animal model for APS1, the Aire-deficient mouse, and found spontaneous development of TGM4 autoantibodies specifically in males. Aire-deficient mice failed to present TGM4 in the thymus, consistent with a defect in central tolerance for TGM4. In the mouse, we further link TGM4 immunity with a destructive prostatitis and compromised secretion of TGM4. Collectively, our findings in APS1 patients and Aire-deficient mice reveal prostate autoimmunity as a major manifestation of APS1 with potential role in male subfertility.
View details for DOI 10.1126/scitranslmed.aaa9186
View details for Web of Science ID 000356390500008
View details for PubMedID 26084804
Transcriptome Signature and Regulation in Human Somatic Cell Reprogramming
STEM CELL REPORTS
2015; 4 (6): 1125-1139
Reprogramming of somatic cells produces induced pluripotent stem cells (iPSCs) that are invaluable resources for biomedical research. Here, we extended the previous transcriptome studies by performing RNA-seq on cells defined by a combination of multiple cellular surface markers. We found that transcriptome changes during early reprogramming occur independently from the opening of closed chromatin by OCT4, SOX2, KLF4, and MYC (OSKM). Furthermore, our data identify multiple spliced forms of genes uniquely expressed at each progressive stage of reprogramming. In particular, we found a pluripotency-specific spliced form of CCNE1 that is specific to human and significantly enhances reprogramming. In addition, single nucleotide polymorphism (SNP) expression analysis reveals that monoallelic gene expression is induced in the intermediate stages of reprogramming, while biallelic expression is recovered upon completion of reprogramming. Our transcriptome data provide unique opportunities in understanding human iPSC reprogramming.
View details for DOI 10.1016/j.stemcr.2015.04.009
View details for Web of Science ID 000356068100017
View details for PubMedID 26004630
Optimized Analytical Procedures for the Untargeted Metabolomic Profiling of Human Urine and Plasma by Combining Hydrophilic Interaction (HILIC) and Reverse-Phase Liquid Chromatography (RPLC)-Mass Spectrometry
MOLECULAR & CELLULAR PROTEOMICS
2015; 14 (6): 1684-1695
Profiling of body fluids is crucial for monitoring and discovering metabolic markers of health and disease and for providing insights into human physiology. Since human urine and plasma each contain an extreme diversity of metabolites, a single liquid chromatographic system when coupled to mass spectrometry (MS) is not sufficient to achieve reasonable metabolome coverage. Hydrophilic interaction liquid chromatography (HILIC) offers complementary information to reverse-phase liquid chromatography (RPLC) by retaining polar metabolites. With the objective of finding the optimal combined chromatographic solution to profile urine and plasma, we systematically investigated the performance of five HILIC columns with different chemistries operated at three different pH (acidic, neutral, basic) and five C18-silica RPLC columns. The zwitterionic column ZIC-HILIC operated at neutral pH provided optimal performance on a large set of hydrophilic metabolites. The RPLC columns Hypersil GOLD and Zorbax SB aq were proven to be best suited for the metabolic profiling of urine and plasma, respectively. Importantly, the optimized HILIC-MS method showed excellent intrabatch peak area reproducibility (CV < 12%) and good long-term interbatch (40 days) peak area reproducibility (CV < 22%) that were similar to those of RPLC-MS procedures. Finally, combining the optimal HILIC- and RPLC-MS approaches greatly expanded metabolome coverage with 44% and 108% new metabolic features detected compared with RPLC-MS alone for urine and plasma, respectively. The proposed combined LC-MS approaches improve the comprehensiveness of global metabolic profiling of body fluids and thus are valuable for monitoring and discovering metabolic changes associated with health and disease in clinical research studies.
View details for DOI 10.1074/mcp.M114.046508
View details for Web of Science ID 000355550400019
View details for PubMedID 25787789
- High-Throughput Sequencing Technologies MOLECULAR CELL 2015; 58 (4): 586-597
High-throughput sequencing technologies.
2015; 58 (4): 586-597
The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.
View details for DOI 10.1016/j.molcel.2015.05.004
View details for PubMedID 26000844
Characterization of Novel Transcripts in Pseudorabies Virus
2015; 7 (5): 2727-2744
In this study we identified two 3'-coterminal RNA molecules in the pseudorabies virus. The highly abundant short transcript (CTO-S) proved to be encoded between the ul21 and ul22 genes in close vicinity of the replication origin (OriL) of the virus. The less abundant long RNA molecule (CTO-L) is a transcriptional readthrough product of the ul21 gene and overlaps OriL. These polyadenylated RNAs were characterized by ascertaining their nucleotide sequences with the Illumina HiScanSQ and Pacific Biosciences Real-Time (PacBio RSII) sequencing platforms and by analyzing their transcription kinetics through use of multi-time-point Real-Time RT-PCR and the PacBio RSII system. It emerged that transcription of the CTOs is fully dependent on the viral transactivator protein IE180 and CTO-S is not a microRNA precursor. We propose an interaction between the transcription and replication machineries at this genomic location, which might play an important role in the regulation of DNA synthesis.
View details for DOI 10.3390/v7052727
View details for Web of Science ID 000356228700027
View details for PubMedID 26008709
Impact of allele-specific peptides in proteome quantification
PROTEOMICS CLINICAL APPLICATIONS
2015; 9 (3-4): 432-436
MS-based proteome technologies have greatly improved our ability to detect and quantify proteomes across various biological samples. High throughput bottom-up proteome profiling in combination with targeted MS method, e.g. SRM assay, is emerging as a powerful approach in the field of biomarker discovery. In the past few years, increasing number of studies have attempted to integrate genomic and proteomic data for biomarker discovery. Here, we describe how allele-specific peptide can be applied in biomarker discovery and their impact in protein quantification.
View details for DOI 10.1002/prca.201400126
View details for Web of Science ID 000353291000019
View details for PubMedID 25676416
Reassessment of Piwi Binding to the Genome and Piwi Impact on RNA Polymerase II Distribution
2015; 32 (6): 772-774
Drosophila Piwi was reported by Huang et al. (2013) to be guided by piRNAs to piRNA-complementary sites in the genome, which then recruits heterochromatin protein 1a and histone methyltransferase Su(Var)3-9 to the sites. Among additional findings, Huang et al. (2013) also reported Piwi binding sites in the genome and the reduction of RNA polymerase II in euchromatin but its increase in pericentric regions in piwi mutants. Marinov et al. (2015) disputed the validity of the Huang et al. bioinformatic pipeline that led to the last two claims. Here we report our independent reanalysis of the data using current bioinformatic methods. Our reanalysis agrees with Marinov et al. (2015) that Piwi's genomic targets still remain to be identified but confirms the Huang et al. claim that Piwi influences RNA polymerase II distribution in the genome. This Matters Arising Response addresses the Marinov et al. (2015) Matters Arising, published concurrently in this issue of Developmental Cell.
View details for DOI 10.1016/j.devcel.2015.03.004
View details for Web of Science ID 000351841900015
View details for PubMedID 25805139
The conserved histone deacetylase Rpd3 and its DNA binding subunit Ume6 control dynamic transcript architecture during mitotic growth and meiotic development
NUCLEIC ACIDS RESEARCH
2015; 43 (1): 115-128
It was recently reported that the sizes of many mRNAs change when budding yeast cells exit mitosis and enter the meiotic differentiation pathway. These differences were attributed to length variations of their untranslated regions. The function of UTRs in protein translation is well established. However, the mechanism controlling the expression of distinct transcript isoforms during mitotic growth and meiotic development is unknown. In this study, we order developmentally regulated transcript isoforms according to their expression at specific stages during meiosis and gametogenesis, as compared to vegetative growth and starvation. We employ regulatory motif prediction, in vivo protein-DNA binding assays, genetic analyses and monitoring of epigenetic amino acid modification patterns to identify a novel role for Rpd3 and Ume6, two components of a histone deacetylase complex already known to repress early meiosis-specific genes in dividing cells, in mitotic repression of meiosis-specific transcript isoforms. Our findings classify developmental stage-specific early, middle and late meiotic transcript isoforms, and they point to a novel HDAC-dependent control mechanism for flexible transcript architecture during cell growth and differentiation. Since Rpd3 is highly conserved and ubiquitously expressed in many tissues, our results are likely relevant for development and disease in higher eukaryotes.
View details for DOI 10.1093/nar/gku1185
View details for Web of Science ID 000350207100017
View details for PubMedID 25477386
View details for PubMedCentralID PMC4288150
- Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications 2015; 6: 8085-?
Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis.
2015; 6: 8085-?
Generalized lymphatic dysplasia (GLD) is a rare form of primary lymphoedema characterized by a uniform, widespread lymphoedema affecting all segments of the body, with systemic involvement such as intestinal and/or pulmonary lymphangiectasia, pleural effusions, chylothoraces and/or pericardial effusions. This may present prenatally as non-immune hydrops. Here we report homozygous and compound heterozygous mutations in PIEZO1, resulting in an autosomal recessive form of GLD with a high incidence of non-immune hydrops fetalis and childhood onset of facial and four limb lymphoedema. Mutations in PIEZO1, which encodes a mechanically activated ion channel, have been reported with autosomal dominant dehydrated hereditary stomatocytosis and non-immune hydrops of unknown aetiology. Besides its role in red blood cells, our findings indicate that PIEZO1 is also involved in the development of lymphatic structures.
View details for DOI 10.1038/ncomms9085
View details for PubMedID 26333996
Whole-Exome Enrichment with the Agilent SureSelect Human All Exon Platform.
Cold Spring Harbor protocols
2015; 2015 (7): pdb prot083659-?
There are multiple platforms available for whole-exome enrichment and sequencing (WES). This protocol is based on the Agilent SureSelect Human All Exon platform, which targets ∼50 Mb of the human exonic regions. The SureSelect system uses ∼120-base RNA probes to capture known coding DNA sequences (CDS) from the NCBI Consensus CDS Database as well as other major RNA coding sequence databases, such as Sanger miRBase. The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.
View details for DOI 10.1101/pdb.prot083659
View details for PubMedID 25762417
Metabolome progression during early gut microbial colonization of gnotobiotic mice.
2015; 5: 11589-?
The microbiome has been implicated directly in host health, especially host metabolic processes and development of immune responses. These are particularly important in infants where the gut first begins being colonized, and such processes may be modeled in mice. In this investigation we follow longitudinally the urine metabolome of ex-germ-free mice, which are colonized with two bacterial species, Bacteroides thetaiotaomicron and Bifidobacterium longum. High-throughput mass spectrometry profiling of urine samples revealed dynamic changes in the metabolome makeup, associated with the gut bacterial colonization, enabled by our adaptation of non-linear time-series analysis to urine metabolomics data. Results demonstrate both gradual and punctuated changes in metabolite production and that early colonization events profoundly impact the nature of small molecules circulating in the host. The identified small molecules are implicated in amino acid and carbohydrate metabolic processes, and offer insights into the dynamic changes occurring during the colonization process, using high-throughput longitudinal methodology.
View details for DOI 10.1038/srep11589
View details for PubMedID 26118551
Genomic analysis of fibrolamellar hepatocellular carcinoma.
Human molecular genetics
2015; 24 (1): 50-63
Pediatric tumors are relatively infrequent but are often associated with significant lethality and lifelong morbidity. A major goal of pediatric cancer research has been to identify key drivers of tumorigenesis to eventually develop targeted therapies to enhance cure rate and minimize acute and long-term toxic effects. Here we used genomics approaches to identify biomarkers and candidate drivers for fibrolamellar hepatocellular carcinoma (FL-HCC), a very rare subtype of pediatric liver cancer for which limited therapeutic options exist. In-depth genomics analyses of one tumor followed by immunohistochemistry validation on seven other tumors showed expression of neuroendocrine markers in FL-HCC. DNA and RNA sequencing data further showed that common cancer pathways are not visibly altered in FL-HCC but identified two novel structural variants, both resulting in fusion transcripts. The first, a 400kb deletion, results in a DNAJ1-PRKCA fusion transcript, which leads to increased PKA activity in the index tumor case and other FL-HCC cases compared to normal liver. This PKA fusion protein is oncogenic in HCC cells. The second gene fusion event, a translocation between the CLPTML1 and GLIS3 genes, generates a transcript whose product also promotes cancer phenotypes in HCC cell lines. These experiments further highlight the tumorigenic role of gene fusions in the etiology of pediatric solid tumors and identify both candidate biomarkers and possible therapeutic targets for this lethal pediatric disease.
View details for DOI 10.1093/hmg/ddu418
View details for PubMedID 25122662
- Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss BMC GENOMICS 2014; 15
Genomic era diagnosis and management of hereditary and sporadic colon cancer.
World journal of clinical oncology
2014; 5 (5): 1036-1047
The morbidity and mortality attributable to heritable and sporadic carcinomas of the colon are substantial and affect children and adults alike. Despite current colonoscopy screening recommendations colorectal adenocarcinoma (CRC) still accounts for almost 140000 cancer cases yearly. Familial adenomatous polyposis (FAP) is a colon cancer predisposition due to alterations in the adenomatous polyposis coli gene, which is mutated in most CRC. Since the beginning of the genomic era next-generation sequencing analyses of CRC continue to improve our understanding of the genetics of tumorigenesis and promise to expand our ability to identify and treat this disease. Advances in genome sequence analysis have facilitated the molecular diagnosis of individuals with FAP, which enables initiation of appropriate monitoring and timely intervention. Genome sequencing also has potential clinical impact for individuals with sporadic forms of CRC, providing means for molecular diagnosis of CRC tumor type, data guiding selection of tumor targeted therapies, and pharmacogenomic profiles specifying patient specific drug tolerances. There is even a potential role for genomic sequencing in surveillance for recurrence, and early detection, of CRC. We review strategies for diagnostic assessment and management of FAP and sporadic CRC in the current genomic era, with emphasis on the current, and potential for future, impact of genome sequencing on the clinical care of these conditions.
View details for DOI 10.5306/wjco.v5.i5.1036
View details for PubMedID 25493239
View details for PubMedCentralID PMC4259930
Widespread contribution of transposable elements to the innovation of gene regulatory networks
2014; 24 (12): 1963-1976
Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.
View details for DOI 10.1101/gr.168872.113
View details for Web of Science ID 000345810600005
View details for PubMedID 25319995
View details for PubMedCentralID PMC4248313
Genome-wide map of regulatory interactions in the human genome
2014; 24 (12): 1905-1917
Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) experiments targeting six broadly distributed factors. Bound regions covered 80% of DNase I hypersensitive sites including 99.7% of TSS and 98% of enhancers. Correlating this map with ChIP-seq and RNA-seq data sets revealed cohesin, CTCF, and ZNF143 as key components of three-dimensional chromatin structure and revealed how the distal chromatin state affects gene transcription. Comparison of interactions between cell types revealed that enhancer-promoter interactions were highly cell-type-specific. Construction and comparison of distal and proximal regulatory networks revealed stark differences in structure and biological function. Proximal binding events are enriched at genes with housekeeping functions, while distal binding events interact with genes involved in dynamic biological processes including response to stimulus. This study reveals new mechanistic and functional insights into regulatory region organization in the nucleus.
View details for DOI 10.1101/gr.176586.114
View details for Web of Science ID 000345810600001
View details for PubMedID 25228660
- A comparative encyclopedia of DNA elements in the mouse genome NATURE 2014; 515 (7527): 355-?
Principles of regulatory information conservation between mouse and human
2014; 515 (7527): 371-?
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
View details for DOI 10.1038/nature13985
View details for Web of Science ID 000345770600036
View details for PubMedCentralID PMC4343047
- Topologically associating domains are stable units of replication-timing regulation NATURE 2014; 515 (7527): 402-?
- Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway GENETICS IN MEDICINE 2014; 16 (10): 751-758
Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway.
Genetics in medicine
2014; 16 (10): 751-758
Purpose:The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1.Methods:Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data.Results:All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele.Conclusion:NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Genet Med advance online publication 20 March 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.22.
View details for DOI 10.1038/gim.2014.22
View details for PubMedID 24651605
Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures.
2014; 30 (19): 2808-2810
Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser Extensible Data Paired-End (BEDPE). Sushi.R is open source and made publicly available through GitHub (https://github.com/dphansti/Sushi) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/Sushi.html).firstname.lastname@example.org or email@example.com.
View details for DOI 10.1093/bioinformatics/btu379
View details for PubMedID 24903420
- Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures BIOINFORMATICS 2014; 30 (19): 2808-2810
Comparative analysis of regulatory information and circuits across distant species.
2014; 512 (7515): 453-456
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.
View details for DOI 10.1038/nature13668
View details for PubMedID 25164757
Regulatory analysis of the C. elegans genome with spatiotemporal resolution.
2014; 512 (7515): 400-405
Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.
View details for DOI 10.1038/nature13497
View details for PubMedID 25164749
Shared functions of plant and mammalian StAR-related lipid transfer (START) domains in modulating transcription factor activity
Steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains were first identified from mammalian proteins that bind lipid/sterol ligands via a hydrophobic pocket. In plants, predicted START domains are predominantly found in homeodomain leucine zipper (HD-Zip) transcription factors that are master regulators of cell-type differentiation in development. Here we utilized studies of Arabidopsis in parallel with heterologous expression of START domains in yeast to investigate the hypothesis that START domains are versatile ligand-binding motifs that can modulate transcription factor activity.Our results show that deletion of the START domain from Arabidopsis Glabra2 (GL2), a representative HD-Zip transcription factor involved in differentiation of the epidermis, results in a complete loss-of-function phenotype, although the protein is correctly localized to the nucleus. Despite low sequence similarly, the mammalian START domain from StAR can functionally replace the HD-Zip-derived START domain. Embedding the START domain within a synthetic transcription factor in yeast, we found that several mammalian START domains from StAR, MLN64 and PCTP stimulated transcription factor activity, as did START domains from two Arabidopsis HD-Zip transcription factors. Mutation of ligand-binding residues within StAR START reduced this activity, consistent with the yeast assay monitoring ligand-binding. The D182L missense mutation in StAR START was shown to affect GL2 transcription factor activity in maintenance of the leaf trichome cell fate. Analysis of in vivo protein-metabolite interactions by mass spectrometry provided direct evidence for analogous lipid-binding activity in mammalian and plant START domains in the yeast system. Structural modeling predicted similar sized ligand-binding cavities of a subset of plant START domains in comparison to mammalian counterparts.The START domain is required for transcription factor activity in HD-Zip proteins from plants, although it is not strictly necessary for the protein's nuclear localization. START domains from both mammals and plants are modular in that they can bind lipid ligands to regulate transcription factor function in a yeast system. The data provide evidence for an evolutionarily conserved mechanism by which lipid metabolites can orchestrate transcription. We propose a model in which the START domain is used by both plants and mammals to regulate transcription factor activity.
View details for DOI 10.1186/s12915-014-0070-8
View details for Web of Science ID 000342371100001
View details for PubMedID 25159688
- Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 2014; 111 (33): E3366-E3366
Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture.
2014; 10 (8)
Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.
View details for DOI 10.1371/journal.pgen.1004549
View details for PubMedID 25121757
- Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics 2014; 10 (8)
H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency.
2014; 158 (3): 673-688
Trimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency and [corrected] increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.
View details for DOI 10.1016/j.cell.2014.06.027
View details for PubMedID 25083876
View details for PubMedCentralID PMC4137894
- Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nature biotechnology 2014; 32 (6): 562-568
- Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 2014; 509 (7500): 371-375
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues.
2014; 10 (5)
Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.
View details for DOI 10.1371/journal.pgen.1004304
View details for PubMedID 24786518
- Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues PLOS GENETICS 2014; 10 (5)
Defining functional DNA elements in the human genome
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2014; 111 (17): 6131-6138
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
View details for DOI 10.1073/pnas.1318948111
View details for Web of Science ID 000335199000025
View details for PubMedID 24753594
View details for PubMedCentralID PMC4035993
Extended lifespan and reduced adiposity in mice lacking the FAT10 gene.
Proceedings of the National Academy of Sciences of the United States of America
2014; 111 (14): 5313-5318
The HLA-F adjacent transcript 10 (FAT10) is a member of the ubiquitin-like gene family that alters protein function/stability through covalent ligation. Although FAT10 is induced by inflammatory mediators and implicated in immunity, the physiological functions of FAT10 are poorly defined. We report the discovery that FAT10 regulates lifespan through pleiotropic actions on metabolism and inflammation. Median and overall lifespan are increased 20% in FAT10ko mice, coincident with elevated metabolic rate, preferential use of fat as fuel, and dramatically reduced adiposity. This phenotype is associated with metabolic reprogramming of skeletal muscle (i.e., increased AMP kinase activity, β-oxidation and -uncoupling, and decreased triglyceride content). Moreover, knockout mice have reduced circulating glucose and insulin levels and enhanced insulin sensitivity in metabolic tissues, consistent with elevated IL-10 in skeletal muscle and serum. These observations suggest novel roles of FAT10 in immune metabolic regulation that impact aging and chronic disease.
View details for DOI 10.1073/pnas.1323426111
View details for PubMedID 24706839
Haplotype structure and positive selection at TLR1
EUROPEAN JOURNAL OF HUMAN GENETICS
2014; 22 (4): 551-557
Toll-like receptor 1, when dimerized with Toll-like receptor 2, is a cell surface receptor that, upon recognition of bacterial lipoproteins, activates the innate immune system. Variants in TLR1 associate with the risk of a variety of medical conditions and diseases, including sepsis, leprosy, tuberculosis, and others. The foremost of these is rs5743618 c.2079T>G(p.(Ile602Ser)), the derived allele of which is associated with reduced risk of sepsis, leprosy, and other diseases. Interestingly, 602Ser, which shows signatures of selection, inhibits TLR1 surface trafficking and subsequent activation of NFκB upon recognition of a ligand. This suggests that reduced TLR1 activity may be beneficial for human health. To better understand TLR1 variation and its link to human health, we have typed all 7 high-frequency missense variants (>5% in at least one population) along with 17 other variants in and around TLR1 in 2548 individuals from 56 populations from around the globe. We have also found additional signatures of selection on missense variants not associated with rs5743618, suggesting that there may be multiple functional alleles under positive selection in this gene.
View details for DOI 10.1038/ejhg.2013.194
View details for Web of Science ID 000332938400027
View details for PubMedID 24002163
Clinical interpretation and implications of whole-genome sequencing.
2014; 311 (10): 1035-1045
Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.
View details for DOI 10.1001/jama.2014.1717
View details for PubMedID 24618965
View details for PubMedCentralID PMC4119063
- Erratum: A single-molecule long-read survey of the human transcriptome. Nature biotechnology 2014; 32 (3): 291-?
Gene-centric Meta-analysis in 87,736 Individuals of European Ancestry Identifies Multiple Blood-Pressure-Related Loci.
American journal of human genetics
2014; 94 (3): 349-360
Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification.
View details for DOI 10.1016/j.ajhg.2013.12.016
View details for PubMedID 24560520
View details for PubMedCentralID PMC3951943
Whole-genome haplotyping using long reads and statistical methods
2014; 32 (3): 261-266
The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.
View details for DOI 10.1038/nbt.2833
View details for Web of Science ID 000332819800024
View details for PubMedID 24561555
- Ordering and dynamical properties of superbright C-60 molecules on Ag(111) PHYSICAL REVIEW B 2014; 89 (8)
Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association.
2014; 10 (2)
Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.
View details for DOI 10.1371/journal.pgen.1004122
View details for PubMedID 24516403
View details for PubMedCentralID PMC3916285
- Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics 2014; 10 (2)
- Landscape and variation of RNA secondary structure across the human transcriptome. Nature 2014; 505 (7485): 706-709
iPOP and its role in participatory medicine
Michael Snyder shares his thoughts on participatory medicine and how omics profiling could fit into this new model of healthcare where patients are at the center of medicine.
View details for DOI 10.1186/gm512
View details for Web of Science ID 000335597000001
View details for PubMedID 24479626
- Identification of STAT5A and STAT5B Target Genes in Human T Cells. PloS one 2014; 9 (1)
Path-scan: a reporting tool for identifying clinically actionable variants.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2014; 19: 229-240
The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.
View details for PubMedID 24297550
View details for PubMedCentralID PMC4008882
Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss.
2014; 15: 1155-?
The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.
View details for DOI 10.1186/1471-2164-15-1155
View details for PubMedID 25528277
Global analysis of transcription factor-binding sites in yeast using ChIP-Seq.
Methods in molecular biology (Clifton, N.J.)
2014; 1205: 231-255
Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way.Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28-36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy.
View details for DOI 10.1007/978-1-4939-1363-3_15
View details for PubMedID 25213249
Identification of STAT5A and STAT5B target genes in human T cells.
2014; 9 (1)
Signal transducer and activator of transcription (STAT) comprises a family of universal transcription factors that help cells sense and respond to environmental signals. STAT5 refers to two highly related proteins, STAT5A and STAT5B, with critical function: their complete deficiency is lethal in mice; in humans, STAT5B deficiency alone leads to endocrine and immunological problems, while STAT5A deficiency has not been reported. STAT5A and STAT5B show peptide sequence similarities greater than 90%, but subtle structural differences suggest possible non-redundant roles in gene regulation. However, these roles remain unclear in humans. We applied chromatin immunoprecipitation followed by DNA sequencing using human CD4(+) T cells to detect candidate genes regulated by STAT5A and/or STAT5B, and quantitative-PCR in STAT5A or STAT5B knock-down (KD) human CD4(+) T cells to validate the findings. Our data show STAT5A and STAT5B play redundant roles in cell proliferation and apoptosis via SGK1 interaction. Interestingly, we found a novel, unique role for STAT5A in binding to genes involved in neural development and function (NDRG1, DNAJC6, and SSH2), while STAT5B appears to play a distinct role in T cell development and function via DOCK8, SNX9, FOXP3 and IL2RA binding. Our results also suggest that one or more co-activators for STAT5A and/or STAT5B may play important roles in establishing different binding abilities and gene regulation behaviors. The new identification of these genes regulated by STAT5A and/or STAT5B has major implications for understanding the pathophysiology of cancer progression, neural disorders, and immune abnormalities.
View details for DOI 10.1371/journal.pone.0086790
View details for PubMedID 24497979
Chromatin immunoprecipitation and multiplex sequencing (ChIP-Seq) to identify global transcription factor binding sites in the nematode Caenorhabditis elegans.
Methods in enzymology
2014; 539: 89-111
The global identification of transcription factor (TF) binding sites is a critical step in the elucidation of the functional elements of the genome. Several methods have been developed that map TF binding in human cells, yeast, and other model organisms. These methods make use of chromatin immunoprecipitation, or ChIP, and take advantage of the fact that formaldehyde fixation of living cells can be used to cross-link DNA sequences to the TFs that bind them in vivo. In ChIP, the cross-linked TF-DNA complexes are sheared by sonication, size fractionated, and incubated with antibody specific to the TF of interest to generate a library of TF-bound DNA sequences. ChIP-chip was the first technology developed to globally identify TF-bound DNA sequences and involves subsequent hybridization of the ChIP DNA to oligonucleotide microarrays. However, ChIP-chip proved to be costly, labor-intensive, and limited by the fixed number of probes available on the microarray chip. ChIP-Seq combines ChIP with massively parallel high-throughput sequencing (see Explanatory Chapter: Next Generation Sequencing) and has demonstrated vast improvement over ChIP-chip with respect to time and cost, signal-to-noise ratio, and resolution. In particular, multiplex sequencing can be used to achieve a higher throughput in ChIP-Seq analyses involving organisms with genomes of lower complexity than that of human (Lefrançois et al., 2009) and thereby reduce the cost and amount of time needed for each result. The multiplex ChIP-Seq method described in this section has been developed for Caenorhabditis elegans, but is easily adaptable for other organisms.
View details for DOI 10.1016/B978-0-12-420120-0.00007-4
View details for PubMedID 24581441
Strain Kaplan of Pseudorabies Virus Genome Sequenced by PacBio Single-Molecule Real-Time Sequencing Technology.
2014; 2 (4)
Pseudorabies virus (PRV) is a neurotropic herpesvirus that causes Aujeszky's disease in pigs. PRV strains are widely used as transsynaptic tracers for mapping neural circuits. We present here the complete and fully annotated genome sequence of strain Kaplan of PRV, determined by Pacific Biosciences RSII long-read sequencing technology.
View details for DOI 10.1128/genomeA.00628-14
View details for PubMedID 25035325
Serum profiling using protein microarrays to identify disease related antigens.
Methods in molecular biology (Clifton, N.J.)
2014; 1176: 169-178
Disease related antigens are of great importance in the clinic. They are used as markers to screen patients for various forms of cancer, to monitor response to therapy, or to serve as therapeutic targets (Chapman et al., Ann Oncol 18(5):868-873, 2007; Soussi et al., Cancer Res 60:1777-1788, 2000; Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Levenson, Biochim Biophy Acta 1770:847-856, 2007). In cancer endogenous levels of protein expression may be disrupted or proteins may be expressed in an aberrant fashion resulting in an immune response that bypasses self tolerance (Soussi et al., Cancer Res 60:1777-1788, 2000; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Molina et al., Breast Cancer Res Treat 51:109-119, 1998). Protein microarrays, which represent a large fraction of the human proteome, have been used to identify antigens in multiple diseases including cancer (Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499, 2007; Beyer et al., J Neuroimmunol 242:26-32, 2012). Typically, arrays are probed with immunoglobulin (Ig) samples from patients as well as healthy controls, then compared to determine which antigens (Ag's) are more reactive within the patient group (Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499).
View details for DOI 10.1007/978-1-4939-0992-6_14
View details for PubMedID 25030927
- Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease PHARMACOGENOMICS 2014; 15 (14): 1771-1790
STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.
2014; 9 (1)
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.
View details for DOI 10.1371/journal.pone.0084860
View details for PubMedID 24454756
Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes
JOURNAL OF PROTEOME RESEARCH
2014; 13 (1): 212-227
This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER-/PR-; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER-/PR-; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.
View details for DOI 10.1021/pr400773v
View details for Web of Science ID 000329472700022
View details for PubMedID 24111759
STAT3 Targets Suggest Mechanisms of Aggressive Tumorigenesis in Diffuse Large B-Cell Lymphoma
G3-GENES GENOMES GENETICS
2013; 3 (12): 2173-2185
The signal transducer and activator of transcription 3 (STAT3) is a transcription factor that, when dysregulated, becomes a powerful oncogene found in many human cancers, including diffuse large B-cell lymphoma. Diffuse large B-cell lymphoma is the most common form of non-Hodgkin's lymphoma and has two major subtypes: germinal center B-cell-like and activated B-cell-like. Compared with the germinal center B-cell-like form, activated B-cell-like lymphomas respond much more poorly to current therapies and often exhibit overexpression or overactivation of STAT3. To investigate how STAT3 might contribute to this aggressive phenotype, we have integrated genome-wide studies of STAT3 DNA binding using chromatin immunoprecipitation-sequencing with whole-transcriptome profiling using RNA-sequencing. STAT3 binding sites are present near almost a third of all genes that differ in expression between the two subtypes, and examination of the affected genes identified previously undetected and clinically significant pathways downstream of STAT3 that drive oncogenesis. Novel treatments aimed at these pathways may increase the survivability of activated B-cell-like diffuse large B-cell lymphoma.
View details for DOI 10.1534/g3.113.007674
View details for Web of Science ID 000328334500008
View details for PubMedID 24142927
Metadata Checklist for the Integrated Personal Omics Study: Proteomics and Metabolomics Experiments.
2013; 1 (4): 202-206
The integrative personal omics profiling study introduced a novel, integrative approach based on personalized, longitudinal, multi-omics data. The study collected genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. The results revealed various medical risks and extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. The current article is a data publication that provides the checklists for the metadata of the proteomics (see Table 1 ) and metabolomics (see Table 2 ) datasets of the study. The proposed checklist was recently developed and endorsed by the Data-Enabled Life Sciences Alliance (DELSA Global). We call for the broader use of data publications using the metadata checklist to make omics data more discoverable, interpretable, and reusable, while enabling appropriate attribution to data generators and infrastructure science builders.
View details for DOI 10.1089/big.2013.0040
View details for PubMedID 27447252
- METADATA CHECKLIST FOR THE INTEGRATED PERSONAL OMICS STUDY: Proteomics and Metabolomics Experiments BIG DATA 2013; 1 (4): BD202-U81
- TOWARD MORE TRANSPARENT AND REPRODUCIBLE OMICS STUDIES THROUGH A COMMON METADATA CHECKLIST AND DATA PUBLICATIONS BIG DATA 2013; 1 (4): BD196-?
- A single-molecule long-read survey of the human transcriptome. Nature biotechnology 2013; 31 (11): 1009-1014
- Defective sphingosine 1-phosphate receptor 1 (S1P1) phosphorylation exacerbates TH17-mediated autoimmune neuroinflammation. Nature immunology 2013; 14 (11): 1166-1172
Comprehensive whole-genome sequencing of an early-stage primary myelofibrosis patient defines low mutational burden and non-recurrent candidate genes.
2013; 98 (11): 1689-1696
In order to identify novel somatic mutations associated with classic BCR/ABL1-negative myeloproliferative neoplasms, we performed high-coverage genome sequencing of DNA from peripheral blood granulocytes and cultured skin fibroblasts from a patient with MPL W515K-positive primary myelofibrosis. The primary myelofibrosis genome had a low somatic mutation rate, consistent with that observed in similar hematopoietic tumor genomes. Interfacing of whole-genome DNA sequence data with RNA expression data identified three somatic mutations of potential functional significance: a nonsense mutation in CARD6, implicated in modulation of NF-kappaB activation; a 19-base pair deletion involving a potential regulatory region in the 5'-untranslated region of BRD2, implicated in transcriptional regulation and cell cycle control; and a non-synonymous point mutation in KIAA0355, an uncharacterized protein. Additional mutations in three genes (CAP2, SOX30, and MFRP) were also evident, albeit with no support for expression at the RNA level. Re-sequencing of these six genes in 178 patients with polycythemia vera, essential thrombocythemia, and myelofibrosis did not identify recurrent somatic mutations in these genes. Finally, we describe methods for reducing false-positive variant calls in the analysis of hematologic malignancies with a low somatic mutation rate. This trial is registered with ClinicalTrials.gov (NCT01108159).
View details for DOI 10.3324/haematol.2013.092379
View details for PubMedID 23872309
Impacts of variation in the human genome on gene regulation.
Journal of molecular biology
2013; 425 (21): 3970-3977
Recent advances in fast and inexpensive DNA sequencing have enabled the extensive study of genomic and transciptomic variation in humans. Human genomic variation is composed of sequence and structural changes including single-nucleotide and multinucleotide variants, short insertions or deletions (indels), larger copy number variants, and similarly sized copy neutral inversions and translocations. It is now well established that any two genomes differ extensively and that structural changes constitute the most prominent source of this variation. There have also been major technological advances in RNA sequencing to globally quantify and describe diversity in transcripts. Large consortia such as the 1000 Genomes Project and the Enclyclopedia of DNA Elements Project are producing increasingly comphrehensive maps outlining the regions of the human genome containing variants and functional elements, respectively. Integration of genetic variation data and extensive annotation of functional genomic elements, along with the ability to measure global transcription, allow the impacts of genetic variants on gene expression to be resolved. There are several well-established models by which genetic variants affect gene regulation depending on the type, nature, and position of the variant with respect to the affected genes. These effects can be manifested in two ways: changes to transcript sequences and isoforms by coding variants, and changes to transcript abundance by dosage or regulatory variants. Here, we review the current state of how genetic variations impact gene regulation locally and globally in the human genome.
View details for DOI 10.1016/j.jmb.2013.07.015
View details for PubMedID 23871684
Incorporating Motif Analysis into Gene Co-expression Networks Reveals Novel Modular Expression Pattern and New Signaling Pathways
2013; 9 (10)
Understanding of gene regulatory networks requires discovery of expression modules within gene co-expression networks and identification of promoter motifs and corresponding transcription factors that regulate their expression. A commonly used method for this purpose is a top-down approach based on clustering the network into a range of densely connected segments, treating these segments as expression modules, and extracting promoter motifs from these modules. Here, we describe a novel bottom-up approach to identify gene expression modules driven by known cis-regulatory motifs in the gene promoters. For a specific motif, genes in the co-expression network are ranked according to their probability of belonging to an expression module regulated by that motif. The ranking is conducted via motif enrichment or motif position bias analysis. Our results indicate that motif position bias analysis is an effective tool for genome-wide motif analysis. Sub-networks containing the top ranked genes are extracted and analyzed for inherent gene expression modules. This approach identified novel expression modules for the G-box, W-box, site II, and MYB motifs from an Arabidopsis thaliana gene co-expression network based on the graphical Gaussian model. The novel expression modules include those involved in house-keeping functions, primary and secondary metabolism, and abiotic and biotic stress responses. In addition to confirmation of previously described modules, we identified modules that include new signaling pathways. To associate transcription factors that regulate genes in these co-expression modules, we developed a novel reporter system. Using this approach, we evaluated MYB transcription factor-promoter interactions within MYB motif modules.
View details for DOI 10.1371/journal.pgen.1003840
View details for Web of Science ID 000330367200023
View details for PubMedID 24098147
Genome-wide Association Analysis of Blood-Pressure Traits in African-Ancestry Individuals Reveals Common Associated Genes in African and Non-African Populations
AMERICAN JOURNAL OF HUMAN GENETICS
2013; 93 (3): 545-554
High blood pressure (BP) is more prevalent and contributes to more severe manifestations of cardiovascular disease (CVD) in African Americans than in any other United States ethnic group. Several small African-ancestry (AA) BP genome-wide association studies (GWASs) have been published, but their findings have failed to replicate to date. We report on a large AA BP GWAS meta-analysis that includes 29,378 individuals from 19 discovery cohorts and subsequent replication in additional samples of AA (n = 10,386), European ancestry (EA) (n = 69,395), and East Asian ancestry (n = 19,601). Five loci (EVX1-HOXA, ULK4, RSPO3, PLEKHG1, and SOX6) reached genome-wide significance (p < 1.0 × 10(-8)) for either systolic or diastolic BP in a transethnic meta-analysis after correction for multiple testing. Three of these BP loci (EVX1-HOXA, RSPO3, and PLEKHG1) lack previous associations with BP. We also identified one independent signal in a known BP locus (SOX6) and provide evidence for fine mapping in four additional validated BP loci. We also demonstrate that validated EA BP GWAS loci, considered jointly, show significant effects in AA samples. Consequently, these findings suggest that BP loci might have universal effects across studied populations, demonstrating that multiethnic samples are an essential component in identifying, fine mapping, and understanding their trait variability.
View details for DOI 10.1016/j.ajhg.2013.07.010
View details for Web of Science ID 000330268900014
View details for PubMedID 23972371
Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females
2013; 341 (6145): 562-565
The Y chromosome and the mitochondrial genome have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y-chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and the mitochondrial genome, we estimate the time to the most recent common ancestor (T(MRCA)) of the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome T(MRCA) to be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages.
View details for DOI 10.1126/science.1237619
View details for Web of Science ID 000322586700057
View details for PubMedID 23908239
Genome-wide profiling of human cap-independent translation-enhancing elements.
2013; 10 (8): 747-750
We report an in vitro selection strategy to identify RNA sequences that mediate cap-independent initiation of translation. This method entails mRNA display of trillions of genomic fragments, selection for initiation of translation and high-throughput deep sequencing. We identified >12,000 translation-enhancing elements (TEEs) in the human genome, generated a high-resolution map of human TEE-bearing regions (TBRs), and validated the function of a subset of sequences in vitro and in cultured cells.
View details for DOI 10.1038/nmeth.2522
View details for PubMedID 23770754
- Genome-wide profiling of human cap-independent translation-enhancing elements NATURE METHODS 2013; 10 (8): 747-?
Functional genomic screen of human stem cell differentiation reveals pathways involved in neurodevelopment and neurodegeneration
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2013; 110 (30): 12361-12366
Human embryonic stem cells (hESCs) can be induced and differentiated to form a relatively homogeneous population of neuronal precursors in vitro. We have used this system to screen for genes necessary for neural lineage development by using a pooled human short hairpin RNA (shRNA) library screen and massively parallel sequencing. We confirmed known genes and identified several unpredicted genes with interrelated functions that were specifically required for the formation or survival of neuronal progenitor cells without interfering with the self-renewal capacity of undifferentiated hESCs. Among these are several genes that have been implicated in various neurodevelopmental disorders (i.e., brain malformations, mental retardation, and autism). Unexpectedly, a set of genes mutated in late-onset neurodegenerative disorders and with roles in the formation of RNA granules were also found to interfere with neuronal progenitor cell formation, suggesting their functional relevance in early neurogenesis. This study advances the feasibility and utility of using pooled shRNA libraries in combination with next-generation sequencing for a high-throughput, unbiased functional genomic screen. Our approach can also be used with patient-specific human-induced pluripotent stem cell-derived neural models to obtain unparalleled insights into developmental and degenerative processes in neurological or neuropsychiatric disorders with monogenic or complex inheritance.
View details for DOI 10.1073/pnas.1309725110
View details for Web of Science ID 000322112300054
View details for PubMedID 23836664
View details for PubMedCentralID PMC3725080
Variation and genetic control of protein abundance in humans
2013; 499 (7456): 79-82
Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.
View details for DOI 10.1038/nature12223
View details for Web of Science ID 000321285600037
View details for PubMedID 23676674
Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis
2013; 5 (7): 1664-1681
The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals--and for targeting therapeutics--in multiple biological settings.
View details for DOI 10.3390/v5071664
View details for Web of Science ID 000322172200005
View details for PubMedID 23881275
View details for PubMedCentralID PMC3738954
Genome Wide Proteomics of ERBB2 and EGFR and Other Oncogenic Pathways in Inflammatory Breast Cancer.
Journal of proteome research
2013; 12 (6): 2805-2817
In this study we selected three breast cancer cell lines (SKBR3, SUM149 and SUM190) with different oncogene expression levels involved in ERBB2 and EGFR signaling pathways as a model system for the evaluation of selective integration of subsets of transcriptomic and proteomic data. We assessed the oncogene status with reads per kilobase per million mapped reads (RPKM) values for ERBB2 (14.4, 400, and 300 for SUM149, SUM190, and SKBR3, respectively) and for EGFR (60.1, not detected, and 1.4 for the same 3 cell lines). We then used RNA-Seq data to identify those oncogenes with significant transcript levels in these cell lines (total 31) and interrogated the corresponding proteomics data sets for proteins with significant interaction values with these oncogenes. The number of observed interactors for each oncogene showed a significant range, e.g., 4.2% (JAK1) to 27.3% (MYC). The percentage is measured as a fraction of the total protein interactions in a given data set vs total interactors for that oncogene in STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 9.0) and I2D (Interologous Interaction Database, version 1.95). This approach allowed us to focus on 4 main oncogenes, ERBB2, EGFR, MYC, and GRB2, for pathway analysis. We used bioinformatics sites GeneGo, PathwayCommons and NCI receptor signaling networks to identify pathways that contained the four main oncogenes and had good coverage in the transcriptomic and proteomic data sets as well as a significant number of oncogene interactors. The four pathways identified were ERBB signaling, EGFR1 signaling, integrin outside-in signaling, and validated targets of C-MYC transcriptional activation. The greater dynamic range of the RNA-Seq values allowed the use of transcript ratios to correlate observed protein values with the relative levels of the ERBB2 and EGFR transcripts in each of the four pathways. This provided us with potential proteomic signatures for the SUM149 and 190 cell lines, growth factor receptor-bound protein 7 (GRB7), Crk-like protein (CRKL) and Catenin delta-1 (CTNND1) for ERBB signaling; caveolin 1 (CAV1), plectin (PLEC) for EGFR signaling; filamin A (FLNA) and actinin alpha1 (ACTN1) (associated with high levels of EGFR transcript) for integrin signalings; branched chain amino-acid transaminase 1 (BCAT1), carbamoyl-phosphate synthetase (CAD), nucleolin (NCL) (high levels of EGFR transcript); transferrin receptor (TFRC), metadherin (MTDH) (high levels of ERBB2 transcript) for MYC signaling; S100-A2 protein (S100A2), caveolin 1 (CAV1), Serpin B5 (SERPINB5), stratifin (SFN), PYD and CARD domain containing (PYCARD), and EPH receptor A2 (EPHA2) for PI3K signaling, p53 subpathway. Future studies of inflammatory breast cancer (IBC), from which the cell lines were derived, will be used to explore the significance of these observations.
View details for DOI 10.1021/pr4001527
View details for PubMedID 23647160
Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases.
2013; 112 (12): 1613-1623
High throughput sequencing technologies have become essential in studies on genomics, epigenomics, and transcriptomics. Although sequencing information has traditionally been elucidated using a low throughput technique called Sanger sequencing, high throughput sequencing technologies are capable of sequencing multiple DNA molecules in parallel, enabling hundreds of millions of DNA molecules to be sequenced at a time. This advantage allows high throughput sequencing to be used to create large data sets, generating more comprehensive insights into the cellular genomic and transcriptomic signatures of various diseases and developmental stages. Within high throughput sequencing technologies, whole exome sequencing can be used to identify novel variants and other mutations that may underlie many genetic cardiac disorders, whereas RNA sequencing can be used to analyze how the transcriptome changes. Chromatin immunoprecipitation sequencing and methylation sequencing can be used to identify epigenetic changes, whereas ribosome sequencing can be used to determine which mRNA transcripts are actively being translated. In this review, we will outline the differences in various sequencing modalities and examine the main sequencing platforms on the market in terms of their relative read depths, speeds, and costs. Finally, we will discuss the development of future sequencing platforms and how these new technologies may improve on current sequencing platforms. Ultimately, these sequencing technologies will be instrumental in further delineating how the cardiovascular system develops and how perturbations in DNA and RNA can lead to cardiovascular disease.
View details for DOI 10.1161/CIRCRESAHA.113.300939
View details for PubMedID 23743227
- Metabolomics as a robust tool in systems biology and personalized medicine: an open letter to the metabolomics community METABOLOMICS 2013; 9 (3): 532-534
iPOP Goes the World: Integrated Personalized Omics Profiling and the Road toward Improved Health Care.
Chemistry & biology
2013; 20 (5): 660-666
The health of an individual depends upon their DNA as well as upon environmental factors (environome or exposome). It is expected that although the genome is the blueprint of an individual, its analysis with that of the other omes such as the DNA methylome, the transcriptome, proteome, and metabolome will further provide a dynamic assessment of the physiology and health state of an individual. This review will help to categorize the current progress of omics analyses and how omics integration can be used for medical research. We believe that integrative personal omics profiling (iPOP) is a stepping stone to a new road to personalized health care and may improve disease risk assessment, accuracy of diagnosis, disease monitoring, targeted treatments, and understanding the biological processes of disease states for their prevention.
View details for DOI 10.1016/j.chembiol.2013.05.001
View details for PubMedID 23706632
Identification of Potential Glycan Cancer Markers with Sialic Acid Attached to Sialic Acid and Up-regulated Fucosylated Galactose Structures in Epidermal Growth Factor Receptor Secreted from A431 Cell Line.
Molecular & cellular proteomics
2013; 12 (5): 1239-1249
We have used powerful HPLC-mass spectrometric approaches to characterize the secreted form of epidermal growth factor receptor (sEGFR). We demonstrated that the amino acid sequence lacked the cytoplasmic domain and was consistent with the primary sequence reported for EGFR purified from a human plasma pool. One of the sEGFR forms, attributed to the alternative RNA splicing, was also confirmed by transcriptional analysis (RNA sequencing). Two unusual types of glycan structures were observed in sEGFR as compared with membrane-bound EGFR from the A431 cell line. The unusual glycan structures were di-sialylated glycans (sialic acid attached to sialic acid) at Asn-151 and N-acetylhexosamine attached to a branched fucosylated galactose with N-acetylglucosamine moieties (HexNAc-(Fuc)Gal-GlcNAc) at Asn-420. These unusual glycans at specific sites were either present at a much lower level or were not observable in membrane-bound EGFR present in the A431 cell lysate. The observation of these di-sialylated glycan structures was consistent with the observed expression of the corresponding α-N-acetylneuraminide α-2,8-sialyltransferase 2 (ST8SiA2) and α-N-acetylneuraminide α-2,8-sialyltransferase 4 (ST8SiA4), by quantitative real time RT-PCR. The connectivity present at the branched fucosylated galactose was also confirmed by methylation of the glycans followed by analysis with sequential fragmentation in mass spectrometry. We hypothesize that the presence of such glycan structures could promote secretion via anionic or steric repulsion mechanisms and thus facilitate the observation of these glycan forms in the secreted fractions. We plan to use this model system to facilitate the search for novel glycan structures present at specific sites in sEGFR as well as other secreted oncoproteins such as Erbb2 as markers of disease progression in blood samples from cancer patients.
View details for DOI 10.1074/mcp.M112.024554
View details for PubMedID 23371026
Preparation of recombinant protein spotted arrays for proteome-wide identification of kinase targets.
Current protocols in protein science / editorial board, John E. Coligan ... [et al.]
2013; Chapter 27: Unit 27 4-?
Protein microarrays allow unique approaches for interrogating global protein interaction networks. Protein arrays can be divided into two categories: antibody arrays and functional protein arrays. Antibody arrays consist of various antibodies and are appropriate for profiling protein abundance and modifications. Functional full-length protein arrays employ full-length proteins with various post-translational modifications. A key advantage of the latter is rapid parallel processing of large number of proteins for studying highly controlled biochemical activities, protein-protein interactions, protein-nucleic acid interactions, and protein-small molecule interactions. This unit presents a protocol for constructing functional yeast protein microarrays for global kinase substrate identification. This approach enables the rapid determination of protein interaction networks in yeast on a proteome-wide level. The same methodology can be readily applied to higher eukaryotic systems with careful consideration of overexpression strategy.
View details for DOI 10.1002/0471140864.ps2704s72
View details for PubMedID 23546622
Comparative annotation of functional regions in the human genome using epigenomic data
NUCLEIC ACIDS RESEARCH
2013; 41 (8): 4423-4432
Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.
View details for DOI 10.1093/nar/gkt143
View details for Web of Science ID 000318569700014
View details for PubMedID 23482391
View details for PubMedCentralID PMC3632130
- Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405 JOURNAL OF PROTEOME RESEARCH 2013; 12 (4): 1732-1742
A Major Epigenetic Programming Mechanism Guided by piRMAs
2013; 24 (5): 502-516
A central enigma in epigenetics is how epigenetic factors are guided to specific genomic sites for their function. Previously, we reported that a Piwi-piRNA complex associates with the piRNA-complementary site in the Drosophila genome and regulates its epigenetic state. Here, we report that Piwi-piRNA complexes bind to numerous piRNA-complementary sequences throughout the genome, implicating piRNAs as a major mechanism that guides Piwi and Piwi-associated epigenetic factors to program the genome. To test this hypothesis, we demonstrate that inserting piRNA-complementary sequences to an ectopic site leads to Piwi, HP1a, and Su(var)3-9 recruitment to the site as well as H3K9me2/3 enrichment and reduced RNA polymerase II association, indicating that piRNA is both necessary and sufficient to recruit Piwi and epigenetic factors to specific genomic sites. Piwi deficiency drastically changed the epigenetic landscape and polymerase II profile throughout the genome, revealing the Piwi-piRNA mechanism as a major epigenetic programming mechanism in Drosophila.
View details for DOI 10.1016/j.devcel.2013.01.023
View details for Web of Science ID 000316163000005
View details for PubMedID 23434410
Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing
G3-GENES GENOMES GENETICS
2013; 3 (3): 387-397
Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
View details for DOI 10.1534/g3.112.004812
View details for Web of Science ID 000315950000002
View details for PubMedID 23450794
Personal genomes, quantitative dynamic omics and personalized medicine.
Quantitative biology (Beijing, China)
2013; 1 (1): 71-90
The rapid technological developments following the Human Genome Project have made possible the availability of personalized genomes. As the focus now shifts from characterizing genomes to making personalized disease associations, in combination with the availability of other omics technologies, the next big push will be not only to obtain a personalized genome, but to quantitatively follow other omics. This will include transcriptomes, proteomes, metabolomes, antibodyomes, and new emerging technologies, enabling the profiling of thousands of molecular components in individuals. Furthermore, omics profiling performed longitudinally can probe the temporal patterns associated with both molecular changes and associated physiological health and disease states. Such data necessitates the development of computational methodology to not only handle and descriptively assess such data, but also construct quantitative biological models. Here we describe the availability of personal genomes and developing omics technologies that can be brought together for personalized implementations and how these novel integrated approaches may effectively provide a precise personalized medicine that focuses on not only characterization and treatment but ultimately the prevention of disease.
View details for PubMedID 25798291
Extensive Transcript Diversity and Novel Upstream Open Reading Frame Regulation in Yeast
G3-GENES GENOMES GENETICS
2013; 3 (2): 343-352
To understand the diversity of transcripts in yeast (Saccharomyces cerevisiae) we analyzed the transcriptional landscapes for cells grown under 18 different environmental conditions. Each sample was analyzed using RNA-sequencing, and a total of 670,446,084 uniquely mapped reads and 377,263 poly-adenylated end tags were produced. Consistent with previous studies, we find that the majority of yeast genes are expressed under one or more different conditions. By directly comparing the 5' and 3' ends of the transcribed regions, we find extensive differences in transcript ends across many conditions, especially those of stationary phase, growth in grape juice, and salt stimulation, suggesting differential choice of transcription start and stop sites is pervasive in yeast. Relative to the exponential growth condition (i.e., YPAD), transcripts differing at the 5' ends and 3' ends are predicted to differ in their annotated start codon in 21 genes and their annotated stop codon in 63 genes. Many (431) upstream open reading frames (uORFs) are found in alternate 5' ends and are significantly enriched in transcripts produced during the salt response. Mutational analysis of five genes with uORFs revealed that two sets of uORFs increase the expression of a reporter construct, indicating a role in activation which had not been reported previously, whereas two other uORFs decreased expression. In addition, RNA binding protein motifs are statistically enriched for alternate ends under many conditions. Overall, these results demonstrate enormous diversity of transcript ends, and that this heterogeneity is regulated under different environmental conditions. Moreover, transcript end diversity has important biological implications for the regulation of gene expression. In addition, our data also serve as a valuable resource for the scientific community.
View details for DOI 10.1534/g3.112.003640
View details for Web of Science ID 000314881600019
View details for PubMedID 23390610
SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data
2013; 23 (2): 377-387
We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a high-dimensional classification framework, SeqFold efficiently matches a given SPP to the most likely cluster of structures sampled from the Boltzmann-weighted ensemble. SeqFold is able to incorporate diverse types of RNA structure profiling data, including parallel analysis of RNA structure (PARS), selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), fragmentation sequencing (FragSeq) data generated by deep sequencing, and conventional SHAPE data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold outperforms or matches existing approaches in accuracy and is more robust to noise in experimental data. Application of SeqFold to reconstruct the secondary structures of the yeast transcriptome reveals the diverse impact of RNA secondary structure on gene regulation, including translation efficiency, transcription initiation, and protein-RNA interactions. SeqFold can be easily adapted to incorporate any new types of high-throughput RNA structure profiling data and is widely applicable to analyze RNA structures in any transcriptome.
View details for DOI 10.1101/gr.138545.112
View details for Web of Science ID 000314323100016
View details for PubMedID 23064747
Two methods for full-length RNA sequencing for low quantities of cells and single cells
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2013; 110 (2): 594-599
The ability to determine the gene expression pattern in low quantities of cells or single cells is important for resolving a variety of problems in many biological disciplines. A robust description of the expression signature of a single cell requires determination of the full-length sequence of the expressed mRNAs in the cell, yet existing methods have either 3' biased or variable transcript representation. Here, we report our protocols for the amplification and high-throughput sequencing of very small amounts of RNA for sequencing using procedures of either semirandom primed PCR or phi29 DNA polymerase-based DNA amplification, for the cDNA generated with oligo-dT and/or random oligonucleotide primers. Unlike existing methods, these protocols produce relatively uniformly distributed sequences covering the full length of almost all transcripts independent of their sizes, from 1,000 to 10 cells, and even with single cells. Both protocols produced satisfactory detection/coverage of the abundant mRNAs from a single K562 erythroleukemic cell or a single dorsal root ganglion neuron. The phi29-based method produces long products with less noise, uses an isothermal reaction, and is simple to practice. The semirandom primed PCR procedure is more sensitive and reproducible at low transcript levels or with low quantities of cells. These methods provide tools for mRNA sequencing or RNA sequencing when only low quantities of cells, a single cell, or even degraded RNA are available for profiling.
View details for DOI 10.1073/pnas.1217322109
View details for Web of Science ID 000313906600047
View details for PubMedID 23267071
- Multimodal Dynamic Profiling of Healthy and Diseased States for Future Personalized Health Care CLINICAL PHARMACOLOGY & THERAPEUTICS 2013; 93 (1): 29-32
Integrative analysis of longitudinal metabolomics data from a personal multi-omics profile.
2013; 3 (3): 741-760
The integrative personal omics profile (iPOP) is a pioneering study that combines genomics, transcriptomics, proteomics, metabolomics and autoantibody profiles from a single individual over a 14-month period. The observation period includes two episodes of viral infection: a human rhinovirus and a respiratory syncytial virus. The profile studies give an informative snapshot into the biological functioning of an organism. We hypothesize that pathway expression levels are associated with disease status. To test this hypothesis, we use biological pathways to integrate metabolomics and proteomics iPOP data. The approach computes the pathways' differential expression levels at each time point, while taking into account the pathway structure and the longitudinal design. The resulting pathway levels show strong association with the disease status. Further, we identify temporal patterns in metabolite expression levels. The changes in metabolite expression levels also appear to be consistent with the disease status. The results of the integrative analysis suggest that changes in biological pathways may be used to predict and monitor the disease. The iPOP experimental design, data acquisition and analysis issues are discussed within the broader context of personal profiling.
View details for DOI 10.3390/metabo3030741
View details for PubMedID 24958148
View details for PubMedCentralID PMC3901289
Systematic investigation of protein-small molecule interactions
2013; 65 (1): 2-8
Cell signaling is extensively wired between cellular components to sustain cell proliferation, differentiation, and adaptation. The interaction network is often manifested in how protein function is regulated through interacting with other cellular components including small molecule metabolites. While many biochemical interactions have been established as reactions between protein enzymes and their substrates and products, much less is known at the system level about how small metabolites regulate protein functions through allosteric binding. In the past decade, study of protein-small molecule interactions has been lagging behind other types of interactions. Recent technological advances have explored several high-throughput platforms to reveal many "unexpected" protein-small molecule interactions that could have profound impact on our understanding of cell signaling. These interactions will help bridge gaps in existing regulatory loops of cell signaling and serve as new targets for medical intervention. In this review, we summarize recent advances of systematic investigation of protein-metabolite/small molecule interactions, and discuss the impact of such studies and their potential impact on both biological researches and medicine.
View details for DOI 10.1002/iub.1111
View details for Web of Science ID 000312886200002
View details for PubMedID 23225626
- Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports 2013; 3: 3311-?
High-throughput sequencing for biology and medicine
MOLECULAR SYSTEMS BIOLOGY
Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.
View details for DOI 10.1038/msb.2012.61
View details for Web of Science ID 000314415800010
View details for PubMedID 23340846
A Chromosome-centric Human Proteome Project (C-HPP) to Characterize the Sets of Proteins Encoded in Chromosome 17
JOURNAL OF PROTEOME RESEARCH
2013; 12 (1): 45-57
We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
View details for DOI 10.1021/pr300985j
View details for Web of Science ID 000313156300007
View details for PubMedID 23259914
- The variable somatic genome. Cell cycle 2013; 12 (1): 5-6
Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405.
Journal of proteome research
As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (∼9800 transcripts per cell line) versus the protein observations (∼1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.
View details for DOI 10.1021/pr3010869
View details for PubMedID 23458625
- Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs GENOME BIOLOGY 2013; 14 (1)
Exome sequencing by targeted enrichment.
Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.]
2013; Chapter 7: Unit7 12-?
This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.
View details for DOI 10.1002/0471142727.mb0712s102
View details for PubMedID 23547016
Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs.
2013; 14 (1): R5
BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.
View details for DOI 10.1186/gb-2013-14-1-r5
View details for PubMedID 23347407
Promise of personalized omics to precision medicine
WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE
2013; 5 (1): 73-82
The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.
View details for DOI 10.1002/wsbm.1198
View details for Web of Science ID 000312736200005
View details for PubMedID 23184638
Centromere-Like Regions in the Budding Yeast Genome
2013; 9 (1)
Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres.
View details for DOI 10.1371/journal.pgen.1003209
View details for Web of Science ID 000314651500052
View details for PubMedID 23349633
Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%.This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.
View details for DOI 10.1186/1471-2105-13-305
View details for Web of Science ID 000314688600001
View details for PubMedID 23157288
Whole Genome Sequence Analysis of Primary Myelofibrosis.
54th Annual Meeting and Exposition of the American-Society-of-Hematology (ASH)
AMER SOC HEMATOLOGY. 2012
View details for Web of Science ID 000313838905376
- Genome interpretation and assembly-recent progress and next steps. Nature biotechnology 2012; 30 (11): 1081-1083
- Michael Snyder. Interview by Asher Mullard. Nature reviews. Drug discovery 2012; 11 (10): 744-?
Systems biology: personalized medicine for the future?
CURRENT OPINION IN PHARMACOLOGY
2012; 12 (5): 623-628
Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.
View details for DOI 10.1016/j.coph.2012.07.011
View details for Web of Science ID 000310478800017
View details for PubMedID 22858243
SWI/SNF Chromatin-remodeling Factors: Multiscale Analyses and Diverse Functions
JOURNAL OF BIOLOGICAL CHEMISTRY
2012; 287 (37): 30897-30905
Chromatin-remodeling enzymes play essential roles in many biological processes, including gene expression, DNA replication and repair, and cell division. Although one such complex, SWI/SNF, has been extensively studied, new discoveries are still being made. Here, we review SWI/SNF biochemistry; highlight recent genomic and proteomic advances; and address the role of SWI/SNF in human diseases, including cancer and viral infections. These studies have greatly increased our understanding of complex nuclear processes.
View details for DOI 10.1074/jbc.R111.309302
View details for Web of Science ID 000308791300003
View details for PubMedID 22952240
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors
2012; 22 (9): 1798-1812
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
View details for DOI 10.1101/gr.139105.112
View details for Web of Science ID 000308272800020
View details for PubMedID 22955990
View details for PubMedCentralID PMC3431495
A highly integrated and complex PPARGC1A transcription factor binding network in HepG2 cells
2012; 22 (9): 1668-1679
PPARGC1A is a transcriptional coactivator that binds to and coactivates a variety of transcription factors (TFs) to regulate the expression of target genes. PPARGC1A plays a pivotal role in regulating energy metabolism and has been implicated in several human diseases, most notably type II diabetes. Previous studies have focused on the interplay between PPARGC1A and individual TFs, but little is known about how PPARGC1A combines with all of its partners across the genome to regulate transcriptional dynamics. In this study, we describe a core PPARGC1A transcriptional regulatory network operating in HepG2 cells treated with forskolin. We first mapped the genome-wide binding sites of PPARGC1A using chromatin-IP followed by high-throughput sequencing (ChIP-seq) and uncovered overrepresented DNA sequence motifs corresponding to known and novel PPARGC1A network partners. We then profiled six of these site-specific TF partners using ChIP-seq and examined their network connectivity and combinatorial binding patterns with PPARGC1A. Our analysis revealed extensive overlap of targets including a novel link between PPARGC1A and HSF1, a TF regulating the conserved heat shock response pathway that is misregulated in diabetes. Importantly, we found that different combinations of TFs bound to distinct functional sets of genes, thereby helping to reveal the combinatorial regulatory code for metabolic and other cellular processes. In addition, the different TFs often bound near the promoters and coding regions of each other's genes suggesting an intricate network of interdependent regulation. Overall, our study provides an important framework for understanding the systems-level control of metabolic gene expression in humans.
View details for DOI 10.1101/gr.127761.111
View details for Web of Science ID 000308272800009
View details for PubMedID 22955979
View details for PubMedCentralID PMC3431484
Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for IncRNAs
2012; 22 (9): 1616-1625
Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.
View details for DOI 10.1101/gr.134445.111
View details for Web of Science ID 000308272800004
View details for PubMedID 22955974
Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements
2012; 22 (9): 1735-1747
Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.
View details for DOI 10.1101/gr.136366.111
View details for Web of Science ID 000308272800015
View details for PubMedID 22955985
VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment
2012; 28 (17): 2267-2269
The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment.VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
View details for DOI 10.1093/bioinformatics/bts368
View details for Web of Science ID 000308019200008
View details for PubMedID 22743228
Understanding transcriptional regulation by integrative analysis of transcription factor binding data
2012; 22 (9): 1658-1667
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
View details for DOI 10.1101/gr.136838.111
View details for Web of Science ID 000308272800008
View details for PubMedID 22955978
A Genome-Scale Resource for In Vivo Tag-Based Protein Function Exploration in C. elegans
2012; 150 (4): 855-866
Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in biology. To enable systematic protein function interrogation in a multicellular context, we built a genome-scale transgenic platform for in vivo expression of fluorescent- and affinity-tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering, and next-generation sequencing to generate a resource of 14,637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins.
View details for DOI 10.1016/j.cell.2012.08.001
View details for Web of Science ID 000308002300018
View details for PubMedID 22901814
Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis
2012; 7 (8)
The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.
View details for DOI 10.1371/journal.pone.0043198
View details for Web of Science ID 000307500100069
View details for PubMedID 22912824
Investigating metabolite-protein interactions: An overview of available techniques
2012; 57 (4): 459-466
Metabolites comprise the molar majority of chemical substances in living cells, and metabolite-protein interactions are expected to be quite common. Many interactions have already been identified and have been shown to be involved in the regulation of different types of cellular processes including signaling events, enzyme activities, protein localizations and interactions. Recent technological advances have greatly facilitated the detection of metabolite-protein interactions at high sensitivity and some of these have been applied on a large scale. In this manuscript, we review the available in vitro, in silico and in vivo technologies for mapping small-molecule-protein interactions. Although some of these were developed for drug-protein interactions they can be applied for mapping metabolite-protein interactions. Information gained from the use of these approaches can be applied to the manipulation of cellular processes and therapeutic applications.
View details for DOI 10.1016/j.ymeth.2012.06.013
View details for Web of Science ID 000309625600009
View details for PubMedID 22750303
Patient-Specific Induced Pluripotent Stem Cells as a Model for Familial Dilated Cardiomyopathy
SCIENCE TRANSLATIONAL MEDICINE
2012; 4 (130)
Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric α-actinin. When stimulated with a β-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric α-actinin distribution. Treatment with β-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.
View details for DOI 10.1126/scitranslmed.3003552
View details for Web of Science ID 000303045900004
View details for PubMedID 22517884
View details for PubMedCentralID PMC3657516
Extensive In vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses
Experimental Biology Meeting 2012
FEDERATION AMER SOC EXP BIOL. 2012
View details for Web of Science ID 000310711300218
A core erythroid transcriptional network is repressed by a master regulator of myelo-lymphoid differentiation
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (10): 3832-3837
Two mechanisms that play important roles in cell fate decisions are control of a "core transcriptional network" and repression of alternative transcriptional programs by antagonizing transcription factors. Whether these two mechanisms operate together is not known. Here we report that GATA-1, SCL, and Klf1 form an erythroid core transcriptional network by co-occupying >300 genes. Importantly, we find that PU.1, a negative regulator of terminal erythroid differentiation, is a highly integrated component of this network. GATA-1, SCL, and Klf1 act to promote, whereas PU.1 represses expression of many of the core network genes. PU.1 also represses the genes encoding GATA-1, SCL, Klf1, and important GATA-1 cofactors. Conversely, in addition to repressing PU.1 expression, GATA-1 also binds to and represses >100 PU.1 myelo-lymphoid gene targets in erythroid progenitors. Mathematical modeling further supports that this dual mechanism of repressing both the opposing upstream activator and its downstream targets provides a synergistic, robust mechanism for lineage specification. Taken together, these results amalgamate two key developmental principles, namely, regulation of a core transcriptional network and repression of an alternative transcriptional program, thereby enhancing our understanding of the mechanisms that establish cellular identity.
View details for DOI 10.1073/pnas.1121019109
View details for Web of Science ID 000301117700049
View details for PubMedID 22357756
Tcf7 Is an Important Regulator of the Switch of Self-Renewal and Differentiation in a Multipotential Hematopoietic Cell Line
2012; 8 (3)
A critical problem in biology is understanding how cells choose between self-renewal and differentiation. To generate a comprehensive view of the mechanisms controlling early hematopoietic precursor self-renewal and differentiation, we used systems-based approaches and murine EML multipotential hematopoietic precursor cells as a primary model. EML cells give rise to a mixture of self-renewing Lin-SCA+CD34+ cells and partially differentiated non-renewing Lin-SCA-CD34- cells in a cell autonomous fashion. We identified and validated the HMG box protein TCF7 as a regulator in this self-renewal/differentiation switch that operates in the absence of autocrine Wnt signaling. We found that Tcf7 is the most down-regulated transcription factor when CD34+ cells switch into CD34- cells, using RNA-Seq. We subsequently identified the target genes bound by TCF7, using ChIP-Seq. We show that TCF7 and RUNX1 (AML1) bind to each other's promoter regions and that TCF7 is necessary for the production of the short isoforms, but not the long isoforms of RUNX1, suggesting that TCF7 and the short isoforms of RUNX1 function coordinately in regulation. Tcf7 knock-down experiments and Gene Set Enrichment Analyses suggest that TCF7 plays a dual role in promoting the expression of genes characteristic of self-renewing CD34+ cells while repressing genes activated in partially differentiated CD34- state. Finally a network of up-regulated transcription factors of CD34+ cells was constructed. Factors that control hematopoietic stem cell (HSC) establishment and development, cell growth, and multipotency were identified. These studies in EML cells demonstrate fundamental cell-intrinsic properties of the switch between self-renewal and differentiation, and yield valuable insights for manipulating HSCs and other differentiating systems.
View details for DOI 10.1371/journal.pgen.1002565
View details for Web of Science ID 000302254800041
View details for PubMedID 22412390
View details for PubMedCentralID PMC3297581
- The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome NATURE BIOTECHNOLOGY 2012; 30 (3): 221-223
Correlation of Global MicroRNA Expression With Basal Cell Carcinoma Subtype
G3-GENES GENOMES GENETICS
2012; 2 (2): 279-286
Basal cell carcinomas (BCCs) are the most common cancers in the United States. The histologic appearance distinguishes several subtypes, each of which can have a different biologic behavior. In this study, global miRNA expression was quantified by high-throughput sequencing in nodular BCCs, a subtype that is slow growing, and infiltrative BCCs, aggressive tumors that extend through the dermis and invade structures such as cutaneous nerves. Principal components analysis correctly classified seven of eight infiltrative tumors on the basis of miRNA expression. The remaining tumor, on pathology review, contained a mixture of nodular and infiltrative elements. Nodular tumors did not cluster tightly, likely reflecting broader histopathologic diversity in this class, but trended toward forming a group separate from infiltrative BCCs. Quantitative polymerase chain reaction assays were developed for six of the miRNAs that showed significant differences between the BCC subtypes, and five of these six were validated in a replication set of four infiltrative and three nodular tumors. The expression level of miR-183, a miRNA that inhibits invasion and metastasis in several types of malignancies, was consistently lower in infiltrative than nodular tumors and could be one element underlying the difference in invasiveness. These results represent the first miRNA profiling study in BCCs and demonstrate that miRNA gene expression may be involved in tumor pathogenesis and particularly in determining the aggressiveness of these malignancies.
View details for DOI 10.1534/g3.111.001115
View details for Web of Science ID 000312411000015
View details for PubMedID 22384406
Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
2012; 13 (9)
Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors.As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions.Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.
View details for DOI 10.1186/gb-2012-13-9-r48
View details for Web of Science ID 000313182600001
View details for PubMedID 22950945
- An encyclopedia of mouse DNA elements (Mouse ENCODE) GENOME BIOLOGY 2012; 13 (8)
- Q & A: the Snyderome GENOME BIOLOGY 2012; 13 (3)
Phosphorylation of Yeast Transcription Factors Correlates with the Evolution of Novel Sequence and Function
JOURNAL OF PROTEOME RESEARCH
2012; 11 (1): 261-268
Gene duplication is a significant source of novel genes and the dynamics of gene duplicate retention vs loss are poorly understood, particularly in terms of the functional and regulatory specialization of their gene products. We compiled a comprehensive data set of S. cerevisiae phosphosites to study the role of phosphorylation in yeast paralog divergence. We found that proteins coded by duplicated genes created in the Whole Genome Duplication (WGD) event and in a period prior to the WGD are significantly more phosphorylated than other duplicates or singletons. Though the amino acid sequence of each paralog of a given pair tends to diverge fairly similarly from their common ortholog in a related species, the phosphorylated amino acids tend to diverge in sequence from the ortholog at different rates. We observed that transcription factors (TFs) are disproportionately present among the set of duplicate genes and among the set of proteins that are phosphorylated. Interestingly, TFs that occur on higher levels of the transcription network hierarchy (i.e., tend to regulate other TFs) tend to be more phosphorylated than lower-level TFs. We found that TF paralog divergence in expression, binding, and sequence correlates with the abundance of phosphosites. Overall, these studies have important implications for understanding divergence of gene function and regulation in eukaryotes.
View details for DOI 10.1021/pr201065k
View details for Web of Science ID 000298827700024
View details for PubMedID 22141333
Interpretome: a freely available, modular, and secure personal genome interpretation engine.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
The decreasing cost of genotyping and genome sequencing has ushered in an era of genomic personalized medicine. More than 100,000 individuals have been genotyped by direct-to-consumer genetic testing services, which offer a glimpse into the interpretation and exploration of a personal genome. However, these interpretations, which require extensive manual curation, are subject to the preferences of the company and are not customizable by the individual. Academic institutions teaching personalized medicine, as well as genetic hobbyists, may prefer to customize their analysis and have full control over the content and method of interpretation. We present the Interpretome, a system for private genome interpretation, which contains all genotype information in client-side interpretation scripts, supported by server-side databases. We provide state-of-the-art analyses for teaching clinical implications of personal genomics, including disease risk assessment and pharmacogenomics. Additionally, we have implemented client-side algorithms for ancestry inference, demonstrating the power of these methods without excessive computation. Finally, the modular nature of the system allows for plugin capabilities for custom analyses. This system will allow for personal genome exploration without compromising privacy, facilitating hands-on courses in genomics and personalized medicine.
View details for PubMedID 22174289
Characterization of Enhancer Function from Genome-Wide Analyses
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 13
2012; 13: 29-57
There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.
View details for DOI 10.1146/annurev-genom-090711-163723
View details for Web of Science ID 000310143800002
View details for PubMedID 22703170
An encyclopedia of mouse DNA elements (Mouse ENCODE).
2012; 13 (8): 418
ABSTRACT: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
View details for DOI 10.1186/gb-2012-13-8-418
View details for PubMedID 22889292
A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult Drosophila melanogaster
2011; 7 (12)
Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications.
View details for DOI 10.1371/journal.pgen.1002380
View details for Web of Science ID 000299167900003
View details for PubMedID 22194694
Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms
2011; 6 (11)
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.
View details for DOI 10.1371/journal.pone.0027859
View details for Web of Science ID 000298168100021
View details for PubMedID 22140474
Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data
PLOS COMPUTATIONAL BIOLOGY
2011; 7 (11)
We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.
View details for DOI 10.1371/journal.pcbi.1002190
View details for Web of Science ID 000297263700001
View details for PubMedID 22125477
Performance comparison of exome DNA sequencing technologies
2011; 29 (10): 908-U206
Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.
View details for DOI 10.1038/nbt.1975
View details for Web of Science ID 000296273000017
View details for PubMedID 21947028
Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence
2011; 7 (9)
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.
View details for DOI 10.1371/journal.pgen.1002280
View details for Web of Science ID 000295419100031
View details for PubMedID 21935354
View details for PubMedCentralID PMC3174201
Arabidopsis RTNLB1 and RTNLB2 Reticulon-Like Proteins Regulate Intracellular Trafficking and Activity of the FLS2 Immune Receptor
2011; 23 (9): 3374-3391
Receptors localized at the plasma membrane are critical for the recognition of pathogens. The molecular determinants that regulate receptor transport to the plasma membrane are poorly understood. In a screen for proteins that interact with the FLAGELIN-SENSITIVE2 (FLS2) receptor using Arabidopsis thaliana protein microarrays, we identified the reticulon-like protein RTNLB1. We showed that FLS2 interacts in vivo with both RTNLB1 and its homolog RTNLB2 and that a Ser-rich region in the N-terminal tail of RTNLB1 is critical for the interaction with FLS2. Transgenic plants that lack RTNLB1 and RTNLB2 (rtnlb1 rtnlb2) or overexpress RTNLB1 (RTNLB1ox) exhibit reduced activation of FLS2-dependent signaling and increased susceptibility to pathogens. In both rtnlb1 rtnlb2 and RTNLB1ox, FLS2 accumulation at the plasma membrane was significantly affected compared with the wild type. Transient overexpression of RTNLB1 led to FLS2 retention in the endoplasmic reticulum (ER) and affected FLS2 glycosylation but not FLS2 stability. Removal of the critical N-terminal Ser-rich region or either of the two Tyr-dependent sorting motifs from RTNLB1 causes partial reversion of the negative effects of excess RTNLB1 on FLS2 transport out of the ER and accumulation at the membrane. The results are consistent with a model whereby RTNLB1 and RTNLB2 regulate the transport of newly synthesized FLS2 to the plasma membrane.
View details for DOI 10.1105/tpc.111.089656
View details for Web of Science ID 000296739100025
View details for PubMedID 21949153
Cooperative transcription factor associations discovered using regulatory variation
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (32): 13353-13358
Regulation of gene expression at the transcriptional level is achieved by complex interactions of transcription factors operating at their target genes. Dissecting the specific combination of factors that bind each target is a significant challenge. Here, we describe in detail the Allele Binding Cooperativity test, which uses variation in transcription factor binding among individuals to discover combinations of factors and their targets. We developed the ALPHABIT (a large-scale process to hunt for allele binding interacting transcription factors) pipeline, which includes statistical analysis of binding sites followed by experimental validation, and demonstrate that this method predicts transcription factors that associate with NFκB. Our method successfully identifies factors that have been known to work with NFκB (E2A, STAT1, IRF2), but whose global coassociation and sites of cooperative action were not known. In addition, we identify a unique coassociation (EBF1) that had not been reported previously. We present a general approach for discovering combinatorial models of regulation and advance our understanding of the genetic basis of variation in transcription factor binding.
View details for DOI 10.1073/pnas.1103105108
View details for Web of Science ID 000293691400076
View details for PubMedID 21828005
View details for PubMedCentralID PMC3156166
A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans
2011; 7 (8)
As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.
View details for DOI 10.1371/journal.pgen.1002236
View details for Web of Science ID 000294297000031
View details for PubMedID 21876680
AlleleSeq: analysis of allele-specific expression and binding in a network framework
MOLECULAR SYSTEMS BIOLOGY
To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.
View details for DOI 10.1038/msb.2011.54
View details for Web of Science ID 000294537800003
View details for PubMedID 21811232
Identification of genomic indels and structural variations using split reads
Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.
View details for DOI 10.1186/1471-2164-12-375
View details for Web of Science ID 000294205500001
View details for PubMedID 21787423
- Metabolites as global regulators: A new view of protein regulation BIOESSAYS 2011; 33 (7): 485-489
The Human Proteome Project: Current State and Future Direction
MOLECULAR & CELLULAR PROTEOMICS
2011; 10 (7)
After the successful completion of the Human Genome Project, the Human Proteome Organization has recently officially launched a global Human Proteome Project (HPP), which is designed to map the entire human protein set. Given the lack of protein-level evidence for about 30% of the estimated 20,300 protein-coding genes, a systematic global effort will be necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP research groups will use the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge bases. The HPP participants will take advantage of the output and cross-analyses from the ongoing Human Proteome Organization initiatives and a chromosome-centric protein mapping strategy, termed C-HPP, with which many national teams are currently engaged. In addition, numerous biologically driven and disease-oriented projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents, and tools for protein studies and analyses, and a stronger basis for personalized medicine. The Human Proteome Organization urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.
View details for DOI 10.1074/mcp.M111.009993
View details for Web of Science ID 000292541500012
View details for PubMedID 21742803
View details for PubMedCentralID PMC3134076
- Landscape of Next-Generation Sequencing Technologies ANALYTICAL CHEMISTRY 2011; 83 (12): 4327-4341
Genome-wide chromatin occupancy analysis reveals a role for ASH2 in transcriptional pausing
NUCLEIC ACIDS RESEARCH
2011; 39 (11): 4628-4639
An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.
View details for DOI 10.1093/nar/gkq1322
View details for Web of Science ID 000291755000015
View details for PubMedID 21310711
CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing
2011; 21 (6): 974-984
Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.
View details for DOI 10.1101/gr.114876.110
View details for Web of Science ID 000291153400017
View details for PubMedID 21324876
A Large Gene Network in Immature Erythroid Cells Is Controlled by the Myeloid and B Cell Transcriptional Regulator PU.1
2011; 7 (6)
PU.1 is a hematopoietic transcription factor that is required for the development of myeloid and B cells. PU.1 is also expressed in erythroid progenitors, where it blocks erythroid differentiation by binding to and inhibiting the main erythroid promoting factor, GATA-1. However, other mechanisms by which PU.1 affects the fate of erythroid progenitors have not been thoroughly explored. Here, we used ChIP-Seq analysis for PU.1 and gene expression profiling in erythroid cells to show that PU.1 regulates an extensive network of genes that constitute major pathways for controlling growth and survival of immature erythroid cells. By analyzing fetal liver erythroid progenitors from mice with low PU.1 expression, we also show that the earliest erythroid committed cells are dramatically reduced in vivo. Furthermore, we find that PU.1 also regulates many of the same genes and pathways in other blood cells, leading us to propose that PU.1 is a multifaceted factor with overlapping, as well as distinct, functions in several hematopoietic lineages.
View details for DOI 10.1371/journal.pgen.1001392
View details for Web of Science ID 000292386300004
View details for PubMedID 21695229
Diverse protein kinase interactions identified by protein microarrays reveal novel connections between cellular processes
GENES & DEVELOPMENT
2011; 25 (7): 767-778
Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.
View details for DOI 10.1101/gad.1998811
View details for Web of Science ID 000289062700010
View details for PubMedID 21460040
- A User's Guide to the Encyclopedia of DNA Elements (ENCODE) PLOS BIOLOGY 2011; 9 (4)
Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches
2011; 7 (3)
A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5' ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.
View details for DOI 10.1371/journal.pgen.1002008
View details for Web of Science ID 000288996600042
View details for PubMedID 21408204
Mapping copy number variation by population-scale genome sequencing
2011; 470 (7332): 59-65
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
View details for DOI 10.1038/nature09708
View details for Web of Science ID 000286886400033
View details for PubMedID 21293372
View details for PubMedCentralID PMC3077050
Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data
2011; 21 (2): 276-285
We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.
View details for DOI 10.1101/gr.110189.110
View details for Web of Science ID 000286804100013
View details for PubMedID 21177971
Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans
2011; 21 (2): 245-254
Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors--LIN-39, MAB-5, and EGL-5--indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.
View details for DOI 10.1101/gr.114587.110
View details for Web of Science ID 000286804100010
View details for PubMedID 21177963
RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries
2011; 27 (2): 281-283
The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.
View details for DOI 10.1093/bioinformatics/btq643
View details for Web of Science ID 000286215200025
View details for PubMedID 21134889
View details for PubMedCentralID PMC3018817
Stat3 is essential for neuronal differentiation through direct transcriptional regulation of the Sox6 gene
2011; 585 (1): 148-152
The transcription factor Signal Transducer and Activator of Transcription 3 (Stat3) functions in various cellular processes including neuronal differentiation. We show that the SRY-box containing gene 6 (Sox6) gene, important for neuronal differentiation, is a direct target gene of Stat3. We demonstrate that in response to ligand stimulation, Stat3 binds to the Sox6 promoter and induces its expression. Furthermore, Stat3 is activated and Sox6 is induced during neuronal differentiation of P19 cells in the absence of exogenous ligand treatment. Moreover, using an RNA interference approach, we show that Stat3 is required for Sox6 expression during neuronal differentiation.
View details for DOI 10.1016/j.febslet.2010.11.030
View details for Web of Science ID 000285921500025
View details for PubMedID 21094641
Methods in molecular biology (Clifton, N.J.)
2011; 759: 125-132
This chapter describes the RNA sequencing (RNA-Seq) protocol, whereby RNA from yeast cells is prepared for sequencing on an Illumina Genome Analyzer. The protocol can easily be altered to use RNA from a different organism. This chapter covers RNA extraction, cDNA synthesis, cDNA fragmentation, and Illumina cDNA library generation and contains some brief remarks on bioinformatic analysis.
View details for DOI 10.1007/978-1-61779-173-4_8
View details for PubMedID 21863485
The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics
2011; 12 (3)
Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic--for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.
View details for DOI 10.1186/gb-2011-12-3-r32
View details for Web of Science ID 000291309200012
View details for PubMedID 21453526
View details for PubMedCentralID PMC3129682
Regulatory Variation Within Between Species
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 12
2011; 12: 327-346
Understanding how individuals differ from one another and from closely related species is a fundamental problem in biology. Recent evidence suggests that much of the variation both within and between species is due to differential gene regulation. Here we review differential gene regulation focusing on evolutionary-developmental (evo-devo) biology, global comparison of genomic sequences, whole-genome gene expression, and transcription factor (TF) binding profiles. We also explore the relationship between divergence rate of regulatory sequences, coding sequences, and TF binding events using several different measures and discuss their implications in the context of evolution of regulatory networks. Finally, we discuss the current status and future challenges in relating regulatory variation to the divergence across and within species.
View details for DOI 10.1146/annurev-genom-082908-150139
View details for Web of Science ID 000295819900014
View details for PubMedID 21721942
Kinase substrate interactions.
Methods in molecular biology (Clifton, N.J.)
2011; 723: 201-212
Kinases have become popular therapeutic targets primarily due to their integral role in cell cycle and tumor progression. The efficacy of high-throughput screening efforts is dependent on the development of high quality multiplex tools capable of replacing lower-throughput technologies such as mass spectroscopy or solution-based assays for the study of kinase-substrate interactions. Functional protein microarrays are comprised of thousands of immobilized proteins on glass slides that have been used successfully to identify protein-protein interactions. Here, we describe the application of functional protein microarrays for the identification of the phosphorylation targets of individual protein kinases using highly sensitive radioactive detection and robust informatics algorithms.
View details for DOI 10.1007/978-1-61779-043-0_13
View details for PubMedID 21370067
The human proteome project: Current state and future direction.
Molecular & cellular proteomics : MCP
After successful completion of the Human Genome Project (HGP), HUPO has recently officially launched a global Human Proteome Project (HPP) which is designed to map the entire human protein set. Given the presence of about 30% undisclosed proteins out of 20,300 protein gene products, a systematic global effort is necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP groups employ the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge base. The HPP participants will take advantage of the output and cross-analyses from the ongoing HUPO initiatives and a chromosome-based protein mapping strategy, termed C-HPP with many national teams currently engaged. In addition, numerous biologically-driven projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents and tools for protein studies and analyses, and a stronger basis for personalized medicine. HUPO urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.
View details for DOI 10.1074/mcp.O111.009993
View details for PubMedID 21531903
Measuring the Evolutionary Rewiring of Biological Networks
PLOS COMPUTATIONAL BIOLOGY
2011; 7 (1)
We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or "rewire", at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of "commonplace" networks such as family trees, co-authorships and linux-kernel function dependencies.
View details for DOI 10.1371/journal.pcbi.1001050
View details for Web of Science ID 000286652100009
View details for PubMedID 21253555
Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project
2010; 330 (6012): 1775-1787
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
View details for DOI 10.1126/science.1196914
View details for Web of Science ID 000285603700031
View details for PubMedID 21177976
Statistical Issues in Mapping QTLs for RNA-seq Data
19th Annual Meeting of the International-Genetic-Epidemiology-Society
WILEY-BLACKWELL. 2010: 942–42
View details for Web of Science ID 000284719100104
Exploring successful community pharmacist-physician collaborative working relationships using mixed methods
RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY
2010; 6 (4): 307-323
Collaborative working relationships (CWRs) between community pharmacists and physicians may foster the provision of medication therapy management services, disease state management, and other patient care activities; however, pharmacists have expressed difficulty in developing such relationships. Additional work is needed to understand the specific pharmacist-physician exchanges that effectively contribute to the development of CWR. Data from successful pairs of community pharmacists and physicians may provide further insights into these exchange variables and expand research on models of professional collaboration.To describe the professional exchanges that occurred between community pharmacists and physicians engaged in successful CWRs, using a published conceptual model and tool for quantifying the extent of collaboration.A national pool of experts in community pharmacy practice identified community pharmacists engaged in CWRs with physicians. Five pairs of community pharmacists and physician colleagues participated in individual semistructured interviews, and 4 of these pairs completed the Pharmacist-Physician Collaborative Index (PPCI). Main outcome measures include quantitative (ie, scores on the PPCI) and qualitative information about professional exchanges within 3 domains found previously to influence relationship development: relationship initiation, trustworthiness, and role specification.On the PPCI, participants scored similarly on trustworthiness; however, physicians scored higher on relationship initiation and role specification. The qualitative interviews revealed that when initiating relationships, it was important for many pharmacists to establish open communication through face-to-face visits with physicians. Furthermore, physicians were able to recognize in these pharmacists a commitment for improved patient care. Trustworthiness was established by pharmacists making consistent contributions to care that improved patient outcomes over time. Open discussions regarding professional roles and an acknowledgment of professional norms (ie, physicians as decision makers) were essential.The findings support and extend the literature on pharmacist-physician CWRs by examining the exchange domains of relationship initiation, trustworthiness, and role specification qualitatively and quantitatively among pairs of practitioners. Relationships appeared to develop in a manner consistent with a published model for CWRs, including the pharmacist as relationship initiator, the importance of communication during early stages of the relationship, and an emphasis on high-quality pharmacist contributions.
View details for DOI 10.1016/j.sapharm.2009.11.008
View details for Web of Science ID 000285168400005
View details for PubMedID 21111388
Transformation of Candida albicans with a synthetic hygromycin B resistance gene
2010; 27 (12): 1039-1048
Synthetic genes that confer resistance to the antibiotic nourseothricin in the pathogenic fungus Candida albicans are available, but genes conferring resistance to other antibiotics are not. We found that multiple C. albicans strains were inhibited by hygromycin B, so we designed a 1026 bp gene (CaHygB) that encodes Escherichia coli hygromycin B phosphotransferase with C. albicans codons. CaHygB conferred hygromycin B resistance in C. albicans transformed with ars2-containing plasmids or single-copy integrating vectors. Since CaHygB did not confer nourseothricin resistance and since the nourseothricin resistance marker SAT-1 did not confer hygromycin B resistance, we reasoned that these two markers could be used for homologous gene disruptions in wild-type C. albicans. We used PCR to fuse CaHygB or SAT-1 to approximately 1 kb of 5' and 3' noncoding DNA from C. albicans ARG4, HIS1 and LEU2, and introduced the resulting amplicons into six wild-type C. albicans strains. Homologous targeting frequencies were approximately 50-70%, and disruption of ARG4, HIS1 and LEU2 alleles was verified by the respective transformants' inabilities to grow without arginine, histidine and leucine. CaHygB should be a useful tool for genetic manipulation of different C. albicans strains, including clinical isolates.
View details for DOI 10.1002/yea.1813
View details for Web of Science ID 000285210600006
View details for PubMedID 20737428
Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads
Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.
View details for DOI 10.1186/1471-2164-11-663
View details for Web of Science ID 000285303000001
View details for PubMedID 21106091
Extensive In Vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses
2010; 143 (4): 639-650
Natural small compounds comprise most cellular molecules and bind proteins as substrates, products, cofactors, and ligands. However, a large-scale investigation of in vivo protein-small metabolite interactions has not been performed. We developed a mass spectrometry assay for the large-scale identification of in vivo protein-hydrophobic small metabolite interactions in yeast and analyzed compounds that bind ergosterol biosynthetic proteins and protein kinases. Many of these proteins bind small metabolites; a few interactions were previously known, but the vast majority are new. Importantly, many key regulatory proteins such as protein kinases bind metabolites. Ergosterol was found to bind many proteins and may function as a general regulator. It is required for the activity of Ypk1, a mammalian AKT/SGK kinase homolog. Our study defines potential key regulatory steps in lipid biosynthetic pathways and suggests that small metabolites may play a more general role as regulators of protein activity and function than previously appreciated.
View details for DOI 10.1016/j.cell.2010.09.048
View details for Web of Science ID 000284149100020
View details for PubMedID 21035178
A map of human genome variation from population-scale sequencing
2010; 467 (7319): 1061-1073
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
View details for DOI 10.1038/nature09534
View details for Web of Science ID 000283548600039
View details for PubMedCentralID PMC3042601
Yeast proteomics and protein microarrays
JOURNAL OF PROTEOMICS
2010; 73 (11): 2147-2157
Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.
View details for DOI 10.1016/j.jprot.2010.08.003
View details for Web of Science ID 000283903000008
View details for PubMedID 20728591
Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq
2010; 20 (10): 1451-1458
Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.
View details for DOI 10.1101/gr.109553.110
View details for Web of Science ID 000282375000015
View details for PubMedID 20810668
Annotating non-coding regions of the genome
NATURE REVIEWS GENETICS
2010; 11 (8): 559-571
Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.
View details for DOI 10.1038/nrg2814
View details for Web of Science ID 000279988800012
View details for PubMedID 20628352
Initiation of the TORC1-Regulated G(0) Program Requires Igo1/2, which License Specific mRNAs to Evade Degradation via the 5 '-3 ' mRNA Decay Pathway
2010; 38 (3): 345-355
Eukaryotic cell proliferation is controlled by growth factors and essential nutrients, in the absence of which cells may enter into a quiescent (G(0)) state. In yeast, nitrogen and/or carbon limitation causes downregulation of the conserved TORC1 and PKA signaling pathways and, consequently, activation of the PAS kinase Rim15, which orchestrates G(0) program initiation and ensures proper life span by controlling distal readouts, including the expression of specific genes. Here, we report that Rim15 coordinates transcription with posttranscriptional mRNA protection by phosphorylating the paralogous Igo1 and Igo2 proteins. This event, which stimulates Igo proteins to associate with the mRNA decapping activator Dhh1, shelters newly expressed mRNAs from degradation via the 5'-3' mRNA decay pathway, thereby enabling their proper translation during initiation of the G(0) program. These results delineate a likely conserved mechanism by which nutrient limitation leads to stabilization of specific mRNAs that are critical for cell differentiation and life span.
View details for DOI 10.1016/j.molcel.2010.02.039
View details for Web of Science ID 000277818400006
View details for PubMedID 20471941
MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains
Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.
View details for DOI 10.1186/1471-2105-11-243
View details for Web of Science ID 000279728900007
View details for PubMedID 20459839
Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells
NATURE STRUCTURAL & MOLECULAR BIOLOGY
2010; 17 (5): 635-U139
Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.
View details for DOI 10.1038/nsmb.1794
View details for Web of Science ID 000277330700020
View details for PubMedID 20418883
Genetic analysis of variation in transcription factor binding in yeast
2010; 464 (7292): 1187-U106
Variation in transcriptional regulation is thought to be a major cause of phenotypic diversity. Although widespread differences in gene expression among individuals of a species have been observed, studies to examine the variability of transcription factor binding on a global scale have not been performed, and thus the extent and underlying genetic basis of transcription factor binding diversity is unknown. By mapping differences in transcription factor binding among individuals, here we present the genetic basis of such variation on a genome-wide scale. Whole-genome Ste12-binding profiles were determined using chromatin immunoprecipitation coupled with DNA sequencing in pheromone-treated cells of 43 segregants of a cross between two highly diverged yeast strains and their parental lines. We identified extensive Ste12-binding variation among individuals, and mapped underlying cis- and trans-acting loci responsible for such variation. We showed that most transcription factor binding variation is cis-linked, and that many variations are associated with polymorphisms residing in the binding motifs of Ste12 as well as those of several proposed Ste12 cofactors. We also identified two trans-factors, AMN1 and FLO8, that modulate Ste12 binding to promoters of more than ten genes under alpha-factor treatment. Neither of these two genes was previously known to regulate Ste12, and we suggest that they may be mediators of gene activity and phenotypic diversity. Ste12 binding strongly correlates with gene expression for more than 200 genes, indicating that binding variation is functional. Many of the variable-bound genes are involved in cell wall organization and biogenesis. Overall, these studies identified genetic regulators of molecular diversity among individuals and provide new insights into mechanisms of gene regulation.
View details for DOI 10.1038/nature08934
View details for Web of Science ID 000276891100036
View details for PubMedID 20237471
Variation in Transcription Factor Binding Among Humans
2010; 328 (5975): 232-235
Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (PolII) and a key regulator of immune responses, nuclear factor kappaB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing PolII binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.
View details for DOI 10.1126/science.1183621
View details for Web of Science ID 000276459600043
View details for PubMedID 20299548
View details for PubMedCentralID PMC2938768
Molecular Mechanisms of Ethanol-Induced Pathogenesis Revealed by RNA-Sequencing
2010; 6 (4)
Acinetobacter baumannii is a common pathogen whose recent resistance to drugs has emerged as a major health problem. Ethanol has been found to increase the virulence of A. baumannii in Dictyostelium discoideum and Caenorhabditis elegans models of infection. To better understand the causes of this effect, we examined the transcriptional profile of A. baumannii grown in the presence or absence of ethanol using RNA-Seq. Using the Illumina/Solexa platform, a total of 43,453,960 reads (35 nt) were obtained, of which 3,596,474 mapped uniquely to the genome. Our analysis revealed that ethanol induces the expression of 49 genes that belong to different functional categories. A strong induction was observed for genes encoding metabolic enzymes, indicating that ethanol is efficiently assimilated. In addition, we detected the induction of genes encoding stress proteins, including upsA, hsp90, groEL and lon as well as permeases, efflux pumps and a secreted phospholipase C. In stationary phase, ethanol strongly induced several genes involved with iron assimilation and a high-affinity phosphate transport system, indicating that A. baumannii makes a better use of the iron and phosphate resources in the medium when ethanol is used as a carbon source. To evaluate the role of phospholipase C (Plc1) in virulence, we generated and analyzed a deletion mutant for plc1. This strain exhibits a modest, but reproducible, reduction in the cytotoxic effect caused by A. baumannii on epithelial cells, suggesting that phospholipase C is important for virulence. Overall, our results indicate the power of applying RNA-Seq to identify key modulators of bacterial pathogenesis. We suggest that the effect of ethanol on the virulence of A. baumannii is multifactorial and includes a general stress response and other specific components such as phospholipase C.
View details for DOI 10.1371/journal.ppat.1000834
View details for Web of Science ID 000277722400007
View details for PubMedID 20368969
Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2010; 107 (11): 5254-5259
To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.
View details for DOI 10.1073/pnas.0914114107
View details for Web of Science ID 000275714300079
View details for PubMedID 20194744
View details for PubMedCentralID PMC2841935
Personal genome sequencing: current approaches and challenges
GENES & DEVELOPMENT
2010; 24 (5): 423-431
The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., "personal genomes." Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences.
View details for DOI 10.1101/gad.1864110
View details for Web of Science ID 000275055900001
View details for PubMedID 20194435
X chromosome-wide analyses of genomic DNA methylation states and gene expression in male and female neutrophils
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2010; 107 (8): 3704-3709
The DNA methylation status of human X chromosomes from male and female neutrophils was identified by high-throughput sequencing of HpaII and MspI digested fragments. In the intergenic and intragenic regions on the X chromosome, the sites outside CpG islands were heavily hypermethylated to the same degree in both genders. Nearly half of X chromosome promoters were either hypomethylated or hypermethylated in both females and males. Nearly one third of X chromosome promoters were a mixture of hypomethylated and heterogeneously methylated sites in females and were hypomethylated in males. Thus, a large fraction of genes that are silenced on the inactive X chromosome are hypomethylated in their promoter regions. These genes frequently belong to the evolutionarily younger strata of the X chromosome. The promoters that were hypomethylated at more than two sites contained most of the genes that escaped silencing on the inactive X chromosome. The overall levels of expression of X-linked genes were indistinguishable in females and males, regardless of the methylation state of the inactive X chromosome. Thus, in addition to DNA methylation, other factors are involved in the fine tuning of gene dosage compensation in neutrophils.
View details for DOI 10.1073/pnas.0914812107
View details for Web of Science ID 000275130900077
View details for PubMedID 20133578
Close association of RNA polymerase II and many transcription factors with Pol III genes
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2010; 107 (8): 3639-3644
Transcription of the eukaryotic genomes is carried out by three distinct RNA polymerases I, II, and III, whereby each polymerase is thought to independently transcribe a distinct set of genes. To investigate a possible relationship of RNA polymerases II and III, we mapped their in vivo binding sites throughout the human genome by using ChIP-Seq in two different cell lines, GM12878 and K562 cells. Pol III was found to bind near many known genes as well as several previously unidentified target genes. RNA-Seq studies indicate that a majority of the bound genes are expressed, although a subset are not suggestive of stalling by RNA polymerase III. Pol II was found to bind near many known Pol III genes, including tRNA, U6, HVG, hY, 7SK and previously unidentified Pol III target genes. Similarly, in vivo binding studies also reveal that a number of transcription factors normally associated with Pol II transcription, including c-Fos, c-Jun and c-Myc, also tightly associate with most Pol III-transcribed genes. Inhibition of Pol II activity using alpha-amanitin reduced expression of a number of Pol III genes (e.g., U6, hY, HVG), suggesting that Pol II plays an important role in regulating their transcription. These results indicate that, contrary to previous expectations, polymerases can often work with one another to globally coordinate gene expression.
View details for DOI 10.1073/pnas.0911315106
View details for Web of Science ID 000275130900066
View details for PubMedID 20139302
Deciphering Protein Kinase Specificity Through Large-Scale Analysis of Yeast Phosphorylation Site Motifs
2010; 3 (109)
Phosphorylation is a universal mechanism for regulating cell behavior in eukaryotes. Although protein kinases target short linear sequence motifs on their substrates, the rules for kinase substrate recognition are not completely understood. We used a rapid peptide screening approach to determine consensus phosphorylation site motifs targeted by 61 of the 122 kinases in Saccharomyces cerevisiae. By correlating these motifs with kinase primary sequence, we uncovered previously unappreciated rules for determining specificity within the kinase family, including a residue determining P-3 arginine specificity among members of the CMGC [CDK (cyclin-dependent kinase), MAPK (mitogen-activated protein kinase), GSK (glycogen synthase kinase), and CDK-like] group of kinases. Furthermore, computational scanning of the yeast proteome enabled the prediction of thousands of new kinase-substrate relationships. We experimentally verified several candidate substrates of the Prk1 family of kinases in vitro and in vivo and identified a protein substrate of the kinase Vhs1. Together, these results elucidate how kinase catalytic domains recognize their phosphorylation targets and suggest general avenues for the identification of previously unknown kinase substrates across eukaryotes.
View details for DOI 10.1126/scisignal.2000482
View details for Web of Science ID 000275647900005
View details for PubMedID 20159853
Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response
2010; 6 (2)
Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.
View details for DOI 10.1371/journal.pgen.1000848
View details for Web of Science ID 000275262700016
View details for PubMedID 20174564
Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library
2010; 28 (1): 47-U76
Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.
View details for DOI 10.1038/nbt.1600
View details for Web of Science ID 000273430400020
View details for PubMedID 20037582
CHIP-SEQ: USING HIGH-THROUGHPUT DNA SEQUENCING FOR GENOME-WIDE IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES
METHODS IN ENZYMOLOGY, VOL 470: GUIDE TO YEAST GENETICS:
2010; 470: 77-104
Much of eukaryotic gene regulation is mediated by binding of transcription factors near or within their target genes. Transcription factor binding sites (TFBS) are often identified globally using chromatin immunoprecipitation (ChIP) in which specific protein-DNA interactions are isolated using an antibody against the factor of interest. Coupling ChIP with high-throughput DNA sequencing allows identification of TFBS in a direct, unbiased fashion; this technique is termed ChIP-Sequencing (ChIP-Seq). In this chapter, we describe the yeast ChIP-Seq procedure, including the protocols for ChIP, input DNA preparation, and Illumina DNA sequencing library preparation. Descriptions of Illumina sequencing and data processing and analysis are also included. The use of multiplex short-read sequencing (i.e., barcoding) enables the analysis of many ChIP samples simultaneously, which is especially valuable for organisms with small genomes such as yeast.
View details for DOI 10.1016/S0076-6879(10)70004-5
View details for Web of Science ID 000275827900004
View details for PubMedID 20946807
RNA-Seq: a method for comprehensive transcriptome analysis.
Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.]
2010; Chapter 4: Unit 4 11 1-13
A recently developed technique called RNA Sequencing (RNA-Seq) uses massively parallel sequencing to allow transcriptome analyses of genomes at a far higher resolution than is available with Sanger sequencing- and microarray-based methods. In the RNA-Seq method, complementary DNAs (cDNAs) generated from the RNA of interest are directly sequenced using next-generation sequencing technologies. The reads obtained from this can then be aligned to a reference genome in order to construct a whole-genome transcriptome map. RNA-Seq has been used successfully to precisely quantify transcript levels, confirm or revise previously annotated 5' and 3' ends of genes, and map exon/intron boundaries. This unit describes protocols for performing RNA-Seq using the Illumina sequencing platform.
View details for DOI 10.1002/0471142727.mb0411s89
View details for PubMedID 20069539
Systems biology approaches to disease marker discovery
2010; 28 (4): 209-224
Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.
View details for DOI 10.3233/DMA-2010-0707
View details for Web of Science ID 000279321200003
View details for PubMedID 20534906
EBNA1 regulates cellular gene expression by binding cellular promoters
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (52): 22421-22426
Epstein-Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitt's lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.
View details for DOI 10.1073/pnas.0911676106
View details for Web of Science ID 000273178700069
View details for PubMedID 20080792
Mapping accessible chromatin regions using Sono-Seq
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (35): 14926-14931
Disruptions in local chromatin structure often indicate features of biological interest such as regulatory regions. We find that sonication of cross-linked chromatin, when combined with a size-selection step and massively parallel short-read sequencing, can be used as a method (Sono-Seq) to map locations of high chromatin accessibility in promoter regions. Sono-Seq sites frequently correspond to actively transcribed promoter regions, as evidenced by their co-association with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation (H3K4me3) marks, and CpG islands; signals over other sites, such as those bound by the CTCF insulator, are also observed. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, that observed for FAIRE and DNase I hypersensitive sites. Our results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure. Furthermore, our results provide insights into the mapping of binding sites by using ChIP-Seq experiments and the value of reference samples that should be used in such experiments.
View details for DOI 10.1073/pnas.0905443106
View details for Web of Science ID 000269481000036
View details for PubMedID 19706456
Global analysis of the glycoproteome in Saccharomyces cerevisiae reveals new roles for protein glycosylation in eukaryotes
MOLECULAR SYSTEMS BIOLOGY
To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.
View details for DOI 10.1038/msb.2009.64
View details for Web of Science ID 000270456400002
View details for PubMedID 19756047
Impact of Chromatin Structures on DNA Processing for Genomic Analyses
2009; 4 (8)
Chromatin has an impact on recombination, repair, replication, and evolution of DNA. Here we report that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation (ChIP) experiments. We initially discovered this effect at the Saccharomyces cerevisiae HMR locus, where we found that silenced chromatin was refractory to shearing, relative to euchromatin. Using input samples from ChIP-Seq studies, we detected a similar bias throughout the heterochromatic portions of the yeast genome. We also observed significant chromatin-related effects at telomeres, protein binding sites, and genes, reflected in the variation of input-Seq coverage. Experimental tests of candidate regions showed that chromatin influenced shearing at some loci, and that chromatin could also lead to enriched or depleted DNA levels in prepared samples, independently of shearing effects. Our results suggested that assays relying on immunoprecipitation of chromatin will be biased by intrinsic differences between regions packaged into different chromatin structures - biases which have been largely ignored to date. These results established the pervasiveness of this bias genome-wide, and suggested that this bias can be used to detect differences in chromatin structures across the genome.
View details for DOI 10.1371/journal.pone.0006700
View details for Web of Science ID 000269267400008
View details for PubMedID 19693276
Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo
NATURE STRUCTURAL & MOLECULAR BIOLOGY
2009; 16 (8): 847-U70
We assess the role of intrinsic histone-DNA interactions by mapping nucleosomes assembled in vitro on genomic DNA. Nucleosomes strongly prefer yeast DNA over Escherichia coli DNA, indicating that the yeast genome evolved to favor nucleosome formation. Many yeast promoter and terminator regions intrinsically disfavor nucleosome formation, and nucleosomes assembled in vitro show strong rotational positioning. Nucleosome arrays generated by the ACF assembly factor have fewer nucleosome-free regions, reduced rotational positioning and less translational positioning than obtained by intrinsic histone-DNA interactions. Notably, nucleosomes assembled in vitro have only a limited preference for specific translational positions and do not show the pattern observed in vivo. Our results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.
View details for DOI 10.1038/nsmb.1636
View details for Web of Science ID 000268738700012
View details for PubMedID 19620965
The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (29): 12031-12036
Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.
View details for DOI 10.1073/pnas.0813248106
View details for Web of Science ID 000268178400040
View details for PubMedID 19597142
Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
PLOS COMPUTATIONAL BIOLOGY
2009; 5 (7)
The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.
View details for DOI 10.1371/journal.pcbi.1000432
View details for Web of Science ID 000269220100023
View details for PubMedID 19593373
Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles
JOURNAL OF PROTEOME RESEARCH
2009; 8 (7): 3689-3692
Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the "International Summit on Proteomics Data Release and Sharing Policy" in Amsterdam, The Netherlands, to identify and address potential roadblocks to rapid and open access to data. The six principles agreed upon by key stakeholders at the summit addressed issues surrounding (1) timing, (2) comprehensiveness, (3) format, (4) deposition to repositories, (5) quality metrics, and (6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.
View details for DOI 10.1021/pr900023z
View details for Web of Science ID 000267694600043
View details for PubMedID 19344107
- Unlocking the secrets of the genome NATURE 2009; 459 (7249): 927-930
Dynamic and complex transcription factor binding during an inducible response in yeast
GENES & DEVELOPMENT
2009; 23 (11): 1351-1363
Complex biological processes are often regulated, at least in part, by the binding of transcription factors to their targets. Recently, considerable effort has been made to analyze the binding of relevant factors to the suite of targets they regulate, thereby generating a regulatory circuit map. However, for most studies the dynamics of binding have not been analyzed, and thus the temporal order of events and mechanisms by which this occurs are poorly understood. We globally analyzed in detail the temporal order of binding of several key factors involved in the salt response of yeast to their target genes. Analysis of Yap4 and Sko1 binding to their target genes revealed multiple temporal classes of binding patterns: (1) constant binding, (2) rapid induction, (3) slow induction, and (4) transient induction. These results demonstrate that individual transcription factors can have multiple binding patterns and help define the different types of temporal binding patterns used in eukaryotic gene regulation. To investigate these binding patterns further, we also analyzed the binding of seven other key transcription factors implicated in osmotic regulation, including Hot1, Msn1, Msn2, Msn4, Skn7, and Yap6, and found significant coassociation among the different factors at their gene targets. Moreover, the binding of several key factors was correlated with distinct classes of Yap4- and Sko1-binding patterns and with distinct types of genes. Gene expression studies revealed association of Yap4, Sko1, and other transcription factor-binding patterns with different gene expression patterns. The integration and analysis of binding and expression information reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.
View details for DOI 10.1101/gad.1781909
View details for Web of Science ID 000266524100009
View details for PubMedID 19487574
Integrated analysis of co-expressed MAP kinase substrates in Arabidopsis thaliana.
Plant signaling & behavior
2009; 4 (6): 524-527
View details for PubMedID 19816141
Distinct Genomic Aberrations Associated with ERG Rearranged Prostate Cancer
GENES CHROMOSOMES & CANCER
2009; 48 (4): 366-380
Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered three genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in nonrearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3 Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pin-point breakpoints leading to fusion events. This study provides further support to define a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements.
View details for DOI 10.1002/gcc.20647
View details for Web of Science ID 000263572700007
View details for PubMedID 19156837
A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster
2009; 113 (11): 2526-2534
We have identified an intergenic transcriptional activity that is located between the human HOXA1 and HOXA2 genes, shows myeloid-specific expression, and is up-regulated during granulocytic differentiation. The novel gene, termed HOTAIRM1 (HOX antisense intergenic RNA myeloid 1), is transcribed antisense to the HOXA genes and originates from the same CpG island that embeds the start site of HOXA1. The transcript appears to be a noncoding RNA containing no long open-reading frame; sucrose gradient analysis shows no association with polyribosomal fractions. HOTAIRM1 is the most prominent intergenic transcript expressed and up-regulated during induced granulocytic differentiation of NB4 promyelocytic leukemia and normal human hematopoietic cells; its expression is specific to the myeloid lineage. Its induction during retinoic acid (RA)-driven granulocytic differentiation is through RA receptor and may depend on the expression of myeloid cell development factors targeted by RA signaling. Knockdown of HOTAIRM1 quantitatively blunted RA-induced expression of HOXA1 and HOXA4 during the myeloid differentiation of NB4 cells, and selectively attenuated induction of transcripts for the myeloid differentiation genes CD11b and CD18, but did not noticeably impact the more distal HOXA genes. These findings suggest that HOTAIRM1 plays a role in the myelopoiesis through modulation of gene expression in the HOXA cluster.
View details for DOI 10.1182/blood-2008-06-162164
View details for Web of Science ID 000264110600021
View details for PubMedID 19144990
A high throughput embryonic stem cell screen identifies Oct-2 as a bifunctional regulator of neuronal differentiation
GENES & DEVELOPMENT
2009; 23 (5): 575-588
Neuronal differentiation is a complex process that involves a plethora of regulatory steps. To identify transcription factors that influence neuronal differentiation we developed a high throughput screen using embryonic stem (ES) cells. Seven-hundred human transcription factor clones were stably introduced into mouse ES (mES) cells and screened for their ability to induce neuronal differentiation of mES cells. Twenty-four factors that are capable of inducing neuronal differentiation were identified, including four known effectors of neuronal differentiation, 11 factors with limited evidence of involvement in regulating neuronal differentiation, and nine novel factors. One transcription factor, Oct-2, was studied in detail and found to be a bifunctional regulator: It can either repress or induce neuronal differentiation, depending on the particular isoform. Ectopic expression experiments demonstrate that isoform Oct-2.4 represses neuronal differentiation, whereas Oct-2.2 activates neuron formation. Consistent with a role in neuronal differentiation, Oct-2.2 expression is induced during differentiation, and cells depleted of Oct-2 and its homolog Oct-1 have a reduced capacity to differentiate into neurons. Our results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.
View details for DOI 10.1101/gad.1772509
View details for Web of Science ID 000263918500005
View details for PubMedID 19270158
Quantifying environmental adaptation of metabolic pathways in metagenomics
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (5): 1374-1379
Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats-i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.
View details for DOI 10.1073/pnas.0808022106
View details for Web of Science ID 000263074600018
View details for PubMedID 19164758
Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing
Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs.We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously.We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.
View details for DOI 10.1186/1471-2164-10-37
View details for Web of Science ID 000264970100002
View details for PubMedID 19159457
Proteomic-Based Detection of a Protein Cluster Dysregulated during Cardiovascular Development Identifies Biomarkers of Congenital Heart Defects
2009; 4 (1)
Cardiovascular development is vital for embryonic survival and growth. Early gestation embryo loss or malformation has been linked to yolk sac vasculopathy and congenital heart defects (CHDs). However, the molecular pathways that underlie these structural defects in humans remain largely unknown hindering the development of molecular-based diagnostic tools and novel therapies.Murine embryos were exposed to high glucose, a condition known to induce cardiovascular defects in both animal models and humans. We further employed a mass spectrometry-based proteomics approach to identify proteins differentially expressed in embryos with defects from those with normal cardiovascular development. The proteins detected by mass spectrometry (WNT16, ST14, Pcsk1, Jumonji, Morca2a, TRPC5, and others) were validated by Western blotting and immunoflorescent staining of the yolk sac and heart. The proteins within the proteomic dataset clustered to adhesion/migration, differentiation, transport, and insulin signaling pathways. A functional role for several proteins (WNT16, ADAM15 and NOGO-A/B) was demonstrated in an ex vivo model of heart development. Additionally, a successful application of a cluster of protein biomarkers (WNT16, ST14 and Pcsk1) as a prenatal screen for CHDs was confirmed in a study of human amniotic fluid (AF) samples from women carrying normal fetuses and those with CHDs.The novel finding that WNT16, ST14 and Pcsk1 protein levels increase in fetuses with CHDs suggests that these proteins may play a role in the etiology of human CHDs. The information gained through this bed-side to bench translational approach contributes to a more complete understanding of the protein pathways dysregulated during cardiovascular development and provides novel avenues for diagnostic and therapeutic interventions, beneficial to fetuses at risk for CHDs.
View details for DOI 10.1371/journal.pone.0004221
View details for Web of Science ID 000265481900004
View details for PubMedID 19156209
Three Distinct Condensin Complexes Control C. elegans Chromosome Dynamics
2009; 19 (1): 9-19
Condensin complexes organize chromosome structure and facilitate chromosome segregation. Higher eukaryotes have two complexes, condensin I and condensin II, each essential for chromosome segregation. The nematode Caenorhabditis elegans was considered an exception, because it has a mitotic condensin II complex but appeared to lack mitotic condensin I. Instead, its condensin I-like complex (here called condensin I(DC)) dampens gene expression along hermaphrodite X chromosomes during dosage compensation.Here we report the discovery of a third condensin complex, condensin I, in C. elegans. We identify new condensin subunits and show that each complex has a conserved five-subunit composition. Condensin I differs from condensin I(DC) by only a single subunit. Yet condensin I binds to autosomes and X chromosomes in both sexes to promote chromosome segregation, whereas condensin I(DC) binds specifically to X chromosomes in hermaphrodites to regulate transcript levels. Both condensin I and II promote chromosome segregation, but associate with different chromosomal regions during mitosis and meiosis. Unexpectedly, condensin I also localizes to regions of cohesion between meiotic chromosomes before their segregation.We demonstrate that condensin subunits in C. elegans form three complexes, one that functions in dosage compensation and two that function in mitosis and meiosis. These results highlight how the duplication and divergence of condensin subunits during evolution may facilitate their adaptation to specialized chromosomal roles and illustrate the versatility of condensins to function in both gene regulation and chromosome segregation.
View details for DOI 10.1016/j.cub.2008.12.006
View details for Web of Science ID 000262584100022
View details for PubMedID 19119011
PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data
2009; 10 (2)
Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.
View details for DOI 10.1186/gb-2009-10-2-r23
View details for Web of Science ID 000266345600020
View details for PubMedID 19236709
MAPK target networks in Arabidopsis thaliana revealed using functional protein microarrays
GENES & DEVELOPMENT
2009; 23 (1): 80-92
Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs). However, to date only a limited number of MKK-MPK interactions and MPK phosphorylation substrates have been revealed. We determined which Arabidopsis thaliana MKKs preferentially activate 10 different MPKs in vivo and used the activated MPKs to probe high-density protein microarrays to determine their phosphorylation targets. Our analyses revealed known and novel signaling modules encompassing 570 MPK phosphorylation substrates; these substrates were enriched in transcription factors involved in the regulation of development, defense, and stress responses. Selected MPK substrates were validated by in planta reconstitution experiments. A subset of activated and wild-type MKKs induced cell death, indicating a possible role for these MKKs in the regulation of cell death. Interestingly, MKK7- and MKK9-induced death requires Sgt1, a known regulator of cell death induced during plant innate immunity. Our predicted MKK-MPK phosphorylation network constitutes a valuable resource to understand the function and specificity of MPK signaling systems.
View details for DOI 10.1101/gad.1740009
View details for Web of Science ID 000262369700008
View details for PubMedID 19095804
RNA-Seq: a revolutionary tool for transcriptomics
NATURE REVIEWS GENETICS
2009; 10 (1): 57-63
RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.
View details for DOI 10.1038/nrg2484
View details for Web of Science ID 000261866500012
View details for PubMedID 19015660
Global identification of protein kinase substrates by protein microarray analysis
2009; 4 (12): 1820-1827
Herein, we describe a protocol for the global identification of in vitro substrates targeted by protein kinases using protein microarray technology. Large numbers of fusion proteins tagged at their carboxy-termini are purified in 96-well format and spotted in duplicate onto amino-silane-coated slides in a spatially addressable manner. These arrays are incubated in the presence of purified kinase and radiolabeled ATP, and then washed, dried and analyzed by autoradiography. The extent of phosphorylation of each spot is quantified and normalized, and proteins that are reproducibly phosphorylated in the presence of the active kinase relative to control slides are scored as positive substrates. This approach enables the rapid determination of kinase-substrate relationship on a proteome-wide scale, and although developed using yeast, has since been adapted to higher eukaryotic systems. Expression, purification and printing of the yeast proteome require about 3 weeks. Afterwards, each kinase assay takes approximately 3 h to perform.
View details for DOI 10.1038/nprot.2009.194
View details for Web of Science ID 000274226100011
View details for PubMedID 20010933
MSB: A mean-shift-based approach for the analysis of structural variation in the genome
2009; 19 (1): 106-117
Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining this. Drawing relevant conclusions from array-CGH requires computational methods for partitioning the chromosome into segments of elevated, reduced, or unchanged copy number. Several approaches have been described, most of which attempt to explicitly model the underlying distribution of data based on particular assumptions. Often, they optimize likelihood functions for estimating model parameters, by expectation maximization or related approaches; however, this requires good parameter initialization through prespecifying the number of segments. Moreover, convergence is difficult to achieve, since many parameters are required to characterize an experiment. To overcome these limitations, we propose a nonparametric method without a global criterion to be optimized. Our method involves mean-shift-based (MSB) procedures; it considers the observed array-CGH signal as sampling from a probability-density function, uses a kernel-based approach to estimate local gradients for this function, and iteratively follows them to determine local modes of the signal. Overall, our method achieves robust discontinuity-preserving smoothing, thus accurately segmenting chromosomes into regions of duplication and deletion. It does not require the number of segments as input, nor does its convergence depend on this. We successfully applied our method to both simulated data and array-CGH experiments on glioblastoma and adenocarcinoma. We show that it performs at least as well as, and often better than, 10 previously published algorithms. Finally, we show that our approach can be extended to segmenting the signal resulting from the depth-of-coverage of mapped reads from next-generation sequencing.
View details for DOI 10.1101/gr.080069.108
View details for Web of Science ID 000262200000010
View details for PubMedID 19037015
PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls
2009; 27 (1): 66-75
Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.
View details for DOI 10.1038/nbt.1518
View details for Web of Science ID 000262471200025
View details for PubMedID 19122651
Methods in molecular biology (Clifton, N.J.)
2009; 548: 209-222
Protein microarrays containing nearly the entire yeast proteome have been constructed. They are typically prepared by overexpression and high-throughput purification and printing onto microscope slides. The arrays can be used to screen nearly the entire proteome in an unbiased fashion and have enormous utility for a variety of applications. These include protein-protein interactions, identification of novel lipid- and nucleic acid-binding proteins, and finding targets of small molecules, protein kinases, and other modification enzymes. Protein microarrays are thus powerful tools for individual studies as well as systematic characterization of proteins and their biochemical activities and regulation.
View details for DOI 10.1007/978-1-59745-540-4_12
View details for PubMedID 19521827
Mismatch oligonucleotides in human and yeast: guidelines for probe design on tiling microarrays
Mismatched oligonucleotides are widely used on microarrays to differentiate specific from nonspecific hybridization. While many experiments rely on such oligos, the hybridization behavior of various degrees of mismatch (MM) structure has not been extensively studied. Here, we present the results of two large-scale microarray experiments on S. cerevisiae and H. sapiens genomic DNA, to explore MM oligonucleotide behavior with real sample mixtures under tiling-array conditions.We examined all possible nucleotide substitutions at the central position of 36-nucleotide probes, and found that nonspecific binding by MM oligos depends upon the individual nucleotide substitutions they incorporate: C-->A, C-->G and T-->A (yielding purine-purine mispairs) are most disruptive, whereas A-->X were least disruptive. We also quantify a marked GC skew effect: substitutions raising probe GC content exhibit higher intensity (and vice versa). This skew is small in highly-expressed regions (+/- 0.5% of total intensity range) and large (+/- 2% or more) elsewhere. Multiple mismatches per oligo are largely additive in effect: each MM added in a distributed fashion causes an additional 21% intensity drop relative to PM, three-fold more disruptive than adding adjacent mispairs (7% drop per MM).We investigate several parameters for oligonucleotide design, including the effects of each central nucleotide substitution on array signal intensity and of multiple MM per oligo. To avoid GC skew, individual substitutions should not alter probe GC content. RNA sample mixture complexity may increase the amount of nonspecific hybridization, magnify GC skew and boost the intensity of MM oligos at all levels.
View details for DOI 10.1186/1471-2164-9-635
View details for Web of Science ID 000264109200001
View details for PubMedID 19117516
Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history
2008; 18 (12): 1865-1874
Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.
View details for DOI 10.1101/gr.081422.108
View details for Web of Science ID 000261398900002
View details for PubMedID 18842824
Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding
2008; 18 (12): 1906-1917
We characterized the relationship of H3K4me1 and H3K4me3 at distal and proximal regulatory elements by comparing ChIP-seq profiles for these histone modifications and for two functionally different transcription factors: STAT1 in the immortalized HeLa S3 cell line, with and without interferon-gamma (IFNG) stimulation; and FOXA2 in mouse adult liver tissue. In unstimulated and stimulated HeLa cells, respectively, we determined approximately 270,000 and approximately 301,000 H3K4me1-enriched regions, and approximately 54,500 and approximately 76,100 H3K4me3-enriched regions. In mouse adult liver, we determined approximately 227,000 and approximately 34,800 H3K4me1 and H3K4me3 regions. Seventy-five percent of the approximately 70,300 STAT1 binding sites in stimulated HeLa cells and 87% of the approximately 11,000 FOXA2 sites in mouse liver were distal to known gene TSS; in both cell types, approximately 83% of these distal sites were associated with at least one of the two histone modifications, and H3K4me1 was associated with over 96% of marked distal sites. After filtering against predicted transcription start sites, 50% of approximately 26,800 marked distal IFNG-stimulated STAT1 binding sites, but 95% of approximately 5800 marked distal FOXA2 sites, were associated with H3K4me1 only. Results for HeLa cells generated additional insights into transcriptional regulation involving STAT1. STAT1 binding was associated with 25% of all H3K4me1 regions in stimulated HeLa cells, suggesting that a single transcription factor can interact with an unexpectedly large fraction of regulatory regions. Strikingly, for a large majority of the locations of stimulated STAT1 binding, the dominant H3K4me1/me3 combinations were established before activation, suggesting mechanisms independent of IFNG stimulation and high-affinity STAT1 binding.
View details for DOI 10.1101/gr.078519.108
View details for Web of Science ID 000261398900006
View details for PubMedID 18787082
High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution
2008; 4 (11)
Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction ( approximately 55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that approximately 50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.
View details for DOI 10.1371/journal.pgen.1000249
View details for Web of Science ID 000261481000004
View details for PubMedID 18989455
A procedure for highly specific, sensitive, and unbiased whole-genome amplification
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2008; 105 (40): 15499-15504
Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA. With an input of as little as 0.5-2.5 ng of human gDNA or a few cells, the product could be close to native DNA in locus representation. The amplicons from 5 and 0.5 ng of DNA faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high-resolution chromosome-wide comparative genomic hybridization. With 550k Infinium BeadChip SNP typing, the >99.7% accuracy was compared favorably with results on unamplified DNA. Importantly, underrepresentation of chromosome termini that occurred with GenomiPhi v2 was greatly rescued with the present procedure, and the call rate and accuracy of SNP typing were also improved for the amplicons with a 0.5-ng, partially degraded DNA input. In addition, the amplification proceeded logarithmically in terms of total yield before saturation; the intact cells was amplified >50 times more efficiently than an equivalent amount of extracted DNA; and the locus imbalance for amplicons with 0.1 ng or lower input of DNA was variable, whereas for higher input it was largely reproducible. This procedure facilitates genomic analysis with single cells or other traces of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.
View details for DOI 10.1073/pnas.0808028105
View details for Web of Science ID 000260360500052
View details for PubMedID 18832167
High-quality binary protein interaction map of the yeast interactome network
2008; 322 (5898): 104-110
Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a "second-generation" high-quality, high-throughput Y2H data set covering approximately 20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.
View details for DOI 10.1126/science.1158684
View details for Web of Science ID 000259680200048
View details for PubMedID 18719252
Modeling ChIP Sequencing In Silico with Applications
PLOS COMPUTATIONAL BIOLOGY
2008; 4 (8)
ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.
View details for DOI 10.1371/journal.pcbi.1000158
View details for Web of Science ID 000260041300021
View details for PubMedID 18725927
A genomic analysis of RNA polymerase II modification and chromatin architecture related to 3 ' end RNA polyadenylation
2008; 18 (8): 1224-1237
Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3' end formation and transcription termination. We used a novel approach to prepare 3' end fragments from polyadenylated RNA, and mapped the position of the poly(A) addition site using oligonucleotide arrays tiling 1% of the human genome. This approach revealed more 3' ends than had been annotated. The distribution of these ends relative to RNA polymerase II (PolII) and di- and trimethylated lysine 4 and lysine 36 of histone H3 was compared. A substantial fraction of unannotated 3' ends of RNA are intronic and antisense to the embedding gene. Poly(A) ends of annotated messages lie on average 2 kb upstream of the end of PolII binding (termination). Near the termination sites, and in some internal sites, unphosphorylated and C-terminal domain (CTD) serine 2 phosphorylated PolII (POLR2A) accumulate, suggesting pausing of the polymerase and perhaps dephosphorylation prior to release. Lysine 36 trimethylation occurs across transcribed genes, sometimes alternating with stretches of DNA in which lysine 36 dimethylation is more prominent. Lysine 36 methylation decreases at or near the site of polyadenylation, sometimes disappearing before disappearance of phosphorylated RNA PolII or release of PolII from DNA. Our results suggest that transcription termination loss of histone 3 lysine 36 methylation and later release of RNA polymerase. The latter is often associated with polymerase pausing. Overall, our study reveals extensive sites of poly(A) addition and provides insights into the events that occur during 3' end formation.
View details for DOI 10.1101/gr.075804.107
View details for Web of Science ID 000258116100004
View details for PubMedID 18487515
Genome-Wide Occupancy of SREBP1 and Its Partners NFY and SP1 Reveals Novel Functional Roles and Combinatorial Regulation of Distinct Classes of Genes
2008; 4 (7)
The sterol regulatory element-binding protein (SREBP) family member SREBP1 is a critical transcriptional regulator of cholesterol and fatty acid metabolism and has been implicated in insulin resistance, diabetes, and other diet-related diseases. We globally identified the promoters occupied by SREBP1 and its binding partners NFY and SP1 in a human hepatocyte cell line using chromatin immunoprecipitation combined with genome tiling arrays (ChIP-chip). We find that SREBP1 occupies the promoters of 1,141 target genes involved in diverse biological pathways, including novel targets with roles in lipid metabolism and insulin signaling. We also identify a conserved SREBP1 DNA-binding motif in SREBP1 target promoters, and we demonstrate that many SREBP1 target genes are transcriptionally activated by treatment with insulin and glucose using gene expression microarrays. Finally, we show that SREBP1 cooperates extensively with NFY and SP1 throughout the genome and that unique combinations of these factors target distinct functional pathways. Our results provide insight into the regulatory circuitry in which SREBP1 and its network partners coordinate a complex transcriptional response in the liver with cues from the diet.
View details for DOI 10.1371/journal.pgen.1000133
View details for Web of Science ID 000260410600005
View details for PubMedID 18654640
The transcriptional landscape of the yeast genome defined by RNA sequencing
2008; 320 (5881): 1344-1349
The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.
View details for DOI 10.1126/science.1158441
View details for Web of Science ID 000256441100046
View details for PubMedID 18451266
The current excitement about copy-number variation: how it relates to gene duplications and protein families
CURRENT OPINION IN STRUCTURAL BIOLOGY
2008; 18 (3): 366-374
Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.
View details for DOI 10.1016/j.sbi.2008.02.005
View details for Web of Science ID 000257539100013
View details for PubMedID 18511261
Leptin affects endocardial cushion formation by modulating EMT and migration via Akt signaling cascades
JOURNAL OF CELL BIOLOGY
2008; 181 (2): 367-380
Blood circulation is dependent on heart valves to direct blood flow through the heart and great vessels. Valve development relies on epithelial to mesenchymal transition (EMT), a central feature of embryonic development and metastatic cancer. Abnormal EMT and remodeling contribute to the etiology of several congenital heart defects. Leptin and its receptor were detected in the mouse embryonic heart. Using an ex vivo model of cardiac EMT, the inhibition of leptin results in a signal transducer and activator of transcription 3 and Snail/vascular endothelial cadherin-independent decrease in EMT and migration. Our data suggest that an Akt signaling pathway underlies the observed phenotype. Furthermore, loss of leptin phenocopied the functional inhibition of alphavbeta3 integrin receptor and resulted in decreased alphavbeta3 integrin and matrix metalloprotease 2, suggesting that the leptin signaling pathway is involved in adhesion and migration processes. This study adds leptin to the repertoire of factors that mediate EMT and, for the first time, demonstrates a role for the interleukin 6 family in embryonic EMT.
View details for Web of Science ID 000255410300018
View details for PubMedID 18411306
Myo2p, a class V myosin in budding yeast, associates with a large ribonucleic acid-protein complex that contains mRNAs and subunits of the RNA-processing body
RNA-A PUBLICATION OF THE RNA SOCIETY
2008; 14 (3): 491-502
Myo2p is an essential class V myosin in budding yeast with several identified functions in organelle trafficking and spindle orientation. The present study demonstrates that Myo2p is a component of a large RNA-containing complex (Myo2p-RNP) that is distinct from polysomes based on sedimentation analysis and lack of ribosomal subunits in the Myo2p-RNP. Microarray analysis of RNAs that coimmunoprecipitate with Myo2p revealed the presence of a large number of mRNAs in this complex. The Myo2p-RNA complex is in part composed of the RNA processing body (P-body) based on coprecipitation with P-body protein subunits and partial colocalization of Myo2p with P-bodies. P-body disassembly is delayed in the motor mutant, myo2-66, indicating that Myo2p may facilitate the release of mRNAs from the P-body.
View details for DOI 10.1261/rna.665008
View details for Web of Science ID 000253565400012
View details for PubMedID 18218704
Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome
2008; 9 (1)
Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.
View details for DOI 10.1186/gb-2008-9-1-r3
View details for Web of Science ID 000253779800011
View details for PubMedID 18173853
The development of protein microarrays and their applications in DNA-protein and protein-protein interaction analyses of Arabidopsis transcription factors
2008; 1 (1): 27-41
We used our collection of Arabidopsis transcription factor (TF) ORFeome clones to construct protein microarrays containing as many as 802 TF proteins. These protein microarrays were used for both protein-DNA and protein-protein interaction analyses. For protein-DNA interaction studies, we examined AP2/ERF family TFs and their cognate cis-elements. By careful comparison of the DNA-binding specificity of 13 TFs on the protein microarray with previous non-microarray data, we showed that protein microarrays provide an efficient and high throughput tool for genome-wide analysis of TF-DNA interactions. This microarray protein-DNA interaction analysis allowed us to derive a comprehensive view of DNA-binding profiles of AP2/ERF family proteins in Arabidopsis. It also revealed four TFs that bound the EE (evening element) and had the expected phased gene expression under clock-regulation, thus providing a basis for further functional analysis of their roles in clock regulation of gene expression. We also developed procedures for detecting protein interactions using this TF protein microarray and discovered four novel partners that interact with HY5, which can be validated by yeast two-hybrid assays. Thus, plant TF protein microarrays offer an attractive high-throughput alternative to traditional techniques for TF functional characterization on a global scale.
View details for DOI 10.1093/mp/ssm009
View details for Web of Science ID 000259068900005
View details for PubMedID 19802365
RNA polymerase II stalling: loading at the start prepares genes for a sprint
2008; 9 (5)
Stalling of RNA polymerase II near the promoter has recently been found to be much more common than previously thought. Genome-wide surveys of the phenomenon suggest that it is likely to be a rate-limiting control on gene activation that poises developmental and stimulus-responsive genes for prompt expression when inducing signals are received.
View details for DOI 10.1186/gb-2008-9-5-220
View details for Web of Science ID 000257564800002
View details for PubMedID 18466645
Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (44): 17494-17499
Ovarian cancer is a leading cause of deaths, yet many aspects of the biology of the disease and a routine means of its detection are lacking. We have used protein microarrays and autoantibodies from cancer patients to identify proteins that are aberrantly expressed in ovarian tissue. Sera from 30 cancer patients and 30 healthy individuals were used to probe microarrays containing 5,005 human proteins. Ninety-four antigens were identified that exhibited enhanced reactivity from sera in cancer patients relative to control sera. The differential reactivity of four antigens was tested by using immunoblot analysis and tissue microarrays. Lamin A/C, SSRP1, and RALBP1 were found to exhibit increased expression in the cancer tissue relative to controls. The combined signals from multiple antigens proved to be a robust test to identify cancerous ovarian tissue. These antigens were also reactive with tissue from other types of cancer and thus are not specific to ovarian cancer. Overall our studies identified candidate tissue marker proteins for ovarian cancer and demonstrate that protein microarrays provide a powerful approach to identify proteins aberrantly expressed in disease states.
View details for DOI 10.1073/pnas.0708572104
View details for Web of Science ID 000250638400048
View details for PubMedID 17954908
Paired-end mapping reveals extensive structural variation in the human genome
2007; 318 (5849): 420-426
Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.
View details for DOI 10.1126/science.1149504
View details for Web of Science ID 000250230400038
View details for PubMedID 17901297
Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms
FUNCTIONAL & INTEGRATIVE GENOMICS
2007; 7 (4): 335-345
In recent years, techniques have been developed to map transcription factor binding sites using chromatin immunoprecipitation combined with DNA microarrays (chIP chip). Initially, polymerase chain reaction (PCR)-based DNA arrays were used for the chIP chip procedure, however, high-density oligonucleotide (HDO) arrays, which allow for the production of thousands more features per array, have emerged as a competing array platform. To compare the two platforms, data from chIP chip analysis performed for three factors (Tec1, Ste12, and Sok2) using both HDO and PCR arrays under identical experimental conditions were compared. HDO arrays provided increased reproducibility and sensitivity, detecting approximately three times more binding events than the PCR arrays while also showing increased accuracy. The increased resolution provided by the HDO arrays also allowed for the identification of multiple binding peaks in close proximity and of novel binding events such as binding within ORFs. The HDO array platform provides a far more robust array system by all measures than PCR-based arrays, all of which is directly attributable to the large number of probes available.
View details for DOI 10.1007/s10142-007-0054-7
View details for Web of Science ID 000249808300006
View details for PubMedID 17638031
Arabidopsis protein microarrays for the high-throughput identification of protein-protein interactions.
Plant signaling & behavior
2007; 2 (5): 416-420
Protein microarray technology has emerged as a powerful new approach for the study of thousands of proteins simultaneously. Protein microarrays have been used for a wide variety of applications for the human and yeast systems. In a recent study, we demonstrated that Arabidopsis functional protein microarrays can be generated and employed to characterize the function of plant proteins. The arrayed proteins were produced using an optimized large-scale plant-based expression system. In a proof-of concept study, 173 known and novel potential substrates of calmodulin (CaM) and calmodulin-like proteins (CML) were identified in an unbiased and high-throughput manner. The information documented here on novel potential CaM targets provides new testable hypotheses in the area of CaM/Ca(2+)-regulated processes and represents a resource of functional information for the scientific community.
View details for PubMedID 19704619
Divergence of transcription factor binding sites across related yeast species
2007; 317 (5839): 815-819
Characterization of interspecies differences in gene regulation is crucial for understanding the molecular basis of both phenotypic diversity and evolution. By means of chromatin immunoprecipitation and DNA microarray analysis, the divergence in the binding sites of the pseudohyphal regulators Ste12 and Tec1 was determined in the yeasts Saccharomyces cerevisiae, S. mikatae, and S. bayanus under pseudohyphal conditions. We have shown that most of these sites have diverged across these species, far exceeding the interspecies variation in orthologous genes. A group of Ste12 targets was shown to be bound only in S. mikatae and S. bayanus under pseudohyphal conditions. Many of these genes are targets of Ste12 during mating in S. cerevisiae, indicating that specialization between the two pathways has occurred in this species. Transcription factor binding sites have therefore diverged substantially faster than ortholog content. Thus, gene regulation resulting from transcription factor binding is likely to be a major cause of divergence between related species.
View details for DOI 10.1126/science.1140748
View details for Web of Science ID 000248624500044
View details for PubMedID 17690298
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
2007; 447 (7146): 799-816
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
View details for DOI 10.1038/nature05874
View details for Web of Science ID 000247207500034
View details for PubMedID 17571346
View details for PubMedCentralID PMC2212820
Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (24): 10110-10115
Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.
View details for DOI 10.1073/pnas.0703834104
View details for Web of Science ID 000247363000036
View details for PubMedID 17551006
Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome
2007; 17 (6): 886-897
Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.
View details for DOI 10.1101/gr.5014606
View details for Web of Science ID 000247226900020
View details for PubMedID 17119069
Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE)
2007; 17 (6): 910-916
Identifying the genome-wide binding sites of transcription factors is important in deciphering transcriptional regulatory networks. ChIP-chip (Chromatin immunoprecipitation combined with microarrays) has been widely used to map transcription factor binding sites in the human genome. However, whole genome ChIP-chip analysis is still technically challenging in vertebrates. We recently developed STAGE as an unbiased method for identifying transcription factor binding sites in the genome. STAGE is conceptually based on SAGE, except that the input is ChIP-enriched DNA. In this study, we implemented an improved sequencing strategy and analysis methods and applied STAGE to map the genomic binding profile of the transcription factor STAT1 after interferon treatment. STAT1 is mainly responsible for mediating the cellular responses to interferons, such as cell proliferation, apoptosis, immune surveillance, and immune responses. We present novel algorithms for STAGE tag analysis to identify enriched loci with high specificity, as verified by quantitative ChIP. STAGE identified several previously unknown STAT1 target genes, many of which are involved in mediating the response to interferon-gamma signaling. STAGE is thus a viable method for identifying the chromosomal targets of transcription factors and generating meaningful biological hypotheses that further our understanding of transcriptional regulatory networks.
View details for DOI 10.1101/gr.5574907
View details for Web of Science ID 000247226900022
View details for PubMedID 17568006
Structured RNAs in the ENCODE selected regions of the human genome
2007; 17 (6): 852-864
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
View details for DOI 10.1101/gr.5650707
View details for Web of Science ID 000247226900017
View details for PubMedID 17568003
Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies
2007; 17 (6): 898-909
Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.
View details for DOI 10.1101/gr.5583007
View details for Web of Science ID 000247226900021
View details for PubMedID 17568005
Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome
2007; 17 (6): 720-731
The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3'-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5'-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5'-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5'-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.
View details for DOI 10.1101/gr.5716607
View details for Web of Science ID 000247226900006
View details for PubMedID 17567992
Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions
2007; 17 (6): 787-797
The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.
View details for DOI 10.1101/gr.5573107
View details for Web of Science ID 000247226900011
View details for PubMedID 17567997
The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci
2007; 17 (6): 732-745
For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
View details for DOI 10.1101/gr.5696007
View details for Web of Science ID 000247226900007
View details for PubMedID 17567993
What is a gene, post-ENCODE? History and updated definition
2007; 17 (6): 669-681
While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.
View details for DOI 10.1101/gr.6339607
View details for Web of Science ID 000247226900002
View details for PubMedID 17567988
Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution
2007; 17 (6): 839-851
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
View details for DOI 10.1101/gr.5586307
View details for Web of Science ID 000247226900016
View details for PubMedID 17568002
Getting connected: analysis and principles of biological networks
GENES & DEVELOPMENT
2007; 21 (9): 1010-1024
The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.
View details for DOI 10.1101/gad.1528707
View details for Web of Science ID 000246154100002
View details for PubMedID 17473168
Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (11): 4730-4735
Calmodulins (CaMs) are the most ubiquitous calcium sensors in eukaryotes. A number of CaM-binding proteins have been identified through classical methods, and many proteins have been predicted to bind CaMs based on their structural homology with known targets. However, multicellular organisms typically contain many CaM-like (CML) proteins, and a global identification of their targets and specificity of interaction is lacking. In an effort to develop a platform for large-scale analysis of proteins in plants we have developed a protein microarray and used it to study the global analysis of CaM/CML interactions. An Arabidopsis thaliana expression collection containing 1,133 ORFs was generated and used to produce proteins with an optimized medium-throughput plant-based expression system. Protein microarrays were prepared and screened with several CaMs/CMLs. A large number of previously known and novel CaM/CML targets were identified, including transcription factors, receptor and intracellular protein kinases, F-box proteins, RNA-binding proteins, and proteins of unknown function. Multiple CaM/CML proteins bound many binding partners, but the majority of targets were specific to one or a few CaMs/CMLs indicating that different CaM family members function through different targets. Based on our analyses, the emergent CaM/CML interactome is more extensive than previously predicted. Our results suggest that calcium functions through distinct CaM/CML proteins to regulate a wide range of targets and cellular activities.
View details for DOI 10.1073/pnas.0611615104
View details for Web of Science ID 000244972700086
View details for PubMedID 17360592
New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis
GENES & DEVELOPMENT
2007; 21 (5): 601-614
Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary tract infections. We explored the pathogenic content of this harmful pathogen using a combination of DNA sequencing and insertional mutagenesis. The genome of this organism was sequenced using a strategy involving high-density pyrosequencing, a novel, rapid method of high-throughput sequencing. Excluding the rDNA repeats, the assembled genome is 3,976,746 base pairs (bp) and has 3830 ORFs. A significant fraction of ORFs (17.2%) are located in 28 putative alien islands, indicating that the genome has acquired a large amount of foreign DNA. Consistent with its role in pathogenesis, a remarkable number of the islands (16) contain genes implicated in virulence, indicating the organism devotes a considerable portion of its genes to pathogenesis. The largest island contains elements homologous to the Legionella/Coxiella Type IV secretion apparatus. Type IV secretion systems have been demonstrated to be important for virulence in other organisms and thus are likely to help mediate pathogenesis of A. baumannii. Insertional mutagenesis generated avirulent isolates of A. baumannii and verified that six of the islands contain virulence genes, including two novel islands containing genes that lacked homology with others in the databases. The DNA sequencing approach described in this study allows the rapid elucidation of the DNA sequence of any microbe and, when combined with genetic screens, can identify many novel genes important for microbial pathogenesis.
View details for DOI 10.1101/gad.1510307
View details for Web of Science ID 000244760600011
View details for PubMedID 17344419
Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool
NUCLEIC ACIDS RESEARCH
2007; 35 (2)
Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the 'chip artifact'. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the 'plate artifact'. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool-COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at http://bioinfo.mbb.yale.edu/ExpressYourself/. Together, the two can eliminate most of the common noises in microarray data.
View details for DOI 10.1093/nar/gkl871
View details for Web of Science ID 000243993600001
View details for PubMedID 17158151
- Yeast protein microarrays YEAST GENE ANALYSIS, SECOND EDITION 2007; 36: 303-?
Protein microarray technology
3rd International Conference on Functional Genomics of Ageing
ELSEVIER IRELAND LTD. 2007: 161–67
Protein chips have emerged as a promising approach for a wide variety of applications including the identification of protein-protein interactions, protein-phospholipid interactions, small molecule targets, and substrates of proteins kinases. They can also be used for clinical diagnostics and monitoring disease states. This article reviews current methods in the generation and applications of protein microarrays.
View details for DOI 10.1016/j.mad.2006.11.021
View details for Web of Science ID 000244301700024
View details for PubMedID 17126887
Tilescope: online analysis pipeline for high-density tiling microarray data
2007; 8 (5)
We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data http://tilescope.gersteinlab.org. In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.
View details for DOI 10.1186/gb-2007-8-5-r81
View details for Web of Science ID 000246983100034
View details for PubMedID 17501994
A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge
2006; 22 (24): 3016-3024
Large-scale tiling array experiments are becoming increasingly common in genomics. In particular, the ENCODE project requires the consistent segmentation of many different tiling array datasets into 'active regions' (e.g. finding transfrags from transcriptional data and putative binding sites from ChIP-chip experiments). Previously, such segmentation was done in an unsupervised fashion mainly based on characteristics of the signal distribution in the tiling array data itself. Here we propose a supervised framework for doing this. It has the advantage of explicitly incorporating validated biological knowledge into the model and allowing for formal training and testing.In particular, we use a hidden Markov model (HMM) framework, which is capable of explicitly modeling the dependency between neighboring probes and whose extended version (the generalized HMM) also allows explicit description of state duration density. We introduce a formal definition of the tiling-array analysis problem, and explain how we can use this to describe sampling small genomic regions for experimental validation to build up a gold-standard set for training and testing. We then describe various ideal and practical sampling strategies (e.g. maximizing signal entropy within a selected region versus using gene annotation or known promoters as positives for transcription or ChIP-chip data, respectively).For the practical sampling and training strategies, we show how the size and noise in the validated training data affects the performance of an HMM applied to the ENCODE transcriptional and ChIP-chip experiments. In particular, we show that the HMM framework is able to efficiently process tiling array data as well as or better than previous approaches. For the idealized sampling strategies, we show how we can assess their performance in a simulation framework and how a maximum entropy approach, which samples sub-regions with very different signal intensities, gives the maximally performing gold-standard. This latter result has strong implications for the optimum way medium-scale validation experiments should be carried out to verify the results of the genome-scale tiling array experiments.
View details for DOI 10.1093/bioinformatics/btl515
View details for Web of Science ID 000242715200008
View details for PubMedID 17038339
High-throughput methods of regulatory element discovery
2006; 41 (6): 673-?
With the number of organisms whose genomes have been sequenced, a vast amount of information concerning the genetic structure of an organism's genome has been collected. However, effective experiment means to study how this information is accessed have only recently been developed. In this review, three basic methods for identifying regions of protein-DNA interaction will be introduced. The first two, chromatin immunoprecipitation (ChIP)-chip and ChIP-PET (for paired-end ditag), rely on the enrichment provided by chromosomal immunoprecipitation to interrogate the genomic sequence for the interaction sites of a protein of interest. In contrast, protein microarrays allow the identification of DNA binding protein that interacts with a DNA sequence of interest. These complementary methods of exploring protein-DNA interactions will increase our fundamental knowledge of how the information contained within the genome sequence is accessed and processed.
View details for Web of Science ID 000242737100019
View details for PubMedID 17191608
HTRA1 promoter polymorphism in wet age-related macular degeneration
2006; 314 (5801): 989-992
Age-related macular degeneration (AMD), the most common cause of irreversible vision loss in individuals aged older than 50 years, is classified as either wet (neovascular) or dry (nonneovascular). Inherited variation in the complement factor H gene is a major risk factor for drusen in dry AMD. Here we report that a single-nucleotide polymorphism in the promoter region of HTRA1, a serine protease gene on chromosome 10q26, is a major genetic risk factor for wet AMD. A whole-genome association mapping strategy was applied to a Chinese population, yielding a P value of <10(-11). Individuals with the risk-associated genotype were estimated to have a likelihood of developing wet AMD 10 times that of individuals with the wild-type genotype.
View details for DOI 10.1126/science.1133807
View details for Web of Science ID 000241896000052
View details for PubMedID 17053108
Charging it up: global analysis of protein phosphorylation
TRENDS IN GENETICS
2006; 22 (10): 545-554
Protein phosphorylation affects most, if not all, cellular activities in eukaryotes and is essential for cell proliferation and development. An estimated 30% of cellular proteins are phosphorylated, representing the phosphoproteome, and phosphorylation can alter a protein's function, activity, localization and stability. Recent studies for large-scale identification of phosphosites using mass spectrometry are revealing the components of the phosphoproteome. The development of new tools, such as kinase assays using modified kinases or protein microarrays, enables rapid kinase substrate identification. The dynamics of specific phosphorylation events can now be monitored using mass spectrometry, single-cell analysis of flow cytometry, or fluorescent reporters. Together, these techniques are beginning to elucidate cellular processes and pathways regulated by phosphorylation, in addition to global regulatory networks.
View details for DOI 10.1016/j.tig.2006.08.005
View details for Web of Science ID 000241268400006
View details for PubMedID 16908088
TOS9 regulates white-opaque switching in Candida albicans
2006; 5 (10): 1674-1687
In Candida albicans, the a1-alpha2 complex represses white-opaque switching, as well as mating. Based upon the assumption that the a1-alpha2 corepressor complex binds to the gene that regulates white-opaque switching, a chromatinimmunoprecipitation-microarray analysis strategy was used to identify 52 genes that bound to the complex. One of these genes, TOS9, exhibited an expression pattern consistent with a "master switch gene." TOS9 was only expressed in opaque cells, and its gene product, Tos9p, localized to the nucleus. Deletion of the gene blocked cells in the white phase, misexpression in the white phase caused stable mass conversion of cells to the opaque state, and misexpression blocked temperature-induced mass conversion from the opaque state to the white state. A model was developed for the regulation of spontaneous switching between the opaque state and the white state that includes stochastic changes of Tos9p levels above and below a threshold that induce changes in the chromatin state of an as-yet-unidentified switching locus. TOS9 has also been referred to as EAP2 and WOR1.
View details for DOI 10.1128/EC.00252-06
View details for Web of Science ID 000241344300010
View details for PubMedID 16950924
Predicting essential genes in fungal genomes
2006; 16 (9): 1126-1135
Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through computational methods is appealing because it circumvents expensive and difficult experimental screens. Most such prediction is based on homology mapping to experimentally verified essential genes in model organisms. We present here a different approach, one that relies exclusively on sequence features of a gene to estimate essentiality and offers a promising way to identify essential genes in unstudied or uncultured organisms. We identified 14 characteristic sequence features potentially associated with essentiality, such as localization signals, codon adaptation, GC content, and overall hydrophobicity. Using the well-characterized baker's yeast Saccharomyces cerevisiae, we employed a simple Bayesian framework to measure the correlation of each of these features with essentiality. We then employed the 14 features to learn the parameters of a machine learning classifier capable of predicting essential genes. We trained our classifier on known essential genes in S. cerevisiae and applied it to the closely related and relatively unstudied yeast Saccharomyces mikatae. We assessed predictive success in two ways: First, we compared all of our predictions with those generated by homology mapping between these two species. Second, we verified a subset of our predictions with eight in vivo knockouts in S. mikatae, and we present here the first experimentally confirmed essential genes in this species.
View details for DOI 10.1101/gr.5144106
View details for Web of Science ID 000240238600007
View details for PubMedID 16899653
Proteome chips for whole-organism assays
NATURE REVIEWS MOLECULAR CELL BIOLOGY
2006; 7 (8): 617-622
Over the past 5 years, protein-chip technology has emerged as a useful tool for the study of many kinds of protein interactions and biochemical activities. The construction of Saccharomyces cerevisiae whole-proteome arrays has enabled further studies of such interactions in a proteome-wide context. Here, we explore some of the recent advances that have been made at the '-omic' level using protein microarrays.
View details for DOI 10.1038/nrm1941
View details for Web of Science ID 000239240000019
View details for PubMedID 16723973
Linking DNA-binding proteins to their recognition sequences by using protein microarrays
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (26): 9940-9945
Analyses of whole-genome sequences and experimental data sets have revealed a large number of DNA sequence motifs that are conserved in many species and may be functional. However, methods of sufficient scale to explore the roles of these elements are lacking. We describe the use of protein arrays to identify proteins that bind to DNA sequences of interest. A microarray of 282 known and potential yeast transcription factors was produced and probed with oligonucleotides of evolutionarily conserved sequences that are potentially functional. Transcription factors that bound to specific DNA sequences were identified. One previously uncharacterized DNA-binding protein, Yjl103, was characterized in detail. We defined the binding site for this protein and identified a number of its target genes, many of which are involved in stress response and oxidative phosphorylation. Protein microarrays offer a high-throughput method for determining DNA-protein interactions.
View details for DOI 10.1073/pnas.0509185103
View details for Web of Science ID 000238872900036
View details for PubMedID 16785442
Defined culture conditions of human embryonic stem cells
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (15): 5688-5693
Human embryonic stem cells (hESCs) are pluripotent cells that have the potential to differentiate into any tissue in the human body; therefore, they are a valuable resource for regenerative medicine, drug screening, and developmental studies. However, the clinical application of hESCs is hampered by the difficulties of eliminating animal products in the culture medium and/or the complexity of conditions required to support hESC growth. We have developed a simple medium [termed hESC Cocktail (HESCO)] containing basic fibroblast growth factor, Wnt3a, April (a proliferation-inducing ligand)/BAFF (B cell-activating factor belonging to TNF), albumin, cholesterol, insulin, and transferrin, which is sufficient for hESC self-renewal and proliferation. Cells grown in HESCO were maintained in an undifferentiated state as determined by using six different stem cell markers, and their genomic integrity was confirmed by karyotyping. Cells cultured in HESCO readily form embryoid bodies in tissue culture and teratomas in mice. In both cases, the cells differentiated into each of the three cell lineages, ectoderm, endoderm, and mesoderm, indicating that they maintained their pluripotency. The use of a minimal medium sufficient for hESC growth is expected to greatly facilitate clinical application and developmental studies of hESCs.
View details for DOI 10.1073/pnas.0601383103
View details for Web of Science ID 000236896200012
View details for PubMedID 16595624
High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (12): 4534-4539
Deletions and amplifications of the human genomic sequence (copy number polymorphisms) are the cause of numerous diseases and a potential cause of phenotypic variation in the normal population. Comparative genomic hybridization (CGH) has been developed as a useful tool for detecting alterations in DNA copy number that involve blocks of DNA several kilobases or larger in size. We have developed high-resolution CGH (HR-CGH) to detect accurately and with relatively little bias the presence and extent of chromosomal aberrations in human DNA. Maskless array synthesis was used to construct arrays containing 385,000 oligonucleotides with isothermal probes of 45-85 bp in length; arrays tiling the beta-globin locus and chromosome 22q were prepared. Arrays with a 9-bp tiling path were used to map a 622-bp heterozygous deletion in the beta-globin locus. Arrays with an 85-bp tiling path were used to analyze DNA from patients with copy number changes in the pericentromeric region of chromosome 22q. Heterozygous deletions and duplications as well as partial triploidies and partial tetraploidies of portions of chromosome 22q were mapped with high resolution (typically up to 200 bp) in each patient, and the precise breakpoints of two deletions were confirmed by DNA sequencing. Additional peaks potentially corresponding to known and novel additional CNPs were also observed. Our results demonstrate that HR-CGH allows the detection of copy number changes in the human genome at an unprecedented level of resolution.
View details for DOI 10.1073/pnas.0511340103
View details for Web of Science ID 000236362600039
View details for PubMedID 16537408
Severe acute respiratory syndrome diagnostics using a coronavirus protein microarray
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (11): 4011-4016
To monitor severe acute respiratory syndrome (SARS) infection, a coronavirus protein microarray that harbors proteins from SARS coronavirus (SARS-CoV) and five additional coronaviruses was constructed. These microarrays were used to screen approximately 400 Canadian sera from the SARS outbreak, including samples from confirmed SARS-CoV cases, respiratory illness patients, and healthcare professionals. A computer algorithm that uses multiple classifiers to predict samples from SARS patients was developed and used to predict 206 sera from Chinese fever patients. The test assigned patients into two distinct groups: those with antibodies to SARS-CoV and those without. The microarray also identified patients with sera reactive against other coronavirus proteins. Our results correlated well with an indirect immunofluorescence test and demonstrated that viral infection can be monitored for many months after infection. We show that protein microarrays can serve as a rapid, sensitive, and simple tool for large-scale identification of viral-specific antibodies in sera.
View details for Web of Science ID 000236429300016
View details for PubMedID 16537477
Target hub proteins serve as master regulators of development in yeast
GENES & DEVELOPMENT
2006; 20 (4): 435-448
To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in Saccharomyces cerevisiae. The binding targets of Ste12, Tec1, Sok2, Phd1, Mga1, and Flo8 were globally mapped across the yeast genome. The factors and their targets form a complex binding network, containing patterns characteristic of autoregulation, feedback and feed-forward loops, and cross-talk. Combinatorial binding to intergenic regions was commonly observed, which allowed for the identification of a novel binding association between Mga1 and Flo8, in which Mga1 requires Flo8 for binding to promoter regions. Further analysis of the network showed that the promoters of MGA1 and PHD1 were bound by all of the factors used in this study, identifying them as key target hubs. Overexpression of either of these two proteins specifically induced pseudohyphal growth under noninducing conditions, highlighting them as master regulators of the system. Our results indicate that target hubs can serve as master regulators whose activity is sufficient for the induction of complex developmental responses and therefore represent important regulatory nodes in biological networks.
View details for DOI 10.1101/gad.1389306
View details for Web of Science ID 000235428600007
View details for PubMedID 16449570
Mapping pathways and phenotypes by systematic gene overexpression
2006; 21 (3): 319-330
Many disease states result from gene overexpression, often in a specific genetic context. To explore gene overexpression phenotypes systematically, we assembled an array of 5280 yeast strains, each containing an inducible copy of an S. cerevisiae gene, covering >80% of the genome. Approximately 15% of the overexpressed genes (769) reduced growth rate. This gene set was enriched for cell cycle-regulated genes, signaling molecules, and transcription factors. Overexpression of most toxic genes resulted in phenotypes different from known deletion mutant phenotypes, suggesting that overexpression phenotypes usually reflect a specific regulatory imbalance rather than disruption of protein complex stoichiometry. Global overexpression effects were also assayed in the context of a cyclin-dependent kinase mutant (pho85Delta). The resultant gene set was enriched for Pho85p targets and identified the yeast calcineurin-responsive transcription factor Crz1p as a substrate. Large-scale application of this approach should provide a strategy for identifying target molecules regulated by specific signaling pathways.
View details for DOI 10.1016/j.molcel.2005.12.011
View details for Web of Science ID 000235436100003
View details for PubMedID 16455487
Yeast as a model for human disease.
Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.]
2006; Chapter 15: Unit 15 6-?
The sequencing of the human genome promised the identification of disease-causing genes and, subsequently, therapies for those diseases. However, when identifying the genetic basis of a disease, it is not uncommon to discover an abnormal protein whose normal function is unknown. The genetic manipulations required to assign function to genes is often extremely difficult, if not impossible, in human cells. Model organisms have been used to facilitate understanding of gene function because of the ease of genetic manipulations and because many features of eukaryotic physiology have been conserved across phyla. Yeast is a simple eukaryote with a tractable genome, a short generation time, and a large network of researchers who have generated a vast arsenal of research tools. These traits make yeast ideally suited to help reveal the function of genes implicated in human disease.
View details for DOI 10.1002/0471142905.hg1506s48
View details for PubMedID 18428391
Design optimization methods for genomic DNA tiling arrays
2006; 16 (2): 271-281
A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.
View details for DOI 10.1101/gr.4455906
View details for Web of Science ID 000235122000015
View details for PubMedID 16365382
ProCAT: a data analysis approach for protein microarrays
2006; 7 (11)
Protein microarrays provide a versatile method for the analysis of many protein biochemical activities. Existing DNA microarray analytical methods do not translate to protein microarrays due to differences between the technologies. Here we report a new approach, ProCAT, which corrects for background bias and spatial artifacts, identifies significant signals, filters nonspecific spots, and normalizes the resulting signal to protein abundance. ProCAT provides a powerful and flexible new approach for analyzing many types of protein microarrays.
View details for DOI 10.1186/gb-2006-7-11-r110
View details for Web of Science ID 000243967000014
View details for PubMedID 17109749
BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments
2006; 7 (11)
Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles).
View details for DOI 10.1186/gb-2006-7-11-r102
View details for Web of Science ID 000243967000006
View details for PubMedID 17078876
Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae
NUCLEIC ACIDS RESEARCH
2006; 34 (8)
Transposons are widely employed as tools for gene disruption. Ideally, they should display unbiased insertion behavior, and incorporate readily into any genomic DNA to which they are exposed. However, many transposons preferentially insert at specific nucleotide sequences. It is unclear to what extent such bias affects their usefulness as mutagenesis tools. Here, we examine insertion site specificity and global insertion behavior of two mini-transposons previously used for large-scale gene disruption in Saccharomyces cerevisiae: Tn3 and Tn7. Using an expanded set of insertion data, we confirm that Tn3 displays marked preference for the AT-rich 5 bp consensus site TA[A/T]TA, whereas Tn7 displays negligible target site preference. On a genome level, both transposons display marked non-uniform insertion behavior: certain sites are targeted far more often than expected, and both distributions depart drastically from Poisson. Thus, to compare their insertion behavior on a genome level, we developed a windowed Kolmogorov-Smirnov (K-S) test to analyze transposon insertion distributions in sequence windows of various sizes. We find that when scored in large windows (>300 bp), both Tn3 and Tn7 distributions appear uniform, whereas in smaller windows, Tn7 appears uniform while Tn3 does not. Thus, both transposons are effective tools for gene disruption, but Tn7 does so with less duplication and a more uniform distribution, better approximating the behavior of the ideal transposon.
View details for DOI 10.1093/nar/gkl184
View details for Web of Science ID 000237697000001
View details for PubMedID 16648358
Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies
DNA MICROARRAYS, PART B: DATABASES AND STATISTICS
2006; 411: 282-311
A credit to microarray technology is its broad application. Two experiments--the tiling microarray experiment and the protein microarray experiment--are exemplars of the versatility of the microarrays. With the technology's expanding list of uses, the corresponding bioinformatics must evolve in step. There currently exists a rich literature developing statistical techniques for analyzing traditional gene-centric DNA microarrays, so the first challenge in analyzing the advanced technologies is to identify which of the existing statistical protocols are relevant and where and when revised methods are needed. A second challenge is making these often very technical ideas accessible to the broader microarray community. The aim of this chapter is to present some of the most widely used statistical techniques for normalizing and scoring traditional microarray data and indicate their potential utility for analyzing the newer protein and tiling microarray experiments. In so doing, we will assume little or no prior training in statistics of the reader. Areas covered include background correction, intensity normalization, spatial normalization, and the testing of statistical significance.
View details for DOI 10.1016/S0076-6879(06)11015-0
View details for Web of Science ID 000244506300015
View details for PubMedID 16939796
Novel transcribed regions in the human genome
71st Cold Spring Harbor Symposium on Quantitative Biology
COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. 2006: 111–116
We have used genomic tiling arrays to identify transcribed regions throughout the human genome. Analysis of the mapping results of RNA isolated from five cell/tissue types, NB4 cells, NB4 cells treated with retinoic acid (RA), NB4 cells treated with 12-O-tetradecanoylphorbol-13 acetate (TPA), neutrophils, and placenta, throughout the ENCODE region reveals a large number of novel transcribed regions. Interestingly, neutrophils exhibit a great deal of novel expression in several intronic regions. Comparison of the hybridization results of NB4 cells treated with different stimuli relative to untreated cells reveals that many new regions are expressed upon cell differentiation. One such region is the Hox locus, which contains a large number of novel regions expressed in a number of cell types. Analysis of the trinucleotide composition of the novel transcribed regions reveals that it is similar to that of known exons. These results suggest that many of the novel transcribed regions may have a functional role.
View details for Web of Science ID 000245962800015
View details for PubMedID 17381286
Global changes in STAT target selection and transcription regulation upon interferon treatments
GENES & DEVELOPMENT
2005; 19 (24): 2953-2968
The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain largely unknown. Using chromatin immunoprecipitation and DNA microarray analysis (ChIP-chip), we have identified the regions of human chromosome 22 bound by STAT1 and STAT2 in interferon-treated cells. Analysis of the genomic loci proximal to these binding sites introduced new candidate STAT1 and STAT2 target genes, several of which are affiliated with proliferation and apoptosis. The genes on chromosome 22 that exhibited interferon-induced up- or down-regulated expression were determined and correlated with the STAT-binding site information, revealing the potential regulatory effects of STAT1 and STAT2 on their target genes. Importantly, the comparison of STAT1-binding sites upon interferon (IFN)-gamma and IFN-alpha treatments revealed dramatic changes in binding locations between the two treatments. The IFN-alpha induction revealed nonconserved STAT1 occupancy at IFN-gamma-induced sites, as well as novel sites of STAT1 binding not evident in IFN-gamma-treated cells. Many of these correlated with binding by STAT2, but others were STAT2 independent, suggesting that multiple mechanisms direct STAT1 binding to its targets under different activation conditions. Overall, our results reveal a wealth of new information regarding IFN/STAT-binding targets and also fundamental insights into mechanisms of regulation of gene expression in different cell states.
View details for DOI 10.1101/gad.1371305
View details for Web of Science ID 000234095500004
View details for PubMedID 16319195
Global analysis of protein phosphorylation in yeast
2005; 438 (7068): 679-684
Protein phosphorylation is estimated to affect 30% of the proteome and is a major regulatory mechanism that controls many basic cellular processes. Until recently, our biochemical understanding of protein phosphorylation on a global scale has been extremely limited; only one half of the yeast kinases have known in vivo substrates and the phosphorylating kinase is known for less than 160 phosphoproteins. Here we describe, with the use of proteome chip technology, the in vitro substrates recognized by most yeast protein kinases: we identified over 4,000 phosphorylation events involving 1,325 different proteins. These substrates represent a broad spectrum of different biochemical functions and cellular roles. Distinct sets of substrates were recognized by each protein kinase, including closely related kinases of the protein kinase A family and four cyclin-dependent kinases that vary only in their cyclin subunits. Although many substrates reside in the same cellular compartment or belong to the same functional category as their phosphorylating kinase, many others do not, indicating possible new roles for several kinases. Furthermore, integration of the phosphorylation results with protein-protein interaction and transcription factor binding data revealed novel regulatory modules. Our phosphorylation results have been assembled into a first-generation phosphorylation map for yeast. Because many yeast proteins and pathways are conserved, these results will provide insights into the mechanisms and roles of protein phosphorylation in many eukaryotes.
View details for DOI 10.1038/nature04187
View details for Web of Science ID 000233593100053
View details for PubMedID 16319894
Biochemical and genetic analysis of the yeast proteome with a movable ORF collection
GENES & DEVELOPMENT
2005; 19 (23): 2816-2826
Functional analysis of the proteome is an essential part of genomic research. To facilitate different proteomic approaches, a MORF (moveable ORF) library of 5854 yeast expression plasmids was constructed, each expressing a sequence-verified ORF as a C-terminal ORF fusion protein, under regulated control. Analysis of 5573 MORFs demonstrates that nearly all verified ORFs are expressed, suggests the authenticity of 48 ORFs characterized as dubious, and implicates specific processes including cytoskeletal organization and transcriptional control in growth inhibition caused by overexpression. Global analysis of glycosylated proteins identifies 109 new confirmed N-linked and 345 candidate glycoproteins, nearly doubling the known yeast glycome.
View details for Web of Science ID 000233765900003
View details for PubMedID 16322557
Advances in functional protein microarray technology
15th Biennial Conference on Methods in Protein Structure Analysis
WILEY-BLACKWELL. 2005: 5400–5411
Numerous innovations in high-throughput protein production and microarray surface technologies have enabled the development of addressable formats for proteins ordered at high spatial density. Protein array implementations have largely focused on antibody arrays for high-throughput protein profiling. However, it is also possible to construct arrays of full-length, functional proteins from a library of expression clones. The advent of protein-based microarrays allows the global observation of biochemical activities on an unprecedented scale, where hundreds or thousands of proteins can be simultaneously screened for protein-protein, protein-nucleic acid, and small molecule interactions. This technology holds great potential for basic molecular biology research, disease marker identification, toxicological response profiling and pharmaceutical target screening.
View details for DOI 10.1111/j.1742-4658.2005.04970.x
View details for Web of Science ID 000232772200003
View details for PubMedID 16262682
Checkpoint maintenance requires Ame1 and Okp1
2005; 4 (10): 1448-1456
Kinetochore proteins are required for high fidelity chromosome segregation and as a platform for checkpoint signaling. Ame1 is an essential component of the COMA (Ctf19, Okp1, Mcm21, Ame1) sub-complex of the central kinetochore of budding yeast. In this study, we describe the isolation and characterization of an Ame1 conditional mutant, ame1-4. ame1-4 cells exhibit chromosome segregation defects and Mad2-dependent cell cycle delay similar to okp1-5 cells. However, the viability of ame1-4 cells is markedly reduced relative to wild type and okp1-5 cells after three hours at restrictive temperature. To determine if ame1-4 cells enter anaphase with mis-segregated chromosomes, we monitored the localization of Bub3:VFP as a marker for anaphase onset. ame1-4 cells containing mis-segregated sister chromatids initially accumulate Bub3:VFP at kinetochores, indicating checkpoint activation and a metaphase arrest. Subsequently, Bub3:VFP de-localizes and cells reinitiate DNA duplication and budding without cytokinesis in the presence of un-segregated chromosomes. Overexpression of OKP1 in ame1-4 cells restores ame1-4 protein localization and a stable arrest. Based on our results, we propose that Ame1 and Okp1 are required for a sustained checkpoint arrest in the presence of mis-segregated chromosomes. Our results suggest that checkpoint response might be controlled not only at the level of activation but also via signals that ensure maintenance of the response.
View details for Web of Science ID 000233751500030
View details for PubMedID 16177574
A pilot study of transcription unit analysis in rice using oligonucleotide tiling-path microarray
PLANT MOLECULAR BIOLOGY
2005; 59 (1): 137-149
As the international efforts to sequence the rice genome are completed, an immediate challenge and opportunity is to comprehensively and accurately define all transcription units in the rice genome. Here we describe a strategy of using high-density oligonucleotide tiling-path microarrays to map transcription of the japonica rice genome. In a pilot experiment to test this approach, one array representing the reverse strand of the last 11.2 Mb sequence of chromosome 10 was analyzed in detail based on a mathematical model developed in this study. Analysis of the array data detected 77% of the reference gene models in a mixture of four RNA populations. Moreover, significant transcriptional activities were found in many of the previously annotated intergenic regions. These preliminary results demonstrate the utility of genome tiling microarrays in evaluating annotated rice gene models and in identifying novel transcription units that will facilitate rice genome annotation.
View details for DOI 10.1007/s11103-005-6164-5
View details for Web of Science ID 000232498000012
View details for PubMedID 16217608
Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping
TRENDS IN GENETICS
2005; 21 (8): 466-475
Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.
View details for DOI 10.1016/j.tig.2005.06.007
View details for Web of Science ID 000231209200010
View details for PubMedID 15979196
- Prospects and challenges in proteomics PLANT PHYSIOLOGY 2005; 138 (2): 560-562
Sexual dimorphism in mammalian gene expression
TRENDS IN GENETICS
2005; 21 (5): 298-305
Males and females have obvious phenotypic differences; they also exhibit differences related to health, life span, cognitive abilities and have different responses to diseases such as anemia, coronary heart disease, hypertension and renal dysfunction. Although the anatomical, hormonal and chemical differences between the sexes are well known, there are few molecular descriptors for gender-specific physiological traits and health risks. Recent studies using microarrays and other methods have made significant progress towards elucidating the molecular differences between mammalian sexes in a variety of tissues and towards identifying the transcription factors that regulate sex-biased gene expression. These findings are providing new insights into the molecular and genetic differences that dictate the different behaviors and physiologies of mammalian sexes.
View details for DOI 10.1016/j.tig.2005.03.005
View details for Web of Science ID 000229143800012
View details for PubMedID 15851067
Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery
2005; 13 (3): 259-274
Microarrays have become a popular and important technology for surveying global patterns in gene expression and regulation. A number of innovative experiments have extended microarray applications beyond the measurement of mRNA expression levels, in order to uncover aspects of large-scale chromosome function and dynamics. This has been made possible due to the recent development of tiling arrays, where all non-repetitive DNA comprising a chromosome or locus is represented at various sequence resolutions. Since tiling arrays are designed to contain the entire DNA sequence without prior consultation of existing gene annotation, they enable the discovery of novel transcribed sequences and regulatory elements through the unbiased interrogation of genomic loci. The implementation of such methods for the global analysis of large eukaryotic genomes presents significant technical challenges. Nonetheless, tiling arrays are expected to become instrumental for the genome-wide identification and characterization of functional elements. Combined with computational methods to relate these data and map the complex interactions of transcriptional regulators, tiling array experiments can provide insight toward a more comprehensive understanding of fundamental molecular and cellular processes.
View details for DOI 10.1007/s10577-005-2165-0
View details for Web of Science ID 000228868500005
View details for PubMedID 15868420
Substrate specificity analysis of protein kinase complex Dbf2-Mob1 by peptide library and proteome array screening.
2005; 6: 22-?
The mitotic exit network (MEN) is a group of proteins that form a signaling cascade that is essential for cells to exit mitosis in Saccharomyces cerevisiae. The MEN has also been implicated in playing a role in cytokinesis. Two components of this signaling pathway are the protein kinase Dbf2 and its binding partner essential for its kinase activity, Mob1. The components of MEN that act upstream of Dbf2-Mob1 have been characterized, but physiological substrates for Dbf2-Mob1 have yet to be identified.Using a combination of peptide library selection, phosphorylation of optimal peptide variants, and screening of a phosphosite array, we found that Dbf2-Mob1 preferentially phosphorylated serine over threonine and required an arginine three residues upstream of the phosphorylated serine in its substrate. This requirement for arginine in peptide substrates could not be substituted with the similarly charged lysine. This specificity determined for peptide substrates was also evident in many of the proteins phosphorylated by Dbf2-Mob1 in a proteome chip analysis.We have determined by peptide library selection and phosphosite array screening that the protein kinase Dbf2-Mob1 preferentially phosphorylated substrates that contain an RXXS motif. A subsequent proteome microarray screen revealed proteins that can be phosphorylated by Dbf2-Mob1 in vitro. These proteins are enriched for RXXS motifs, and may include substrates that mediate the function of Dbf2-Mob1 in mitotic exit and cytokinesis. The relatively low degree of sequence restriction at the site of phosphorylation suggests that Dbf2 achieves specificity by docking its substrates at a site that is distinct from the phosphorylation site.
View details for PubMedID 16242037
Global analysis of protein function using protein microarrays
2nd International Conference on Functional Genomics of Ageing
ELSEVIER IRELAND LTD. 2005: 171–75
Protein microarrays containing thousands of proteins arrayed at high density can be prepared and probed for a wide variety of activities, thereby allowing the large scale analysis of many proteins simultaneously. In addition to identifying the activities of many previously uncharacterized proteins, protein microarrays can reveal new activities of well-characterized proteins, thus providing new insights about the functions of these proteins. Below, we describe the construction and use of protein microarrays and their applications using yeast as a model system.
View details for DOI 10.1016/j.mad.2004.09.019
View details for Web of Science ID 000226564000022
View details for PubMedID 15610776
Global identification of human transcribed sequences with genome tiling arrays
2004; 306 (5705): 2242-2246
Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.
View details for DOI 10.1126/science.1103388
View details for Web of Science ID 000225950000042
View details for PubMedID 15539566
DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (51): 17771-17776
Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific sequences can change during development; however, the determinants of this dynamic process are poorly understood. To gain insights into the contribution of developmental state, genomic sequence, and transcriptional activity to replication timing, we investigated the timing of DNA replication at high resolution along an entire human chromosome (chromosome 22) in two different cell types. The pattern of replication timing was correlated with respect to annotated genes, gene expression, novel transcribed regions of unknown function, sequence composition, and cytological features. We observed that chromosome 22 contains regions of early- and late-replicating domains of 100 kb to 2 Mb, many (but not all) of which are associated with previously described chromosomal bands. In both cell types, expressed sequences are replicated earlier than nontranscribed regions. However, several highly transcribed regions replicate late. Overall, the DNA replication-timing profiles of the two different cell types are remarkably similar, with only nine regions of difference observed. In one case, this difference reflects the differential expression of an annotated gene that resides in this region. Novel transcribed regions with low coding potential exhibit a strong propensity for early DNA replication. Although the cellular function of such transcripts is poorly understood, our results suggest that their activity is linked to the replication-timing program.
View details for DOI 10.1073/pnas.0408170101
View details for Web of Science ID 000225951500038
View details for PubMedID 15591350
Regulation of gene expression by a metabolic enzyme
2004; 306 (5695): 482-484
Gene expression in eukaryotes is normally believed to be controlled by transcriptional regulators that activate genes encoding structural proteins and enzymes. To identify previously unrecognized DNA binding activities, a yeast proteome microarray was screened with DNA probes; Arg5,6, a well-characterized mitochondrial enzyme involved in arginine biosynthesis, was identified. Chromatin immunoprecipitation experiments revealed that Arg5,6 is associated with specific nuclear and mitochondrial loci in vivo, and Arg5,6 binds to specific fragments in vitro. Deletion of Arg5,6 causes altered transcript levels of both nuclear and mitochondrial target genes. These results indicate that metabolic enzymes can directly regulate eukaryotic gene expression.
View details for Web of Science ID 000224626500052
View details for PubMedID 15486299
Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon
2004; 14 (10A): 1975-1986
We present here an unbiased and extremely versatile insertional library of yeast genomic DNA generated by in vitro mutagenesis with a multipurpose element derived from the bacterial transposon Tn7. This mini-Tn7 element has been engineered such that a single insertion can be used to generate a lacZ fusion, gene disruption, and epitope-tagged gene product. Using this transposon, we generated a plasmid-based library of approximately 300,000 mutant alleles; by high-throughput screening in yeast, we identified and sequenced 9032 insertions affecting 2613 genes (45% of the genome). From analysis of 7176 insertions, we found little bias in Tn7 target-site selection in vitro. In contrast, we also sequenced 10,174 Tn3 insertions and found a markedly stronger preference for an AT-rich 5-base pair target sequence. We further screened 1327 insertion alleles in yeast for hypersensitivity to the chemotherapeutic cisplatin. Fifty-one genes were identified, including four functionally uncharacterized genes and 25 genes involved in DNA repair, replication, transcription, and chromatin structure. In total, the collection reported here constitutes the largest plasmid-based set of sequenced yeast mutant alleles to date and, as such, should be singularly useful for gene and genome-wide functional analysis.
View details for DOI 10.1101/gr.2875304
View details for Web of Science ID 000224405900017
View details for PubMedID 15466296
Genomic analysis of regulatory network dynamics reveals large topological changes
2004; 431 (7006): 308-312
Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here--particularly the large-scale topological changes and hub transience--will apply to other biological networks, including complex sub-systems in higher eukaryotes.
View details for DOI 10.1038/nature02782
View details for Web of Science ID 000223864000041
View details for PubMedID 15372033
Major molecular differences between mammalian sexes are involved in drug metabolism and renal function
2004; 6 (6): 791-800
Many anatomical differences exist between males and females; these are manifested on a molecular level by different hormonal environments. Although several molecular differences in adult tissues have been identified, a comprehensive investigation of the gene expression differences between males and females has not been performed. We surveyed the expression patterns of 13,977 mouse genes in male and female hypothalamus, kidney, liver, and reproductive tissues. Extensive differential gene expression was observed not only in the reproductive tissues, but also in the kidney and liver. The differentially expressed genes are involved in drug and steroid metabolism, osmotic regulation, or as yet unresolved cellular roles. In contrast, very few molecular differences were observed between the male and female hypothalamus in both mice and humans. We conclude that there are persistent differences in gene expression between adult males and females. These molecular differences have important implications for the physiological differences between males and females.
View details for Web of Science ID 000222443200012
View details for PubMedID 15177028
CREB binds to multiple loci on human chromosome 22
MOLECULAR AND CELLULAR BIOLOGY
2004; 24 (9): 3804-3814
The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An unbiased, global analysis of where CREB binds has not been performed. We have mapped for the first time the binding distribution of CREB along an entire human chromosome. Chromatin immunoprecipitation of CREB-associated DNA and subsequent hybridization of the associated DNA to a genomic DNA microarray containing all of the nonrepetitive DNA of human chromosome 22 revealed 215 binding sites corresponding to 192 different loci and 100 annotated potential gene targets. We found binding near or within many genes involved in signal transduction and neuronal function. We also found that only a small fraction of CREB binding sites lay near well-defined 5' ends of genes; the majority of sites were found elsewhere, including introns and unannotated regions. Several of the latter lay near novel unannotated transcriptionally active regions. Few CREB targets were found near full-length cyclic AMP response element sites; the majority contained shorter versions or close matches to this sequence. Several of the CREB targets were altered in their expression by treatment with forskolin; interestingly, both induced and repressed genes were found. Our results provide novel molecular insights into how CREB mediates its functions in humans.
View details for DOI 10.1128/MCB.24.9.3804-3814.2004
View details for Web of Science ID 000220898100021
View details for PubMedID 15082775
Microbial synergy via an ethanol-triggered pathway
MOLECULAR AND CELLULAR BIOLOGY
2004; 24 (9): 3874-3884
We have discovered a microbial interaction between yeast, bacteria, and nematodes. Upon coculturing, Saccharomyces cerevisiae stimulated the growth of several species of Acinetobacter, including, A. baumannii, A. haemolyticus, A. johnsonii, and A. radioresistens, as well as several natural isolates of Acinetobacter. This enhanced growth was due to a diffusible factor that was shown to be ethanol by chemical assays and evaluation of strains lacking ADH1, ADH3, and ADH5, as all three genes are involved in ethanol production by yeast. This effect is specific to ethanol: methanol, butanol, and dimethyl sulfoxide were unable to stimulate growth to any appreciable level. Low doses of ethanol not only stimulated growth to a higher cell density but also served as a signaling molecule: in the presence of ethanol, Acinetobacter species were able to withstand the toxic effects of salt, indicating that ethanol alters cell physiology. Furthermore, ethanol-fed A. baumannii displayed increased pathogenicity when confronted with a predator, Caenorhabditis elegans. Our results are consistent with the concept that ethanol can serve as a signaling molecule which can affect bacterial physiology and survival.
View details for DOI 10.1128/MCB.24.9.3874-3884.2004
View details for Web of Science ID 000220898100027
View details for PubMedID 15082781
Regulation of polarized growth initiation and termination cycles by the polarisome and Cdc42 regulators
JOURNAL OF CELL BIOLOGY
2004; 164 (2): 207-218
The dynamic regulation of polarized cell growth allows cells to form structures of defined size and shape. We have studied the regulation of polarized growth using mating yeast as a model. Haploid yeast cells treated with high concentration of pheromone form successive mating projections that initiate and terminate growth with regular periodicity. The mechanisms that control the frequency of growth initiation and termination under these conditions are not well understood. We found that the polarisome components Spa2, Pea2, and Bni1 and the Cdc42 regulators Cdc24 and Bem3 control the timing and frequency of projection formation. Loss of polarisome components and mutation of Cdc24 decrease the frequency of projection formation, while loss of Bem3 increases the frequency of projection formation. We found that polarisome components and the cell fusion proteins Fus1 and Fus2 are important for the termination of projection growth. Our results define the first molecular regulators that control the timing of growth initiation and termination during eukaryotic cell differentiation.
View details for DOI 10.1083/jcb.200307065
View details for Web of Science ID 000188370500006
View details for PubMedID 14734532
Fast optimal genome tiling with applications to microarray design and homology search
JOURNAL OF COMPUTATIONAL BIOLOGY
2004; 11 (4): 766-785
In this paper, we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size-bound parameters, we want to find a set of tiles of maximum total weight such that each tiles satisfies the size bounds. A solution to this problem is important to a number of computational biology applications such as selecting genomic DNA fragments for PCR-based amplicon microarrays and performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we first discuss the solution to a basic online interval maximum problem via a sliding-window approach and show how to use this solution in a nontrivial manner for many of the tiling problems introduced. We also discuss NP-hardness results and approximation algorithms for generalizing our basic tiling problem to higher dimensions. Finally, computational results from applying our tiling algorithms to genomic sequences of five model eukaryotes are reported.
View details for Web of Science ID 000223974700015
View details for PubMedID 15579244
Analyzing antibody specificity with whole proteome microarrays
2003; 21 (12): 1509-1512
Although approximately 10,000 antibodies are available from commercial sources, antibody reagents are still unavailable for most proteins. Furthermore, new applications such as antibody arrays and monoclonal antibody therapeutics have increased the demand for more specific antibodies to reduce cross-reactivity and side effects. An array containing every protein for the relevant organism represents the ideal format for an assay to test antibody specificity, because it allows the simultaneous screening of thousands of proteins for possible cross-reactivity. As an initial test of this approach, we screened 11 polyclonal and monoclonal antibodies to approximately 5,000 different yeast proteins deposited on a glass slide and found that, in addition to recognizing their cognate proteins, the antibodies cross-reacted with other yeast proteins to varying degrees. Some of the interactions of the antibodies with noncognate proteins could be deduced by alignment of the primary amino acid sequences of the antigens and cross-reactive proteins; however, these interactions could not be predicted a priori. Our findings show that proteome array technology has potential to improve antibody design and selection for applications in both medicine and research.
View details for DOI 10.1038/nbt910
View details for Web of Science ID 000186845200031
View details for PubMedID 14608365
Changes in the nutrient content of school lunches: results from the Pathways study
2003; 37 (6): S35-S45
Pathways, a randomized trial, evaluated the effectiveness of a school-based multicomponent intervention to reduce fatness in American-Indian schoolchildren. The goal of the Pathways food service intervention component was to reduce the fat in school lunches to no more than 30% of energy from fat while maintaining recommended levels of calories and key nutrients.The intervention was implemented by school food service staff in intervention schools over a 3-year period. Five consecutive days of school lunch menu items were collected from 20 control and 21 intervention schools at four time periods, and nutrient content was analyzed.There was a significantly greater mean reduction in percent energy from fat and saturated fat in the intervention schools compared to the control schools. Mean percentages of energy from fat decreased from 33.1% at baseline to 28.3% at the end of the study in intervention schools compared to 33.2% at baseline and 32.2% at follow-up in the control schools (P<0.003). There were no statistically significant differences for calories or nutrients between intervention and control schools.The Pathways school food lunch intervention documented the feasibility of successfully lowering the percent of energy from fat, as part of a coordinated obesity prevention program for American-Indian children.
View details for DOI 10.1016/j.ypmed.2003.08.009
View details for Web of Science ID 000187114300005
View details for PubMedID 14636807
Negative regulation of calcineurin signaling by Hrr25p, a yeast homolog of casein kinase I
GENES & DEVELOPMENT
2003; 17 (21): 2698-2708
Calcineurin is a Ca2+/calmodulin-regulated protein phosphatase required for Saccharomyces cerevisiae to respond to a variety of environmental stresses. Calcineurin promotes cell survival during stress by dephosphorylating and activating the Zn-finger transcription factor Crz1p/Tcn1p. Using a high-throughput assay, we screened 119 yeast kinases for their ability to phosphorylate Crz1p in vitro and identified the casein kinase I homolog Hrr25p. Here we show that Hrr25p negatively regulates Crz1p activity and nuclear localization in vivo. Hrr25p binds to and phosphorylates Crz1p in vitro and in vivo. Overexpression of Hrr25p decreases Crz1p-dependent transcription and antagonizes its Ca2+-induced nuclear accumulation. In the absence of Hrr25p, activation of Crz1p by Ca2+/calcineurin is potentiated. These findings represent the first identification of a negative regulator for Crz1p, and establish a novel physiological role for Hrr25p in antagonizing calcineurin signaling.
View details for DOI 10.1101/gad.1140603
View details for Web of Science ID 000186299700011
View details for PubMedID 14597664
Microarrays to characterize protein interactions on a whole-proteome scale
2003; 3 (11): 2190-2199
Protein microarrays contain a defined set of proteins spotted and analyzed at high density, and can be generally classified into two categories; protein profiling arrays and functional protein arrays. Functional protein arrays can be made up of any type of protein, and therefore have a diverse set of useful applications. Advantages of these arrays include low reagent consumption, rapid interpretation of results, and the ability to easily control experimental conditions. The ultimate form of a functional protein array consists of all of the proteins encoded by the genome of an organism; such an array would be the whole proteome equivalent of the whole genome DNA arrays that are now available. While proteome microarrays may not have reached the stage of maturity of DNA microarrays, recent developments have shown that many of the barriers holding back the technology can be overcome. Arrays of this type have already been used to rapidly screen large numbers of proteins simultaneously for biochemical activities, protein-protein interactions, protein-lipid interactions, protein-nucleic acid interactions, and protein-small molecule interactions. Eventually, functional protein arrays will be used to facilitate various steps in the drug discovery and early development processes that are currently bottlenecks in the drug development continuum.
View details for DOI 10.1002/pmic.200300610
View details for Web of Science ID 000186582500015
View details for PubMedID 14595818
A Bayesian networks approach for predicting protein-protein interactions from genomic data
2003; 302 (5644): 449-453
We have developed an approach using Bayesian networks to predict protein-protein interactions genome-wide in yeast. Our method naturally weights and combines into reliable predictions genomic features only weakly associated with interaction (e.g., messenger RNAcoexpression, coessentiality, and colocalization). In addition to de novo predictions, it can integrate often noisy, experimental interaction data sets. We observe that at given levels of sensitivity, our predictions are more accurate than the existing high-throughput experimental data sets. We validate our predictions with TAP (tandem affinity purification) tagging experiments. Our analysis, which gives a comprehensive view of yeast interactions, is available at genecensus.org/intint.
View details for Web of Science ID 000185963200044
View details for PubMedID 14564010
Distribution of NF-kappa B-binding sites across human chromosome 22
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2003; 100 (21): 12247-12252
We have mapped the chromosomal binding site distribution of a transcription factor in human cells. The NF-kappaB family of transcription factors plays an essential role in regulating the induction of genes involved in several physiological processes, including apoptosis, immunity, and inflammation. The binding sites of the NF-kappaB family member p65 were determined by using chromatin immunoprecipitation and a genomic microarray of human chromosome 22 DNA. Sites of binding were observed along the entire chromosome in both coding and noncoding regions, with an enrichment at the 5' end of genes. Strikingly, a significant proportion of binding was seen in intronic regions, demonstrating that transcription factor binding is not restricted to promoter regions. NF-kappaB binding was also found at genes whose expression was regulated by tumor necrosis factor alpha, a known inducer of NF-kappaB-dependent gene expression, as well as adjacent to genes whose expression is not affected by tumor necrosis factor alpha. Many of these latter genes are either known to be activated by NF-kappaB under other conditions or are consistent with NF-kappaB's role in the immune and apoptotic responses. Our results suggest that binding is not restricted to promoter regions and that NF-kappaB binding occurs at a significant number of genes whose expression is not altered, thereby suggesting that binding alone is not sufficient for gene activation.
View details for DOI 10.1073/pnas.2135255100
View details for Web of Science ID 000186024300058
View details for PubMedID 14527995
Cytoskeletal activation of a checkpoint kinase
2003; 12 (3): 663-673
The assembly of cytoskeletal structures is coupled to other cellular processes. We have studied the molecular mechanism by which assembly of the yeast septin cytoskeleton is monitored and coordinated with cell cycle progression by analyzing a key regulatory protein kinase, Hsl1, that becomes activated only when the septin cytoskeleton is properly assembled. We first identified a regulatory region of Hsl1 that physically associates with the kinase domain and found that it performs an autoinhibitory function both in vivo and in vitro. Several septin binding domains lie near and overlap the inhibitory domain; these are important for Hsl1 function, and binding of two septins, Cdc11 and Cdc12, relieves the autoinhibition imposed by the kinase inhibitory domain in vitro. Our results suggest that binding to multiple septins activates Hsl1 kinase activity, thereby promoting cell cycle progression. The high conservation of Hsl1 indicates that similar mechanisms may monitor cytoskeletal organization in other eukaryotes.
View details for Web of Science ID 000185613800015
View details for PubMedID 14527412
Specific protein targeting during cell differentiation: Polarized localization of Fus1p during mating depends on Chs5p in Saccharomyces cerevisiae
2003; 2 (4): 821-825
In budding yeast, chs5 mutants are defective in chitin synthesis and cell fusion during mating. Chs5p is a late-Golgi protein required for the polarized transport of the chitin synthase Chs3p to the membrane. Here we show that Chs5p is also essential for the polarized targeting of Fus1p, but not of other cell fusion proteins, to the membrane during mating.
View details for DOI 10.1128/EC.2.4.821-825.2003
View details for Web of Science ID 000184803000018
View details for PubMedID 12912901
ExpressYourself: a modular platform for processing and visualizing microarray data
NUCLEIC ACIDS RESEARCH
2003; 31 (13): 3477-3482
DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself.
View details for DOI 10.1093/nar/gkg628
View details for Web of Science ID 000183832900040
View details for PubMedID 12824348
Recent developments in analytical and functional protein microarrays
CURRENT OPINION IN MOLECULAR THERAPEUTICS
2003; 5 (3): 271-277
In recent years, the genomes of many different organisms have been fully sequenced and annotated. As a consequence of this information, a number of methods have emerged to study the function of many genes and proteins in parallel. One recent approach for the large-scale analysis of proteins is the use of protein microarrays in which hundreds to thousands of proteins are arrayed and assayed simultaneously. Protein arrays can be used for assessing protein levels and following disease markers, identifying biochemical activities, analyzing post-translational modifications, building interaction networks, and for drug discovery and development. In this review, we discuss the construction of different types of protein arrays, and their numerous and diverse applications.
View details for Web of Science ID 000184024600009
View details for PubMedID 12870437
- Genomics - Defining genes in the genomics era SCIENCE 2003; 300 (5617): 258-260
Molecular dissection of a yeast septin: Distinct domains are required for septin interaction, localization, and function
MOLECULAR AND CELLULAR BIOLOGY
2003; 23 (8): 2762-2777
The septins are a family of cytoskeletal proteins present in animal and fungal cells. They were first identified for their essential role in cytokinesis, but more recently, they have been found to play an important role in many cellular processes, including bud site selection, chitin deposition, cell compartmentalization, and exocytosis. Septin proteins self-associate into filamentous structures that, in yeast cells, form a cortical ring at the mother bud neck. Members of the septin family share common structural domains: a GTPase domain in the central region of the protein, a stretch of basic residues at the amino terminus, and a predicted coiled-coil domain at the carboxy terminus. We have studied the role of each domain in the Saccharomyces cerevisiae septin Cdc11 and found that the three domains are responsible for distinct and sometimes overlapping functions. All three domains are important for proper localization and function in cytokinesis and morphogenesis. The basic region was found to bind the phosphoinositides phosphatidylinositol 4-phosphate and phosphatidylinositol 5-phosphate. The coiled-coil domain is important for interaction with Cdc3 and Bem4. The GTPase domain is involved in Cdc11-septin interaction and targeting to the mother bud neck. Surprisingly, GTP binding appears to be dispensable for Cdc11 function, localization, and lipid binding. Thus, we find that septins are multifunctional proteins with specific domains involved in distinct molecular interactions required for assembly, localization, and function within the cell.
View details for DOI 10.1128/MCB.23.8.2762-2777.2003
View details for Web of Science ID 000182049900012
View details for PubMedID 12665577
Protein analysis on a proteomic scale
2003; 422 (6928): 208-215
The long-term challenge of proteomics is enormous: to define the identities, quantities, structures and functions of complete complements of proteins, and to characterize how these properties vary in different cellular contexts. One critical step in tackling this goal is the generation of sets of clones that express a representative of each protein of a proteome in a useful format, followed by the analysis of these sets on a genome-wide basis. Such studies enable genetic, biochemical and cell biological technologies to be applied on a systematic level, leading to the assignment of biochemical activities, the construction of protein arrays, the identification of interactions, and the localization of proteins within cellular compartments.
View details for DOI 10.1038/nature01512
View details for Web of Science ID 000181488900055
View details for PubMedID 12634794
The transcriptional activity of human Chromosome 22
GENES & DEVELOPMENT
2003; 17 (4): 529-540
A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at http://array.mbb.yale.edu/chr22.
View details for DOI 10.1101/gad.1055203
View details for Web of Science ID 000181276200011
View details for PubMedID 12600945
Protein chip technology
CURRENT OPINION IN CHEMICAL BIOLOGY
2003; 7 (1): 55-63
Microarray technology has become a crucial tool for large-scale and high-throughput biology. It allows fast, easy and parallel detection of thousands of addressable elements in a single experiment. In the past few years, protein microarray technology has shown its great potential in basic research, diagnostics and drug discovery. It has been applied to analyse antibody-antigen, protein-protein, protein-nucleic-acid, protein-lipid and protein-small-molecule interactions, as well as enzyme-substrate interactions. Recent progress in the field of protein chips includes surface chemistry, capture molecule attachment, protein labeling and detection methods, high-throughput protein/antibody production, and applications to analyse entire proteomes.
View details for DOI 10.1016/S1367-5931(02)00005-4
View details for Web of Science ID 000180868900009
View details for PubMedID 12547427
- Identification of novel functional elements in the human genome 67th Cold Spring Harbor Symposium on Quantitative Biology COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. 2003: 317–322
ANNUAL REVIEW OF BIOCHEMISTRY
2003; 72: 783-812
Fueled by ever-growing DNA sequence information, proteomics-the large scale analysis of proteins-has become one of the most important disciplines for characterizing gene function, for building functional linkages between protein molecules, and for providing insight into the mechanisms of biological processes in a high-throughput mode. It is now possible to examine the expression of more than 1000 proteins using mass spectrometry technology coupled with various separation methods. High-throughput yeast two-hybrid approaches and analysis of protein complexes using affinity tag purification have yielded valuable protein-protein interaction maps. Large-scale protein tagging and subcellular localization projects have provided considerable information about protein function. Finally, recent developments in protein microarray technology provide a versatile tool to study protein-protein, protein-nucleic acid, protein-lipid, enzyme-substrate, and protein-drug interactions. Other types of microarrays, though not fully developed, also show great potential in diagnostics, protein profiling, and drug identification and validation. This review discusses high-throughput technologies for proteome analysis and their applications. Also discussed are the approaches used for the integrated analysis of the voluminous sets of data generated by proteome analysis conducted on a global scale.
View details for DOI 10.1146/annurev.biochem.72.121801.161511
View details for Web of Science ID 000185092500024
View details for PubMedID 14527327
Proteomic approaches for the global analysis of proteins
2002; 33 (6): 1308-1316
Improvements in technology that allow miniaturization and high-throughput analyses of thousand of genes and gene products have changed the focus and scope of research and development in both academia and industry. It is now possible to study entire proteomes with the goals of elucidating protein expression, subcellular localization, biochemical activities, and their regulation. Alterations in different cell types and conditions and in normal and disease states can be revealed. This wealth of information not only has facilitated our basic understanding of many biological processes but also has enormous potential for drug discovery and development.
View details for Web of Science ID 000179996500022
View details for PubMedID 12503317
Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae
GENES & DEVELOPMENT
2002; 16 (23): 3017-3033
In the yeast Saccharomyces cerevisiae, SBF (Swi4-Swi6 cell cycle box binding factor) and MBF (MluI binding factor) are the major transcription factors regulating the START of the cell cycle, a time just before DNA replication, bud growth initiation, and spindle pole body (SPB) duplication. These two factors bind to the promoters of 235 genes, but bind less than a quarter of the promoters upstream of genes with peak transcript levels at the G1 phase of the cell cycle. Several functional categories, which are known to be crucial for G1/S events, such as SPB duplication/migration and DNA synthesis, are under-represented in the list of SBF and MBF gene targets. SBF binds the promoters of several other transcription factors, including HCM1, PLM2, POG1, TOS4, TOS8, TYE7, YAP5, YHP1, and YOX1. Here, we demonstrate that these factors are targets of SBF using an independent assay. To further elucidate the transcriptional circuitry that regulates the G1-to-S-phase progression, these factors were epitope-tagged and their binding targets were identified by chIp-chip analysis. These factors bind the promoters of genes with roles in G1/S events including DNA replication, bud growth, and spindle pole complex formation, as well as the general activities of mitochondrial function, transcription, and protein synthesis. Although functional overlap exists between these factors and MBF and SBF, each of these factors has distinct functional roles. Most of these factors bind the promoters of other transcription factors known to be cell cycle regulated or known to be important for cell cycle progression and differentiation processes indicating that a complex network of transcription factors coordinates the diverse activities that initiate a new cell cycle.
View details for DOI 10.1101/gad.1039602
View details for Web of Science ID 000179649300005
View details for PubMedID 12464632
The alpha-factor receptor C-terminus is important for mating projection formation and orientation in Saccharomyces cerevisiae
CELL MOTILITY AND THE CYTOSKELETON
2002; 53 (4): 251-266
Successful mating of MATa Saccharomyces cerevisiae cells is dependent on Ste2p, the alpha-factor receptor. Besides receiving the pheromone signal and transducing it through the G-protein coupled MAP kinase pathway, Ste2p is active in the establishment and orientation of the mating projection. We investigated the role of the carboxyl terminus of the receptor in mating projection formation and orientation using a spatial gradient assay. Cells carrying the ste2-T326 mutation, truncating 105 of the 135 amino acids in the receptor tail including a motif necessary for its ligand-mediated internalization, display slow onset of projection formation, abnormal shmoo morphology, and reduced ability to orient the mating projection toward a pheromone source. This reduction was due to the increased loss of mating projection orientation in a pheromone gradient. Cells with a mutated endocytosis motif were defective in reorientation in a pheromone gradient. ste2-Delta296 cells, which carry a complete truncation of the Ste2p tail, exhibit a severe defect in projection formation, and those projections that do form are unable to orient in a pheromone gradient. These results suggest a complex role for the Ste2p carboxy-terminal tail in the formation, orientation, and directional adjustment of the mating projection, and that endocytosis of the receptor is important for this process. In addition, mutations in RSR1/BUD1 and SPA2, genes necessary for budding polarity, exhibited little or no defect in formation or orientation of mating projections. We conclude that mating projection orientation depends upon the carboxyl terminus of the pheromone receptor and not the directional machinery used in budding.
View details for DOI 10.1002/cm.10073
View details for Web of Science ID 000179314000001
View details for PubMedID 12378535
A novel mitochondrial protein, Tar1p, is encoded on the antisense strand of the nuclear 25S rDNA
GENES & DEVELOPMENT
2002; 16 (21): 2755-2760
In eukaryotes, it is widely assumed that genes coding for proteins and structural RNAs do not overlap. Using a transposon-tagging strategy to globally analyze the Saccharomyces cerevisiae genome for expressed genes, we identified multiple insertions in an open reading frame that is contained fully within and transcribed antisense to the 25S rRNA gene in the nuclear rDNA repeat region on Chromosome XII. Expression of this gene, TAR1 (Transcript Antisense to Ribosomal RNA), can be detected at the RNA and protein levels, and the primary sequence of the corresponding 124-amino-acid protein is conserved in several yeast species. Tar1p was found to localize to mitochondria, and overexpression of the protein suppresses the respiration-deficient petite phenotype of a point mutation in mitochondrial RNA polymerase that affects mitochondrial gene expression and mtDNA stability. These findings indicate that coding information for protein and structural RNAs can overlap, raising issues regarding the coevolution of such complex genes, and also suggest that rDNA transcription and mitochondrial function are coordinately regulated in eukaryotic cells.
View details for DOI 10.1101/gad.1035002
View details for Web of Science ID 000179027900004
View details for PubMedID 12414727
A dynamic approach to mapping coordinates between microplates and microarrays
JOURNAL OF BIOMEDICAL INFORMATICS
2002; 35 (5-6): 306-312
The retrieval of useful data from spotted microarray slides requires keeping track of which microplate wells and DNA sample corresponds to each spot on each array slide. Existing approaches are closely coupled with the type of arrayer in use and are computer operating-system-specific. To support the microarray researcher community at large who use different arrayers and computer platforms, increased flexibility, generality, and portability of these approaches are required. In this paper, we describe a general algorithm that correlates the well positions of DNA samples in each microplate to the positions of the spots on each array slide. Based on this algorithm, we have implemented a flexible and platform-independent program named MicroArray Convolutor (MAC) that provides a Web solution allowing the user to: (a) import a text file that identifies the DNA samples and their well locations, (b) select a transformation method that converts data in 96-well plate format into 384-well plate format, and (c) specify the output format of the array lists dependant on the configuration of the array platform as well as the downstream analysis software chosen for the array. MAC and its source code can be accessed via the following Web address: http://ymd.med.yale.edu/kei-cgi/kc_mac_dev8.pl.
View details for DOI 10.1016/S1532-0464(03)00033-9
View details for Web of Science ID 000184879000004
View details for PubMedID 12968779
Global analysis of gene expression in yeast.
Functional & integrative genomics
2002; 2 (4-5): 171-180
In the past decade, there has been an intense effort to comprehensively catalogue the expressed genes in the yeast Saccharomyces cerevisiae and to determine the absolute and relative abundance of transcript and protein levels under different cellular conditions. Several methods have been developed to monitor gene expression: DNA microarray analysis, Serial Analysis of Gene Expression (SAGE), kinetic RT-PCR and monitoring expression of beta-galactosidase fusion proteins. These techniques have been used to measure transcript and protein abundance in different developmental states and under different environmental stimuli. A wealth of expression data for yeast is now publicly available through several web sites. The expression information that exists has the obvious benefits of providing a better understanding of the gene expression patterns that accompany changes in a yeast cell's environmental and developmental states. This data has also, however, provided clues to unraveling the complicated questions surrounding gene regulation: why and how is gene expression controlled?
View details for PubMedID 12192590
Functional profiling of the Saccharomyces cerevisiae genome
2002; 418 (6896): 387-391
Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.
View details for DOI 10.1038/nature00935
View details for Web of Science ID 000177009700029
View details for PubMedID 12140549
Microtubule capture by the cleavage apparatus is required for proper spindle positioning in yeast
GENES & DEVELOPMENT
2002; 16 (13): 1627-1639
Cell division is the result of two major cytoskeletal events: partition of the chromatids by the mitotic spindle and cleavage of the cell by the cytokinetic apparatus. Spatial coordination of these events ensures that each daughter cell inherits a nucleus. Here we show that, in budding yeast, capture and shrinkage of astral microtubules at the bud neck is required to position the spindle relative to the cleavage apparatus. Capture required the septins and the microtubule-associated protein Kar9. Like Kar9-defective cells, cells lacking the septin ring failed to position their spindle correctly and showed an increased frequency of nuclear missegregation. Microtubule attachment at the bud neck was followed by shrinkage and a pulling action on the spindle. Enhancement of microtubule shrinkage at the bud neck required the Par-1-related, septin-dependent kinases (SDK) Hsl1 and Gin4. Neither the formin Bnr1 nor the actomyosin contractile ring was required for either microtubule capture or microtubule shrinkage. Together, our results indicate that septins and septin-dependent kinases may coordinate microtubule and actin functions in cell division.
View details for DOI 10.1101/gad.222602
View details for Web of Science ID 000176679100004
View details for PubMedID 12101122
Large-scale identification of genes important for apical growth in Saccharomyces cerevisiae by directed allele replacement technology (DART) screening.
Functional & integrative genomics
2002; 1 (6): 345-356
In Saccharomyces cerevisiae, apical bud growth occurs for a brief period in G1 when the deposition of membrane and cell wall is restricted to the tip of the growing bud. To identify genes important for apical bud growth, we have utilized a novel transposon-based mutagenesis system termed DART (Directed Allele Replacement Technology) that allows the rapid transfer of defined insertion alleles into any strain background. A total of 4,810 insertion alleles affecting 1,392 different yeast genes were transferred into a cdc34-2 mutant strain that arrests in the apical growth phase when grown at the restrictive temperature of 37 degrees C. We identified 29 insertion alleles, containing mutations in 17 different genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, SEC22, FAB1, VPS36, VID22, RAS2, ECM33, OPI3, API1/YDR372c, API2/YDR525w, API3/YKR020w, and API4/YNL051w), which alter the elongated bud morphology of cdc34-2 cells arrested in the apical growth phase. Upon treatment with mating pheromone at 25 degrees C, cells containing insertion alleles affecting ten of these genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, FAB1, VPS36, VID22, and API2/YDR525w) form abnormal mating projections. Additionally, cells containing insertion alleles affecting SEC22, RAS2, API1/YDR372c, API3/YKR020w,and API4/YNL051display severe mating projection formation defects at the elevated temperature of 37 degrees C. DART mutagenesis has many advantages over traditional mutagenesis methods and will be a useful tool for dissecting gene networks important for biological processes.
View details for PubMedID 11957109
Bud-site selection and cell polarity in budding yeast
CURRENT OPINION IN MICROBIOLOGY
2002; 5 (2): 179-186
Polarized growth involves a hierarchy of events such as selection of the growth site, polarization of the cytoskeleton to the selected growth site, and transport of secretory vesicles containing components required for growth. The budding yeast Saccharomyces cerevisiae is an excellent model system for the study of polarized cell growth. A large number of proteins have been found to be involved in these processes, although their mechanisms of action are not yet well-understood. Recent discoveries have helped elucidate many of the processes involved in cell polarity and bud-site selection in yeast and have modified the traditional view of cellular structures involved in these processes. This review focuses on recent advances on the roles of cortical tags, GTPases and the cytoskeleton in the generation and maintenance of cell polarity in yeast.
View details for Web of Science ID 000175460500009
View details for PubMedID 11934615
'Omic' approaches for unraveling signaling networks
CURRENT OPINION IN CELL BIOLOGY
2002; 14 (2): 173-179
Signaling pathways are crucial for cell differentiation and response to cellular environments. Recently, a large number of approaches for the global analysis of genes and proteins have been described. These have provided important new insights into the components of different pathways and the molecular and cellular responses of these pathways. This review covers genomic and proteomic (collectively referred to as "omic") approaches for the global analysis of cell signaling, including gene expression profiling and analysis, protein-protein interaction methods, protein microarrays, mass spectroscopy and gene-disruption and engineering approaches.
View details for DOI 10.1016/S0955-0674(02)00315-0
View details for Web of Science ID 000174193300007
View details for PubMedID 11891116
Carbohydrate analysis prepares to enter the "omics" era
CHEMISTRY & BIOLOGY
2002; 9 (4): 400-401
In this issue, Houseman and Mrksich describe a carbohydrate array preparation method that can be used to analyze protein-carbohydrate interactions and to characterize the substrate specificity of a carbohydrate-modifying enzyme. Carbohydrate chips were prepared by a novel procedure that allows the covalent attachment of carbohydrate-diene conjugates to a specially engineered monolayer surface. The surface presents a precisely controllable ratio of reactive benzoquinone and inert ethylene glycol groups. Nonspecific adsorption of proteins to the surface is extremely low, and the surface is compatible with popular detection techniques. The immobilization technique was demonstrated to be compatible with recently developed automated solid phase carbohydrate synthesis methods, paving the way for the development of highly complex carbohydrate arrays.
View details for Web of Science ID 000175379100002
View details for PubMedID 11983329
Subcellular localization of the yeast proteome
GENES & DEVELOPMENT
2002; 16 (6): 707-719
Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any eukaryote. Using directed topoisomerase I-mediated cloning strategies and genome-wide transposon mutagenesis, we have epitope-tagged 60% of the Saccharomyces cerevisiae proteome. By high-throughput immunolocalization of tagged gene products, we have determined the subcellular localization of 2744 yeast proteins. Extrapolating these data through a computational algorithm employing Bayesian formalism, we define the yeast localizome (the subcellular distribution of all 6100 yeast proteins). We estimate the yeast proteome to encompass approximately 5100 soluble proteins and >1000 transmembrane proteins. Our results indicate that 47% of yeast proteins are cytoplasmic, 13% mitochondrial, 13% exocytic (including proteins of the endoplasmic reticulum and secretory vesicles), and 27% nuclear/nucleolar. A subset of nuclear proteins was further analyzed by immunolocalization using surface-spread preparations of meiotic chromosomes. Of these proteins, 38% were found associated with chromosomal DNA. As determined from phenotypic analyses of nuclear proteins, 34% are essential for spore viability--a percentage nearly twice as great as that observed for the proteome as a whole. In total, this study presents experimentally derived localization data for 955 proteins of previously unknown function: nearly half of all functionally uncharacterized proteins in yeast. To facilitate access to these data, we provide a searchable database featuring 2900 fluorescent micrographs at http://ygac.med.yale.edu.
View details for DOI 10.1101/gad.970902
View details for Web of Science ID 000174516500007
View details for PubMedID 11914276
GATA-1 binding sites mapped in the beta-globin locus by using mammalian chlp-chip analysis
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2002; 99 (5): 2924-2929
The expression of the beta-like globin genes is intricately regulated by a series of both general and tissue-restricted transcription factors. The hemapoietic lineage-specific transcription factor GATA-1 is important for erythroid differentiation and has been implicated in regulating the expression of the erythroid-specific genes including the genes of the beta-globin locus. In the human erythroleukemic K562 cell line, only one DNA region has been identified previously as a putative site of GATA-1 interaction by in vivo footprinting studies. We mapped GATA-1 binding throughout the beta-globin locus by using chIp-chip analysis of K562 cells. We found that GATA-1 binds in a region encompassing the HS2 core element, as was previously identified, and an additional region of GATA-1 binding upstream of the gammaG gene. This approach will be of general utility for mapping transcription factor binding sites within the beta-globin locus and throughout the genome.
View details for DOI 10.1073/pnas.052706999
View details for Web of Science ID 000174284600061
View details for PubMedID 11867748
A question of size: the eukaryotic proteome and the problems in defining it
NUCLEIC ACIDS RESEARCH
2002; 30 (5): 1083-1090
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the 'current' proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences ('the orfome'). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes ('dead' genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)
View details for Web of Science ID 000174229900001
View details for PubMedID 11861898
A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution
JOURNAL OF MOLECULAR BIOLOGY
2002; 316 (3): 409-419
We surveyed the sequenced Saccharomyces cerevisiae genome (strain S288C) comprehensively for open reading frames (ORFs) that could encode full-length proteins but contain obvious mid-sequence disablements (frameshifts or premature stop codons). These pseudogenic features are termed disabled ORFs (dORFs). Using homology to annotated yeast ORFs and non-yeast proteins plus a simple region extension procedure, we have found 183 dORFs. Combined with the 38 existing annotations for potential dORFs, we have a total pool of up to 221 dORFs, corresponding to less than approximately 3% of the proteome. Additionally, we found 20 pairs of annotated ORFs for yeast that could be merged into a single ORF (termed a mORF) by read-through of the intervening stop codon, and may comprise a complete ORF in other yeast strains. Focussing on a core pool of 98 dORFs with a verifying protein homology, we find that most dORFs are substantially decayed, with approximately 90% having two or more disablements, and approximately 60% having four or more. dORFs are much more yeast-proteome specific than live yeast genes (having about half the chance that they are related to a non-yeast protein). They show a dramatically increased density at the telomeres of chromosomes, relative to genes. A microarray study shows that some dORFs are expressed even though they carry multiple disablements, and thus may be more resistant to nonsense-mediated decay. Many of the dORFs may be involved in responding to environmental stresses, as the largest functional groups include growth inhibition, flocculation, and the SRP/TIP1 family. Our results have important implications for proteome evolution. The characteristics of the dORF population suggest the sorts of genes that are likely to fall in and out of usage (and vary in copy number) in a strain-specific way and highlight the role of subtelomeric regions in engendering this diversity. Our results also have important implications for the effects of the [PSI+] prion. The dORFs disabled by only a single stop and the mORFs (together totalling 35) provide an estimate for the extent of the sequence population that can be resurrected readily through the demonstrated ability of the [PSI+] prion to cause nonsense-codon read-through. Also, the dORFs and mORFs that we find have properties (e.g. growth inhibition, flocculation, vanadate resistance, stress response) that are potentially related to the ability of [PSI+] to engender substantial phenotypic variation in yeast strains under different environmental conditions. (See genecensus.org/pseudogene for further information.)
View details for DOI 10.1006/jmbi.2001.5343
View details for Web of Science ID 000174216400001
View details for PubMedID 11866506
An integrated approach for finding overlooked genes in yeast
2002; 20 (1): 58-63
We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to beta-galactosidase (beta-gal); non-annotated open reading frames (ORFs) translated as beta-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.
View details for Web of Science ID 000173031600037
View details for PubMedID 11753363
- Insertional mutagenesis: Transposon-insertion libraries as mutagens in yeast GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B 2002; 350: 219-229
- ChIP-chip: A genomic approach for identifying transcription factor binding sites GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B 2002; 350: 469-483
The TRIPLES database: a community resource for yeast molecular biology
NUCLEIC ACIDS RESEARCH
2002; 30 (1): 73-75
TRIPLES is a web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces cerevisiae-a relational database housing nearly half a million data points generated from an ongoing study using large-scale transposon mutagenesis to characterize gene function in yeast. At present, TRIPLES contains three principal data sets (i.e. phenotypic data, protein localization data and expression data) for over 3500 annotated yeast genes as well as several hundred non-annotated open reading frames. In addition, the TRIPLES web site provides online order forms linked to each data set so that users may request any strain or reagent generated from this project free of charge. In response to user requests, the TRIPLES web site has undergone several recent modifications. Our localization data have been supplemented with approximately 500 fluorescent micrographs depicting actual staining patterns observed upon indirect immunofluorescence analysis of indicated epitope-tagged proteins. These localization data, as well as all other data sets within TRIPLES, are now available in full as tab-delimited text. To accommodate increased reagent requests, all orders are now cataloged in a separate database, and users are notified immediately of order receipt and shipment. Also, TRIPLES is one of five sites incorporated into the new functional analysis tool Function Junction provided by the Saccharomyces Genome Database. TRIPLES may be accessed from the Yale Genome Analysis Center (YGAC) homepage at http://ygac.med.yale.edu.
View details for Web of Science ID 000173077100018
View details for PubMedID 11752258
YMD: A microarray database for large-scale gene expression analysis
Annual Symposium of the American-Medical-Informatics-Association
HANLEY & BELFUS INC MED PUBLISHERS. 2002: 140–144
The use of microarray technology to perform parallel analysis of the expression pattern of a large number of genes in a single experiment has created a new frontier of medical research. The vast amount of gene expression data generated from multiple microarray experiments requires a robust database system that allows efficient data storage, retrieval, secure access, data dissemination, and integrated data analyses. To address the growing needs of microarray researchers at Yale and their collaborators, we have built the Yale Microarray Database (YMD). YMD is Web-accessible with the following features: (i) a Web program that tracks DNA samples between source plates and arrays, (ii) the capability of finding common genes/clones across different array platforms, (iii) an image file server, (iv) laboratory-based user management and access privileges, (v) project management, (vi) template data entry, (vii) linking gene expression data to annotation databases for functional analysis. YMD is currently being used on a pilot basis by several laboratories for different organisms and array platforms.
View details for Web of Science ID 000189418100029
View details for PubMedID 12463803
Phosphorylation of gamma-tubulin regulates microtubule organization in budding yeast
2001; 1 (5): 621-631
gamma-Tubulin is essential for microtubule nucleation in yeast and other organisms; whether this protein is regulated in vivo has not been explored. We show that the budding yeast gamma-tubulin (Tub4p) is phosphorylated in vivo. Hyperphosphorylated Tub4p isoforms are restricted to G1. A conserved tyrosine near the carboxy terminus (Tyr445) is required for phosphorylation in vivo. A point mutation, Tyr445 to Asp, causes cells to arrest prior to anaphase. The frequency of new microtubules appearing in the SPB region and the number of microtubules are increased in tub4-Y445D cells, suggesting this mutation promotes microtubule assembly. These data suggest that modification of gamma-tubulin is important for controlling microtubule number, thereby influencing microtubule organization and function during the yeast cell cycle.
View details for Web of Science ID 000175301700008
View details for PubMedID 11709183
A filamentous growth response mediated by the yeast mating pathway
2001; 159 (3): 919-928
Haploid cells of the budding yeast Saccharomyces cerevisiae respond to mating pheromones by arresting their cell-division cycle in G1 and differentiating into a cell type capable of locating and fusing with mating partners. Yeast cells undergo chemotactic cell surface growth when pheromones are present above a threshold level for morphogenesis; however, the morphogenetic responses of cells to levels of pheromone below this threshold have not been systematically explored. Here we show that MATa haploid cells exposed to low levels of the alpha-factor mating pheromone undergo a novel cellular response: cells modulate their division patterns and cell shape, forming colonies composed of filamentous chains of cells. Time-lapse analysis of filament formation shows that its dynamics are distinct from that of pseudohyphal growth; during pheromone-induced filament formation, daughter cells are delayed relative to mother cells with respect to the timing of bud emergence. Filament formation requires the RSR1(BUD1), BUD8, SLK1/BCK1, and SPA2 genes and many elements of the STE11/STE7 MAP kinase pathway; this response is also independent of FAR1, a gene involved in orienting cell polarization during the mating response. We suggest that mating yeast cells undergo a complex response to low levels of pheromone that may enhance the ability of cells to search for mating partners through the modification of cell shape and alteration of cell-division patterns.
View details for Web of Science ID 000172665800002
View details for PubMedID 11729141
Global analysis of protein activities using proteome chips
2001; 293 (5537): 2101-2105
To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.
View details for Web of Science ID 000171028700077
View details for PubMedID 11474067
A genomic study of the bipolar bud site selection pattern in Saccharomyces cerevisiae
MOLECULAR BIOLOGY OF THE CELL
2001; 12 (7): 2147-2170
A genome-wide screen of 4168 homozygous diploid yeast deletion strains has been performed to identify nonessential genes that participate in the bipolar budding pattern. By examining bud scar patterns representing the sites of previous cell divisions, 127 mutants representing three different phenotypes were found: unipolar, axial-like, and random. From this screen, 11 functional classes of known genes were identified, including those involved in actin-cytoskeleton organization, general bud site selection, cell polarity, vesicular transport, cell wall synthesis, protein modification, transcription, nuclear function, translation, and other functions. Four characterized genes that were not known previously to participate in bud site selection were also found to be important for the haploid axial budding pattern. In addition to known genes, we found 22 novel genes (20 are designated BUD13-BUD32) important for bud site selection. Deletion of one resulted in unipolar budding exclusively from the proximal pole, suggesting that this gene plays an important role in diploid distal budding. Mutations in 20 other novel BUD genes produced a random budding phenotype and one produced an axial-like budding defect. Several of the novel Bud proteins were fused to green fluorescence protein; two proteins were found to localize to sites of polarized cell growth (i.e., the bud tip in small budded cells and the neck in cells undergoing cytokinesis), similar to that postulated for the bipolar signals and proteins that target cell division site tags to their proper location in the cell. Four others localized to the nucleus, suggesting that they play a role in gene expression. The bipolar distal marker Bud8 was localized in a number of mutants; many showed an altered Bud8-green fluorescence protein localization pattern. Through the genome-wide identification and analysis of different mutants involved in bipolar bud site selection, an integrated pathway for this process is presented in which proximal and distal bud site selection tags are synthesized and localized at their appropriate poles, thereby directing growth at those sites. Genome-wide screens of defined collections of mutants hold significant promise for dissecting many biological processes in yeast.
View details for Web of Science ID 000170350300019
View details for PubMedID 11452010
Genome-wide transposon mutagenesis in yeast.
Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.]
2001; Chapter 13: Unit13 3-?
This unit provides comprehensive protocols for the use of insertional libraries generated by shuttle mutagenesis. From the basic protocol, a small aliquot of insertional library DNA may be used to mutagenize yeast, producing strains containing a single transposon insertion within a transcribed and translated region of the genome. This transposon-mutagenized bank of yeast strains may be screened for any desired mutant phenotype. Alternatively, since the transposon contains a reporter gene lacking its start codon and promoter, transposon-tagged strains may also be screened for specific patterns of gene expression. Strains of interest may be characterized by vectorette PCR (protocol provided) in order to locate the precise genomic site of transposon insertion within each mutant. A method by which Cre/lox recombination may be used to reduce the transposon in yeast to a small insertion element encoding an epitope tag is described. This tag serves as a tool by which transposon-mutagenized gene products may be analyzed further (e.g., localized to a discrete subcellular site).
View details for DOI 10.1002/0471142727.mb1303s51
View details for PubMedID 18265099
Emerging technologies in yeast genomics
NATURE REVIEWS GENETICS
2001; 2 (4): 302-312
The genomic revolution is undeniable: in the past year alone, the term 'genomics' was found in nearly 500 research articles, and at least 6 journals are devoted solely to genomic biology. More than just a buzzword, molecular biology has genuinely embraced genomics (the systematic, large-scale study of genomes and their functions). With its facile genetics, the budding yeast Saccharomyces cerevisiae has emerged as an important model organism in the development of many current genomic methodologies. These techniques have greatly influenced the manner in which biology is studied in yeast and in other organisms. In this review, we summarize the most promising technologies in yeast genomics.
View details for Web of Science ID 000167837900015
View details for PubMedID 11283702
The Cbk1p pathway is important for polarized cell growth and cell separation in Saccharomyces cerevisiae
MOLECULAR AND CELLULAR BIOLOGY
2001; 21 (7): 2449-2462
During the early stages of budding, cell wall remodeling and polarized secretion are concentrated at the bud tip (apical growth). The CBK1 gene, encoding a putative serine/threonine protein kinase, was identified in a screen designed to isolate mutations that affect apical growth. Analysis of cbk1Delta cells reveals that Cbk1p is required for efficient apical growth, proper mating projection morphology, bipolar bud site selection in diploid cells, and cell separation. Epitope-tagged Cbk1p localizes to both sides of the bud neck in late anaphase, just prior to cell separation. CBK1 and another gene, HYM1, were previously identified in a screen for genes involved in transcriptional repression and proposed to function in the same pathway. Deletion of HYM1 causes phenotypes similar to those observed in cbk1Delta cells and disrupts the bud neck localization of Cbk1p. Whole-genome transcriptional analysis of cbk1Delta suggests that the kinase regulates the expression of a number of genes with cell wall-related functions, including two genes required for efficient cell separation: the chitinase-encoding gene CTS1 and the glucanase-encoding gene SCW11. The Ace2p transcription factor is required for expression of CTS1 and has been shown to physically interact with Cbk1p. Analysis of ace2Delta cells reveals that Ace2p is required for cell separation but not for polarized growth. Our results suggest that Cbk1p and Hym1p function to regulate two distinct cell morphogenesis pathways: an ACE2-independent pathway that is required for efficient apical growth and mating projection formation and an ACE2-dependent pathway that is required for efficient cell separation following cytokinesis. Cbk1p is most closely related to the Neurospora crassa Cot-1; Schizosaccharomyces pombe Orb6; Caenorhabditis elegans, Drosophila, and human Ndr; and Drosophila and mammalian WARTS/LATS kinases. Many Cbk1-related kinases have been shown to regulate cellular morphology.
View details for Web of Science ID 000167451500019
View details for PubMedID 11259593
Protein arrays and microarrays
CURRENT OPINION IN CHEMICAL BIOLOGY
2001; 5 (1): 40-45
In the past, studies of protein activities have focused on studying a single protein at a time, which is often time-consuming and expensive. Recently, with the sequencing of entire genomes, large-scale proteome analysis has begun. Arrays of proteins have been used for the determination of subcellular localization, analysis of protein-protein interactions and biochemical analysis of protein function. New protein-microarray technologies have been introduced that enable the high-throughput analysis of protein activities. These have the potential to revolutionize the analysis of entire proteomes.
View details for Web of Science ID 000167051500006
View details for PubMedID 11166646
Large-scale mutagenesis: yeast genetics in the genome era
CURRENT OPINION IN BIOTECHNOLOGY
2001; 12 (1): 28-34
The completion of the DNA sequence of the budding yeast Saccharomyces cerevisiae resulted in the identification of a large number of genes. However, the function of most of these genes is not known. One of the best ways to determine gene function is to carry out mutational and phenotypic analysis. In recent years, several approaches have been developed for the mutational analysis of yeast genes on a large scale. These include transposon-based insertional mutagenesis, and systematic deletions using PCR-based approaches. These projects have produced collections of yeast strains and plasmid alleles that can be screened using novel approaches. Analysis of these collections by the scientific community promises to reveal a great deal of biological information about this organism.
View details for Web of Science ID 000167209900005
View details for PubMedID 11167069
A metadata framework for interoperating heterogeneous genome data using XML
Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001)
BMJ PUBLISHING GROUP. 2001: 110–114
The rapid advances in the Human Genome Project and genomic technologies have produced massive amounts of data populated in a large number of network-accessible databases. These technological advances and the associated data can have a great impact on biomedicine and healthcare. To answer many of the biologically or medically important questions, researchers often need to integrate data from a number of independent but related genome databases. One common practice is to download data sets (text files) from various genome Web sites and process them by some local programs. One main problem with this approach is that these programs are written on a case-by-case basis because the data sets involved are heterogeneous in structure. To address this problem, we define metadata that maps these heterogeneously structured files into a common eXtensible Markup Language (XML) structure to facilitate data interoperation. We illustrate this approach by interoperating two sets of essential yeast genes that are stored in two yeast genome databases (MIPS and YPD).
View details for Web of Science ID 000172263400024
View details for PubMedID 11825164
The carboxy terminus of Tub4p is required for gamma-tubulin function in budding yeast
JOURNAL OF CELL SCIENCE
2000; 113 (21): 3871-3882
The role of gamma-tubulin in microtubule nucleation is well established, however, its function in other aspects of microtubule organization is unknown. The carboxy termini of alpha/beta-tubulins influence the assembly and stability of microtubules. We investigated the role of the carboxy terminus of yeast gamma-tubulin (Tub4p) in microtubule organization. This region consists of a conserved domain (DSYLD), and acidic tail. Cells expressing truncations lacking the DSYLD domain, tail or both regions are temperature sensitive for growth. Growth defects of tub4 mutants lacking either or both carboxy-terminal domains are suppressed by the microtubule destabilizing drug benomyl. tub4 carboxy-terminal mutants arrest as large budded cells with short bipolar spindles positioned at the bud neck. Electron microscopic analysis of wild-type and CTR mutant cells reveals that SPBs are tightly associated with the bud neck/cortex by cytoplasmic microtubules in mutants lacking the tail region (tub4-delta 444, tub4-delta 448). Mutants lacking the DSYLD residues (tub4-delta 444, tub4-delta DSYLD) form many cytoplasmic microtubules. We propose that the carboxy terminus of Tub4p is required for re-organization of the microtubules upon completion of nuclear migration, and facilitates spindle elongation into the bud.
View details for Web of Science ID 000165515000019
View details for PubMedID 11034914
Analysis of yeast protein kinases using protein chips
2000; 26 (3): 283-289
We have developed a novel protein chip technology that allows the high-throughput analysis of biochemical activities, and used this approach to analyse nearly all of the protein kinases from Saccharomyces cerevisiae. Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides. The high density and small size of the wells allows for high-throughput batch processing and simultaneous analysis of many individual samples. Only small amounts of protein are required. Of 122 known and predicted yeast protein kinases, 119 were overexpressed and analysed using 17 different substrates and protein chips. We found many novel activities and that a large number of protein kinases are capable of phosphorylating tyrosine. The tyrosine phosphorylating enzymes often share common amino acid residues that lie near the catalytic region. Thus, our study identified a number of novel features of protein kinases and demonstrates that protein chip technology is useful for high-throughput screening of protein biochemical activity.
View details for Web of Science ID 000165176500015
View details for PubMedID 11062466
New antimicrobial flavanones from Physena madagascariensis
JOURNAL OF NATURAL PRODUCTS
2000; 63 (8): 1082-1089
Two new flavanones (1 and 2) with antibacterial activity were isolated from the methanolic extract of the dried leaves of Physena madagascariensis using activity against Staphylococcus aureus to guide the isolation. A third flavonoid, a flavanone dimer linked by a methylene group (3) was also isolated and proved to be inactive. The structures of 1 and 2 were established primarily from NMR studies, while that of 3 required more extensive mass spectrometric analysis. All three flavanones had lavandulyl units in the limonene form. Flavanones 1 and 2 were active against several bacteria at concentrations as low as 4 microM.
View details for DOI 10.1021/np000054m
View details for Web of Science ID 000089056600009
View details for PubMedID 10978202
Polarized growth controls cell shape and bipolar bud site selection in Saccharomyces cerevisiae
MOLECULAR AND CELLULAR BIOLOGY
2000; 20 (14): 5235-5247
We examined the relationship between polarized growth and division site selection, two fundamental processes important for proper development of eukaryotes. Diploid Saccharomyces cerevisiae cells exhibit an ellipsoidal shape and a specific division pattern (a bipolar budding pattern). We found that the polarity genes SPA2, PEA2, BUD6, and BNI1 participate in a crucial step of bud morphogenesis, apical growth. Deleting these genes results in round cells and diminishes bud elongation in mutants that exhibit pronounced apical growth. Examination of distribution of the polarized secretion marker Sec4 demonstrates that spa2Delta, pea2Delta, bud6Delta, and bni1Delta mutants fail to concentrate Sec4 at the bud tip during apical growth and at the division site during repolarization just prior to cytokinesis. Moreover, cell surface expansion is not confined to the distal tip of the bud in these mutants. In addition, we found that the p21-activated kinase homologue Ste20 is also important for both apical growth and bipolar bud site selection. We further examined how the duration of polarized growth affects bipolar bud site selection by using mutations in cell cycle regulators that control the timing of growth phases. The grr1Delta mutation enhances apical growth by stabilizing G(1) cyclins and increases the distal-pole budding in diploids. Prolonging polarized growth phases by disrupting the G(2)/M cyclin gene CLB2 enhances the accuracy of bud site selection in wild-type, spa2Delta, and ste20Delta cells, whereas shortening the polarized growth phases by deleting SWE1 decreases the fidelity of bipolar budding. This study reports the identification of components required for apical growth and demonstrates the critical role of polarized growth in bipolar bud site selection. We propose that apical growth and repolarization at the site of cytokinesis are crucial for establishing spatial cues used by diploid yeast cells to position division planes.
View details for Web of Science ID 000087820000027
View details for PubMedID 10866679
The Kar3p kinesin-related protein forms a novel heterodimeric structure with its associated protein Cik1p
MOLECULAR BIOLOGY OF THE CELL
2000; 11 (7): 2373-2385
Proteins that physically associate with members of the kinesin superfamily are critical for the functional diversity observed for these microtubule motor proteins. However, quaternary structures of complexes between kinesins and kinesin-associated proteins are poorly defined. We have analyzed the nature of the interaction between the Kar3 motor protein, a minus-end-directed kinesin from yeast, and its associated protein Cik1. Extraction experiments demonstrate that Kar3p and Cik1p are tightly associated. Mapping of the interaction domains of the two proteins by two-hybrid analyses indicates that Kar3p and Cik1p associate in a highly specific manner along the lengths of their respective coiled-coil domains. Sucrose gradient velocity centrifugation and gel filtration experiments were used to determine the size of the Kar3-Cik1 complex from both mating pheromone-treated cells and vegetatively growing cells. These experiments predict a size for this complex that is consistent with that of a heterodimer containing one Kar3p subunit and one Cik1p subunit. Finally, immunoprecipitation of epitope-tagged and untagged proteins confirms that only one subunit of Kar3p and Cik1p are present in the Kar3-Cik1 complex. These findings demonstrate that the Kar3-Cik1 complex has a novel heterodimeric structure not observed previously for kinesin complexes.
View details for Web of Science ID 000088184800016
View details for PubMedID 10888675
Drivers and passengers wanted! The role of kinesin-associated proteins
TRENDS IN CELL BIOLOGY
2000; 10 (7): 281-289
Members of the kinesin superfamily of proteins participate in a wide variety of cellular processes. Although much attention has been devoted to the structural and biophysical properties of the force-generating motor domain of kinesins, the factors controlling the functional specificity of each kinesin have only recently been examined. Genetic and biochemical approaches have identified two classes of proteins that associate physically with the diverse non-motor domains of kinesins. These proteins can be divided into two general classes: first, those that form tight complexes with the kinesin and are instrumental in directing the distinct function of the motor (i.e. drivers) and, second, those proteins that might transiently interact with the motor or be an integral part of the motor's cargo (i.e. passengers). Here, we discuss known kinesin-binding proteins, and how they might participate in the activity of their motor partners.
View details for Web of Science ID 000087769300004
View details for PubMedID 10856931
Genome-wide mutant collections: toolboxes for functional genomics
CURRENT OPINION IN MICROBIOLOGY
2000; 3 (3): 309-315
The sequencing of entire genomes has led to the identification of many genes. A future challenge will be to determine the function of all of the genes of an organism. One of the best ways to ascertain function is to disrupt genes and determine the phenotype of the resulting organism. Novel large-scale approaches for generating gene disruptions and analyzing the resulting phenotype are underway in the budding yeast Saccharomyces cerevisiae and other organisms including flies, Mycoplasma, worms, plants and mice. These approaches and mutant collections will be extremely valuable to the scientific community and will dramatically alter the manner in which science is performed in the future.
View details for Web of Science ID 000087635200015
View details for PubMedID 10851164
An integrated web interface for large-scale characterization of sequence data.
Functional & integrative genomics
2000; 1 (1): 70-75
Large-scale genome projects require the analysis of large amounts of raw data. This analysis often involves the application of a chain of biology-based programs. Many of these programs are difficult to operate because they are non-integrated, command-line driven, and platform-dependent. The problem is compounded when the number of data files involved is large, making navigation and status-tracking difficult. To demonstrate how this problem can be addressed, we have created a platform-independent Web front end that integrates a set of programs used in a genomic project analyzing gene function by transposon mutagenesis in Saccharomyces cerevisiae. In particular, these programs help define a large number of transposon insertion events within the yeast genome, identifying both the precise site of transposon insertion as well as potential open reading frames disrupted by this insertion event. Our Web interface facilitates this analysis by performing the following tasks. Firstly, it allows each of the analysis programs to be launched against multiple directories of data files. Secondly, it allows the user to view, download, and upload files generated by the programs. Thirdly, it indicates which sets of data directories have been processed by each program. Although designed specifically to aid in this project, our interface exemplifies a general approach by which independent software programs may be integrated into an efficient protocol for large-scale genomic data processing.
View details for PubMedID 11793223
Compartmentalization of the cell cortex by septins is required for maintenance of cell polarity in yeast
2000; 5 (5): 841-851
Formation and maintenance of specialized plasma membrane domains are crucial for many biological processes, such as cell polarization and signaling. During isotropic bud growth, the yeast cell periphery is divided into two domains: the bud surface, an active site of exocytosis and growth, and the relatively quiescent surface of the mother cell. We found that cells lacking septins at the bud neck failed to maintain the exocytosis and morphogenesis factors Spa2, Sec3, Sec5, and Myo2 in the bud during isotropic growth. Furthermore, we found that septins were required for proper regulation of actin patch stability; septin-defective cells permitted to enter isotropic growth lost actin and growth polarity. We propose that septins maintain cell polarity by specifying a boundary between cortical domains.
View details for Web of Science ID 000087332500008
View details for PubMedID 10882120
Regulation of cytokinesis by the Elm1 protein kinase in Saccharomyces cerevisiae
JOURNAL OF CELL SCIENCE
2000; 113 (8): 1435-1445
A Saccharomyces cerevisiae mutant unable to grow in a cdc28-1N background was isolated and shown to be affected in the ELM1 gene. Elm1 is a protein kinase, thought to be a negative regulator of pseudo-hyphal growth. We show that Cdc11, one of the septins, is delocalised in the mutant, indicating that septin localisation is partly controlled by Elm1. Moreover, we show that cytokinesis is delayed in an elm1delta mutant. Elm1 levels peak at the end of the cell cycle and Elm1 is localised at the bud neck in a septin-dependent fashion from bud emergence until the completion of anaphase, at about the time of cell division. Genetic and biochemical evidence suggest that Elm1 and the three other septin-localised protein kinases, Hsl1, Gin4 and Kcc4, work in parallel pathways to regulate septin behaviour and cytokinesis. In addition, the elm1delta;) morphological defects can be suppressed by deletion of the SWE1 gene, but not the cytokinesis defect nor the septin mislocalisation. Our results indicate that cytokinesis in budding yeast is regulated by Elm1.
View details for Web of Science ID 000086855200012
View details for PubMedID 10725226
Sbe2p and Sbe22p, two homologous Golgi proteins involved in yeast cell wall formation
MOLECULAR BIOLOGY OF THE CELL
2000; 11 (2): 435-452
The cell wall of fungal cells is important for cell integrity and cell morphogenesis and protects against harmful environmental conditions. The yeast cell wall is a complex structure consisting mainly of mannoproteins, glucan, and chitin. The molecular mechanisms by which the cell wall components are synthesized and transported to the cell surface are poorly understood. We have identified and characterized two homologous yeast proteins, Sbe2p and Sbe22p, through their suppression of a chs5 spa2 mutant strain defective in chitin synthesis and cell morphogenesis. Although sbe2 and sbe22 null mutants are viable, sbe2 sbe22 cells display several phenotypes indicative of defects in cell integrity and cell wall structure. First, sbe2 sbe22 cells display a sorbitol-remediable lysis defect at 37 degrees C and are hypersensitive to SDS and calcofluor. Second, electron microscopic analysis reveals that sbe2 sbe22 cells have an aberrant cell wall structure with a reduced mannoprotein layer. Finally, immunofluorescence experiments reveal that in small-budded cells, sbe2 sbe22 mutants mislocalize Chs3p, a protein involved in chitin synthesis. In addition, sbe2 sbe22 diploids have a bud-site selection defect, displaying a random budding pattern. A Sbe2p-GFP fusion protein localizes to cytoplasmic patches, and Sbe2p cofractionates with Golgi proteins. Deletion of CHS5, which encodes a Golgi protein involved in the transport of Chs3p to the cell periphery, is lethal in combination with disruption of SBE2 and SBE22. Thus, we suggest a model in which Sbe2p and Sbe22p are involved in the transport of cell wall components from the Golgi apparatus to the cell surface periphery in a pathway independent of Chs5p.
View details for Web of Science ID 000085478500003
View details for PubMedID 10679005
Mutagenesis of murine cytomegalovirus using a Tn3-based transposon
2000; 266 (2): 264-274
A transposon derived from Escherichia coli Tn3 was introduced into the genome of murine cytomegalovirus (MCMV) to generate a pool of viral mutants. We analyzed three of the constructed recombinant viruses that contained the transposon within the M25, M27, and m155 open reading frames. Our studies provide the first direct evidence to suggest that M25 and M27 are not essential for viral replication in mouse NIH 3T3 cells. Studies in cultured cells and Balb/c mice indicated that the transposon insertion is stable during viral propagation both in vitro and in vivo. Moreover the virus that contained the insertion mutation in M25 exhibited a titer similar to that of the wild-type virus in the salivary glands, lungs, livers, spleens, and kidneys of the Balb/c mice that were intraperitoneally infected with these viruses. These results suggest that M25 is dispensable for viral growth in these organs and the presence of the transposon sequence in the viral genome does not significantly affect viral replication in vivo. The Tn3-based system can be used as a mutagenesis approach for studying the function of MCMV genes in both tissue culture and in animals.
View details for Web of Science ID 000085018400005
View details for PubMedID 10639313
Graphically-enabled integration of bioinformatics tools allowing parallel execution
Annual Symposium of the American-Medical-Informatics-Association
HANLEY & BELFUS INC. 2000: 141–145
Rapid analysis of large amounts of genomic data is of great biological as well as medical interest. This type of analysis will greatly benefit from the ability to rapidly assemble a set of related analysis programs and to exploit the power of parallel computing. TurboGenomics, which is a software package currently in its alpha-testing phase, allows integration of heterogeneous software components to be done graphically. In addition, the tool is capable of making the integrated components run in parallel. To demonstrate these abilities, we use the tool to develop a Web-based application that allows integrated access to a set of large-scale sequence data analysis programs used by a transposon-insertion based yeast genome project. We also contrast the differences in building such an application with and without using the TurboGenomics software.
View details for Web of Science ID 000170207500030
View details for PubMedID 11079861
TRIPLES: a database of gene function in Saccharomyces cerevisiae
NUCLEIC ACIDS RESEARCH
2000; 28 (1): 81-84
Using a novel multipurpose mini-transposon, we have generated a collection of defined mutant alleles for the analysis of disruption phenotypes, protein localization, and gene expression in Saccharomyces cerevisiae. To catalog this unique data set, we have developed TRIPLES, a Web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces. Encompassing over 250 000 data points, TRIPLES provides convenient access to information from nearly 7800 transposon-mutagenized yeast strains; within TRIPLES, complete data reports of each strain may be viewed in table format, or if desired, downloaded as tab-delimited text files. Each report contains external links to corresponding entries within the Saccharomyces Genome Database and International Nucleic Acid Sequence Data Library (GenBank). Unlike other yeast databases, TRIPLES also provides on-line order forms linked to each clone report; users may immediately request any desired strain free-of-charge by submitting a completed form. In addition to presenting a wealth of information for over 2300 open reading frames, TRIPLES constitutes an important medium for the distribution of useful reagents throughout the yeast scientific community. Maintained by the Yale Genome Analysis Center, TRIPLES may be accessed at http://ycmi.med.yale.edu/ygac/triples.htm
View details for Web of Science ID 000084896300021
View details for PubMedID 10592187
- gamma-tubulin of budding yeast CENTROSOME IN CELL REPLICATION AND EARLY DEVELOPMENT 2000; 49: 75-104
- High-throughput methods for the large-scale analysis of gene function by transposon tagging APPLICATIONS OF CHIMERIC GENES AND HYBRID PROTEINS, PT C 2000; 328: 550-574
Large-scale analysis of the yeast genome by transposon tagging and gene disruption
1999; 402 (6760): 413-418
Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background--a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.
View details for Web of Science ID 000083913600057
View details for PubMedID 10586881
- Rationale and design of the National Emphysema Treatment Trial (NETT): A prospective randomized trial of lung volume reduction surgery JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY 1999; 118 (3): 518-528
Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis
1999; 285 (5429): 901-906
The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs (more than one-third of the ORFs in the genome). Of the deleted ORFs, 17 percent were essential for viability in rich medium. The phenotypes of more than 500 deletion strains were assayed in parallel. Of the deletion strains, 40 percent showed quantitative growth defects in either rich or minimal medium.
View details for Web of Science ID 000081860900053
View details for PubMedID 10436161
Differential regulation of the Kar3p kinesin-related protein by two associated proteins, Cik1p and Vik1p
JOURNAL OF CELL BIOLOGY
1999; 144 (6): 1219-1233
The mechanisms by which kinesin-related proteins interact with other proteins to carry out specific cellular processes is poorly understood. The kinesin-related protein, Kar3p, has been implicated in many microtubule functions in yeast. Some of these functions require interaction with the Cik1 protein (Page, B.D., L.L. Satterwhite, M.D. Rose, and M. Snyder. 1994. J. Cell Biol. 124:507-519). We have identified a Saccharomyces cerevisiae gene, named VIK1, encoding a protein with sequence and structural similarity to Cik1p. The Vik1 protein is detected in vegetatively growing cells but not in mating pheromone-treated cells. Vik1p physically associates with Kar3p in a complex separate from that of the Kar3p-Cik1p complex. Vik1p localizes to the spindle-pole body region in a Kar3p-dependent manner. Reciprocally, concentration of Kar3p at the spindle poles during vegetative growth requires the presence of Vik1p, but not Cik1p. Phenotypic analysis suggests that Cik1p and Vik1p are involved in different Kar3p functions. Disruption of VIK1 causes increased resistance to the microtubule depolymerizing drug benomyl and partially suppresses growth defects of cik1Delta mutants. The vik1Delta and kar3Delta mutations, but not cik1Delta, partially suppresses the temperature-sensitive growth defect of strains lacking the function of two other yeast kinesin-related proteins, Cin8p and Kip1p. Our results indicate that Kar3p forms functionally distinct complexes with Cik1p and Vik1p to participate in different microtubule-mediated events within the same cell.
View details for Web of Science ID 000079470900011
View details for PubMedID 10087265
SHC1, a high pH inducible gene required for growth at alkaline pH in Saccharomyces cerevisiae
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
1999; 255 (1): 116-122
In this study, we carried out a large-scale transposon tagging screening to identify genes whose expression is regulated by ambient pH. Of 35,000 transformants, two strains carrying the genes whose expression is strictly dependent on pH of growth medium were identified. One of the genes with 20-fold induction by alkali pH was identified as SHC1 gene in the Yeast Genome Directory and its expression was the highest at alkaline pH and moderately induced by osmotic stress. However, the gene was expressed neither at acidic pH nor by other stress conditions. The haploid mutant with truncated shc1 gene showed growth retardation and an abnormal morphology at alkaline pH. On the other hand, the mutant strain carrying the wild-type SHC1 gene reverted to the mutant phenotype. To confirm that Shc1p is an alkali-inducible protein, a monoclonal antibody to Shc1p was produced. While a 55-kDa protein band appeared on the Western blot of cells grown at alkaline pH, Shc1p was barely detectable on the blots of cells grown in YPD. Our results indicate that yeast cells have an efficient system adapting to large variations in ambient pH and SHC1 is one of the genes required for the growth at alkaline pH.
View details for Web of Science ID 000078599700021
View details for PubMedID 10082665
Nim1-related kinases coordinate cell cycle progression with the organization of the peripheral cytoskeleton in yeast
GENES & DEVELOPMENT
1999; 13 (2): 176-187
The mechanisms that couple cell cycle progression with the organization of the peripheral cytoskeleton are poorly understood. In Saccharomyces cerevisiae, the Swe1 protein has been shown previously to phosphorylate and inactivate the cyclin-dependent kinase, Cdc28, thereby delaying the onset of mitosis. The nim1-related protein kinase, Hsl1, induces entry into mitosis by negatively regulating Swe1. We have found that Hsl1 physically associates with the septin cytoskeleton in vivo and that Hsl1 kinase activity depends on proper septin function. Genetic analysis indicates that two additional Hsl1-related kinases, Kcc4 and Gin4, act redundantly with Hsl1 to regulate Swe1. Kcc4, like Hsl1 and Gin4, was found to localize to the bud neck in a septin-dependent fashion. Interestingly, hsl1 kcc4 gin4 triple mutants develop a cellular morphology extremely similar to that of septin mutants. Consistent with the idea that Hsl1, Kcc4, and Gin4 link entry into mitosis to proper septin organization, we find that septin mutants incubated at the restrictive temperature trigger a Swe1-dependent mitotic delay that is necessary to maintain cell viability. These results reveal for the first time how cells monitor the organization of their cytoskeleton and demonstrate the existence of a cell cycle checkpoint that responds to defects in the peripheral cytoskeleton. Moreover, Hsl1, Kcc4, and Gin4 have homologs in higher eukaryotes, suggesting that the regulation of Swe1/Wee1 by this class of kinases is highly conserved.
View details for Web of Science ID 000078395100007
View details for PubMedID 9925642
- Transposon mutagenesis for the analysis of protein production, function, and localization CDNA PREPARATION AND CHARACTERIZATION 1999; 303: 512-532
Spa2p interacts with cell polarity proteins and signaling components involved in yeast cell morphogenesis
MOLECULAR AND CELLULAR BIOLOGY
1998; 18 (7): 4053-4069
The yeast protein Spa2p localizes to growth sites and is important for polarized morphogenesis during budding, mating, and pseudohyphal growth. To better understand the role of Spa2p in polarized growth, we analyzed regions of the protein important for its function and proteins that interact with Spa2p. Spa2p interacts with Pea2p and Bud6p (Aip3p) as determined by the two-hybrid system; all of these proteins exhibit similar localization patterns, and spa2Delta, pea2Delta, and bud6Delta mutants display similar phenotypes, suggesting that these three proteins are involved in the same biological processes. Coimmunoprecipitation experiments demonstrate that Spa2p and Pea2p are tightly associated with each other in vivo. Velocity sedimentation experiments suggest that a significant portion of Spa2p, Pea2p, and Bud6p cosediment, raising the possibility that these proteins form a large, 12S multiprotein complex. Bud6p has been shown previously to interact with actin, suggesting that the 12S complex functions to regulate the actin cytoskeleton. Deletion analysis revealed that multiple regions of Spa2p are involved in its localization to growth sites. One of the regions involved in Spa2p stability and localization interacts with Pea2p; this region contains a conserved domain, SHD-II. Although a portion of Spa2p is sufficient for localization of itself and Pea2p to growth sites, only the full-length protein is capable of complementing spa2 mutant defects, suggesting that other regions are required for Spa2p function. By using the two-hybrid system, Spa2p and Bud6p were also found to interact with components of two mitogen-activated protein kinase (MAPK) pathways important for polarized cell growth. Spa2p interacts with Ste11p (MAPK kinase [MEK] kinase) and Ste7p (MEK) of the mating signaling pathway as well as with the MEKs Mkk1p and Mkk2p of the Slt2p (Mpk1p) MAPK pathway; for both Mkk1p and Ste7p, the Spa2p-interacting region was mapped to the N-terminal putative regulatory domain. Bud6p interacts with Ste11p. The MEK-interacting region of Spa2p corresponds to the highly conserved SHD-I domain, which is shown to be important for mating and MAPK signaling. spa2 mutants exhibit reduced levels of pheromone signaling and an elevated level of Slt2p kinase activity. We thus propose that Spa2p, Pea2p, and Bud6p function together, perhaps as a complex, to promote polarized morphogenesis through regulation of the actin cytoskeleton and signaling pathways.
View details for Web of Science ID 000074380100044
View details for PubMedID 9632790
Ursodiol prophylaxis against hepatic complications of allogeneic bone marrow transplantation - A randomized, double-blind, placebo-controlled trial
ANNALS OF INTERNAL MEDICINE
1998; 128 (12): 975-?
Hepatic complications are a major cause of illness and death after bone marrow transplantation.To confirm the results of a pilot study that indicated that ursodiol prophylaxis could reduce the incidence of veno-occlusive disease of the liver.Randomized, double-blind, placebo-controlled study.Tertiary care teaching hospital.67 consecutive patients undergoing transplantation with allogeneic bone marrow (donated by a relative) in whom busulfan plus cyclophosphamide was used as the preparative regimen and cyclosporine plus methotrexate was used to prevent graft-versus-host disease.Before the preparative regimen was started, patients were randomly assigned to receive ursodiol, 300 mg twice daily (or 300 mg in the morning and 600 mg in the evening if body weight was > 90 kg), or placebo.Patients were prospectively evaluated for the clinical diagnosis of veno-occlusive disease, the occurrence of acute graft-versus-host disease, and survival.The incidence of veno-occlusive disease was 40% (13 of 32 patients) in placebo recipients and 15% (5 of 34 patients) in ursodiol recipients (P = 0.03). Assignment to placebo was the only pretransplantation characteristic that predicted the development of veno-occlusive disease. The most significant predictor of 100-day mortality was the diagnosis of veno-occlusive disease. The difference in actuarial risk for hematologic relapse in patients with chronic myelogenous leukemia and nonhepatic toxicities between the two groups was not statistically significant (13% in the ursodiol group and 20% in the placebo group; P > 0.2).Ursodiol prophylaxis seemed to decrease the incidence of hepatic complications after allogeneic bone marrow transplantation in patients who received a preparative regimen with busulfan plus cyclophosphamide.
View details for Web of Science ID 000074201300002
View details for PubMedID 9625683
Pheromone-regulated genes required for yeast mating differentiation
JOURNAL OF CELL BIOLOGY
1998; 140 (3): 461-483
Yeast cells mate by an inducible pathway that involves agglutination, mating projection formation, cell fusion, and nuclear fusion. To obtain insight into the mating differentiation of Saccharomyces cerevisiae, we carried out a large-scale transposon tagging screen to identify genes whose expression is regulated by mating pheromone. 91,200 transformants containing random lacZ insertions were screened for beta-galactosidase (beta-gal) expression in the presence and absence of alpha factor, and 189 strains containing pheromone-regulated lacZ insertions were identified. Transposon insertion alleles corresponding to 20 genes that are novel or had not previously been known to be pheromone regulated were examined for effects on the mating process. Mutations in four novel genes, FIG1, FIG2, KAR5/ FIG3, and FIG4 were found to cause mating defects. Three of the proteins encoded by these genes, Fig1p, Fig2p, and Fig4p, are dispensible for cell polarization in uniform concentrations of mating pheromone, but are required for normal cell polarization in mating mixtures, conditions that involve cell-cell communication. Fig1p and Fig2p are also important for cell fusion and conjugation bridge shape, respectively. The fourth protein, Kar5p/Fig3p, is required for nuclear fusion. Fig1p and Fig2p are likely to act at the cell surface as Fig1:: beta-gal and Fig2::beta-gal fusion proteins localize to the periphery of mating cells. Fig4p is a member of a family of eukaryotic proteins that contain a domain homologous to the yeast Sac1p. Our results indicate that a variety of novel genes are expressed specifically during mating differentiation to mediate proper cell morphogenesis, cell fusion, and other steps of the mating process.
View details for Web of Science ID 000072026300002
View details for PubMedID 9456310
The Spa2-related protein, Sph1p, is important for polarized growth in yeast
JOURNAL OF CELL SCIENCE
1998; 111: 479-494
The Saccharomyces cerevisiae protein Sph1p is both structurally and functionally related to the polarity protein, Spa2p. Sph1p and Spa2p are predicted to share three 100-amino acid domains each exceeding 30% sequence identity, and the amino-terminal domain of each protein contains a direct repeat common to Homo sapiens and Caenorhabditis elegans protein sequences. sph1- and spa2-deleted cells possess defects in mating projection morphology and pseudohyphal growth. sph1(Delta) spa2(Delta) double mutants also exhibit a strong haploid invasive growth defect and an exacerbated mating projection defect relative to either sph1(Delta) or spa2(Delta) single mutants. Consistent with a role in polarized growth, Sph1p localizes to growth sites in a cell cycle-dependent manner: Sph1p concentrates as a cortical patch at the presumptive bud site in unbudded cells, at the tip of small, medium and large buds, and at the bud neck prior to cytokinesis. In pheromone-treated cells, Sph1p localizes to the tip of the mating projection. Proper localization of Sph1p to sites of active growth during budding and mating requires Spa2p. Sph1p interacts in the two-hybrid system with three mitogen-activated protein (MAP) kinase kinases (MAPKKs): Mkk1p and Mkk2p, which function in the cell wall integrity/cell polarization MAP kinase pathway, and Ste7p, which operates in the pheromone and pseudohyphal signaling response pathways. Sph1p also interacts weakly with STE11, the MAPKKK known to activate STE7. Moreover, two-hybrid interactions between SPH1 and STE7 and STE11 occur independently of STE5, a proposed scaffolding protein which interacts with several members of this MAP kinase module. We speculate that Spa2p and Sph1p may function during pseudohyphal and haploid invasive growth to help tether this MAP kinase module to sites of polarized growth. Our results indicate that Spa2p and Sph1p comprise two related proteins important for the control of cell morphogenesis in yeast.
View details for Web of Science ID 000072336900007
View details for PubMedID 9443897
Transposon tagging I: A novel system for monitoring protein production, function and localization
YEAST GENE ANALYSIS
1998; 26: 161-179
View details for Web of Science ID 000073851500011
Cell polarity and morphogenesis in budding yeast
ANNUAL REVIEW OF MICROBIOLOGY
1998; 52: 687-744
Eukaryotic cells respond to intracellular and extracellular cues to direct asymmetric cell growth and division. The yeast Saccharomyces cerevisiae undergoes polarized growth at several times during budding and mating and is a useful model organism for studying asymmetric growth and division. In recent years, many regulatory and cytoskeletal components important for directing and executing growth have been identified, and molecular mechanisms have been elucidated in yeast. Key signaling pathways that regulate polarization during the cell cycle and mating response have been described. Since many of the components important for polarized cell growth are conserved in other organisms, the basic mechanisms mediating polarized cell growth are likely to be universal among eukaryotes.
View details for Web of Science ID 000076541000021
View details for PubMedID 9891811
The Rho-GEF Rom2p localizes to sites of polarized cell growth and participates in cytoskeletal functions in Saccharomyces cerevisiae
MOLECULAR BIOLOGY OF THE CELL
1997; 8 (10): 1829-1844
Rom2p is a GDP/GTP exchange factor for Rho1p and Rho2p GTPases; Rho proteins have been implicated in control of actin cytoskeletal rearrangements. ROM2 and RHO2 were identified in a screen for high-copy number suppressors of cik1 delta, a mutant defective in microtubule-based processes in Saccharomyces cerevisiae. A Rom2p::3XHA fusion protein localizes to sites of polarized cell growth, including incipient bud sites, tips of small buds, and tips of mating projections. Disruption of ROM2 results in temperature-sensitive growth defects at 11 degrees C and 37 degrees C. rom2 delta cells exhibit morphological defects. At permissive temperatures, rom2 delta cells often form elongated buds and fail to form normal mating projections after exposure to pheromone; at the restrictive temperature, small budded cells accumulate. High-copy number plasmids containing either ROM2 or RHO2 suppress the temperature-sensitive growth defects of cik1 delta and kar3 delta strains. KAR3 encodes a kinesin-related protein that interacts with Cik1p. Furthermore, rom2 delta strains exhibit increased sensitivity to the microtubule depolymerizing drug benomyl. These results suggest a role for Rom2p in both polarized morphogenesis and functions of the microtubule cytoskeleton.
View details for Web of Science ID A1997YB66300001
View details for PubMedID 9348527
Human dishevelled genes constitute a DHR-containing multigene family
1997; 42 (2): 302-310
Three human genes encoding proteins homologous to Drosophila Dishevelled protein were cloned and characterized. Amino acid similarity between the different Dishevelled proteins is concentrated in three highly conserved regions. Two of these regions do not exhibit significant sequence similarity with other known proteins; the third is similar to the discs-large homology region, which was first found in a Drosophila Discs-large tumor suppressor protein (also known as GLGF or PDZ domain). We produced antibodies against human Dishevelled-2 and demonstrated that it is a phosphoprotein and can be detected in all cell lines and human embryonic tissues examined. Indirect immunofluorescence indicates that it is found throughout the cytoplasm. Our results indicate that the human dishevelled genes constitute a multigene family and that Dishevelled proteins are highly conserved among metazoans.
View details for Web of Science ID A1997XE86000015
View details for PubMedID 9192851
SBF cell cycle regulator as a target of the yeast PKC-MAP kinase pathway
1997; 275 (5307): 1781-1784
Protein kinase C (PKC) signaling is highly conserved among eukaryotes and has been implicated in the regulation of cellular processes such as cell proliferation and growth. In the budding yeast, PKC1 functions to activate the SLT2(MPK1) mitogen-activated protein (MAP) kinase cascade, which is required for the maintenance of cell integrity during asymmetric cell growth. Genetic studies, coimmunoprecipitation experiments, and analysis of protein phosphorylation in vivo and in vitro indicate that the SBF transcription factor (composed of Swi4p and Swi6p), an important regulator of gene expression at the G1 to S phase cell cycle transition, is a target of the Slt2p(Mpk1p) MAP kinase. These studies provide evidence for a direct role of the PKC1 pathway in the regulation of the yeast cell cycle and cell growth and indicate that conserved signaling pathways can act to control key regulators of cell division.
View details for Web of Science ID A1997WP05600038
View details for PubMedID 9065400
Targeting of chitin synthase 3 to polarized growth sites in yeast requires Chs5p and Myo2p
JOURNAL OF CELL BIOLOGY
1997; 136 (1): 95-110
Chitin is an essential structural component of the yeast cell wall whose deposition is regulated throughout the yeast life cycle. The temporal and spatial regulation of chitin synthesis was investigated during vegetative growth and mating of Saccharomyces cerevisiae by localization of the putative catalytic subunit of chitin synthase III, Chs3p, and its regulator, Chs5p. Immunolocalization of epitope-tagged Chs3p revealed a novel localization pattern that is cell cycle-dependent. Chs3p is polarized as a diffuse ring at the incipient bud site and at the neck between the mother and bud in small-budded cells; it is not found at the neck in large-budded cells containing a single nucleus. In large-budded cells undergoing cytokinesis, it reappears as a ring at the neck. In cells responding to mating pheromone, Chs3p is found throughout the projection. The appearance of Chs3p at cortical sites correlates with times that chitin synthesis is expected to occur. In addition to its localization at the incipient bud site and neck, Chs3p is also found in cytoplasmic patches in cells at different stages of the cell cycle. Epitope-tagged Chs5p also localizes to cytoplasmic patches; these patches contain Kex2p, a late Golgi-associated enzyme. Unlike Chs3p, Chs5p does not accumulate at the incipient bud site or neck. Nearly all Chs3p patches contain Chs5p, whereas some Chs5p patches lack detectable Chs3p. In the absence of Chs5p, Chs3p localizes in cytoplasmic patches, but it is no longer found at the neck or the incipient bud site, indicating that Chs5p is required for the polarization of Chs3p. Furthermore, Chs5p localization is not affected either by temperature shift or by the myo2-66 mutation, however, Chs3p polarization is affected by temperature shift and myo2-66. We suggest a model in which Chs3p polarization to cortical sites in yeast is dependent on both Chs5p and the actin cytoskeleton/Myo2p.
View details for Web of Science ID A1997WC96100009
View details for PubMedID 9008706
A multipurpose transposon system for analyzing protein production, localization, and function in Saccharomyces cerevisiae
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1997; 94 (1): 190-195
Analysis of the function of a particular gene product typically involves determining the expression profile of the gene, the subcellular location of the protein, and the phenotype of a null strain lacking the protein. Conditional alleles of the gene are often created as an additional tool. We have developed a multifunctional, transposon-based system that simultaneously generates constructs for all the above analyses and is suitable for mutagenesis of any given Saccharomyces cerevisiae gene. Depending on the transposon used, the yeast gene is fused to a coding region for beta-galactosidase or green fluorescent protein. Gene expression can therefore be monitored by chemical or fluorescence assays. The transposons create insertion mutations in the target gene, allowing phenotypic analysis. The transposon can be reduced by cre-lox site-specific recombination to a smaller element that leaves an epitope tag inserted in the encoded protein. In addition to its utility f