Bio


1977 B.A, Chemistry and Biology, University of Rochester, NY
1978-1982 Ph.D. California Institute of Technology, CA Advisor: Dr. Norman Davidson
1982-1986 Postdoctoral Research Stanford University School of Medicine, CA Advisor: Dr. Ronald Davis
1986-2009 Faculty Dept of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT
2009-present Dept of Genetics, Stanford University School of Medicine, Stanford, CA

Administrative Appointments


  • Chair, Dept. of Genetics (2009 - Present)
  • Director, Center for Genomics and Personalized Medicine (2009 - Present)

Current Research and Scholarly Interests


We are presently in an omics revolution in which genomes and other omes can be readily characterized. Our laboratory uses a variety of approaches to analyze genomes and regulatory networks. Our research focuses on yeast, an ideal model organism ideally suited to genetic analysis, and humans.

1) Transcriptomes
To annotate genomes, we developed RNA sequencing for annotation the yeast and human transcriptomes. We discovered that the eukaryotic transcriptome is much more complex than previously appreciated and that embryonic stem cells have more transcript isoforms than differentiated cells.

2) Transcription Factor Binding Networks
We have also developed methods for mapping transcription factor binding sites through the genome. We used this to develop regulatory maps and have been using this to help decipher the combinatorial regulatory code – which factors work together to regulate which genes. Using this approach we have mapped out pathways crucial for metabolism and inflammation.

3) Integrated Regulatory Networks
In addition to transcriptional factor binding networks we have also been mapping phosphorylation and metabolite-protein interaction networks. These studies have revealed novel global regulators and key points in integrated regulatory networks.

4) Variation
We have been analyzing differences between individuals and species at two levels: DNA sequence variation and regulatory information variations. We developed paired end sequencing for humans and found that humans have extensive structural variation (SV), i.e. deletions, insertions and inversions. This is likely to be a major cause of phenotypic variation and human disease. In addition, by mapping binding sites difference among different yeast strains and humans, we have found that individuals differ much more in their regulatory information than in coding sequence differences. We can correlate these differences with those in SNPS and SVs, thereby associating noncoding DNA differences with regulatory information.

5) Human Disease
Finally, we are applying omics approaches of genome sequencing, transcriptomics proteomics metabolomics, DNA methylation and microbiome assays to the analysis of human disease. These integrative omics approaches are being applied to help understand the molecular basis of disease and the development of diagnostics and therapeutics.

Clinical Trials


  • Multiomic Signatures of Microbial Metabolites Following Prebiotic Fiber Supplementation Recruiting

    The investigators propose a comprehensive, multiomic study that will integrate longitudinal data associating changes in specific gut bacteria and host in response to prebiotic fiber supplementation. These data will guide the development of an integrative biological signature relating bacterial-derived metabolites with biological outcome in the host. The open sharing of data generated by the proposed research represents a significant public resource that will support and accelerate future novel studies.

    View full details

  • Precision Diets for Diabetes Prevention Recruiting

    With this study the investigators want to understand the physiological differences for people developing pre-diabetes and diabetes. The investigators hypothesize that different individuals go through different paths in the development of the disease. By understanding the personal mechanism for developing disease, the investigators will find a personalized approach to prevent that development. The investigators are also hoping to be able to find a biomarker that will pinpoint to the particular defect and thus, diagnose the problem at an earlier stage and have the information to give personalized diet recommendations to prevent the development of diabetes more effectively.

    View full details

  • Understanding and Diagnosing Allergic Disease in Twins Recruiting

    The purpose of this study is to gain better understanding of how the immune system works in twins with and without allergic disease. Healthy volunteers are not specifically targeted. Healthy non-allergic study participants may be found through the course of evaluation for the presence of allergies.

    View full details

  • The 28 Day Challenge Not Recruiting

    The purpose of this study is to determine how a 28 Day Challenge influences mental health and well-being. This is a blinded study. Participants both with and without depression and anxiety, will be included. A moderation analysis will be performed to see whether changes in depression after the intervention are a function of baseline depression and anxiety levels.

    Stanford is currently not accepting patients for this trial. For more information, please contact Ariel Ganz, PhD, 650-736-8099.

    View full details

2019-20 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • HAT1 Coordinates Histone Production and Acetylation via H4 Promoter Binding. Molecular cell Gruber, J. J., Geller, B., Lipchik, A. M., Chen, J., Salahudeen, A. A., Ram, A. N., Ford, J. M., Kuo, C. J., Snyder, M. P. 2019

    Abstract

    The energetic costs of duplicating chromatin are large and therefore likely depend on nutrient sensing checkpoints and metabolic inputs. By studying chromatin modifiers regulated by epithelial growth factor, we identified histone acetyltransferase 1 (HAT1) as an induced gene that enhances proliferation through coordinating histone production, acetylation, and glucose metabolism. In addition to its canonical role as a cytoplasmic histone H4 acetyltransferase, we isolated a HAT1-containing complex bound specifically at promoters of H4 genes. HAT1-dependent transcription of H4 genes required an acetate-sensitive promoter element. HAT1 expression was critical for S-phase progression and maintenance of H3 lysine 9 acetylation at proliferation-associated genes, including histone genes. Therefore, these data describe a feedforward circuit whereby HAT1 captures acetyl groups on nascent histones and drives H4 production by chromatin binding to support chromatin replication and acetylation. These findings have important implications for human disease, since high HAT1 levels associate with poor outcomes across multiple cancer types.

    View details for DOI 10.1016/j.molcel.2019.05.034

    View details for PubMedID 31278053

  • The Integrative Human Microbiome Project NATURE Proctor, L. M., Creasy, H. H., Fettweis, J. M., Lloyd-Price, J., Mahurkar, A., Zhou, W., Buck, G. A., Snyder, M. P., Strauss, J. F., Weinstock, G. M., White, O., Huttenhower, C., Integrative HMP iHMP Res Network 2019; 569 (7758): 641–48

    Abstract

    The NIH Human Microbiome Project (HMP) has been carried out over ten years and two phases to provide resources, methods, and discoveries that link interactions between humans and their microbiomes to health-related outcomes. The recently completed second phase, the Integrative Human Microbiome Project, comprised studies of dynamic changes in the microbiome and host under three conditions: pregnancy and preterm birth; inflammatory bowel diseases; and stressors that affect individuals with prediabetes. The associated research begins to elucidate mechanisms of host-microbiome interactions under these conditions, provides unique data resources (at the HMP Data Coordination Center), and represents a paradigm for future multi-omic studies of the human microbiome.

    View details for DOI 10.1038/s41586-019-1238-8

    View details for Web of Science ID 000470144100031

    View details for PubMedID 31142853

  • A longitudinal big data approach for precision health NATURE MEDICINE Rose, S., Contrepois, K., Moneghetti, K. J., Zhou, W., Mishra, T., Mataraso, S., Dagan-Rosenfeld, O., Ganz, A. B., Dunn, J., Hornburg, D., Rego, S., Perelman, D., Ahadi, S., Sailani, M., Zhou, Y., Leopold, S. R., Chen, J., Ashland, M., Christle, J. W., Avina, M., Limcaoco, P., Ruiz, C., Tan, M., Butte, A. J., Weinstock, G. M., Slavich, G. M., Sodergren, E., McLaughlin, T. L., Haddad, F., Snyder, M. P. 2019; 25 (5): 792-+
  • Longitudinal multi-omics of host-microbe dynamics in prediabetes. Nature Zhou, W., Sailani, M. R., Contrepois, K., Zhou, Y., Ahadi, S., Leopold, S. R., Zhang, M. J., Rao, V., Avina, M., Mishra, T., Johnson, J., Lee-McMullen, B., Chen, S., Metwally, A. A., Tran, T. D., Nguyen, H., Zhou, X., Albright, B., Hong, B., Petersen, L., Bautista, E., Hanson, B., Chen, L., Spakowicz, D., Bahmani, A., Salins, D., Leopold, B., Ashland, M., Dagan-Rosenfeld, O., Rego, S., Limcaoco, P., Colbert, E., Allister, C., Perelman, D., Craig, C., Wei, E., Chaib, H., Hornburg, D., Dunn, J., Liang, L., Rose, S. M., Kukurba, K., Piening, B., Rost, H., Tse, D., McLaughlin, T., Sodergren, E., Weinstock, G. M., Snyder, M. 2019; 569 (7758): 663–71

    Abstract

    Type 2 diabetes mellitus (T2D) is a growing health problem, but little is known about its early disease stages, its effects on biological processes or the transition to clinical T2D. To understand the earliest stages of T2Dbetter, we obtained samples from 106 healthy individuals and individuals with prediabetes over approximately four years and performed deep profiling of transcriptomes, metabolomes, cytokines, and proteomes, as well as changes in the microbiome. This rich longitudinal data set revealed many insights: first, healthy profiles are distinct among individuals while displaying diverse patterns of intra- and/or inter-personal variability. Second, extensive host and microbial changes occur during respiratory viral infections and immunization, and immunization triggers potentially protective responses that are distinct from responses to respiratory viral infections. Moreover, during respiratory viral infections, insulin-resistant participants respond differently than insulin-sensitive participants. Third, global co-association analyses among the thousands of profiled molecules reveal specific host-microbe interactions that differ between insulin-resistant and insulin-sensitive individuals. Last, we identified early personal molecular signatures in one individual that preceded the onset of T2D, including the inflammation markers interleukin-1 receptor agonist (IL-1RA) and high-sensitivity C-reactive protein (CRP) paired with xenobiotic-induced immune signalling. Our study reveals insights into pathways and responses that differ between glucose-dysregulated and healthy individuals during health and disease and provides an open-access data resource to enable further research into healthy, prediabetic and T2D states.

    View details for DOI 10.1038/s41586-019-1236-x

    View details for PubMedID 31142858

  • The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight SCIENCE Garrett-Bakelman, F. E., Darshi, M., Green, S. J., Gur, R. C., Lin, L., Macias, B. R., McKenna, M. J., Meydan, C., Mishra, T., Nasrini, J., Piening, B. D., Rizzardi, L. F., Sharma, K., Siamwala, J. H., Taylor, L., Vitaterna, M., Afkarian, M., Afshinnekoo, E., Ahadi, S., Ambati, A., Arya, M., Bezdan, D., Callahan, C. M., Chen, S., Choi, A. K., Chlipala, G. E., Contrepois, K., Covington, M., Crucian, B. E., De Vivo, I., Dinges, D. F., Ebert, D. J., Feinberg, J. I., Gandara, J. A., George, K. A., Goutsias, J., Grills, G. S., Hargens, A. R., Heer, M., Hillary, R. P., Hoofnagle, A. N., Hook, V. H., Jenkinson, G., Jiang, P., Keshavarzian, A., Laurie, S. S., Lee-McMullen, B., Lumpkins, S. B., MacKay, M., Maienschein-Cline, M. G., Melnick, A. M., Moore, T. M., Nakahira, K., Patel, H. H., Pietrzyk, R., Rao, V., Saito, R., Salins, D. N., Schilling, J. M., Sears, D. D., Sheridan, C. K., Stenger, M. B., Tryggvadottir, R., Urban, A. E., Vaisar, T., Van Espen, B., Zhang, J., Ziegler, M. G., Zwart, S. R., Charles, J. B., Kundrot, C. E., Scott, G. I., Bailey, S. M., Basner, M., Feinberg, A. P., Lee, S. C., Mason, C. E., Mignot, E., Rana, B. K., Smith, S. M., Snyder, M. P., Turek, F. W. 2019; 364 (6436): 144-+
  • Gene-Environment Interaction in the Era of Precision Medicine CELL Li, J., Li, X., Zhang, S., Snyder, M. 2019; 177 (1): 38–44
  • A longitudinal big data approach for precision health. Nature medicine Schüssler-Fiorenza Rose, S. M., Contrepois, K., Moneghetti, K. J., Zhou, W., Mishra, T., Mataraso, S., Dagan-Rosenfeld, O., Ganz, A. B., Dunn, J., Hornburg, D., Rego, S., Perelman, D., Ahadi, S., Sailani, M. R., Zhou, Y., Leopold, S. R., Chen, J., Ashland, M., Christle, J. W., Avina, M., Limcaoco, P., Ruiz, C., Tan, M., Butte, A. J., Weinstock, G. M., Slavich, G. M., Sodergren, E., McLaughlin, T. L., Haddad, F., Snyder, M. P. 2019; 25 (5): 792–804

    Abstract

    Precision health relies on the ability to assess disease risk at an individual level, detect early preclinical conditions and initiate preventive strategies. Recent technological advances in omics and wearable monitoring enable deep molecular and physiological profiling and may provide important tools for precision health. We explored the ability of deep longitudinal profiling to make health-related discoveries, identify clinically relevant molecular pathways and affect behavior in a prospective longitudinal cohort (n = 109) enriched for risk of type 2 diabetes mellitus. The cohort underwent integrative personalized omics profiling from samples collected quarterly for up to 8 years (median, 2.8 years) using clinical measures and emerging technologies including genome, immunome, transcriptome, proteome, metabolome, microbiome and wearable monitoring. We discovered more than 67 clinically actionable health discoveries and identified multiple molecular pathways associated with metabolic, cardiovascular and oncologic pathophysiology. We developed prediction models for insulin resistance by using omics measurements, illustrating their potential to replace burdensome tests. Finally, study participation led the majority of participants to implement diet and exercise changes. Altogether, we conclude that deep longitudinal profiling can lead to actionable health discoveries and provide relevant information for precision health.

    View details for PubMedID 31068711

  • Chromatin Remodeling in Response to BRCA2-Crisis. Cell reports Gruber, J. J., Chen, J., Geller, B., Jäger, N., Lipchik, A. M., Wang, G., Kurian, A. W., Ford, J. M., Snyder, M. P. 2019; 28 (8): 2182–93.e6

    Abstract

    Individuals with a single functional copy of the BRCA2 tumor suppressor have elevated risks for breast, ovarian, and other solid tumor malignancies. The exact mechanisms of carcinogenesis due to BRCA2 haploinsufficiency remain unclear, but one possibility is that at-risk cells are subject to acute periods of decreased BRCA2 availability and function ("BRCA2-crisis"), which may contribute to disease. Here, we establish an in vitro model for BRCA2-crisis that demonstrates chromatin remodeling and activation of an NF-κB survival pathway in response to transient BRCA2 depletion. Mechanistically, we identify BRCA2 chromatin binding, histone acetylation, and associated transcriptional activity as critical determinants of the epigenetic response to BRCA2-crisis. These chromatin alterations are reflected in transcriptional profiles of pre-malignant tissues from BRCA2 carriers and, therefore, may reflect natural steps in human disease. By modeling BRCA2-crisis in vitro, we have derived insights into pre-neoplastic molecular alterations that may enhance the development of preventative therapies.

    View details for DOI 10.1016/j.celrep.2019.07.057

    View details for PubMedID 31433991

  • The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight. Science (New York, N.Y.) Garrett-Bakelman, F. E., Darshi, M., Green, S. J., Gur, R. C., Lin, L., Macias, B. R., McKenna, M. J., Meydan, C., Mishra, T., Nasrini, J., Piening, B. D., Rizzardi, L. F., Sharma, K., Siamwala, J. H., Taylor, L., Vitaterna, M. H., Afkarian, M., Afshinnekoo, E., Ahadi, S., Ambati, A., Arya, M., Bezdan, D., Callahan, C. M., Chen, S., Choi, A. M., Chlipala, G. E., Contrepois, K., Covington, M., Crucian, B. E., De Vivo, I., Dinges, D. F., Ebert, D. J., Feinberg, J. I., Gandara, J. A., George, K. A., Goutsias, J., Grills, G. S., Hargens, A. R., Heer, M., Hillary, R. P., Hoofnagle, A. N., Hook, V. Y., Jenkinson, G., Jiang, P., Keshavarzian, A., Laurie, S. S., Lee-McMullen, B., Lumpkins, S. B., MacKay, M., Maienschein-Cline, M. G., Melnick, A. M., Moore, T. M., Nakahira, K., Patel, H. H., Pietrzyk, R., Rao, V., Saito, R., Salins, D. N., Schilling, J. M., Sears, D. D., Sheridan, C. K., Stenger, M. B., Tryggvadottir, R., Urban, A. E., Vaisar, T., Van Espen, B., Zhang, J., Ziegler, M. G., Zwart, S. R., Charles, J. B., Kundrot, C. E., Scott, G. B., Bailey, S. M., Basner, M., Feinberg, A. P., Lee, S. M., Mason, C. E., Mignot, E., Rana, B. K., Smith, S. M., Snyder, M. P., Turek, F. W. 2019; 364 (6436)

    Abstract

    To understand the health impact of long-duration spaceflight, one identical twin astronaut was monitored before, during, and after a 1-year mission onboard the International Space Station; his twin served as a genetically matched ground control. Longitudinal assessments identified spaceflight-specific changes, including decreased body mass, telomere elongation, genome instability, carotid artery distension and increased intima-media thickness, altered ocular structure, transcriptional and metabolic changes, DNA methylation changes in immune and oxidative stress-related pathways, gastrointestinal microbiota alterations, and some cognitive decline postflight. Although average telomere length, global gene expression, and microbiome changes returned to near preflight levels within 6 months after return to Earth, increased numbers of short telomeres were observed and expression of some genes was still disrupted. These multiomic, molecular, physiological, and behavioral datasets provide a valuable roadmap of the putative health risks for future human spaceflight.

    View details for PubMedID 30975860

  • High-Resolution Bisulfite-Sequencing of Peripheral Blood DNA Methylation in Early-Onset and Familial Risk Breast Cancer Patients. Clinical cancer research : an official journal of the American Association for Cancer Research Chen, J., Haanpää, M. K., Gruber, J. J., Jäger, N., Ford, J. M., Snyder, M. P. 2019

    Abstract

    Understanding and explaining hereditary predisposition to cancer has focused on the genetic etiology of the disease. However, mutations in known genes associated with breast cancer, such as BRCA1 and BRCA2, account for less than 25% of familial cases of breast cancer. Recently, specific epigenetic modifications at BRCA1 have been shown to promote hereditary breast cancer, but the broader potential for epigenetic contribution to hereditary breast cancer is not yet well understood.We examined DNA methylation through deep bisulfite sequencing of CpG islands and known promoter or regulatory regions in peripheral blood DNA from 99 familial or early-onset breast or ovarian cancer patients, 6 unaffected BRCA-mutation carriers, and 49 unaffected controls.In 9% of patients, we observed altered methylation in the promoter regions of genes known to be involved in cancer including hypermethylation at the tumor suppressor PTEN and hypomethylation at the proto-oncogene TEX14 These alterations occur in the form of allelic methylation that span up to hundreds of base-pairs in length.Our observations suggest a broader role for DNA methylation in early-onset, familial risk breast cancer. Further studies are warranted to clarify these mechanisms and the benefits of DNA methylation screening for early risk prediction of familial cancers.

    View details for DOI 10.1158/1078-0432.CCR-18-2423

    View details for PubMedID 31175093

  • Metformin Affects Heme Function as a Possible Mechanism of Action. G3 (Bethesda, Md.) Li, X., Wang, X., Snyder, M. P. 2018

    Abstract

    Metformin elicits pleiotropic effects that are beneficial for treating diabetes, and as well as particular cancers and aging. In spite of its importance, a convincing and unifying mechanism to explain how metformin operates is lacking. Here we describe investigations into the mechanism of metformin action through heme and hemoprotein(s). Metformin suppresses heme production by 50% in yeast, and this suppression requires mitochondria function, which is necessary for heme synthesis. At high concentrations comparable to those in the clinic, metformin also suppresses heme production in human erythrocytes, erythropoietic cells and hepatocytes by 30-50%; the heme-targeting drug artemisinin operates at a greater potency. Significantly, metformin prevents oxidation of heme in three protein scaffolds, cytochrome c, myoglobin and hemoglobin, with Kd values < 3 mM suggesting a dual oxidation and reduction role in the regulation of heme redox transition. Since heme- and porphyrin-like groups operate in diverse enzymes that control important metabolic processes, we suggest that metformin acts, at least in part, through stabilizing appropriate redox states in heme and other porphyrin-containing groups to control cellular metabolism.

    View details for PubMedID 30554148

  • High Frequency Actionable Pathogenic Exome Variants in an Average-Risk Cohort. Cold Spring Harbor molecular case studies Rego, S., Dagan-Rosenfeld, O., Zhou, W., Sailani, M. R., Limcaoco, P., Colbert, E., Avina, M., Wheeler, J., Craig, C., Salins, D., Rost, H. L., Dunn, J., McLaughlin, T., Steinmetz, L. M., Bernstein, J. A., Snyder, M. P. 2018

    Abstract

    Exome sequencing is increasingly utilized in both clinical and non-clinical settings, but little is known about its utility in healthy individuals. Most previous studies on this topic have examined a small subset of genes known to be implicated in human disease and/or have used automated pipelines to assess pathogenicity of known variants. In order to determine the frequency of both medically actionable and non-actionable but medically relevant exome findings in the general population we assessed the exomes of 70 participants who have been extensively characterized over the past several years as part of a longitudinal integrated multi-omics profiling study. We analyzed exomes by identifying rare likely pathogenic and pathogenic variants in genes associated with Mendelian disease in the Online Mendelian Inheritance in Man (OMIM) database. We then used American College of Medical Genetics (ACMG) guidelines for the classification of rare sequence variants. Additionally, we assessed pharmacogenetic variants. Twelve out of 70 (17%) participants had medically actionable findings in Mendelian disease genes. Five had phenotypes or family histories associated with their genetic variants. The frequency of actionable variants is higher than that reported in most previous studies and suggests added benefit from utilizing expanded gene lists and manual curation to assess actionable findings. A total of 63 participants (90%) had additional non-actionable findings, including 60 who were found to be carriers for recessive diseases and 21 who have increased Alzheimer's disease risk due to heterozygous or homozygous APOE e4 alleles (18 participants had both). Our results suggest that exome sequencing may have considerable more utility for health management in the general population than previously thought.

    View details for PubMedID 30487145

  • Longitudinal personal DNA methylome dynamics in a human with a chronic condition. Nature medicine Chen, R., Xia, L., Tu, K., Duan, M., Kukurba, K., Li-Pook-Than, J., Xie, D., Snyder, M. 2018

    Abstract

    Epigenomics regulates gene expression and is as important as genomics in precision personal health, as it is heavily influenced by environment and lifestyle. We profiled whole-genome DNA methylation and the corresponding transcriptome of peripheral blood mononuclear cells collected from a human volunteer over a period of 36 months, generating 28 methylome and 57 transcriptome datasets. We found that DNA methylomic changes are associated with infrequent glucose level alteration, whereas the transcriptome underwent dynamic changes during events such as viral infections. Most DNA meta-methylome changes occurred 80-90days before clinically detectable glucose elevation. Analysis of the deep personal methylome dataset revealed an unprecedented number of allelic differentially methylated regions that remain stable longitudinally and are preferentially associated with allele-specific gene regulation. Our results revealed that changes in different types of 'omics' data associate with different physiological aspects of this individual: DNA methylation with chronic conditions and transcriptome with acute events.

    View details for PubMedID 30397358

  • Dynamic Human Environmental Exposome Revealed by Longitudinal Personal Monitoring. Cell Jiang, C., Wang, X., Li, X., Inlora, J., Wang, T., Liu, Q., Snyder, M. 2018; 175 (1): 277

    Abstract

    Human health is dependent upon environmental exposures, yet the diversity and variation in exposures are poorly understood. We developed a sensitive method to monitor personal airborne biological and chemical exposures and followed the personal exposomes of 15 individuals for up to 890days and over 66 distinct geographical locations. We found that individuals are potentially exposed to thousands of pan-domain species and chemical compounds, including insecticides and carcinogens. Personal biological and chemical exposomes are highly dynamic and vary spatiotemporally, even for individuals located in the same general geographical region.Integrated analysis of biological and chemical exposomes revealed strong location-dependent relationships. Finally, construction of an exposome interaction network demonstrated the presence of distinct yet interconnected human- and environment-centric clouds, comprised of interacting ecosystems such as human, flora, pets, and arthropods. Overall, we demonstrate that human exposomes are diverse, dynamic, spatiotemporally-driven interaction networks with the potential to impact human health.

    View details for PubMedID 30241608

  • Decoding the Genomics of Abdominal Aortic Aneurysm. Cell Li, J., Pan, C., Zhang, S., Spin, J. M., Deng, A., Leung, L. L., Dalman, R. L., Tsao, P. S., Snyder, M. 2018; 174 (6): 1361

    Abstract

    A key aspect of genomic medicine is to make individualized clinical decisions from personal genomes. We developed a machine-learning framework to integrate personal genomes and electronic health record (EHR) data and used this framework to study abdominal aortic aneurysm (AAA), a prevalent irreversible cardiovascular disease with unclear etiology. Performing whole-genome sequencing on AAA patients and controls, we demonstrated its predictive precision solely from personal genomes. By modeling personal genomes with EHRs, this framework quantitatively assessed the effectiveness of adjusting personal lifestyles given personal genome baselines, demonstrating its utility as a personal health management tool. We showed that this new framework agnostically identified genetic components involved in AAA, which were subsequently validated in human aortic tissues and in murine models. Our study presents a new framework for disease genome analysis, which can be used for both health management and understanding the biological architecture of complex diseases. VIDEO ABSTRACT.

    View details for PubMedID 30193110

  • Glucotypes reveal new patterns of glucose dysregulation. PLoS biology Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., McLaughlin, T., Snyder, M. 2018; 16 (7): e2005143

    Abstract

    Diabetes is an increasing problem worldwide; almost 30 million people, nearly 10% of the population, in the United States are diagnosed with diabetes. Another 84 million are prediabetic, and without intervention, up to 70% of these individuals may progress to type 2 diabetes. Current methods for quantifying blood glucose dysregulation in diabetes and prediabetes are limited by reliance on single-time-point measurements or on average measures of overall glycemia and neglect glucose dynamics. We have used continuous glucose monitoring (CGM) to evaluate the frequency with which individuals demonstrate elevations in postprandial glucose, the types of patterns, and how patterns vary between individuals given an identical nutrient challenge. Measurement of insulin resistance and secretion highlights the fact that the physiology underlying dysglycemia is highly variable between individuals. We developed an analytical framework that can group individuals according to specific patterns of glycemic responses called "glucotypes" that reveal heterogeneity, or subphenotypes, within traditional diagnostic categories of glucose regulation. Importantly, we found that even individuals considered normoglycemic by standard measures exhibit high glucose variability using CGM, with glucose levels reaching prediabetic and diabetic ranges 15% and 2% of the time, respectively. We thus show that glucose dysregulation, as characterized by CGM, is more prevalent and heterogeneous than previously thought and can affect individuals considered normoglycemic by standard measures, and specific patterns of glycemic responses reflect variable underlying physiology. The interindividual variability in glycemic responses to standardized meals also highlights the personal nature of glucose regulation. Through extensive phenotyping, we developed a model for identifying potential mechanisms of personal glucose dysregulation and built a webtool for visualizing a user-uploaded CGM profile and classifying individualized glucose patterns into glucotypes.

    View details for PubMedID 30040822

  • Natural Selection Has Differentiated the Progesterone Receptor among Human Populations. American journal of human genetics Li, J., Hong, X., Mesiano, S., Muglia, L. J., Wang, X., Snyder, M., Stevenson, D. K., Shaw, G. M. 2018

    Abstract

    The progesterone receptor (PGR) plays a central role in maintaining pregnancy and is significantly associated with medical conditions such as preterm birth that affects 12.6% of all the births in U.S. PGR has been evolving rapidly since the common ancestor of human and chimpanzee, and we herein investigated evolutionary dynamics of PGR during recent human migration and population differentiation. Our study revealed substantial population differentiation at the PGR locus driven by natural selection, where very recent positive selection in East Asians has substantially decreased its genetic diversity by nearly fixing evolutionarily novel alleles. On the contrary, in European populations, the PGR locus has been promoted to a highly polymorphic state likely due to balancing selection. Integrating transcriptome data across multiple tissue types together with large-scale genome-wide association data for preterm birth, our study demonstrated the consequence of the selection event in East Asians on remodeling PGR expression specifically in the ovary and determined a significant association of early spontaneous preterm birth with the evolutionarily selected variants. To reconstruct its evolutionary trajectory on the human lineage, we observed substantial differentiation between modern and archaic humans at the PGR locus, including fixation of a deleterious missense allele in the Neanderthal genome that was later introgressed in modern human populations. Taken together, our study revealed substantial evolutionary innovation in PGR even during very recent human evolution, and its different forms among human populations likely result in differential susceptibility to progesterone-associated disease conditions including preterm birth.

    View details for PubMedID 29937092

  • Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining JOURNAL OF PROTEOME RESEARCH Yu, K., Lee, T., Wan, C., Chen, Y., Re, C., Kou, S. C., Chiang, J., Kohane, I. S., Snyder, M. 2018; 17 (4): 1383–96

    Abstract

    There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.

    View details for PubMedID 29505266

  • Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome GENOME RESEARCH Tilgner, H., Jahanbani, F., Gupta, I., Collier, P., Wei, E., Rasmussen, M., Snyder, M. 2018; 28 (2): 231–42

    Abstract

    Understanding transcriptome complexity is crucial for understanding human biology and disease. Technologies such as Synthetic long-read RNA sequencing (SLR-RNA-seq) delivered 5 million isoforms and allowed assessing splicing coordination. Pacific Biosciences and Oxford Nanopore increase throughput also but require high input amounts or amplification. Our new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules at a time, enabling analysis of 10-100 million RNA molecules. SpISO-seq requires less than 1 ng of input cDNA, limiting or removing the need for prior amplification with its associated biases. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. The increased number of molecules expands our understanding of isoform complexity. In addition to confirming our previously published cases of splicing coordination (e.g., BIN1), the greater depth reveals many new cases, such as MAPT Coordination of internal exons is found to be extensive among protein coding genes: 23.5%-59.3% (95% confidence interval) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for noncoding sequences, suggesting a larger role of splicing coordination in shaping proteins. Groups of genes with coordination are involved in protein-protein interactions with each other, raising the possibility that coordination facilitates complex formation and/or function. We also find new splicing coordination types, involving initial and terminal exons. Our results provide a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.

    View details for PubMedID 29196558

    View details for PubMedCentralID PMC5793787

  • A genome-wide association study identifies only two ancestry specific variants associated with spontaneous preterm birth SCIENTIFIC REPORTS Rappoport, N., Toung, J., Hadley, D., Wong, R. J., Fujioka, K., Reuter, J., Abbott, C. W., Oh, S., Hu, D., Eng, C., Huntsman, S., Bodian, D. L., Niederhuber, J. E., Hong, X., Zhang, G., Sikora-Wohfeld, W., Gignoux, C. R., Wang, H., Oehlert, J., Jelliffe-Pawlowski, L. L., Gould, J. B., Darmstadt, G. L., Wang, X., Bustamante, C. D., Snyder, M. P., Ziv, E., Patsopoulos, N. A., Muglia, L. J., Burchard, E., Shaw, G. M., O'Brodovich, H. M., Stevenson, D. K., Butte, A. J., Sirota, M. 2018; 8: 226

    Abstract

    Preterm birth (PTB), or the delivery prior to 37 weeks of gestation, is a significant cause of infant morbidity and mortality. Although twin studies estimate that maternal genetic contributions account for approximately 30% of the incidence of PTB, and other studies reported fetal gene polymorphism association, to date no consistent associations have been identified. In this study, we performed the largest reported genome-wide association study analysis on 1,349 cases of PTB and 12,595 ancestry-matched controls from the focusing on genomic fetal signals. We tested over 2 million single nucleotide polymorphisms (SNPs) for associations with PTB across five subpopulations: African (AFR), the Americas (AMR), European, South Asian, and East Asian. We identified only two intergenic loci associated with PTB at a genome-wide level of significance: rs17591250 (P = 4.55E-09) on chromosome 1 in the AFR population and rs1979081 (P = 3.72E-08) on chromosome 8 in the AMR group. We have queried several existing replication cohorts and found no support of these associations. We conclude that the fetal genetic contribution to PTB is unlikely due to single common genetic variant, but could be explained by interactions of multiple common variants, or of rare variants affected by environmental influences, all not detectable using a GWAS alone.

    View details for PubMedID 29317701

  • Integrative Personal Omics Profiles during Periods of Weight Gain and Loss. Cell systems Piening, B. D., Zhou, W., Contrepois, K., Röst, H., Gu Urban, G. J., Mishra, T., Hanson, B. M., Bautista, E. J., Leopold, S., Yeh, C. Y., Spakowicz, D., Banerjee, I., Chen, C., Kukurba, K., Perelman, D., Craig, C., Colbert, E., Salins, D., Rego, S., Lee, S., Zhang, C., Wheeler, J., Sailani, M. R., Liang, L., Abbott, C., Gerstein, M., Mardinoglu, A., Smith, U., Rubin, D. L., Pitteri, S., Sodergren, E., McLaughlin, T. L., Weinstock, G. M., Snyder, M. P. 2018

    Abstract

    Advances in omics technologies now allow an unprecedented level of phenotyping for human diseases, including obesity, in which individual responses to excess weight are heterogeneous and unpredictable. To aid the development of better understanding of these phenotypes, we performed a controlled longitudinal weight perturbation study combining multiple omics strategies (genomics, transcriptomics, multiple proteomics assays, metabolomics, and microbiomics) during periods of weight gain and loss in humans. Results demonstrated that: (1) weight gain is associated with the activation of strong inflammatory and hypertrophic cardiomyopathy signatures in blood; (2) although weight loss reverses some changes, a number of signatures persist, indicative of long-term physiologic changes; (3) we observed omics signatures associated with insulin resistance that may serve as novel diagnostics; (4) specific biomolecules were highly individualized and stable in response to perturbations, potentially representing stable personalized markers. Most data are available open access and serve as a valuable resource for the community.

    View details for PubMedID 29361466

  • Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma CELL SYSTEMS Yu, K., Berry, G. J., Rubin, D. L., Re, C., Altman, R. B., Snyder, M. 2017; 5 (6): 620-+

    Abstract

    Adenocarcinoma accounts for more than 40% of lung malignancy, and microscopic pathology evaluation is indispensable for its diagnosis. However, how histopathology findings relate to molecular abnormalities remains largely unknown. Here, we obtained H&E-stained whole-slide histopathology images, pathology reports, RNA sequencing, and proteomics data of 538 lung adenocarcinoma patients from The Cancer Genome Atlas and used these to identify molecular pathways associated with histopathology patterns. We report cell-cycle regulation and nucleotide binding pathways underpinning tumor cell dedifferentiation, and we predicted histology grade using transcriptomics and proteomics signatures (area under curve >0.80). We built an integrative histopathology-transcriptomics model to generate better prognostic predictions for stage I patients (p = 0.0182 ± 0.0021) compared with gene expression or histopathology studies alone, and the results were replicated in an independent cohort (p = 0.0220 ± 0.0070). These results motivate the integration of histopathology and omics data to investigate molecular mechanisms of pathology findings and enhance clinical prognostic prediction.

    View details for PubMedID 29153840

    View details for PubMedCentralID PMC5746468

  • Plasma sterols and depressive symptom severity in a population-based cohort PLOS ONE Cenik, B., Cenik, C., Snyder, M. P., Brown, E. 2017; 12 (9): e0184382

    Abstract

    Convergent evidence strongly suggests major depressive disorder is heterogeneous in its etiology and clinical characteristics. Depression biomarkers hold potential for identifying etiological subtypes, improving diagnostic accuracy, predicting treatment response, and personalization of treatment. Human plasma contains numerous sterols that have not been systematically studied. Changes in cholesterol concentrations have been implicated in suicide and depression, suggesting plasma sterols may be depression biomarkers. Here, we investigated associations between plasma levels of 34 sterols (measured by mass spectrometry) and scores on the Quick Inventory of Depressive Symptomatology-Self Report (QIDS-SR16) scale in 3117 adult participants in the Dallas Heart Study, an ethnically diverse, population-based cohort. We built a random forest model using feature selection from a pool of 43 variables including demographics, general health indicators, and sterol concentrations. This model comprised 19 variables, 13 of which were sterol concentrations, and explained 15.5% of the variation in depressive symptoms. Desmosterol concentrations below the fifth percentile (1.9 ng/mL, OR 1.9, 95% CI 1.2-2.9) were significantly associated with depressive symptoms of at least moderate severity (QIDS-SR16 score ≥10.5). This is the first study reporting a novel association between plasma concentrations cholesterol precursors and depressive symptom severity.

    View details for PubMedID 28886149

  • Fetal de novo mutations and preterm birth. PLoS genetics Li, J., Oehlert, J., Snyder, M., Stevenson, D. K., Shaw, G. M. 2017; 13 (4)

    Abstract

    Preterm birth (PTB) affects ~12% of pregnancies in the US. Despite its high mortality and morbidity, the molecular etiology underlying PTB has been unclear. Numerous studies have been devoted to identifying genetic factors in maternal and fetal genomes, but so far few genomic loci have been associated with PTB. By analyzing whole-genome sequencing data from 816 trio families, for the first time, we observed the role of fetal de novo mutations in PTB. We observed a significant increase in de novo mutation burden in PTB fetal genomes. Our genomic analyses further revealed that affected genes by PTB de novo mutations were dosage sensitive, intolerant to genomic deletions, and their mouse orthologs were likely developmentally essential. These genes were significantly involved in early fetal brain development, which was further supported by our analysis of copy number variants identified from an independent PTB cohort. Our study indicates a new mechanism in PTB occurrence independently contributed from fetal genomes, and thus opens a new avenue for future PTB research.

    View details for DOI 10.1371/journal.pgen.1006689

    View details for PubMedID 28388617

  • De novo and rare mutations in the HSPA1L heat shock gene associated with inflammatory bowel disease GENOME MEDICINE Takahashi, S., Andreoletti, G., Chen, R., Munehira, Y., Batra, A., Afzal, N. A., Beattie, R. M., Bernstein, J. A., Ennis, S., Snyder, M. 2017; 9

    Abstract

    Inflammatory bowel disease (IBD) is a chronic, relapsing inflammatory disease of the gastrointestinal tract which includes ulcerative colitis and Crohn's disease. Genetic risk factors for IBD are not well understood.We performed a family-based whole exome sequencing (WES) analysis on a core family (Family A) to identify potential causal mutations and then analyzed exome data from a Caucasian pediatric cohort (136 patients and 106 controls) to validate the presence of mutations in the candidate gene, heat shock 70 kDa protein 1-like (HSPA1L). Biochemical assays of the de novo and rare (minor allele frequency, MAF < 0.01) mutation variant proteins further validated the predicted deleterious effects of the identified alleles.In the proband of Family A, we found a heterozygous de novo mutation (c.830C > T; p.Ser277Leu) in HSPA1L. Through analysis of WES data of 136 patients, we identified five additional rare HSPA1L mutations (p.Gly77Ser, p.Leu172del, p.Thr267Ile, p.Ala268Thr, p.Glu558Asp) in six patients. In contrast, rare HSPA1L mutations were not observed in controls, and were significantly enriched in patients (P = 0.02). Interestingly, we did not find non-synonymous rare mutations in the HSP70 isoforms HSPA1A and HSPA1B. Biochemical assays revealed that all six rare HSPA1L variant proteins showed decreased chaperone activity in vitro. Moreover, three variants demonstrated dominant negative effects on HSPA1L and HSPA1A protein activity.Our results indicate that de novo and rare mutations in HSPA1L are associated with IBD and provide insights into the pathogenesis of IBD, and also expand our understanding of the roles of HSP70s in human disease.

    View details for DOI 10.1186/s13073-016-0394-9

    View details for PubMedID 28126021

  • Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information. PLoS biology Li, X., Dunn, J., Salins, D., Zhou, G., Zhou, W., Schüssler-Fiorenza Rose, S. M., Perelman, D., Colbert, E., Runge, R., Rego, S., Sonecha, R., Datta, S., McLaughlin, T., Snyder, M. P. 2017; 15 (1)

    Abstract

    A new wave of portable biosensors allows frequent measurement of health-related physiology. We investigated the use of these devices to monitor human physiological changes during various activities and their role in managing health and diagnosing and analyzing disease. By recording over 250,000 daily measurements for up to 43 individuals, we found personalized circadian differences in physiological parameters, replicating previous physiological findings. Interestingly, we found striking changes in particular environments, such as airline flights (decreased peripheral capillary oxygen saturation [SpO2] and increased radiation exposure). These events are associated with physiological macro-phenotypes such as fatigue, providing a strong association between reduced pressure/oxygen and fatigue on high-altitude flights. Importantly, we combined biosensor information with frequent medical measurements and made two important observations: First, wearable devices were useful in identification of early signs of Lyme disease and inflammatory responses; we used this information to develop a personalized, activity-based normalization framework to identify abnormal physiological signals from longitudinal data for facile disease detection. Second, wearables distinguish physiological differences between insulin-sensitive and -resistant individuals. Overall, these results indicate that portable biosensors provide useful information for monitoring personal activities and physiology and are likely to play an important role in managing health and enabling affordable health care access to groups traditionally limited by socioeconomic class or remote geography.

    View details for DOI 10.1371/journal.pbio.2001402

    View details for PubMedID 28081144

  • Static and Dynamic DNA Loops form AP-1-Bound Activation Hubs during Macrophage Development. Molecular cell Phanstiel, D. H., Van Bortle, K., Spacek, D., Hess, G. T., Shamim, M. S., Machol, I., Love, M. I., Aiden, E. L., Bassik, M. C., Snyder, M. P. 2017; 67 (6): 1037–48.e6

    Abstract

    The three-dimensional arrangement of the human genome comprises a complex network of structural and regulatory chromatin loops important for coordinating changes in transcription during human development. To better understand the mechanisms underlying context-specific 3D chromatin structure and transcription during cellular differentiation, we generated comprehensive in situ Hi-C maps of DNA loops in human monocytes and differentiated macrophages. We demonstrate that dynamic looping events are regulatory rather than structural in nature and uncover widespread coordination of dynamic enhancer activity at preformed and acquired DNA loops. Enhancer-bound loop formation and enhancer activation of preformed loops together form multi-loop activation hubs at key macrophage genes. Activation hubs connect 3.4 enhancers per promoter and exhibit a strong enrichment for activator protein 1 (AP-1)-binding events, suggesting that multi-loop activation hubs involving cell-type-specific transcription factors represent an important class of regulatory chromatin structures for the spatiotemporal control of transcription.

    View details for PubMedID 28890333

  • Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers. Cell stem cell Gu, M., Shao, N., Sa, S., Li, D., Termglinchan, V., Ameen, M., Karakikes, I., Sosa, G., Grubert, F., Lee, J., Cao, A., Taylor, S., Ma, Y., Zhao, Z., Chappell, J., Hamid, R., Austin, E. D., Gold, J. D., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2016

    Abstract

    In familial pulmonary arterial hypertension (FPAH), the autosomal dominant disease-causing BMPR2 mutation is only 20% penetrant, suggesting that genetic variation provides modifiers that alleviate the disease. Here, we used comparison of induced pluripotent stem cell-derived endothelial cells (iPSC-ECs) from three families with unaffected mutation carriers (UMCs), FPAH patients, and gender-matched controls to investigate this variation. Our analysis identified features of UMC iPSC-ECs related to modifiers of BMPR2 signaling or to differentially expressed genes. FPAH-iPSC-ECs showed reduced adhesion, survival, migration, and angiogenesis compared to UMC-iPSC-ECs and control cells. The "rescued" phenotype of UMC cells was related to an increase in specific BMPR2 activators and/or a reduction in inhibitors, and the improved cell adhesion could be attributed to preservation of related signaling. The improved survival was related to increased BIRC3 and was independent of BMPR2. Our findings therefore highlight protective modifiers for FPAH that could help inform development of future treatment strategies.

    View details for DOI 10.1016/j.stem.2016.08.019

    View details for PubMedID 28017794

  • Simul-seq: combined DNA and RNA sequencing for whole-genome and transcriptome profiling. Nature methods Reuter, J. A., Spacek, D. V., Pai, R. K., Snyder, M. P. 2016; 13 (11): 953-958

    Abstract

    Paired DNA and RNA profiling is increasingly employed in genomics research to uncover molecular mechanisms of disease and to explore personal genotype and phenotype correlations. Here, we introduce Simul-seq, a technique for the production of high-quality whole-genome and transcriptome sequencing libraries from small quantities of cells or tissues. We apply the method to laser-capture-microdissected esophageal adenocarcinoma tissue, revealing a highly aneuploid tumor genome with extensive blocks of increased homozygosity and corresponding increases in allele-specific expression. Among this widespread allele-specific expression, we identify germline polymorphisms that are associated with response to cancer therapies. We further leverage this integrative data to uncover expressed mutations in several known cancer genes as well as a recurrent mutation in the motor domain of KIF3B that significantly affects kinesin-microtubule interactions. Simul-seq provides a new streamlined approach for generating comprehensive genome and transcriptome profiles from limited quantities of clinically relevant samples.

    View details for DOI 10.1038/nmeth.4028

    View details for PubMedID 27723755

  • Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations NATURE GENETICS Araya, C. L., Cenik, C., Reuters, J. A., Kiss, G., Pande, V. S., Snyder, M. P., Greenleaf, W. J. 2016; 48 (2): 117-125

    Abstract

    Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

    View details for DOI 10.1038/ng.3471

    View details for Web of Science ID 000369043900008

  • Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nature biotechnology Kuleshov, V., Jiang, C., Zhou, W., Jahanbani, F., Batzoglou, S., Snyder, M. 2016; 34 (1): 64-69

    Abstract

    Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

    View details for DOI 10.1038/nbt.3416

    View details for PubMedID 26655498

    View details for PubMedCentralID PMC4884093

  • Yeast longevity promoted by reversing aging-associated decline in heavy isotope content. NPJ aging and mechanisms of disease Li, X., Snyder, M. P. 2016; 2: 16004

    Abstract

    Dysregulation of metabolism develops with organismal aging. Both genetic and environmental manipulations promote longevity by effectively diverting various metabolic processes against aging. How these processes converge on the metabolome is not clear. Here we report that the heavy isotopic forms of common elements, a universal feature of metabolites, decline in yeast cells undergoing chronological aging. Supplementation of deuterium, a heavy hydrogen isotope, through heavy water (D2O) uptake extends yeast chronological lifespan (CLS) by up to 85% with minimal effects on growth. The CLS extension by D2O bypasses several known genetic regulators, but is abrogated by calorie restriction and mitochondrial deficiency. Heavy water substantially suppresses endogenous generation of reactive oxygen species (ROS) and slows the pace of metabolic consumption and disposal. Protection from aging by heavy isotopes might result from kinetic modulation of biochemical reactions. Altogether, our findings reveal a novel perspective of aging and new means for promoting longevity.

    View details for PubMedID 28721263

  • Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature communications Yu, K., Zhang, C., Berry, G. J., Altman, R. B., Ré, C., Rubin, D. L., Snyder, M. 2016; 7: 12474-?

    Abstract

    Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs.

    View details for DOI 10.1038/ncomms12474

    View details for PubMedID 27527408

  • Identification of Human Neuronal Protein Complexes Reveals Biochemical Activities and Convergent Mechanisms of Action in Autism Spectrum Disorders CELL SYSTEMS Li, J., Ma, Z., Shi, M., Malty, R. H., Aoki, H., Minic, Z., Phanse, S., Jin, K., Wall, D. P., Zhang, Z., Urban, A. E., Hallmayer, J., Babu, M., Snyder, M. 2015; 1 (5): 361-374

    Abstract

    The prevalence of autism spectrum disorders (ASDs) is rapidly growing, yet its molecular basis is poorly understood. We used a systems approach in which ASD candidate genes were mapped onto the ubiquitous human protein complexes and the resulting complexes were characterized. The studies revealed the role of histone deacetylases (HDAC1/2) in regulating the expression of ASD orthologs in the embryonic mouse brain. Proteome-wide screens for the co-complexed subunits with HDAC1 and six other key ASD proteins in neuronal cells revealed a protein interaction network, which displayed preferential expression in fetal brain development, exhibited increased deleterious mutations in ASD cases, and were strongly regulated by FMRP and MECP2 causal for Fragile X and Rett syndromes, respectively. Overall, our study reveals molecular components in ASD, suggests a shared mechanism between the syndromic and idiopathic forms of ASDs, and provides a systems framework for analyzing complex human diseases.

    View details for DOI 10.1016/j.cels.2015.11.002

    View details for Web of Science ID 000209926300009

    View details for PubMedCentralID PMC4776331

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions CELL Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065

    Abstract

    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for Web of Science ID 000360589900015

    View details for PubMedCentralID PMC4556133

  • Recurrent somatic mutations in regulatory regions of human cancer genomes. Nature genetics Melton, C., Reuter, J. A., Spacek, D. V., Snyder, M. 2015; 47 (7): 710-716

    Abstract

    Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus-specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

    View details for DOI 10.1038/ng.3332

    View details for PubMedID 26053494

    View details for PubMedCentralID PMC4485503

  • Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events NATURE BIOTECHNOLOGY Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., Harel, I., Bustamante, C. D., Rasmussen, M., Snyder, M. P. 2015; 33 (7): 736-742

    Abstract

    Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

    View details for DOI 10.1038/nbt.3242

    View details for Web of Science ID 000358396100029

  • Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nature biotechnology Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., Harel, I., Bustamante, C. D., Rasmussen, M., Snyder, M. P. 2015

    Abstract

    Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

    View details for PubMedID 25985263

  • Comparison of the transcriptional landscapes between human and mouse tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lin, S., Lin, Y., Nery, J. R., Urich, M. A., Breschi, A., Davis, C. A., Dobin, A., Zaleski, C., Beer, M. A., Chapman, W. C., Gingeras, T. R., Ecker, J. R., Snyder, M. P. 2014; 111 (48): 17224-17229

    Abstract

    Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.

    View details for DOI 10.1073/pnas.1413624111

    View details for Web of Science ID 000345920800059

    View details for PubMedID 25413365

    View details for PubMedCentralID PMC4260565

  • Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature Araya, C. L., Kawli, T., Kundaje, A., Jiang, L., Wu, B., Vafeados, D., Terrell, R., Weissdepp, P., Gevirtzman, L., Mace, D., Niu, W., Boyle, A. P., Xie, D., Ma, L., Murray, J. I., Reinke, V., Waterston, R. H., Snyder, M. 2014; 512 (7515): 400-405

    View details for DOI 10.1038/nature13497

    View details for PubMedID 25164749

  • Comparative analysis of regulatory information and circuits across distant species. Nature Boyle, A. P., Araya, C. L., Brdlik, C., Cayting, P., Cheng, C., Cheng, Y., Gardner, K., Hillier, L. W., Janette, J., Jiang, L., Kasper, D., Kawli, T., Kheradpour, P., Kundaje, A., Li, J. J., Ma, L., Niu, W., Rehm, E. J., Rozowsky, J., Slattery, M., Spokony, R., Terrell, R., Vafeados, D., Wang, D., Weisdepp, P., Wu, Y., Xie, D., Yan, K., Feingold, E. A., Good, P. J., Pazin, M. J., Huang, H., Bickel, P. J., Brenner, S. E., Reinke, V., Waterston, R. H., Gerstein, M., White, K. P., Kellis, M., Snyder, M. 2014; 512 (7515): 453-456

    View details for DOI 10.1038/nature13668

    View details for PubMedID 25164757

  • Defining a personal, allele-specific, and single-molecule long-read transcriptome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Tilgner, H., Grubert, F., Sharon, D., Snyder, M. P. 2014; 111 (27): 9869-9874

    Abstract

    Personal transcriptomes in which all of an individual's genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV-in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.

    View details for DOI 10.1073/pnas.1400447111

    View details for Web of Science ID 000338514800044

    View details for PubMedCentralID PMC4103364

  • Clinical interpretation and implications of whole-genome sequencing. JAMA : the journal of the American Medical Association Dewey, F. E., Grove, M. E., Pan, C., Goldstein, B. A., Bernstein, J. A., Chaib, H., Merker, J. D., Goldfeder, R. L., Enns, G. M., David, S. P., Pakdaman, N., Ormond, K. E., Caleshu, C., Kingham, K., Klein, T. E., Whirl-Carrillo, M., Sakamoto, K., Wheeler, M. T., Butte, A. J., Ford, J. M., Boxer, L., Ioannidis, J. P., Yeung, A. C., Altman, R. B., Assimes, T. L., Snyder, M., Ashley, E. A., Quertermous, T. 2014; 311 (10): 1035-1045

    Abstract

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

    View details for DOI 10.1001/jama.2014.1717

    View details for PubMedID 24618965

    View details for PubMedCentralID PMC4119063

  • Divergence in a master variator generates distinct phenotypes and transcriptional responses GENES & DEVELOPMENT Gallagher, J. E., Zheng, W., Rong, X., Miranda, N., Lin, Z., Dunn, B., Zhao, H., Snyder, M. P. 2014; 28 (4): 409-421

    Abstract

    Genetic basis of phenotypic differences in individuals is an important area in biology and personalized medicine. Analysis of divergent Saccharomyces cerevisiae strains grown under different conditions revealed extensive variation in response to both drugs (e.g., 4-nitroquinoline 1-oxide [4NQO]) and different carbon sources. Differences in 4NQO resistance were due to amino acid variation in the transcription factor Yrr1. Yrr1(YJM789) conferred 4NQO resistance but caused slower growth on glycerol, and vice versa with Yrr1(S96), indicating that alleles of Yrr1 confer distinct phenotypes. The binding targets of Yrr1 alleles from diverse yeast strains varied considerably among different strains grown under the same conditions as well as for the same strain under different conditions, indicating that distinct molecular programs are conferred by the different Yrr1 alleles. Our results demonstrate that genetic variations in one important control gene (YRR1), lead to distinct regulatory programs and phenotypes in individuals. We term these polymorphic control genes "master variators."

    View details for DOI 10.1101/gad.228940.113

    View details for Web of Science ID 000331616100009

    View details for PubMedID 24532717

    View details for PubMedCentralID PMC3937518

  • Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology Li, J., Shi, M., Ma, Z., Zhao, S., Euskirchen, G., Ziskin, J., Urban, A., Hallmayer, J., Snyder, M. 2014; 10: 774-?

    Abstract

    Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.

    View details for DOI 10.15252/msb.20145487

    View details for PubMedID 25549968

    View details for PubMedCentralID PMC4300495

  • Principles of regulatory information conservation between mouse and human. Nature Cheng, Y., Ma, Z., Kim, B. H., Wu, W., Cayting, P., Boyle, A. P., Sundaram, V., Xing, X., Dogan, N., Li, J., Euskirchen, G., Lin, S., Lin, Y., Visel, A., Kawli, T., Yang, X., Patacsil, D., Keller, C. A., Giardine, B., Kundaje, A., Wang, T., Pennacchio, L. A., Weng, Z., Hardison, R. C., Snyder, M. P. 2014; 515 (7527): 371–75

    Abstract

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

    View details for PubMedID 25409826

  • Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology Li, J., Shi, M., Ma, Z., Zhao, S., Euskirchen, G., Ziskin, J., Urban, A., Hallmayer, J., Snyder, M. 2014; 10 (12): 774-?

    View details for DOI 10.15252/msb.20145487

    View details for PubMedID 25549968

  • Extensive Variation in Chromatin States Across Humans SCIENCE Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J. B., Kundaje, A., Liu, Y., Boyle, A. P., Zhang, Q. C., Zakharia, F., Spacek, D. V., Li, J., Xie, D., Olarerin-George, A., Steinmetz, L. M., Hogenesch, J. B., Kellis, M., Batzoglou, S., Snyder, M. 2013; 342 (6159): 750-752

    Abstract

    The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.

    View details for DOI 10.1126/science.1242510

    View details for PubMedID 24136358

  • A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2013; 31 (11): 1009-1014

    Abstract

    Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

    View details for DOI 10.1038/nbt.2705

    View details for PubMedID 24108091

  • Dynamic trans-Acting Factor Colocalization in Human Cells CELL Xie, D., Boyle, A. P., Wu, L., Zhai, J., Kawli, T., Snyder, M. 2013; 155 (3): 713-724

    Abstract

    Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.

    View details for DOI 10.1016/j.cell.2013.09.043

    View details for Web of Science ID 000326571800023

    View details for PubMedID 24243024

  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias. journal of allergy and clinical immunology Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-664 e17

    Abstract

    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for PubMedID 23830146

  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-?

    Abstract

    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for Web of Science ID 000323612000018

    View details for PubMedID 23830146

  • Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America Karczewski, K. J., Dudley, J. T., Kukurba, K. R., Chen, R., Butte, A. J., Montgomery, S. B., Snyder, M. 2013; 110 (23): 9607-9612

    Abstract

    Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.

    View details for DOI 10.1073/pnas.1219099110

    View details for PubMedID 23690573

  • Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports Mias, G. I., Chen, R., Zhang, Y., Sridhar, K., Sharon, D., Xiao, L., Im, H., Snyder, M. P., Greenberg, P. L. 2013; 3: 3311-?

    Abstract

    Increased autoantibody reactivity in plasma from Myelodysplastic Syndromes (MDS) patients may provide novel disease signatures, and possible early detection. In a two-stage study we investigated Immunoglobulin G reactivity in plasma from MDS, Acute Myeloid Leukemia post MDS patients, and a healthy cohort. In exploratory Stage I we utilized high-throughput protein arrays to identify 35 high-interest proteins showing increased reactivity in patient subgroups compared to healthy controls. In validation Stage II we designed new arrays focusing on 25 of the proteins identified in Stage I and expanded the initial cohort. We validated increased antibody reactivity against AKT3, FCGR3A and ARL8B in patients, which enabled sample classification into stable MDS and healthy individuals. We also detected elevated AKT3 protein levels in MDS patient plasma. The discovery of increased specific autoantibody reactivity in MDS patients, provides molecular signatures for classification, supplementing existing risk categorizations, and may enhance diagnostic and prognostic capabilities for MDS.

    View details for DOI 10.1038/srep03311

    View details for PubMedID 24264604

  • Extensive genetic variation in somatic human tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA O'Huallachain, M., Karczewski, K. J., Weissman, S. M., Urban, A. E., Snyder, M. P. 2012; 109 (44): 18018-18023

    Abstract

    Genetic variation between individuals has been extensively investigated, but differences between tissues within individuals are far less understood. It is commonly assumed that all healthy cells that arise from the same zygote possess the same genomic content, with a few known exceptions in the immune system and germ line. However, a growing body of evidence shows that genomic variation exists between differentiated tissues. We investigated the scope of somatic genomic variation between tissues within humans. Analysis of copy number variation by high-resolution array-comparative genomic hybridization in diverse tissues from six unrelated subjects reveals a significant number of intraindividual genomic changes between tissues. Many (79%) of these events affect genes. Our results have important consequences for understanding normal genetic and phenotypic variation within individuals, and they have significant implications for both the etiology of genetic diseases such as cancer and for immortalized cell lines that might be used in research and therapeutics.

    View details for DOI 10.1073/pnas.1213736109

    View details for Web of Science ID 000311149900070

    View details for PubMedID 23043118

    View details for PubMedCentralID PMC3497787

  • An integrated encyclopedia of DNA elements in the human genome NATURE Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C., Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt, S. G., Lee, B., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N., Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng, C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B., Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis, M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K., Merkel, A., Mortazavi, A., Parker, S. C., Reddy, T. E., Rozowsky, J., Schlesinger, F., Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S., Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A., Adams, L. B., Kelly, C. J., Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney, E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C., Gingeras, T. R., Green, E. D., Guigo, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent, W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A., Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk, B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings, M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen, T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M., Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Roeder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk, B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond, A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigo, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S., Wong, M. C., Barber, G., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D., Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey, T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B., Battenhouse, A., Sheffield, N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C., Schaner, M. R., Kim, S. K., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z., McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter, D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S., Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers, R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J., Partridge, E. C., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein, H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I., Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G., Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter, C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B., Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C., Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A., Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G., Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte, R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski, F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge, J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A., Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M., Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigo, R., Harrow, J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P., Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao, A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming, J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X., Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J., Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J., O'Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky, J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K., Yang, X., Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder, M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia, R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen, A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl, F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C., Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C., Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates, D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg, K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson, A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T., Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E., Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B., Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A., Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss, M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A., Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk, M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow, A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A., Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J., Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes, J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker, D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S., Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci, A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K., Yip, K. Y., Birney, E. 2012; 489 (7414): 57-74

    Abstract

    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    View details for DOI 10.1038/nature11247

    View details for Web of Science ID 000308347000039

    View details for PubMedID 22955616

    View details for PubMedCentralID PMC3439153

  • Architecture of the human regulatory network derived from ENCODE data NATURE Gerstein, M. B., Kundaje, A., Hariharan, M., Landt, S. G., Yan, K., Cheng, C., Mu, X. J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A., Addleman, N., Bhardwaj, N., Boyle, A. P., Cayting, P., Charos, A., Chen, D. Z., Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J., Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J., Monahan, H., O'Geen, H., Ouyang, Z., Partridge, E. C., Patacsil, D., Pauli, F., Raha, D., Ramirez, L., Reddy, T. E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L., Yang, X., Yip, K. Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham, P. J., Myers, R. M., Weissman, S. M., Snyder, M. 2012; 489 (7414): 91-100

    Abstract

    Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.

    View details for DOI 10.1038/nature11245

    View details for PubMedID 22955619

  • Linking disease associations with regulatory information in the human genome GENOME RESEARCH Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S., Snyder, M. 2012; 22 (9): 1748-1759

    Abstract

    Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

    View details for DOI 10.1101/gr.136127.111

    View details for PubMedID 22955986

  • Annotation of functional variation in personal genomes using RegulomeDB GENOME RESEARCH Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A., Kasowski, M., Karczewski, K. J., Park, J., Hitz, B. C., Weng, S., Cherry, J. M., Snyder, M. 2012; 22 (9): 1790-1797

    Abstract

    As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

    View details for DOI 10.1101/gr.137323.112

    View details for PubMedID 22955989

  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia GENOME RESEARCH Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., Snyder, M. 2012; 22 (9): 1813-1831

    Abstract

    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

    View details for DOI 10.1101/gr.136184.111

    View details for PubMedID 22955991

  • Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes CELL Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., Miriami, E., Karczewski, K. J., Hariharan, M., Dewey, F. E., Cheng, Y., Clark, M. J., Im, H., Habegger, L., Balasubramanian, S., O'Huallachain, M., Dudley, J. T., Hillenmeyer, S., Haraksingh, R., Sharon, D., Euskirchen, G., Lacroute, P., Bettinger, K., Boyle, A. P., Kasowski, M., Grubert, F., Seki, S., Garcia, M., Whirl-Carrillo, M., Gallardo, M., Blasco, M. A., Greenberg, P. L., Snyder, P., Klein, T. E., Altman, R. B., Butte, A. J., Ashley, E. A., Gerstein, M., Nadeau, K. C., Tang, H., Snyder, M. 2012; 148 (6): 1293-1307

    Abstract

    Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.

    View details for DOI 10.1016/j.cell.2012.02.009

    View details for PubMedID 22424236

  • Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY Lam, H. Y., Pan, C., Clark, M. J., Lacroute, P., Chen, R., Haraksingh, R., O'Huallachain, M., Gerstein, M. B., Kidd, J. M., Bustamante, C. D., Snyder, M. 2012; 30 (3): 226-229

    View details for Web of Science ID 000301303800013

    View details for PubMedID 22398614

  • Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation CELL Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., Sim, H. S., Peh, S. Q., Mulawadi, F. H., Ong, C. T., Orlov, Y. L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C., Ge, W., Wang, H., Davis, C., Fisher-Aylor, K. I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fullwood, M. J., Cheung, E., Liu, E., Sung, W., Snyder, M., Ruan, Y. 2012; 148 (1-2): 84-98

    Abstract

    Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.

    View details for DOI 10.1016/j.cell.2011.12.014

    View details for Web of Science ID 000299540700016

    View details for PubMedID 22265404

    View details for PubMedCentralID PMC3339270

  • Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY Lam, H. Y., Clark, M. J., Chen, R., Chen, R., Natsoulis, G., O'Huallachain, M., Dewey, F. E., Habegger, L., Ashley, E. A., Gerstein, M. B., Butte, A. J., Ji, H. P., Snyder, M. 2012; 30 (1): 78-U118

    View details for DOI 10.1038/nbt.2065

    View details for Web of Science ID 000299110600023

  • Dissecting phosphorylation networks: lessons learned from yeast EXPERT REVIEW OF PROTEOMICS Mok, J., Zhu, X., Snyder, M. 2011; 8 (6): 775-786

    Abstract

    Protein phosphorylation continues to be regarded as one of the most important post-translational modifications found in eukaryotes and has been implicated in key roles in the development of a number of human diseases. In order to elucidate roles for the 518 human kinases, phosphorylation has routinely been studied using the budding yeast Saccharomyces cerevisiae as a model system. In recent years, a number of technologies have emerged to globally map phosphorylation in yeast. In this article, we review these technologies and discuss how these phosphorylation mapping efforts have shed light on our understanding of kinase signaling pathways and eukaryotic proteomic networks in general.

    View details for DOI 10.1586/EPR.11.64

    View details for Web of Science ID 000297299000013

    View details for PubMedID 22087660

    View details for PubMedCentralID PMC3262144

  • Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF NATURE Iyer, V. R., Horak, C. E., Scafe, C. S., Botstein, D., Snyder, M., Brown, P. O. 2001; 409 (6819): 533-538

    Abstract

    Proteins interact with genomic DNA to bring the genome to life; and these interactions also define many functional features of the genome. SBF and MBF are sequence-specific transcription factors that activate gene expression during the G1/S transition of the cell cycle in yeast. SBF is a heterodimer of Swi4 and Swi6, and MBF is a heterodimer of Mbpl and Swi6 (refs 1, 3). The related Swi4 and Mbp1 proteins are the DNA-binding components of the respective factors, and Swi6 mayhave a regulatory function. A small number of SBF and MBF target genes have been identified. Here we define the genomic binding sites of the SBF and MBF transcription factors in vivo, by using DNA microarrays. In addition to the previously characterized targets, we have identified about 200 new putative targets. Our results support the hypothesis that SBF activated genes are predominantly involved in budding, and in membrane and cell-wall biosynthesis, whereas DNA replication and repair are the dominant functions among MBF activated genes. The functional specialization of these factors may provide a mechanism for independent regulation of distinct molecular processes that normally occur in synchrony during the mitotic cell cycle.

    View details for Web of Science ID 000166570500053

    View details for PubMedID 11206552

  • Genome-wide effects of social status on DNA methylation in the brain of a cichlid fish, Astatotilapia burtoni. BMC genomics Hilliard, A. T., Xie, D., Ma, Z., Snyder, M. P., Fernald, R. D. 2019; 20 (1): 699

    Abstract

    BACKGROUND: Successful social behavior requires real-time integration of information about the environment, internal physiology, and past experience. The molecular substrates of this integration are poorly understood, but likely modulate neural plasticity and gene regulation. In the cichlid fish species Astatotilapia burtoni, male social status can shift rapidly depending on the environment, causing fast behavioral modifications and a cascade of changes in gene transcription, the brain, and the reproductive system. These changes can be permanent but are also reversible, implying the involvement of a robust but flexible mechanism that regulates plasticity based on internal and external conditions. One candidate mechanism is DNA methylation, which has been linked to social behavior in many species, including A. burtoni. But, the extent of its effects after A. burtoni social change were previously unknown.RESULTS: We performed the first genome-wide search for DNA methylation patterns associated with social status in the brains of male A. burtoni, identifying hundreds of Differentially Methylated genomic Regions (DMRs) in dominant versus non-dominant fish. Most DMRs were inside genes supporting neural development, synapse function, and other processes relevant to neural plasticity, and DMRs could affect gene expression in multiple ways. DMR genes were more likely to be transcription factors, have a duplicate elsewhere in the genome, have an anti-sense lncRNA, and have more splice variants than other genes. Dozens of genes had multiple DMRs that were often seemingly positioned to regulate specific splice variants.CONCLUSIONS: Our results revealed genome-wide effects of A. burtoni social status on DNA methylation in the brain and strongly suggest a role for methylation in modulating plasticity across multiple biological levels. They also suggest many novel hypotheses to address in mechanistic follow-up studies, and will be a rich resource for identifying the relationships between behavioral, neural, and transcriptional plasticity in the context of social status.

    View details for DOI 10.1186/s12864-019-6047-9

    View details for PubMedID 31506062

  • Systematic Identification of Host Cell Regulators of Legionella pneumophila Pathogenesis Using a Genome-wide CRISPR Screen. Cell host & microbe Jeng, E. E., Bhadkamkar, V., Ibe, N. U., Gause, H., Jiang, L., Chan, J., Jian, R., Jimenez-Morales, D., Stevenson, E., Krogan, N. J., Swaney, D. L., Snyder, M. P., Mukherjee, S., Bassik, M. C. 2019

    Abstract

    During infection, Legionella pneumophila translocates over 300 effector proteins into the host cytosol, allowing the pathogen to establish an endoplasmic reticulum (ER)-like Legionella-containing vacuole (LCV) that supports bacterial replication. Here, we perform a genome-wide CRISPR-Cas9 screen and secondary targeted screens in U937 human monocyte/macrophage-like cells to systematically identify host factorsthat regulate killing by L.pneumophila. The screens reveal known host factors hijacked by L.pneumophila, as well as genes spanning diverse trafficking and signaling pathways previously not linked to L.pneumophila pathogenesis. We further characterize C1orf43 and KIAA1109 as regulators ofphagocytosis and show that RAB10 and its chaperone RABIF are required for optimal L.pneumophila replication and ER recruitment to the LCV. Finally, we show that Rab10 protein is recruited to the LCV and ubiquitinated by the effectors SidC/SdcA. Collectively, our results provide a wealth of previously undescribed insights into L.pneumophila pathogenesis and mammalian cell function.

    View details for DOI 10.1016/j.chom.2019.08.017

    View details for PubMedID 31540829

  • Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes. Cell Sberro, H., Fremin, B. J., Zlitni, S., Edfors, F., Greenfield, N., Snyder, M. P., Pavlopoulos, G. A., Kyrpides, N. C., Bhatt, A. S. 2019

    Abstract

    Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; 30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.

    View details for DOI 10.1016/j.cell.2019.07.016

    View details for PubMedID 31402174

  • Simultaneous RNA purification and size selection using on-chip isotachophoresis with an ionic spacer. Lab on a chip Han, C. M., Catoe, D., Munro, S. A., Khnouf, R., Snyder, M. P., Santiago, J. G., Salit, M. L., Cenik, C. 2019

    Abstract

    We present an on-chip method for the extraction of RNA within a specific size range from low-abundance samples. We use isotachophoresis (ITP) with an ionic spacer and a sieving matrix to enable size-selection with a high yield of RNA in the target size range. The spacer zone separates two concentrated ITP peaks, the first containing unwanted single nucleotides and the second focusing RNA of the target size range (2-35 nt). Our ITP method excludes >90% of single nucleotides and >65% of longer RNAs (>35 nt). Compared to size selection using gel electrophoresis, ITP-based size-selection yields a 2.2-fold increase in the amount of extracted RNAs within the target size range. We also demonstrate compatibility of the ITP-based size-selection with downstream next generation sequencing. On-chip ITP-prepared samples reveal higher reproducibility of transcript-specific measurements compared to samples size-selected by gel electrophoresis. Our method offers an attractive alternative to conventional sample preparation for sequencing with shorter assay time, higher extraction efficiency and reproducibility. Potential applications of ITP-based size-selection include sequencing-based analyses of small RNAs from low-abundance samples such as rare cell types, samples from fluorescence activated cell sorting (FACS), or limited clinical samples.

    View details for DOI 10.1039/c9lc00311h

    View details for PubMedID 31328753

  • MISTERMINATE Mechanistically Links Mitochondrial Dysfunction with Proteostasis Failure. Molecular cell Wu, Z., Tantray, I., Lim, J., Chen, S., Li, Y., Davis, Z., Sitron, C., Dong, J., Gispert, S., Auburger, G., Brandman, O., Bi, X., Snyder, M., Lu, B. 2019

    Abstract

    Mitochondrial dysfunction and proteostasis failure frequently coexist as hallmarks of neurodegenerative disease. How these pathologies are related is notwell understood. Here, we describe a phenomenon termed MISTERMINATE (mitochondrial-stress-induced translational termination impairment and protein carboxyl terminal extension), which mechanistically links mitochondrial dysfunction with proteostasis failure. We show that mitochondrial dysfunction impairs translational termination of nuclear-encoded mitochondrial mRNAs, including complex-I 30kD subunit (C-I30) mRNA, occurring on the mitochondrial surface in Drosophila and mammalian cells. Ribosomes stalled at the normal stop codon continue to add to the C terminus of C-I30 certain amino acids non-coded by mRNA template. C-terminally extended C-I30 is toxic when assembled into C-I and forms aggregates in the cytosol. Enhancing co-translational quality control prevents C-I30 C-terminal extension and rescues mitochondrial and neuromuscular degeneration in a Parkinson's disease model. These findings emphasize theimportance of efficient translation termination and reveal unexpected link between mitochondrial health and proteome homeostasis mediated by MISTERMINATE.

    View details for DOI 10.1016/j.molcel.2019.06.031

    View details for PubMedID 31378462

  • Matrix stiffness induces a tumorigenic phenotype in mammary epithelium through changes in chromatin accessibility. Nature biomedical engineering Stowers, R. S., Shcherbina, A., Israeli, J., Gruber, J. J., Chang, J., Nam, S., Rabiee, A., Teruel, M. N., Snyder, M. P., Kundaje, A., Chaudhuri, O. 2019

    Abstract

    In breast cancer, the increased stiffness of the extracellular matrix is a key driver of malignancy. Yet little is known about the epigenomic changes that underlie the tumorigenic impact of extracellular matrix mechanics. Here, we show in a three-dimensional culture model of breast cancer that stiff extracellular matrix induces a tumorigenic phenotype through changes in chromatin state. We found that increased stiffness yielded cells with more wrinkled nuclei and with increased lamina-associated chromatin, that cells cultured in stiff matrices displayed more accessible chromatin sites, which exhibited footprints of Sp1 binding, and that this transcription factor acts along with the histone deacetylases 3 and 8 to regulate the induction of stiffness-mediated tumorigenicity. Just as cell culture on soft environments or in them rather than on tissue-culture plastic better recapitulates the acinar morphology observed in mammary epithelium in vivo, mammary epithelial cells cultured on soft microenvironments or in them also more closely replicate the in vivo chromatin state. Our results emphasize the importance of culture conditions for epigenomic studies, and reveal that chromatin state is a critical mediator of mechanotransduction.

    View details for DOI 10.1038/s41551-019-0420-5

    View details for PubMedID 31285581

  • Long-Read Sequencing - A Powerful Toll in Viral Transcriptome Research TRENDS IN MICROBIOLOGY Boldogkoi, Z., Moldovan, N., Balazs, Z., Snyder, M., Tombacz, D. 2019; 27 (7): 578–92
  • Comment on 'AIRE-deficient patients harbor unique high-affinity disease-ameliorating autoantibodies'. eLife Landegren, N., Rosen, L. B., Freyhult, E., Eriksson, D., Fall, T., Smith, G., Ferre, E. M., Brodin, P., Sharon, D., Snyder, M., Lionakis, M., Anderson, M., Kampe, O. 2019; 8

    Abstract

    The AIRE gene plays a key role in the development of central immune tolerance by promoting thymic presentation of tissue-specific molecules. Patients with AIRE-deficiency develop multiple autoimmune manifestations and display autoantibodies against the affected tissues. In 2016 it was reported that: i) the spectrum of autoantibodies in patients with AIRE-deficiency is much broader than previously appreciated; ii) neutralizing autoantibodies to type I interferons (IFNs) could provide protection against type 1 diabetes in these patients (Meyer et al., 2016). We attempted to replicate these new findings using a similar experimental approach in an independent patient cohort, and found no evidence for either conclusion.

    View details for DOI 10.7554/eLife.43578

    View details for PubMedID 31244471

  • Engineering Genetic Predisposition in Human Neuroepithelial Stem Cells Recapitulates Medulloblastoma Tumorigenesis. Cell stem cell Huang, M., Tailor, J., Zhen, Q., Gillmor, A. H., Miller, M. L., Weishaupt, H., Chen, J., Zheng, T., Nash, E. K., McHenry, L. K., An, Z., Ye, F., Takashima, Y., Clarke, J., Ayetey, H., Cavalli, F. M., Luu, B., Moriarity, B. S., Ilkhanizadeh, S., Chavez, L., Yu, C., Kurian, K. M., Magnaldo, T., Sevenet, N., Koch, P., Pollard, S. M., Dirks, P., Snyder, M. P., Largaespada, D. A., Cho, Y. J., Phillips, J. J., Swartling, F. J., Morrissy, A. S., Kool, M., Pfister, S. M., Taylor, M. D., Smith, A., Weiss, W. A. 2019

    Abstract

    Human neural stem cell cultures provide progenitor cells that are potential cells of origin for brain cancers. However, the extent to which genetic predisposition to tumor formation can be faithfully captured in stem cell lines is uncertain. Here, we evaluated neuroepithelial stem (NES) cells, representative of cerebellar progenitors. We transduced NES cells with MYCN, observing medulloblastoma upon orthotopic implantation in mice. Significantly, transcriptomes and patternsof DNA methylation from xenograft tumors were globally more representative of human medulloblastoma compared to a MYCN-driven genetically engineered mouse model. Orthotopic transplantation of NES cells generated from Gorlin syndrome patients, who are predisposed to medulloblastoma due to germline-mutated PTCH1, also generated medulloblastoma. We engineered candidate cooperating mutations in Gorlin NES cells, with mutation of DDX3X or loss of GSE1 both accelerating tumorigenesis. These findings demonstrate that human NES cells provide a potent experimental resource for dissecting genetic causation in medulloblastoma.

    View details for DOI 10.1016/j.stem.2019.05.013

    View details for PubMedID 31204176

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis (vol 6, 8035, 2015) NATURE COMMUNICATIONS Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2019; 10
  • Analysis of the Complete Genome Sequence of a Novel, Pseudorabies Virus Strain Isolated in Southeast Europe CANADIAN JOURNAL OF INFECTIOUS DISEASES & MEDICAL MICROBIOLOGY Csabai, Z., Tombacz, D., Deim, Z., Snyder, M., Boldogkoi, Z. 2019; 2019
  • Much ado about nothing: A qualitative study of the experiences of an average-risk population receiving results of exome sequencing JOURNAL OF GENETIC COUNSELING Rego, S., Dagan-Rosenfeld, O., Bivona, S. A., Snyder, M. P., Ormond, K. E. 2019; 28 (2): 428–37

    View details for DOI 10.1002/jgc4.1096

    View details for Web of Science ID 000463993600026

  • Much ado about nothing: A qualitative study of the experiences of an average-risk population receiving results of exome sequencing. Journal of genetic counseling Rego, S., Dagan-Rosenfeld, O., Bivona, S. A., Snyder, M. P., Ormond, K. E. 2019

    Abstract

    The increasing availability of exome sequencing to the general ("healthy") population raises questions about the implications of genomic testing for individuals without suspected Mendelian diseases. Little is known about this population's motivations for undergoing exome sequencing, their expectations, reactions, and perceptions of utility. In order to address these questions, we conducted in-depth semi-structured interviews with 12 participants recruited from a longitudinal multi-omics profiling study that included exome sequencing. Participants were interviewed after receiving exome results, which included Mendelian disease-associated pathogenic and likely pathogenic variants, pharmacogenetic variants, and risk assessments for multifactorial diseases such as type 2 diabetes. The primary motivation driving participation in exome sequencing was personal curiosity. While they reported feeling validation and relief, participants were frequently underwhelmed by the results and described having expected more from exome sequencing. All participants reported discussing the results with at least some family, friends, and healthcare providers. Participants' recollection of the results returned to them was sometimes incorrect or incomplete, in many cases aligning with their perceptions of their health risks when entering the study. These results underscore the need for different genetic counseling approaches for generally healthy patients undergoing exome sequencing, in particular the need to provide anticipatory guidance to moderate participants' expectations. They also provide a preview of potential challenges clinicians may face as genomic sequencing continues to scale-up in the general population despite a lack of full understanding of its impact.

    View details for PubMedID 30835913

  • Multi-Omics Profiling, Microscopic Cervical Remodeling, and Parturition: Insights from the Smart Diaphragm Study. Liang, L., Dunn, J. P., Chen, S., Tsai, M., Hornburg, D., Newmann, S., Avina, M., Leng, Y., Holman, R., Lee, T. H., Qureshi, S., Montelongo, E., Zhao, B., Jeliffe, L., Snyder, M., Rand, L. SAGE PUBLICATIONS INC. 2019: 216A
  • Lifelong physical activity is associated with promoter hypomethylation of genes involved in metabolism, myogenesis, contractile properties and oxidative stress resistance in aged human skeletal muscle. Scientific reports Sailani, M. R., Halling, J. F., Moller, H. D., Lee, H., Plomgaard, P., Pilegaard, H., Snyder, M. P., Regenberg, B. 2019; 9 (1): 3272

    Abstract

    Lifelong regular physical activity is associated with reduced risk of type 2 diabetes (T2D), maintenance of muscle mass and increased metabolic capacity. However, little is known about epigenetic mechanisms that might contribute to these beneficial effects in aged individuals. We investigated the effect of lifelong physical activity on global DNA methylation patterns in skeletal muscle of healthy aged men, who had either performed regular exercise or remained sedentary their entire lives (average age 62 years). DNA methylation was significantly lower in 714 promoters of the physically active than inactive men while methylation of introns, exons and CpG islands was similar in the two groups. Promoters for genes encoding critical insulin-responsive enzymes in glycogen metabolism, glycolysis and TCA cycle were hypomethylated in active relative to inactive men. Hypomethylation was also found in promoters of myosin light chain, dystrophin, actin polymerization, PAK regulatory genes and oxidative stress response genes. A cluster of genes regulated by GSK3beta-TCF7L2 also displayed promoter hypomethylation. Together, our results suggest that lifelong physical activity is associated with DNA methylation patterns that potentially allow for increased insulin sensitivity and a higher expression of genes in energy metabolism, myogenesis, contractile properties and oxidative stress resistance in skeletal muscle of aged individuals.

    View details for PubMedID 30824849

  • Long-Read Sequencing - A Powerful Tool in Viral Transcriptome Research. Trends in microbiology Boldogkoi, Z., Moldovan, N., Balazs, Z., Snyder, M., Tombacz, D. 2019

    Abstract

    Long-read sequencing (LRS) has become increasingly popular due to its strengths in de novo assembly and in resolving complex DNA regions as well as in determining full-length RNA molecules. Two important LRS technologies have been developed during the past few years, including single-molecule, real-time sequencing by Pacific Biosciences, and nanopore sequencing by Oxford Nanopore Technologies. Although current LRS methods produce lower coverage, and are more error prone than short-read sequencing, these methods continue to be superior in identifying transcript isoforms including multispliced RNAs and transcript-length variants as well as overlapping transcripts and alternative polycistronic RNA molecules. Viruses have small, compact genomes and therefore these organisms are ideal subjects for transcriptome analysis with the relatively low-throughput LRS techniques. Recent LRS studies have multiplied the number of previously known transcripts and have revealed complex networks of transcriptional overlaps in the examined viruses.

    View details for PubMedID 30824172

  • 2017 NIH-wide workshop report on "The Human Microbiome: Emerging Themes at the Horizon of the 21st Century" MICROBIOME Alm, E., Borenstein, E., Britton, R. A., Bultman, S. J., Chang, E. B., Cho, M., Dantas, G., Dominguez-Bello, M., Donovan, S. M., Dorrestein, P., Douglas, A. E., Gewirtz, A., Ghannoum, M., Goodman, A. L., Gordon, J. I., Huffnagle, G. B., Jenq, R. R., Jia, W., Knight, R., Koropatkin, N., Lampe, J. W., Lu, T., Ochman, H., Pamer, E. G., Patterson, A. D., Philpott, D., Pollard, K. S., Rawls, J. F., Salzman, N. H., Sears, C. L., Stappenbeck, T., Taga, M. E., Turnbaugh, P. J., Wang, H. H., Wu, G. D., Xavier, R. J., 2017 NIH-Wide Microbiome Workshop 2019; 7: 32

    Abstract

    The National Institutes of Health (NIH) organized a three-day human microbiome research workshop, August 16-18, 2017, to highlight the accomplishments of the 10-year Human Microbiome Project program, the outcomes of the investments made by the 21 NIH Institutes and Centers which now fund this area, and the technical challenges and knowledge gaps which will need to be addressed in order for this field to advance over the next 10 years. This report summarizes the key points in the talks, round table discussions, and Joint Agency Panel from this workshop.

    View details for DOI 10.1186/s40168-019-0627-4

    View details for Web of Science ID 000459927100002

    View details for PubMedID 30808401

    View details for PubMedCentralID PMC6391828

  • Whole-exome sequencing data of suicide victims who had suffered from major depressive disorder. Scientific data Tombacz, D., Maroti, Z., Kalmar, T., Palkovits, M., Snyder, M., Boldogkoi, Z. 2019; 6: 190010

    Abstract

    Suicide is one of the leading causes of mortality worldwide; it causes the death of more than one million patients each year. Suicide is a complex, multifactorial phenotype with environmental and genetic factors contributing to the risk of the forthcoming suicide. These factors first generally lead to mental disorders, such as depression, schizophrenia and bipolar disorder, which then become the direct cause of suicide. Here we present a high quality dataset (including processed BAM and VCF files) gained from the high-throughput whole-exome Illumina sequencing of 23 suicide victims - all of whom had suffered from major depressive disorder - and 21 control patients to a depth of at least 40-fold coverage in both cohorts. We identified ~130,000 variants per sample and altogether 442,270 unique variants in the cohort of 44 samples. To our best knowledge, this is the first whole-exome sequencing dataset from suicide victims. We expect that this dataset provides useful information for genomic studies of suicide and depression, and also for the analysis of the Hungarian population.

    View details for PubMedID 30720799

  • Whole-exome sequencing data of suicide victims who had suffered from major depressive disorder SCIENTIFIC DATA Tombacz, D., Maroti, Z., Kalmar, T., Palkovits, M., Snyder, M., Boldogkoi, Z. 2019; 6
  • Smooth Muscle Contact Drives Endothelial Regeneration by BMPR2-Notch1-Mediated Metabolic and Epigenetic Changes CIRCULATION RESEARCH Miyagawa, K., Shi, M., Chen, P., Hennigs, J. K., Zhao, Z., Wang, M., Li, C. G., Saito, T., Taylor, S., Sa, S., Cao, A., Wang, L., Snyder, M. P., Rabinovitch, M. 2019; 124 (2): 211–24
  • Activation of PDGF pathway links LMNA mutation to dilated cardiomyopathy. Nature Lee, J., Termglinchan, V., Diecke, S., Itzhaki, I., Lam, C. K., Garg, P., Lau, E., Greenhaw, M., Seeger, T., Wu, H., Zhang, J. Z., Chen, X., Gil, I. P., Ameen, M., Sallam, K., Rhee, J. W., Churko, J. M., Chaudhary, R., Chour, T., Wang, P. J., Snyder, M. P., Chang, H. Y., Karakikes, I., Wu, J. C. 2019

    Abstract

    Lamin A/C (LMNA) is one of the most frequently mutated genes associated with dilated cardiomyopathy (DCM). DCM related to mutations in LMNA is a common inherited cardiomyopathy that is associated with systolic dysfunction and cardiac arrhythmias. Here we modelled the LMNA-related DCM in vitro using patient-specific induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs). Electrophysiological studies showed that the mutant iPSC-CMs displayed aberrant calcium homeostasis that led to arrhythmias at the single-cell level. Mechanistically, we show that the platelet-derived growth factor (PDGF) signalling pathway is activated in mutant iPSC-CMs compared to isogenic control iPSC-CMs. Conversely, pharmacological and molecular inhibition of the PDGF signalling pathway ameliorated the arrhythmic phenotypes of mutant iPSC-CMs in vitro. Taken together, our findings suggest that the activation of the PDGF pathway contributes to the pathogenesis of LMNA-related DCM and point to PDGF receptor-β (PDGFRB) as a potential therapeutic target.

    View details for DOI 10.1038/s41586-019-1406-x

    View details for PubMedID 31316208

  • Phenotypically-Silent Bone Morphogenetic Protein Receptor 2 (Bmpr2) Mutations Predispose Rats to Inflammation-Induced Pulmonary Arterial Hypertension by Enhancing The Risk for Neointimal Transformation. Circulation Tian, W., Jiang, X., Sung, Y. K., Shuffle, E., Wu, T. H., Kao, P. N., Tu, A. B., Dorfmüller, P., Cao, A., Wang, L., Peng, G., Kim, Y., Zhang, P., Chappell, J., Pasupneti, S., Dahms, P., Maguire, P., Chaib, H., Zamanian, R., Peters-Golden, M., Snyder, M. P., Voelkel, N. F., Humbert, M., Rabinovitch, M., Nicolls, M. R. 2019

    Abstract

    Bmpr2 mutations are critical risk factors for hereditary pulmonary arterial hypertension (hPAH) with approximately 20% of carriers developing disease. There is an unmet medical need to understand how environmental factors, such as inflammation, render Bmpr2 mutants susceptible to PAH. Overexpressing 5-lipoxygenase (5-LO) provokes lung inflammation and transient PAH in Bmpr2+/- mice. Accordingly, 5-LO and its metabolite, leukotriene B4 (LTB4), are candidates for the 'second hit'. The purpose of this study was to determine how 5-LO-mediated pulmonary inflammation synergized with phenotypically-silent Bmpr2 defects to elicit significant pulmonary vascular disease in rats.Monoallelic Bmpr2 mutant rats were generated and found phenotypically normal for up to one year of observation. To evaluate whether a second hit would elicit disease, animals were exposed to 5-LO-expressing adenovirus (AdAlox5), monocrotaline, SU5416, SU5416 with chronic hypoxia or chronic hypoxia alone. Bmpr2-mutant hPAH patient samples were assessed for neointimal 5-LO expression. Pulmonary artery endothelial cells (PAECs) with impaired BMPR2 signaling were exposed to increased 5-LO-mediated inflammation and were assessed for phenotypic and transcriptomic changes.Lung inflammation, induced by intratracheal delivery of AdAlox5, elicited severe PAH with intimal remodeling in Bmpr2+/- rats but not in their wild-type littermates. Neointimal lesions in the diseased Bmpr2+/- rats gained endogenous 5-LO expression associated with elevated LTB4 biosynthesis. Bmpr2-mutant hPAH patients similarly expressed 5-LO in the neointimal cells. In vitro, BMPR2 deficiency, compounded by 5-LO-mediated inflammation, generated apoptosis-resistant, and proliferative PAECs with mesenchymal characteristics. These transformed cells expressed nuclear envelope-localized 5-LO consistent with induced LTB4 production, as well as a transcriptomic signature similar to clinical disease, including upregulated NF-κB, IL-6, and TGF-β signaling pathways. The reversal of PAH and vasculopathy in Bmpr2 mutants by TGF-β antagonism suggests that TGF-β is critical for neointimal transformation.In a new 'two-hit' model of disease, lung inflammation induced severe PAH pathology in Bmpr2+/- rats. Endothelial transformation required the activation of canonical and noncanonical TGF-β signaling pathways and was characterized by 5-LO nuclear envelope translocation with enhanced LTB4 production. This study offers one explanation of how an environmental injury unleashes the destructive potential of an otherwise-silent genetic mutation.

    View details for DOI 10.1161/CIRCULATIONAHA.119.040629

    View details for PubMedID 31462075

  • Progress on Identifying and Characterizing the Human Proteome: 2019 Metrics from the HUPO Human Proteome Project. Journal of proteome research Omenn, G. S., Lane, L., Overall, C. M., Corrales, F. J., Schwenk, J. M., Paik, Y. K., Van Eyk, J. E., Liu, S., Pennington, S., Snyder, M. P., Baker, M. S., Deutsch, E. W. 2019

    Abstract

    The Human Proteome Project (HPP) annually reports on progress made throughout the field in credibly identifying and characterizing the complete human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2019-01-11 contains 17 694 proteins with strong protein-level evidence (PE1), compliant with HPP Guidelines for Interpretation of MS Data v2.1; these represent 89% of all 19 823 neXtProt predicted coding genes (all PE1,2,3,4 proteins), up from 17 470 one year earlier. Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), has been reduced from 2949 to 2129 since 2016 through efforts throughout the community, including the chromosome-centric HPP. PeptideAtlas is the source of uniformly reanalyzed raw mass spectrometry data for neXtProt; PeptideAtlas added 495 canonical proteins between 2018 and 2019, especially from studies designed to detect hard-to-identify proteins. Meanwhile, the Human Protein Atlas has released version 18.1 with immunohistochemical evidence of expression of 17 000 proteins and survival plots as part of the Pathology Atlas. Many investigators apply multiplexed SRM-targeted proteomics for quantitation of organ-specific popular proteins in studies of various human diseases. The 19 teams of the Biology and Disease-driven B/D-HPP published a total of 160 publications in 2018, bringing proteomics to a broad array of biomedical research.

    View details for DOI 10.1021/acs.jproteome.9b00434

    View details for PubMedID 31430157

  • MACHINE LEARNING ANALYSIS OF ULTRA-DEEP WHOLE-GENOME SEQUENCING IN HUMAN BRAIN REVEALS SOMATIC GENOMIC RETROTRANSPOSITION IN GLIA AS WELL AS IN NEURONS Urban, A., Zhu, X., Zhou, B., Sloan, S., Pattni, R., Fiston-Lavier, A., Snyder, M., Petrov, D., Abyzov, A., Vaccarino, F., Barres, B., Vogel, H., Tamminga, C., Levinson, D. ELSEVIER. 2019: 1240
  • Smart Diaphragm Study: Multi-omics profiling and cervical device measurements during pregnancy Liang, L., Dunn, J. P., Chen, S., Tsai, M., Hornburg, D., Newmann, S., Chung, P., Avina, M., Leng, Y., Holman, R., Lee, T. H., Berrios, S., Qureshi, S. A., Baer, R., Etemadi, M., Montelongo, E., Paynter, R., Zhao, B., Roy, S., Jelliffe, L., Snyder, M., Rand, L. MOSBY-ELSEVIER. 2019: S649
  • Personalized Metabolomics. Methods in molecular biology (Clifton, N.J.) Marciano, D. P., Snyder, M. P. 2019; 1978: 447–56

    Abstract

    The human metabolome is the cumulative product of ingested metabolites and those produced by the body and its microbiota. Together these metabolites can dynamically report on the health and disease state of an individual, as well as their response to drug treatments and other external perturbations. Profiling metabolites in human body fluids provides an opportunity to identify biomarkers and stratify patients for personalized treatments but requires the development of high-throughput approaches compatible with large cohort and longitudinal studies. Here we review in detail sample preparation and analytical liquid chromatography-mass spectrometry (LC-MS) methods to measure the broad chemical diversity of metabolites found in human plasma and urine.

    View details for DOI 10.1007/978-1-4939-9236-2_27

    View details for PubMedID 31119679

  • Analysis of the Complete Genome Sequence of a Novel, Pseudorabies Virus Strain Isolated in Southeast Europe. The Canadian journal of infectious diseases & medical microbiology = Journal canadien des maladies infectieuses et de la microbiologie medicale Csabai, Z., Tombácz, D., Deim, Z., Snyder, M., Boldogkői, Z. 2019; 2019: 1806842

    Abstract

    Pseudorabies virus (PRV) is the causative agent of Aujeszky's disease giving rise to significant economic losses worldwide. Many countries have implemented national programs for the eradication of this virus. In this study, long-read sequencing was used to determine the nucleotide sequence of the genome of a novel PRV strain (PRV-MdBio) isolated in Serbia.In this study, a novel PRV strain was isolated and characterized. PRV-MdBio was found to exhibit similar growth properties to those of another wild-type PRV, the strain Kaplan. Single-molecule real-time (SMRT) sequencing has revealed that the new strain differs significantly in base composition even from strain Kaplan, to which it otherwise exhibits the highest similarity. We compared the genetic composition of PRV-MdBio to strain Kaplan and the China reference strain Ea and obtained that radical base replacements were the most common point mutations preceding conservative and silent mutations. We also found that the adaptation of PRV to cell culture does not lead to any tendentious genetic alteration in the viral genome.PRV-MdBio is a wild-type virus, which differs in base composition from other PRV strains to a relatively large extent.

    View details for PubMedID 31093307

  • Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics (Oxford, England) Ghaemi, M. S., DiGiulio, D. B., Contrepois, K., Callahan, B., Ngo, T. T., Lee-McMullen, B., Lehallier, B., Robaczewska, A., Mcilwain, D., Rosenberg-Hasson, Y., Wong, R. J., Quaintance, C., Culos, A., Stanley, N., Tanada, A., Tsai, A., Gaudilliere, D., Ganio, E., Han, X., Ando, K., McNeil, L., Tingle, M., Wise, P., Maric, I., Sirota, M., Wyss-Coray, T., Winn, V. D., Druzin, M. L., Gibbs, R., Darmstadt, G. L., Lewis, D. B., Partovi Nia, V., Agard, B., Tibshirani, R., Nolan, G., Snyder, M. P., Relman, D. A., Quake, S. R., Shaw, G. M., Stevenson, D. K., Angst, M. S., Gaudilliere, B., Aghaeepour, N. 2019; 35 (1): 95–103

    Abstract

    Motivation: Multiple biological clocks govern a healthy pregnancy. These biological mechanisms produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy provides the frameworks for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia.Results: We performed a multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The datasets included measurements from the immunome, transcriptome, microbiome, proteome and metabolome of samples obtained simultaneously from the same patients. Multivariate predictive modeling using the Elastic Net (EN) algorithm was used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets were combined into a single model. This model not only significantly increased predictive power by combining all datasets, but also revealed novel interactions between different biological modalities. Future work includes expansion of the cohort to preterm-enriched populations and in vivo analysis of immune-modulating interventions based on the mechanisms identified.Availability and implementation: Datasets and scripts for reproduction of results are available through: https://nalab.stanford.edu/multiomics-pregnancy/.Supplementary information: Supplementary data are available at Bioinformatics online.

    View details for PubMedID 30561547

  • A machine-compiled database of genome-wide association studies. Nature communications Kuleshov, V., Ding, J., Vo, C., Hancock, B., Ratner, A., Li, Y., Ré, C., Batzoglou, S., Snyder, M. 2019; 10 (1): 3341

    Abstract

    Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.

    View details for DOI 10.1038/s41467-019-11026-x

    View details for PubMedID 31350405

  • Mitigation of off-target toxicity in CRISPR-Cas9 screens for essential non-coding elements. Nature communications Tycko, J., Wainberg, M., Marinov, G. K., Ursu, O., Hess, G. T., Ego, B. K., Li, A., Truong, A., Trevino, A. E., Spees, K., Yao, D., Kaplow, I. M., Greenside, P. G., Morgens, D. W., Phanstiel, D. H., Snyder, M. P., Bintu, L., Greenleaf, W. J., Kundaje, A., Bassik, M. C. 2019; 10 (1): 4063

    Abstract

    Pooled CRISPR-Cas9 screens are a powerful method for functionally characterizing regulatory elements in the non-coding genome, but off-target effects in these experiments have not been systematically evaluated. Here, we investigate Cas9, dCas9, and CRISPRi/a off-target activity in screens for essential regulatory elements. The sgRNAs with the largest effects in genome-scale screens for essential CTCF loop anchors in K562 cells were not single guide RNAs (sgRNAs) that disrupted gene expression near the on-target CTCF anchor. Rather, these sgRNAs had high off-target activity that, while only weakly correlated with absolute off-target site number, could be predicted by the recently developed GuideScan specificity score. Screens conducted in parallel with CRISPRi/a, which do not induce double-stranded DNA breaks, revealed that a distinct set of off-targets also cause strong confounding fitness effects with these epigenome-editing tools. Promisingly, filtering of CRISPRi libraries using GuideScan specificity scores removed these confounded sgRNAs and enabled identification of essential regulatory elements.

    View details for DOI 10.1038/s41467-019-11955-7

    View details for PubMedID 31492858

  • Understanding health disparities. Journal of perinatology : official journal of the California Perinatal Association Stevenson, D. K., Wong, R. J., Aghaeepour, N., Angst, M. S., Darmstadt, G. L., DiGiulio, D. B., Druzin, M. L., Gaudilliere, B., Gibbs, R. S., B Gould, J., Katz, M., Li, J., Moufarrej, M. N., Quaintance, C. C., Quake, S. R., Relman, D. A., Shaw, G. M., Snyder, M. P., Wang, X., Wise, P. H. 2018

    Abstract

    Based upon our recent insights into the determinants of preterm birth, which is the leading cause of death in children under five years of age worldwide, we describe potential analytic frameworks that provides both a common understanding and, ultimately the basis for effective, ameliorative action. Our research on preterm birth serves as an example that the framing of any human health condition is a result of complex interactions between the genome and the exposome. New discoveries of the basic biology of pregnancy, such as the complex immunological and signaling processes that dictate the health and length of gestation, have revealed a complexity in the interactions (current and ancestral) between genetic and environmental forces. Understanding of these relationships may help reduce disparities in preterm birth and guide productive research endeavors and ultimately, effective clinical and public health interventions.

    View details for PubMedID 30560947

  • Cross-Platform Comparison of Untargeted and Targeted Lipidomics Approaches on Aging Mouse Plasma. Scientific reports Contrepois, K., Mahmoudi, S., Ubhi, B. K., Papsdorf, K., Hornburg, D., Brunet, A., Snyder, M. 2018; 8 (1): 17747

    Abstract

    Lipidomics - the global assessment of lipids - can be performed using a variety of mass spectrometry (MS)-based approaches. However, choosing the optimal approach in terms of lipid coverage, robustness and throughput can be a challenging task. Here, we compare a novel targeted quantitative lipidomics platform known as the Lipidyzer to a conventional untargeted liquid chromatography (LC)-MS approach. We find that both platforms are efficient in profiling more than 300 lipids across 11 lipid classes in mouse plasma with precision and accuracy below 20% for most lipids. While the untargeted and targeted platforms detect similar numbers of lipids, the former identifies a broader range of lipid classes and can unambiguously identify all three fatty acids in triacylglycerols (TAG). Quantitative measurements from both approaches exhibit a median correlation coefficient (r) of 0.99 using a dilution series of deuterated internal standards and 0.71 using endogenous plasma lipids in the context of aging. Application of both platforms to plasma from aging mouse reveals similar changes in total lipid levels across all major lipid classes and in specific lipid species. Interestingly, TAG is the lipid class that exhibits the most changes with age, suggesting that TAG metabolism is particularly sensitive to the aging process in mice. Collectively, our data show that the Lipidyzer platform provides comprehensive profiling of the most prevalent lipids in plasma in a simple and automated manner.

    View details for PubMedID 30532037

  • Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project JOURNAL OF PROTEOME RESEARCH Omenn, G. S., Lane, L., Overall, C. M., Corrales, F. J., Schwenk, J. M., Paik, Y., Van Eyk, J. E., Liu, S., Snyder, M., Baker, M. S., Deutsch, E. W. 2018; 17 (12): 4031–41

    Abstract

    The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2018-01-17, the baseline for this sixth annual HPP special issue of the Journal of Proteome Research, contains 17 470 PE1 proteins, 89% of all neXtProt predicted PE1-4 proteins, up from 17 008 in release 2017-01-23 and 13 975 in release 2012-02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16 092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15 798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.

    View details for PubMedID 30099871

  • Transcriptomic study of Herpes simplex virus type-1 using full-length sequencing techniques. Scientific data Boldogkoi, Z., Szucs, A., Balazs, Z., Sharon, D., Snyder, M., Tombacz, D. 2018; 5: 180266

    Abstract

    Herpes simplex virus type-1 (HSV-1) is a human pathogenic member of the Alphaherpesvirinae subfamily of herpesviruses. The HSV-1 genome is a large double-stranded DNA specifying about 85 protein coding genes. The latest surveys have demonstrated that the HSV-1 transcriptome is much more complex than it had been thought before. Here, we provide a long-read sequencing dataset, which was generated by using the RSII and Sequel systems from Pacific Biosciences (PacBio), as well as MinION sequencing system from Oxford Nanopore Technologies (ONT). This dataset contains 39,096 reads of inserts (ROIs) mapped to the HSV-1 genome (X14112) in RSII sequencing, while Sequel sequencing yielded 77,851 ROIs. The MinION cDNA sequencing altogether resulted in 158,653 reads, while the direct RNA-seq produced 16,516 reads. This dataset can be utilized for the identification of novel HSV RNAs and transcripts isoforms, as well as for the comparison of the quality and length of the sequencing reads derived from the currently available long-read sequencing platforms. The various library preparation approaches can also be compared with each other.

    View details for PubMedID 30480662

  • Transcriptomic study of Herpes simplex virus type-1 using full-length sequencing techniques SCIENTIFIC DATA Boldogkoi, Z., Szucs, A., Balazs, Z., Sharon, D., Snyder, M., Tombacz, D. 2018; 5
  • Macrophage de novo NAD+ synthesis specifies immune function in aging and inflammation. Nature immunology Minhas, P. S., Liu, L., Moon, P. K., Joshi, A. U., Dove, C., Mhatre, S., Contrepois, K., Wang, Q., Lee, B. A., Coronado, M., Bernstein, D., Snyder, M. P., Migaud, M., Majeti, R., Mochly-Rosen, D., Rabinowitz, J. D., Andreasson, K. I. 2018

    Abstract

    Recent advances highlight a pivotal role for cellular metabolism in programming immune responses. Here, we demonstrate that cell-autonomous generation of nicotinamide adenine dinucleotide (NAD+) via the kynurenine pathway (KP) regulates macrophage immune function in aging and inflammation. Isotope tracer studies revealed that macrophage NAD+ derives substantially from KP metabolism of tryptophan. Genetic or pharmacological blockade of de novo NAD+ synthesis depleted NAD+, suppressed mitochondrial NAD+-dependent signaling and respiration, and impaired phagocytosis and resolution of inflammation. Innate immune challenge triggered upstream KP activation but paradoxically suppressed cell-autonomous NAD+ synthesis by limiting the conversion of downstream quinolinate to NAD+, a profile recapitulated in aging macrophages. Increasing de novo NAD+ generation in immune-challenged or aged macrophages restored oxidative phosphorylation and homeostatic immune responses. Thus, KP-derived NAD+ operates as a metabolic switch to specify macrophage effector responses. Breakdown of de novo NAD+ synthesis may underlie declining NAD+ levels and rising innate immune dysfunction in aging and age-associated diseases.

    View details for PubMedID 30478397

  • Dynamic Transcriptome Profiling Dataset of Vaccinia Virus Obtained from Long-read Sequencing Techniques. GigaScience Tombacz, D., Prazsak, I., Szucs, A., Denes, B., Snyder, M., Boldogkoi, Z. 2018

    Abstract

    Background: Poxviruses are large DNA viruses infecting humans and animals. Vaccinia virus (VACV) has been applied as a live vaccine for immunization against smallpox, which was eradicated by 1980 as a result of worldwide vaccination. VACV is the prototype of poxviruses in the investigation of the molecular pathogenesis of the virus. Short-read sequencing methods have revolutionized transcriptomics; but, they are not efficient in distinguishing between the RNA isoforms and transcript overlaps. Long-read sequencing (LRS) is much better suited to solve these problems, and also allow direct RNA sequencing. Despite the scientific relevance of VACV, no LRS data have been generated for the viral transcriptome so far.Findings: For the deep characterization of the VACV RNA profile, various LRS platforms and library preparation approaches were applied. The raw reads were mapped to the VACV reference genome and also to the host (Chlorocebus sabaeus) genome. In this study, we applied the Pacific Biosciences RSII and Sequel platforms, which altogether resulted in 937,531 mapped reads of inserts (1.42 Gb), while we obtained 2,160,348 aligned reads (1.75 Gb) from the different library preparation methods, using the MinION device from Oxford Nanopore Technologies.Conclusions: By applying cutting-edge technologies, we were able to generate a large dataset that can serve as a valuable resource for the investigation of the dynamic VACV transcriptome, the virus-host interactions and RNA base modifications. These data can provide useful information for novel gene annotations in the VACV genome. Our dataset can also be applied for analyzing the currently available LRS platforms, library preparation methods and bioinformatics pipelines.

    View details for PubMedID 30476066

  • Smooth Muscle Contact Drives Endothelial Regeneration by BMPR2-Notch1 Mediated Metabolic and Epigenetic Changes. Circulation research Miyagawa, K., Shi, M., Chen, P., Hennigs, J. K., Zhao, Z., Wang, M., Li, C. G., Saito, T., Taylor, S., Sa, S., Cao, A., Wang, L., Snyder, M. P., Rabinovitch, M. 2018

    Abstract

    RATIONALE: Maintaining endothelial cells (EC) as a monolayer in the vessel wall depends on their metabolic state and gene expression profile, features influenced by contact with neighboring cells such as pericytes and smooth muscle cells (SMC). Failure to regenerate a normal EC monolayer in response to injury can result in occlusive neointima formation in diseases such as atherosclerosis and pulmonary arterial hypertension.OBJECTIVE: We investigated the nature and functional importance of contact-dependent communication between SMC and EC to maintain EC integrity.METHODS AND RESULTS: We found that in SMC and EC contact co-cultures, bone morphogenetic protein receptor 2 (BMPR2) is required by both cell types to produce collagen IV to activate integrin-linked kinase. This enzyme directs phospho c-Jun N-terminal kinase (p-JNK) to the EC membrane, where it stabilizes presenilin1 and releases Notch1 intracellular domain (N1ICD) to promote EC proliferation. This response is necessary for EC regeneration following carotid artery injury. It is deficient in EC-SMC Bmpr2 double heterozygous mice in association with reduced collagen IV production, decreased N1ICD and attenuated EC proliferation, but can be rescued by targeting N1ICD to EC. Deletion of EC- Notch1 in transgenic mice worsens hypoxia-induced pulmonary hypertension, in association with impaired EC regenerative function associated with loss of pre-capillary arteries. We further determined that N1ICD maintains EC proliferative capacity by increasing mitochondrial mass and by inducing the phosphofructokinase PFKFB3. ChIP-seq analyses showed that PFKFB3 is required for citrate-dependent histone acetylation (H3K27) at enhancer sites of genes regulated by the acetyl transferase p300, and by N1ICD or the N1ICD target MYC and necessary for EC proliferation and homeostasis.CONCLUSIONS: Thus, SMC-EC contact is required for activation of Notch1 by BMPR2, to coordinate metabolism with chromatin remodeling of genes that enable EC regeneration, to maintain monolayer integrity and vascular homeostasis in response to injury.

    View details for PubMedID 30582451

  • Identification of phagocytosis regulators using magnetic genome-wide CRISPR screens. Nature genetics Haney, M. S., Bohlen, C. J., Morgens, D. W., Ousey, J. A., Barkal, A. A., Tsui, C. K., Ego, B. K., Levin, R., Kamber, R. A., Collins, H., Tucker, A., Li, A., Vorselen, D., Labitigan, L., Crane, E., Boyle, E., Jiang, L., Chan, J., Rincon, E., Greenleaf, W. J., Li, B., Snyder, M. P., Weissman, I. L., Theriot, J. A., Collins, S. R., Barres, B. A., Bassik, M. C. 2018

    Abstract

    Phagocytosis is required for a broad range of physiological functions, from pathogen defense to tissue homeostasis, but the mechanisms required for phagocytosis of diverse substrates remain incompletely understood. Here, we developed a rapid magnet-based phenotypic screening strategy, and performed eight genome-wide CRISPR screens in human cells to identify genes regulating phagocytosis of distinct substrates. After validating select hits in focused miniscreens, orthogonal assays and primary human macrophages, we show that (1) the previously uncharacterized gene NHLRC2 is a central player in phagocytosis, regulating RhoA-Rac1 signaling cascades that control actin polymerization and filopodia formation, (2) very-long-chain fatty acids are essential for efficient phagocytosis of certain substrates and (3) the previously uncharacterized Alzheimer's disease-associated gene TM2D3 can preferentially influence uptake of amyloid-beta aggregates. These findings illuminate new regulators and core principles of phagocytosis, and more generally establish an efficient method for unbiased identification of cellular uptake mechanisms across diverse physiological and pathological contexts.

    View details for PubMedID 30397336

  • Systematic Screening For Environmental And Behavioral Determinants Identifies Factors Detrimental to Skeletal Health Oei, L., Wu, J., Oei, E., Rivadeneira, F., Uitterlinden, A., Ioannidis, J., Snyder, M., Patel, C. WILEY. 2018: 279
  • Evaluation of whole exome sequencing as an alternative to BeadChip and whole genome sequencing in human population genetic analysis. BMC genomics Maroti, Z., Boldogkoi, Z., Tombacz, D., Snyder, M., Kalmar, T. 2018; 19 (1): 778

    Abstract

    BACKGROUND: Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which remains costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs).RESULTS: To unbiasedly compare the effect of SNP selection strategies in population genetic analysis we subsampled the variants of the same highly curated 1K Genome dataset to mimic genome, exome sequencing and array data in order to eliminate the effect of different chemistry and error profiles of these different approaches. Next we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods.CONCLUSIONS: Our results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.

    View details for PubMedID 30373510

  • Precision Medicine: Role of Proteomics in Changing Clinical Management and Care. Journal of proteome research Van Eyk, J. E., Snyder, M. P. 2018

    Abstract

    It is now possible to collect large sums of health-related data which has the potential to transform healthcare. Proteomics, with its central position as downstream of genetics and epigenetic inputs and upstream of biochemical outputs and integrators of environmental signals, is well-positioned to contribute to health discoveries and management. We present our perspective on the role of proteomics and other Omics in precision health and medicine.

    View details for PubMedID 30296097

  • Wearables and the medical revolution. Personalized medicine Dunn, J., Runge, R., Snyder, M. 2018

    Abstract

    Wearable sensors are already impacting healthcare and medicine by enabling health monitoring outside of the clinic and prediction of health events. This paper reviews current and prospective wearable technologies and their progress toward clinical application. We describe technologies underlying common, commercially available wearable sensors and early-stage devices and outline research, when available, to support the use of these devices in healthcare. We cover applications in the following health areas: metabolic, cardiovascular and gastrointestinal monitoring; sleep, neurology, movement disorders and mental health; maternal, pre- and neo-natal care; and pulmonary health and environmental exposures. Finally, we discuss challenges associated with the adoption of wearable sensors in the current healthcare ecosystem and discuss areas for future research and development.

    View details for PubMedID 30259801

  • Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome FRONTIERS IN GENETICS Balazs, Z., Tombacz, D., Szucs, A., Snyder, M., Boldogkoi, Z. 2018; 9
  • Disruption of mesoderm formation during cardiac differentiation due to developmental exposure to 13-cis-retinoic acid. Scientific reports Liu, Q., Van Bortle, K., Zhang, Y., Zhao, M., Zhang, J. Z., Geller, B. S., Gruber, J. J., Jiang, C., Wu, J. C., Snyder, M. P. 2018; 8 (1): 12960

    Abstract

    13-cis-retinoic acid (isotretinoin, INN) is an oral pharmaceutical drug used for the treatment of skin acne, and is also a known teratogen. In this study, the molecular mechanisms underlying INN-induced developmental toxicity during early cardiac differentiation were investigated using both human induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs). Pre-exposure of hiPSCs and hESCs to a sublethal concentration of INN did not influence cell proliferation and pluripotency. However, mesodermal differentiation was disrupted when INN was included in the medium during differentiation. Transcriptomic profiling by RNA-seq revealed that INN exposure leads to aberrant expression of genes involved in several signaling pathways that control early mesoderm differentiation, such as TGF-beta signaling. In addition, genome-wide chromatin accessibility profiling by ATAC-seq suggested that INN-exposure leads to enhanced DNA-binding of specific transcription factors (TFs), including HNF1B, SOX10 and NFIC, often in close spatial proximity to genes that are dysregulated in response to INN treatment. Altogether, these results identify potential molecular mechanisms underlying INN-induced perturbation during mesodermal differentiation in the context of cardiac development. This study further highlights the utility of human stem cells as an alternative system for investigating congenital diseases of newborns that arise as a result of maternal drug exposure during pregnancy.

    View details for PubMedID 30154523

  • A Cloud-Based Metabolite and Chemical Prioritization System for the Biology/Disease-Driven Human Proteome Project. Journal of proteome research Yu, K., Lee, T. M., Chen, Y., Re, C., Kou, S. C., Chiang, J., Snyder, M., Kohane, I. S. 2018

    Abstract

    Targeted metabolomics and biochemical studies complement the ongoing investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven Human Proteome Project (B/D-HPP). However, it is challenging to identify and prioritize metabolite and chemical targets. Literature-mining-based approaches have been proposed for target proteomics studies, but text mining methods for metabolite and chemical prioritization are hindered by a large number of synonyms and nonstandardized names of each entity. In this study, we developed a cloud-based literature mining and summarization platform that maps metabolites and chemicals in the literature to unique identifiers and summarizes the copublication trends of metabolites/chemicals and B/D-HPP topics using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores. We successfully prioritized metabolites and chemicals associated with the B/D-HPP targeted fields and validated the results by checking against expert-curated associations and enrichment analyses. Compared with existing algorithms, our system achieved better precision and recall in retrieving chemicals related to B/D-HPP focused areas. Our cloud-based platform enables queries on all biological terms in multiple species, which will contribute to B/D-HPP and targeted metabolomics/chemical studies.

    View details for PubMedID 30094994

  • Long-Read Sequencing Revealed an Extensive Transcript Complexity in Herpesviruses FRONTIERS IN GENETICS Tombacz, D., Balazs, Z., Csabai, Z., Snyder, M., Boldogkoi, Z. 2018; 9
  • An integrated global regulatory network of hematopoietic precursor cell self-renewal and differentiation INTEGRATIVE BIOLOGY You, Y., Duran, R., Jiang, L., Dong, X., Zong, S., Snyder, M., Wu, J. 2018; 10 (7): 390–405

    Abstract

    Systematic study of the regulatory mechanisms of Hematopoietic Stem Cell and Progenitor Cell (HSPC) self-renewal is fundamentally important for understanding hematopoiesis and for manipulating HSPCs for therapeutic purposes. Previously, we have characterized gene expression and identified important transcription factors (TFs) regulating the switch between self-renewal and differentiation in a multipotent Hematopoietic Progenitor Cell (HPC) line, EML (Erythroid, Myeloid, and Lymphoid) cells. Herein, we report binding maps for additional TFs (SOX4 and STAT3) by using chromatin immunoprecipitation (ChIP)-Sequencing, to address the underlying mechanisms regulating self-renewal properties of lineage-CD34+ subpopulation (Lin-CD34+ EML cells). Furthermore, we applied the Assay for Transposase Accessible Chromatin (ATAC)-Sequencing to globally identify the open chromatin regions associated with TF binding in the self-renewing Lin-CD34+ EML cells. Mass spectrometry (MS) was also used to quantify protein relative expression levels. Finally, by integrating the protein-protein interaction database, we built an expanded transcriptional regulatory and interaction network. We found that MAPK (Mitogen-activated protein kinase) pathway and TGF-β/SMAD signaling pathway components were highly enriched among the binding targets of these TFs in Lin-CD34+ EML cells. The present study integrates regulatory information at multiple levels to paint a more comprehensive picture of the HSPC self-renewal mechanisms.

    View details for PubMedID 29892750

  • High Throughput Sequencing and Assessing Disease Risk. Cold Spring Harbor perspectives in medicine Rego, S. M., Snyder, M. P. 2018

    Abstract

    High-throughput sequencing has dramatically improved our ability to determine and diagnose the underlying causes of human disease. The use of whole-genome and whole-exome sequencing has facilitated faster and more cost-effective identification of new genes implicated in Mendelian disease. It has also improved our ability to identify disease-causing mutations for Mendelian diseases whose associated genes are already known. These benefits apply not only in cases in which the objective is to assess genetic disease risk in adults and children, but also for prenatal genetic testing and embryonic testing. High-throughput sequencing has also impacted our ability to assess risk for complex diseases and will likely continue to influence this area of disease research as more and more individuals undergo sequencing and we better understand the significance of variation, both rare and common, across the genome. Through these activities, high-throughput sequencing has the potential to revolutionize medicine.

    View details for PubMedID 29959131

  • Transcriptome-wide survey of pseudorabies virus using next- and third-generation sequencing platforms SCIENTIFIC DATA Tombacz, D., Sharon, D., Szucs, A., Moldovan, N., Snyder, M., Boldogkoi, Z. 2018; 5: 180119

    Abstract

    Pseudorabies virus (PRV) is an alphaherpesvirus of swine. PRV has a large double-stranded DNA genome and, as the latest investigations have revealed, a very complex transcriptome. Here, we present a large RNA-Seq dataset, derived from both short- and long-read sequencing. The dataset contains 1.3 million 100 bp paired-end reads that were obtained from the Illumina random-primed libraries, as well as 10 million 50 bp single-end reads generated by the Illumina polyA-seq. The Pacific Biosciences RSII non-amplified method yielded 57,021 reads of inserts (ROIs) aligned to the viral genome, the amplified method resulted in 158,396 PRV-specific ROIs, while we obtained 12,555 ROIs using the Sequel platform. The Oxford Nanopore's MinION device generated 44,006 reads using their regular cDNA-sequencing method, whereas 29,832 and 120,394 reads were produced by using the direct RNA-sequencing and the Cap-selection protocols, respectively. The raw reads were aligned to the PRV reference genome (KJ717942.1). Our provided dataset can be used to compare different sequencing approaches, library preparation methods, as well as for validation and testing bioinformatic pipelines.

    View details for PubMedID 29917014

  • Integrative omics for health and disease NATURE REVIEWS GENETICS Karczewski, K. J., Snyder, M. P. 2018; 19 (5): 299–310

    Abstract

    Advances in omics technologies - such as genomics, transcriptomics, proteomics and metabolomics - have begun to enable personalized medicine at an extraordinarily detailed molecular level. Individually, these technologies have contributed medical advances that have begun to enter clinical practice. However, each technology individually cannot capture the entire biological complexity of most human diseases. Integration of multiple technologies has emerged as an approach to provide a more comprehensive view of biology and disease. In this Review, we discuss the potential for combining diverse types of data and the utility of this approach in human health and disease. We provide examples of data integration to understand, diagnose and inform treatment of diseases, including rare and common diseases as well as cancer and transplant biology. Finally, we discuss technical and other challenges to clinical implementation of integrative omics.

    View details for PubMedID 29479082

    View details for PubMedCentralID PMC5990367

  • Personal Omics for Precision Health CIRCULATION RESEARCH Kellogg, R. A., Dunn, J., Snyder, M. P. 2018; 122 (9): 1169–71

    View details for PubMedID 29700064

  • Fast Metagenomic Binning via Hashing and Bayesian Clustering JOURNAL OF COMPUTATIONAL BIOLOGY Popic, V., Kuleshov, V., Snyder, M., Batzoglou, S. 2018

    Abstract

    We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks. It also provides a way to index metagenomic samples (e.g., from public repositories such as the Human Microbiome Project) offline once and reuse them across experiments; furthermore, the small size of the sample indices allows them to be easily transferred and stored. Leveraging the MinHash technique, GATTACA also provides an efficient way to identify publicly available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly available metagenomic data sets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.

    View details for PubMedID 29658784

  • A global transcriptional network connecting noncoding mutations to changes in tumor gene expression NATURE GENETICS Zhang, W., Bojorquez-Gomez, A., Velez, D., Xu, G., Sanchez, K. S., Shen, J., Chen, K., Licon, K., Melton, C., Olson, K. M., Yu, M., Huang, J. K., Carter, H., Farley, E. K., Snyder, M., Fraley, S. I., Kreisberg, J. F., Ideker, T. 2018; 50 (4): 613-+

    Abstract

    Although cancer genomes are replete with noncoding mutations, the effects of these mutations remain poorly characterized. Here we perform an integrative analysis of 930 tumor whole genomes and matched transcriptomes, identifying a network of 193 noncoding loci in which mutations disrupt target gene expression. These 'somatic eQTLs' (expression quantitative trait loci) are frequently mutated in specific cancer tissues, and the majority can be validated in an independent cohort of 3,382 tumors. Among these, we find that the effects of noncoding mutations on DAAM1, MTG2 and HYI transcription are recapitulated in multiple cancer cell lines and that increasing DAAM1 expression leads to invasive cell migration. Collectively, the noncoding loci converge on a set of core pathways, permitting a classification of tumors into pathway-based subtypes. The somatic eQTL network is disrupted in 88% of tumors, suggesting widespread impact of noncoding mutations in cancer.

    View details for PubMedID 29610481

    View details for PubMedCentralID PMC5893414

  • NF90/ILF3 is a transcription factor that promotes proliferation over differentiation by hierarchical regulation in K562 erythroleukemia cells PLOS ONE Wu, T., Shi, L., Adrian, J., Shi, M., Nair, R. V., Snyder, M. P., Kao, P. N. 2018; 13 (3): e0193126

    Abstract

    NF90 and splice variant NF110 are DNA- and RNA-binding proteins encoded by the Interleukin enhancer-binding factor 3 (ILF3) gene that have been established to regulate RNA splicing, stabilization and export. The roles of NF90 and NF110 in regulating transcription as chromatin-interacting proteins have not been comprehensively characterized. Here, chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) identified 9,081 genomic sites specifically occupied by NF90/NF110 in K562 cells. One third of NF90/NF110 peaks occurred at promoters of annotated genes. NF90/NF110 occupancy colocalized with chromatin marks associated with active promoters and strong enhancers. Comparison with 150 ENCODE ChIP-seq experiments revealed that NF90/NF110 clustered with transcription factors exhibiting preference for promoters over enhancers (POLR2A, MYC, YY1). Differential gene expression analysis following shRNA knockdown of NF90/NF110 in K562 cells revealed that NF90/NF110 activates transcription factors that drive growth and proliferation (EGR1, MYC), while attenuating differentiation along the erythroid lineage (KLF1). NF90/NF110 associates with chromatin to hierarchically regulate transcription factors that promote proliferation and suppress differentiation.

    View details for PubMedID 29590119

  • Circular DNA elements of chromosomal origin are common in healthy human somatic tissue NATURE COMMUNICATIONS Moller, H., Mohiyuddin, M., Prada-Luengo, I., Sailani, M., Halling, J., Plomgaard, P., Maretty, L., Hansen, A., Snyder, M. P., Pilegaard, H., Lam, H. K., Regenberg, B. 2018; 9: 1069

    Abstract

    The human genome is generally organized into stable chromosomes, and only tumor cells are known to accumulate kilobase (kb)-sized extrachromosomal circular DNA elements (eccDNAs). However, it must be expected that kb eccDNAs exist in normal cells as a result of mutations. Here, we purify and sequence eccDNAs from muscle and blood samples from 16 healthy men, detecting ~100,000 unique eccDNA types from 16 million nuclei. Half of these structures carry genes or gene fragments and the majority are smaller than 25 kb. Transcription from eccDNAs suggests that eccDNAs reside in nuclei and recurrence of certain eccDNAs in several individuals implies DNA circularization hotspots. Gene-rich chromosomes contribute to more eccDNAs per megabase and the most transcribed protein-coding gene in muscle, TTN (titin), provides the most eccDNAs per gene. Thus, somatic genomes are rich in chromosome-derived eccDNAs that may influence phenotypes through altered gene copy numbers and transcription of full-length or truncated genes.

    View details for PubMedID 29540679

  • An Integrated Understanding of the Rapid Metabolic Benefits of a Carbohydrate-Restricted Diet on Hepatic Steatosis in Humans CELL METABOLISM Mardinoglu, A., Wu, H., Bjornson, E., Zhang, C., Hakkarainen, A., Rasanen, S. M., Lee, S., Mancina, R. M., Bergentall, M., Pietilainen, K. H., Soderlund, S., Matikainen, N., Stahlman, M., Bergh, P., Adiels, M., Piening, B. D., Graner, M., Lundbom, N., Williams, K. J., Romeo, S., Nielsen, J., Snyder, M., Uhlen, M., Bergstrom, G., Perkins, R., Marschall, H., Backhed, F., Taskinen, M., Boren, J. 2018; 27 (3): 559-+

    Abstract

    A carbohydrate-restricted diet is a widely recommended intervention for non-alcoholic fatty liver disease (NAFLD), but a systematic perspective on the multiple benefits of this diet is lacking. Here, we performed a short-term intervention with an isocaloric low-carbohydrate diet with increased protein content in obese subjects with NAFLD and characterized the resulting alterations in metabolism and the gut microbiota using a multi-omics approach. We observed rapid and dramatic reductions of liver fat and other cardiometabolic risk factors paralleled by (1) marked decreases in hepatic de novo lipogenesis; (2) large increases in serum β-hydroxybutyrate concentrations, reflecting increased mitochondrial β-oxidation; and (3) rapid increases in folate-producing Streptococcus and serum folate concentrations. Liver transcriptomic analysis on biopsy samples from a second cohort revealed downregulation of the fatty acid synthesis pathway and upregulation of folate-mediated one-carbon metabolism and fatty acid oxidation pathways. Our results highlight the potential of exploring diet-microbiota interactions for treating NAFLD.

    View details for PubMedID 29456073

  • Full Genome Sequence of the Western Reserve Strain of Vaccinia Virus Determined by Third-Generation Sequencing MICROBIOLOGY RESOURCE ANNOUNCEMENTS Prazsak, I., Tombacz, D., Szucs, A., Denes, B., Snyder, M., Boldogkoi, Z. 2018; 6 (11)

    Abstract

    The vaccinia virus is a large, complex virus belonging to the Poxviridae family. Here, we report the complete, annotated genome sequence of the neurovirulent Western Reserve laboratory strain of this virus, which was sequenced on the Pacific Biosciences RS II and Oxford Nanopore MinION platforms.

    View details for PubMedID 29545308

  • Applying genomics in heart transplantation TRANSPLANT INTERNATIONAL Keating, B. J., Pereira, A. C., Snyder, M., Piening, B. D. 2018; 31 (3): 278–90

    Abstract

    While advances in patient care and immunosuppressive pharmacotherapies have increased the lifespan of heart allograft recipients, there are still significant comorbidities post-transplantation and 5-year survival rates are still significant, at approximately 70%. The last decade has seen massive strides in genomics and other omics fields, including transcriptomics, with many of these advances now starting to impact heart transplant clinical care. This review summarizes a number of the key advances in genomics which are relevant for heart transplant outcomes, and we highlight the translational potential that such knowledge may bring to patient care within the next decade.

    View details for PubMedID 29363220

    View details for PubMedCentralID PMC5990370

  • Multiplatform next-generation sequencing identifies novel RNA molecules and transcript isoforms of the endogenous retrovirus isolated from cultured cells FEMS MICROBIOLOGY LETTERS Moldovan, N., Szucs, A., Tombacz, D., Balazs, Z., Csabai, Z., Snyder, M., Boldogkoi, Z. 2018; 365 (5)

    Abstract

    In this study, we applied short- and long-read RNA sequencing techniques, as well as PCR analysis to investigate the transcriptome of the porcine endogenous retrovirus (PERV) expressed from cultured porcine kidney cell line PK-15. This analysis has revealed six novel transcripts and eight transcript isoforms, including five length and three splice variants. We were able to establish whether a deletion in a transcript is the result of the splicing of mRNAs or of genomic deletion in one of the PERV clones. Additionally, we re-annotated the formerly identified RNA molecules. Our analysis revealed a higher complexity of PERV transcriptome than it was earlier believed.

    View details for PubMedID 29361122

  • Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder AMERICAN JOURNAL OF HUMAN GENETICS Olahova, M., Yoon, W., Thompson, K., Jangam, S., Fernandez, L., Davidson, J. M., Kyle, J. E., Grove, M. E., Fisk, D. G., Kohler, J. N., Holmes, M., Dries, A. M., Huang, Y., Zhao, C., Contrepois, K., Zappala, Z., Fresard, L., Waggott, D., Zink, E. M., Kim, Y., Heyman, H. M., Stratton, K. G., Webb-Robertson, B. M., Snyder, M., Merker, J. D., Montgomery, S. B., Fisher, P. G., Feichtinger, R. G., Mayr, J. A., Hall, J., Barbosa, I. A., Simpson, M. A., Deshpande, C., Waters, K. M., Koeller, D. M., Metz, T. O., Morris, A. A., Schelley, S., Cowan, T., Friederich, M. W., McFarland, R., Van Hove, J. K., Enns, G. M., Yamamoto, S., Ashley, E. A., Wangler, M. F., Taylor, R. W., Bellen, H. J., Bernstein, J. A., Wheeler, M. T., Undiagnosed Diseases Network 2018; 102 (3): 494–504

    Abstract

    ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.

    View details for PubMedID 29478781

  • Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus FRONTIERS IN MICROBIOLOGY Moldovan, N., Tombacz, D., Szucs, A., Csabai, Z., Snyder, M., Boldogkoi, Z. 2018; 8
  • Omics AnalySIs System for PRecision Oncology (OASISPRO): a web-based omics analysis tool for clinical phenotype prediction BIOINFORMATICS Yu, K., Fitzpatrick, M. R., Pappas, L., Chan, W., Kung, J., Snyder, M. 2018; 34 (2): 319–20

    Abstract

    Precision oncology is an approach that accounts for individual differences to guide cancer management. Omics signatures have been shown to predict clinical traits for cancer patients. However, the vast amount of omics information poses an informatics challenge in systematically identifying patterns associated with health outcomes, and no general-purpose data-mining tool exists for physicians, medical researchers, and citizen scientists without significant training in programming and bioinformatics. To bridge this gap, we built the Omics AnalySIs System for PRecision Oncology (OASISPRO), a web-based system to mine the quantitative omics information from The Cancer Genome Atlas (TCGA). This system effectively visualizes patients' clinical profiles, executes machine-learning algorithms of choice on the omics data, and evaluates the prediction performance using held-out test sets. With this tool, we successfully identified genes strongly associated with tumor stage, and accurately predicted patients' survival outcomes in many cancer types, including mesothelioma and adrenocortical carcinoma. By identifying the links between omics and clinical phenotypes, this system will facilitate omics studies on precision cancer medicine and contribute to establishing personalized cancer treatment plans.This web-based tool is available at http://tinyurl.com/oasispro ;source codes are available at http://tinyurl.com/oasisproSourceCode .

    View details for PubMedID 28968749

    View details for PubMedCentralID PMC5860203

  • How many human proteoforms are there? Nature chemical biology Aebersold, R., Agar, J. N., Amster, I. J., Baker, M. S., Bertozzi, C. R., Boja, E. S., Costello, C. E., Cravatt, B. F., Fenselau, C., Garcia, B. A., Ge, Y., Gunawardena, J., Hendrickson, R. C., Hergenrother, P. J., Huber, C. G., Ivanov, A. R., Jensen, O. N., Jewett, M. C., Kelleher, N. L., Kiessling, L. L., Krogan, N. J., Larsen, M. R., Loo, J. A., Ogorzalek Loo, R. R., Lundberg, E., MacCoss, M. J., Mallick, P., Mootha, V. K., Mrksich, M., Muir, T. W., Patrie, S. M., Pesavento, J. J., Pitteri, S. J., Rodriguez, H., Saghatelian, A., Sandoval, W., Schlüter, H., Sechi, S., Slavoff, S. A., Smith, L. M., Snyder, M. P., Thomas, P. M., Uhlén, M., Van Eyk, J. E., Vidal, M., Walt, D. R., White, F. M., Williams, E. R., Wohlschlager, T., Wysocki, V. H., Yates, N. A., Young, N. L., Zhang, B. 2018; 14 (3): 206–14

    Abstract

    Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.

    View details for PubMedID 29443976

  • Functional regulatory mechanism of smooth muscle cell-restricted LMOD1 coronary artery disease locus. PLoS genetics Nanda, V., Wang, T., Pjanic, M., Liu, B., Nguyen, T., Matic, L. P., Hedin, U., Koplev, S., Ma, L., Franzén, O., Ruusalepp, A., Schadt, E. E., Björkegren, J. L., Montgomery, S. B., Snyder, M. P., Quertermous, T., Leeper, N. J., Miller, C. L. 2018; 14 (11): e1007755

    Abstract

    Recent genome-wide association studies (GWAS) have identified multiple new loci which appear to alter coronary artery disease (CAD) risk via arterial wall-specific mechanisms. One of the annotated genes encodes LMOD1 (Leiomodin 1), a member of the actin filament nucleator family that is highly enriched in smooth muscle-containing tissues such as the artery wall. However, it is still unknown whether LMOD1 is the causal gene at this locus and also how the associated variants alter LMOD1 expression/function and CAD risk. Using epigenomic profiling we recently identified a non-coding regulatory variant, rs34091558, which is in tight linkage disequilibrium (LD) with the lead CAD GWAS variant, rs2820315. Herein we demonstrate through expression quantitative trait loci (eQTL) and statistical fine-mapping in GTEx, STARNET, and human coronary artery smooth muscle cell (HCASMC) datasets, rs34091558 is the top regulatory variant for LMOD1 in vascular tissues. Position weight matrix (PWM) analyses identify the protective allele rs34091558-TA to form a conserved Forkhead box O3 (FOXO3) binding motif, which is disrupted by the risk allele rs34091558-A. FOXO3 chromatin immunoprecipitation and reporter assays show reduced FOXO3 binding and LMOD1 transcriptional activity by the risk allele, consistent with effects of FOXO3 downregulation on LMOD1. LMOD1 knockdown results in increased proliferation and migration and decreased cell contraction in HCASMC, and immunostaining in atherosclerotic lesions in the SMC lineage tracing reporter mouse support a key role for LMOD1 in maintaining the differentiated SMC phenotype. These results provide compelling functional evidence that genetic variation is associated with dysregulated LMOD1 expression/function in SMCs, together contributing to the heritable risk for CAD.

    View details for PubMedID 30444878

  • Distinct transcriptomic and exomic abnormalities within myelodysplastic syndrome marrow cells. Leukemia & lymphoma Im, H., Rao, V., Sridhar, K., Bentley, J., Mishra, T., Chen, R., Hall, J., Graber, A., Zhang, Y., Li, X., Mias, G. I., Snyder, M. P., Greenberg, P. L. 2018: 1–11

    Abstract

    To provide biologic insights into mechanisms underlying myelodysplastic syndromes (MDS) we evaluated the CD34+ marrow cells transcriptome using high-throughput RNA sequencing (RNA-Seq). We demonstrated significant differential gene expression profiles (GEPs) between MDS and normal and identified 41 disease classifier genes. Additionally, two main clusters of GEPs distinguished patients based on their major clinical features, particularly between those whose disease remained stable versus patients who transformed into acute myeloid leukemia within 12 months. The genes whose expression was associated with disease outcome were involved in functional pathways and biologic processes highly relevant for MDS. Combined with exomic analysis we identified differential isoform usage of genes in MDS mutational subgroups, with consequent dysregulation of distinct biologic functions. This combination of clinical, transcriptomic and exomic findings provides valuable understanding of mechanisms underlying MDS and its progression to a more aggressive stage and also facilitates prognostic characterization of MDS patients.

    View details for PubMedID 29616851

  • Long-Read Sequencing Revealed an Extensive Transcript Complexity in Herpesviruses. Frontiers in genetics Tombácz, D., Balázs, Z., Csabai, Z., Snyder, M., Boldogkői, Z. 2018; 9: 259

    Abstract

    Long-read sequencing (LRS) techniques are very recent advancements, but they have already been used for transcriptome research in all of the three subfamilies of herpesviruses. These techniques have multiplied the number of known transcripts in each of the examined viruses. Meanwhile, they have revealed a so far hidden complexity of the herpesvirus transcriptome with the discovery of a large number of novel RNA molecules, including coding and non-coding RNAs, as well as transcript isoforms, and polycistronic RNAs. Additionally, LRS techniques have uncovered an intricate meshwork of transcriptional overlaps between adjacent and distally located genes. Here, we review the contribution of LRS to herpesvirus transcriptomics and present the complexity revealed by this technology, while also discussing the functional significance of this phenomenon.

    View details for PubMedID 30065753

  • Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome. Frontiers in genetics Balázs, Z., Tombácz, D., Szűcs, A., Snyder, M., Boldogkői, Z. 2018; 9: 432

    View details for PubMedID 30319694

  • Distinct Transcriptomic and Exomic Abnormalities Within Myelodysplastic Syndrome Marrow Cells Leukemia & Lymphoma Im, H., Rao, V., Sridhar, K., Bentley, J., Mishra, T., Chen, R., Hall, J., Graber, A., Zhang, Y., Xiao, L., Mias, G., Snyder, M. P., Greenberg, P. L. 2018: 1-11
  • SETD7 Drives Cardiac Lineage Commitment through Stage-Specific Transcriptional Activation. Cell stem cell Lee, J., Shao, N. Y., Paik, D. T., Wu, H., Guo, H., Termglinchan, V., Churko, J. M., Kim, Y., Kitani, T., Zhao, M. T., Zhang, Y., Wilson, K. D., Karakikes, I., Snyder, M. P., Wu, J. C. 2018; 22 (3): 428–44.e5

    Abstract

    Cardiac development requires coordinated and large-scale rearrangements of the epigenome. The roles and precise mechanisms through which specific epigenetic modifying enzymes control cardiac lineage specification, however, remain unclear. Here we show that the H3K4 methyltransferase SETD7 controls cardiac differentiation by reading H3K36 marks independently of its enzymatic activity. Through chromatin immunoprecipitation sequencing (ChIP-seq), we found that SETD7 targets distinct sets of genes to drive their stage-specific expression during cardiomyocyte differentiation. SETD7 associates with different co-factors at these stages, including SWI/SNF chromatin-remodeling factors during mesodermal formation and the transcription factor NKX2.5 in cardiac progenitors to drive their differentiation. Further analyses revealed that SETD7 binds methylated H3K36 in the bodies of its target genes to facilitate RNA polymerase II (Pol II)-dependent transcription. Moreover, abnormal SETD7 expression impairs functional attributes of terminally differentiated cardiomyocytes. Together, these results reveal how SETD7 acts at sequential steps in cardiac lineage commitment, and they provide insights into crosstalk between dynamic epigenetic marks and chromatin-modifying enzymes.

    View details for PubMedID 29499155

  • Value of Circulating Cytokine Profiling During Submaximal Exercise Testing in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Scientific reports Moneghetti, K. J., Skhiri, M., Contrepois, K., Kobayashi, Y., Maecker, H., Davis, M., Snyder, M., Haddad, F., Montoya, J. G. 2018; 8 (1): 2779

    Abstract

    Myalgic Encephalomyelitis or Chronic Fatigue Syndrome (ME/CFS) is a heterogeneous syndrome in which patients often experience severe fatigue and malaise following exertion. Immune and cardiovascular dysfunction have been postulated to play a role in the pathophysiology. We therefore, examined whether cytokine profiling or cardiovascular testing following exercise would differentiate patients with ME/CFS. Twenty-four ME/CFS patients were matched to 24 sedentary controls and underwent cardiovascular and circulating immune profiling. Cardiovascular analysis included echocardiography, cardiopulmonary exercise and endothelial function testing. Cytokine and growth factor profiles were analyzed using a 51-plex Luminex bead kit at baseline and 18 hours following exercise. Cardiac structure and exercise capacity were similar between groups. Sparse partial least square discriminant analyses of cytokine profiles 18 hours post exercise offered the most reliable discrimination between ME/CFS and controls (κ = 0.62(0.34,0.84)). The most discriminatory cytokines post exercise were CD40L, platelet activator inhibitor, interleukin 1-β, interferon-α and CXCL1. In conclusion, cytokine profiling following exercise may help differentiate patients with ME/CFS from sedentary controls.

    View details for PubMedID 29426834

  • Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform SCIENTIFIC DATA Balazs, Z., Tombacz, D., Szucs, A., Snyder, M., Boldogkoi, Z. 2017; 4: 170194

    Abstract

    Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

    View details for PubMedID 29257134

  • Challenges and recommendations for epigenomics in precision health NATURE BIOTECHNOLOGY Carter, A. C., Chang, H. Y., Church, G., Dombkowski, A., Ecker, J. R., Gil, E., Giresi, P. G., Greely, H., Greenleaf, W. J., Hacohen, N., He, C., Hill, D., Ko, J., Kohane, I., Kundaje, A., Palmer, M., Snyder, M. P., Tung, J., Urban, A., Vidal, M., Wong, W. 2017; 35 (12): 1128–32

    View details for PubMedID 29220033

  • Long-Read Sequencing of Human Cytomegalovirus Transcriptome Reveals RNA Isoforms Carrying Distinct Coding Potentials SCIENTIFIC REPORTS Balazs, Z., Tombacz, D., Szucs, A., Csabai, Z., Megyeri, K., Petrov, A. N., Snyder, M., Boldogkoi, Z. 2017; 7: 15989

    Abstract

    The human cytomegalovirus (HCMV) is a ubiquitous, human pathogenic herpesvirus. The complete viral genome is transcriptionally active during infection; however, a large part of its transcriptome has yet to be annotated. In this work, we applied the amplified isoform sequencing technique from Pacific Biosciences to characterize the lytic transcriptome of HCMV strain Towne varS. We developed a pipeline for transcript annotation using long-read sequencing data. We identified 248 transcriptional start sites, 116 transcriptional termination sites and 80 splicing events. Using this information, we have annotated 291 previously undescribed or only partially annotated transcript isoforms, including eight novel antisense transcripts and their isoforms, as well as a novel transcript (RS2) in the short repeat region, partially antisense to RS1. Similarly to other organisms, we discovered a high transcriptional diversity in HCMV, with many transcripts only slightly differing from one another. Comparing our transcriptome profiling results to an earlier ribosome footprint analysis, we have concluded that the majority of the transcripts contain multiple translationally active ORFs, and also that most isoforms contain unique combinations of ORFs. Based on these results, we propose that one important function of this transcriptional diversity may be to provide a regulatory mechanism at the level of translation.

    View details for PubMedID 29167532

  • Transcriptomic and epigenomic differences in human induced pluripotent stem cells generated from six reprogramming methods NATURE BIOMEDICAL ENGINEERING Churko, J. M., Lee, J., Ameen, M., Gu, M., Venkatasubramanian, M., Diecke, S., Sallam, K., Im, H., Wang, G., Gold, J. D., Salomonis, N., Snyder, M. P., Wu, J. C. 2017; 1 (10): 826–37
  • Long-Read Sequencing Reveals a GC Pressure during the Evolution of Porcine Endogenous Retrovirus MICROBIOLOGY RESOURCE ANNOUNCEMENTS Szucs, A., Moldovan, N., Tombacz, D., Csabai, Z., Snyder, M., Boldogkoi, Z. 2017; 5 (40)

    Abstract

    Here, we present the complete genome sequence of a porcine endogenous retrovirus determined by Pacific Biosciences sequencing. A comparison of the genome of this isolate with those of other strains revealed the operation of a mechanism resulting in the selective accumulation of G and C bases in the viral DNA.

    View details for PubMedID 28982996

  • Novel nonsense gain-of-function NFKB2 mutations associated with a combined immunodeficiency phenotype BLOOD Kuehn, H., Niemela, J. E., Sreedhara, K., Stoddard, J. L., Grossman, J., Wysocki, C. A., de la Morena, M., Garofalo, M., Inlora, J., Snyder, M. P., Lewis, D. B., Stratakis, C. A., Fleisher, T. A., Rosenzweig, S. D. 2017; 130 (13): 1553–64

    Abstract

    NF-κB signaling through its NFKB1-dependent canonical and NFKB2-dependent noncanonical pathways plays distinctive roles in a diverse range of immune processes. Recently, mutations in these 2 genes have been associated with common variable immunodeficiency (CVID). While studying patients with genetically uncharacterized primary immunodeficiencies, we detected 2 novel nonsense gain-of-function (GOF) NFKB2 mutations (E418X and R635X) in 3 patients from 2 families, and a novel missense change (S866R) in another patient. Their immunophenotype was assessed by flow cytometry and protein expression; activation of canonical and noncanonical pathways was examined in peripheral blood mononuclear cells and transfected HEK293T cells through immunoblotting, immunohistochemistry, luciferase activity, real-time polymerase chain reaction, and multiplex assays. The S866R change disrupted a C-terminal NF-κΒ2 critical site affecting protein phosphorylation and nuclear translocation, resulting in CVID with adrenocorticotropic hormone deficiency, growth hormone deficiency, and mild ectodermal dysplasia as previously described. In contrast, the nonsense mutations E418X and R635X observed in 3 patients led to constitutive nuclear localization and activation of both canonical and noncanonical NF-κΒ pathways, resulting in a combined immunodeficiency (CID) without endocrine or ectodermal manifestations. These changes were also found in 2 asymptomatic relatives. Thus, these novel NFKB2 GOF mutations produce a nonfully penetrant CID phenotype through a different pathophysiologic mechanism than previously described for mutations in NFKB2.

    View details for PubMedID 28778864

    View details for PubMedCentralID PMC5620416

  • Evaluation of the impact of ul54 gene-deletion on the global transcription and DNA replication of pseudorabies virus ARCHIVES OF VIROLOGY Csabai, Z., Takacs, I. F., Snyder, M., Boldogkoi, Z., Tombacz, D. 2017; 162 (9): 2679–94

    Abstract

    Pseudorabies virus (PRV) is an animal alphaherpesvirus with a wide host range. PRV has 67 protein-coding genes and several non-coding RNA molecules, which can be classified into three temporal groups, immediate early, early and late classes. The ul54 gene of PRV and its homolog icp27 of herpes simplex virus have a multitude of functions, including the regulation of viral DNA synthesis and the control of the gene expression. Therefore, abrogation of PRV ul54 function was expected to exert a significant effect on the global transcriptome and on DNA replication. Real-time PCR and real-time RT-PCR platforms were used to investigate these presumed effects. Our analyses revealed a drastic impact of the ul54 mutation on the genome-wide expression of PRV genes, especially on the transcription of the true late genes. A more than two hour delay was observed in the onset of DNA replication, and the amount of synthesized DNA molecules was significantly decreased in comparison to the wild-type virus. Furthermore, in this work, we were able to successfully demonstrate the utility of long-read SMRT sequencing for genotyping of mutant viruses.

    View details for PubMedID 28577213

    View details for PubMedCentralID PMC5927779

  • High-Coverage Whole-Exome Sequencing Identifies Candidate Genes for Suicide in Victims with Major Depressive Disorder SCIENTIFIC REPORTS Tombacz, D., Maroti, Z., Kalmar, T., Csabai, Z., Balazs, Z., Takahashi, S., Palkovits, M., Snyder, M., Boldogkoi, Z. 2017; 7: 7106

    Abstract

    We carried out whole-exome ultra-high throughput sequencing in brain samples of suicide victims who had suffered from major depressive disorder and control subjects who had died from other causes. This study aimed to reveal the selective accumulation of rare variants in the coding and the UTR sequences within the genes of suicide victims. We also analysed the potential effect of STR and CNV variations, as well as the infection of the brain with neurovirulent viruses in this behavioural disorder. As a result, we have identified several candidate genes, among others three calcium channel genes that may potentially contribute to completed suicide. We also explored the potential implication of the TGF-β signalling pathway in the pathogenesis of suicidal behaviour. To our best knowledge, this is the first study that uses whole-exome sequencing for the investigation of suicide.

    View details for PubMedID 28769055

  • Network analyses identify liver-specific targets for treating liver diseases MOLECULAR SYSTEMS BIOLOGY Lee, S., Zhang, C., Liu, Z., Klevstig, M., Mukhopadhyay, B., Bergentall, M., Cinar, R., Stahlman, M., Sikanic, N., Park, J. K., Deshmukh, S., Harzandi, A. M., Kuijpers, T., Grotli, M., Elsasser, S. J., Piening, B. D., Snyder, M., Smith, U., Nielsen, J., Backhed, F., Kunos, G., Uhlen, M., Boren, J., Mardinoglu, A. 2017; 13 (8): 938

    Abstract

    We performed integrative network analyses to identify targets that can be used for effectively treating liver diseases with minimal side effects. We first generated co-expression networks (CNs) for 46 human tissues and liver cancer to explore the functional relationships between genes and examined the overlap between functional and physical interactions. Since increased de novo lipogenesis is a characteristic of nonalcoholic fatty liver disease (NAFLD) and hepatocellular carcinoma (HCC), we investigated the liver-specific genes co-expressed with fatty acid synthase (FASN). CN analyses predicted that inhibition of these liver-specific genes decreases FASN expression. Experiments in human cancer cell lines, mouse liver samples, and primary human hepatocytes validated our predictions by demonstrating functional relationships between these liver genes, and showing that their inhibition decreases cell growth and liver fat content. In conclusion, we identified liver-specific genes linked to NAFLD pathogenesis, such as pyruvate kinase liver and red blood cell (PKLR), or to HCC pathogenesis, such as PKLR, patatin-like phospholipase domain containing 3 (PNPLA3), and proprotein convertase subtilisin/kexin type 9 (PCSK9), all of which are potential targets for drug development.

    View details for PubMedID 28827398

  • A Droplet Microfluidics Based Platform for Mining Metagenomic Libraries for Natural Compounds MICROMACHINES Theodorou, E., Scanga, R., Twardowski, M., Snyder, M. P., Brouzes, E. 2017; 8 (8)

    Abstract

    Historically, microbes from the environment have been a reliable source for novel bio-active compounds. Cloning and expression of metagenomic DNA in heterologous strains of bacteria has broadened the range of potential compounds accessible. However, such metagenomic libraries have been under-exploited for applications in mammalian cells because of a lack of integrated methods. We present an innovative platform to systematically mine natural resources for pro-apoptotic compounds that relies on the combination of bacterial delivery and droplet microfluidics. Using the violacein operon from C. violaceum as a model, we demonstrate that E. coli modified to be invasive can serve as an efficient delivery vehicle of natural compounds. This approach permits the seamless screening of metagenomic libraries with mammalian cell assays and alleviates the need for laborious extraction of natural compounds. In addition, we leverage the unique properties of droplet microfluidics to amplify bacterial clones and perform clonal screening at high-throughput in place of one-compound-per-well assays in multi-well format. We also use droplet microfluidics to establish a cell aggregate strategy that overcomes the issue of background apoptosis. Altogether, this work forms the foundation of a versatile platform to efficiently mine the metagenome for compounds with therapeutic potential.

    View details for PubMedID 30400422

  • Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis SCIENTIFIC REPORTS Ma, S., Snyder, M., Dinesh-Kumar, S. P. 2017; 7: 5557

    Abstract

    Deciphering gene regulatory networks requires identification of gene expression modules. We describe a novel bottom-up approach to identify gene modules regulated by cis-regulatory motifs from a human gene co-expression network. Target genes of a cis-regulatory motif were identified from the network via the motif's enrichment or biased distribution towards transcription start sites in the promoters of co-expressed genes. A gene sub-network containing the target genes was extracted and used to derive gene modules. The analysis revealed known and novel gene modules regulated by the NF-Y motif. The binding of NF-Y proteins to these modules' gene promoters were verified using ENCODE ChIP-Seq data. The analyses also identified 8,048 Sp1 motif target genes, interestingly many of which were not detected by ENCODE ChIP-Seq. These target genes assemble into house-keeping, tissues-specific developmental, and immune response modules. Integration of Sp1 modules with genomic and epigenomic data indicates epigenetic control of Sp1 targets' expression in a cell/tissue specific manner. Finally, known and novel target genes and modules regulated by the YY1, RFX1, IRF1, and 34 other motifs were also identified. The study described here provides a valuable resource to understand transcriptional regulation of various human developmental, disease, or immunity pathways.

    View details for PubMedID 28717181

  • Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis NATURE COMMUNICATIONS Sahraeian, S., Mohiyuddin, M., Sebra, R., Tilgner, H., Afshar, P. T., Au, K., Asadi, N., Gerstein, M. B., Wong, W., Snyder, M. P., Schadt, E., Lam, H. K. 2017; 8: 59

    Abstract

    RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.

    View details for PubMedID 28680106

  • Long-Read Isoform Sequencing Reveals a Hidden Complexity of the Transcriptional Landscape of Herpes Simplex Virus Type 1 FRONTIERS IN MICROBIOLOGY Tombacz, D., Csabai, Z., Szuca, A., Balazs, Z., Moldovan, N., Sharon, D., Snyder, M., Boldogkoi, Z. 2017; 8: 1079

    Abstract

    In this study, we used the amplified isoform sequencing technique from Pacific Biosciences to characterize the poly(A)+ fraction of the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). Our analysis detected 34 formerly unidentified protein-coding genes, 10 non-coding RNAs, as well as 17 polycistronic and complex transcripts. This work also led us to identify many transcript isoforms, including 13 splice and 68 transcript end variants, as well as several transcript overlaps. Additionally, we determined previously unascertained transcriptional start and polyadenylation sites. We analyzed the transcriptional activity from the complementary DNA strand in five convergent HSV gene pairs with quantitative RT-PCR and detected antisense RNAs in each gene. This part of the study revealed an inverse correlation between the expressions of convergent partners. Our work adds new insights for understanding the complexity of the pervasive transcriptional overlaps by suggesting that there is a crosstalk between adjacent and distal genes through interaction between their transcription apparatuses. We also identified transcripts overlapping the HSV replication origins, which may indicate an interplay between the transcription and replication machineries. The relative abundance of HSV-1 transcripts has also been established by using a novel method based on the calculation of sequencing reads for the analysis.

    View details for DOI 10.3389/fmicb.2017.01079

    View details for Web of Science ID 000403758800001

    View details for PubMedID 28676792

    View details for PubMedCentralID PMC5476775

  • Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells NUCLEIC ACIDS RESEARCH Han, L., Wu, H., Zhu, H., Kim, K., Marjani, S. L., Riester, M., Euskirchen, G., Zi, X., Yang, J., Han, J., Snyder, M., Park, I., Irizarry, R., Weissman, S. M., Michor, F., Fan, R., Pan, X. 2017; 45 (10): e77

    Abstract

    Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population.

    View details for PubMedID 28126923

    View details for PubMedCentralID PMC5605247

  • Isolated Congenital Anosmia and CNGA2 Mutation. Scientific reports Sailani, M. R., Jingga, I., MirMazlomi, S. H., Bitarafan, F., Bernstein, J. A., Snyder, M. P., Garshasbi, M. 2017; 7 (1): 2667-?

    Abstract

    Isolated congenital anosmia (ICA) is a rare condition that is associated with life-long inability to smell. Here we report a genetic characterization of a large Iranian family segregating ICA. Whole exome sequencing in five affected family members and five healthy members revealed a stop gain mutation in CNGA2 (OMIM 300338) (chrX:150,911,102; CNGA2. c.577C > T; p.Arg193*). The mutation segregates in an X-linked pattern, as all the affected family members are hemizygotes, whereas healthy family members are either heterozygote or homozygote for the reference allele. cnga2 knockout mice are congenitally anosmic and have abnormal olfactory system physiology, additionally Karstensen et al. recently reported two anosmic brothers sharing a CNGA2 truncating variant. Our study in concert with these findings provides strong support for role of CNGA2 gene with pathogenicity of ICA in humans. Together, these results indicate that mutations in key olfactory signaling pathway genes are responsible for human disease.

    View details for DOI 10.1038/s41598-017-02947-y

    View details for PubMedID 28572688

  • Succinate and its G-protein-coupled receptor stimulates osteoclastogenesis. Nature communications Guo, Y., Xie, C., Li, X., Yang, J., Yu, T., Zhang, R., Zhang, T., Saxena, D., Snyder, M., Wu, Y., Li, X. 2017; 8: 15621-?

    Abstract

    The mechanism underlying bone impairment in patients with diabetes mellitus, a metabolic disorder characterized by chronic hyperglycaemia and dysregulation in metabolism, is unclear. Here we show the difference in the metabolomics of bone marrow stromal cells (BMSCs) derived from hyperglycaemic (type 2 diabetes mellitus, T2D) and normoglycaemic mice. One hundred and forty-two metabolites are substantially regulated in BMSCs from T2D mice, with the tricarboxylic acid (TCA) cycle being one of the primary metabolic pathways impaired by hyperglycaemia. Importantly, succinate, an intermediate metabolite in the TCA cycle, is increased by 24-fold in BMSCs from T2D mice. Succinate functions as an extracellular ligand through binding to its specific receptor on osteoclastic lineage cells and stimulates osteoclastogenesis in vitro and in vivo. Strategies targeting the receptor activation inhibit osteoclastogenesis. This study reveals a metabolite-mediated mechanism of osteoclastogenesis modulation that contributes to bone dysregulation in metabolic disorders.

    View details for DOI 10.1038/ncomms15621

    View details for PubMedID 28561074

  • Multi-platform analysis reveals a complex transcriptome architecture of a circovirus. Virus research Moldován, N., Balázs, Z., Tombácz, D., Csabai, Z., Szucs, A., Snyder, M., Boldogkoi, Z. 2017; 237: 37-46

    Abstract

    In this study, we used Pacific Biosciences RS II long-read and Illumina HiScanSQ short-read sequencing technologies for the characterization of porcine circovirus type 1 (PCV-1) transcripts. Our aim was to identify novel RNA molecules and transcript isoforms, as well as to determine the exact 5'- and 3'-end sequences of previously described transcripts with single base-pair accuracy. We discovered a novel 3'-UTR length isoform of the Cap transcript, and a non-spliced Cap transcript variant. Additionally, our analysis has revealed a 3'-UTR isoform of Rep and two 5'-UTR isoforms of Rep' transcripts, and a novel splice variant of the longer Rep' transcript. We also explored two novel long transcripts, one with a previously identified splice site, and a formerly undetected mRNA of ORF3. Altogether, our methods have identified nine novel RNA molecules, doubling the size of PCV-1 transcriptome that had been known before. Additionally, our investigations revealed an intricate pattern of transcript overlapping, which might produce transcriptional interference between the transcriptional machineries of adjacent genes, and thereby may potentially play a role in the regulation of gene expression in circoviruses.

    View details for DOI 10.1016/j.virusres.2017.05.010

    View details for PubMedID 28549855

  • Non-equivalence of Wnt and R-spondin ligands during Lgr5(+) intestinal stem-cell self-renewal NATURE Yan, K. S., Janda, C. Y., Chang, J., Zheng, G. X., Larkin, K. A., Luca, V. C., Chia, L. A., Mah, A. T., Han, A., Terry, J. M., Ootani, A., Roelf, K., Lee, M., Yuan, J., Li, X., Bolen, C. R., Wilhelmy, J., Davies, P. S., Ueno, H., von Furstenberg, R. J., Belgrader, P., Ziraldo, S. B., Ordonez, H., Henning, S. J., Wong, M. H., Snyder, M. P., Weissman, I. L., Hsueh, A. J., Mikkelsen, T. S., Garcia, K. C., Kuo, C. J. 2017; 545 (7653): 238-?

    Abstract

    The canonical Wnt/β-catenin signalling pathway governs diverse developmental, homeostatic and pathological processes. Palmitoylated Wnt ligands engage cell-surface frizzled (FZD) receptors and LRP5 and LRP6 co-receptors, enabling β-catenin nuclear translocation and TCF/LEF-dependent gene transactivation. Mutations in Wnt downstream signalling components have revealed diverse functions thought to be carried out by Wnt ligands themselves. However, redundancy between the 19 mammalian Wnt proteins and 10 FZD receptors and Wnt hydrophobicity have made it difficult to attribute these functions directly to Wnt ligands. For example, individual mutations in Wnt ligands have not revealed homeostatic phenotypes in the intestinal epithelium-an archetypal canonical, Wnt pathway-dependent, rapidly self-renewing tissue, the regeneration of which is fueled by proliferative crypt Lgr5(+) intestinal stem cells (ISCs). R-spondin ligands (RSPO1-RSPO4) engage distinct LGR4-LGR6, RNF43 and ZNRF3 receptor classes, markedly potentiate canonical Wnt/β-catenin signalling, and induce intestinal organoid growth in vitro and Lgr5(+) ISCs in vivo. However, the interchangeability, functional cooperation and relative contributions of Wnt versus RSPO ligands to in vivo canonical Wnt signalling and ISC biology remain unknown. Here we identify the functional roles of Wnt and RSPO ligands in the intestinal crypt stem-cell niche. We show that the default fate of Lgr5(+) ISCs is to differentiate, unless both RSPO and Wnt ligands are present. However, gain-of-function studies using RSPO ligands and a new non-lipidated Wnt analogue reveal that these ligands have qualitatively distinct, non-interchangeable roles in ISCs. Wnt proteins are unable to induce Lgr5(+) ISC self-renewal, but instead confer a basal competency by maintaining RSPO receptor expression that enables RSPO ligands to actively drive and specify the extent of stem-cell expansion. This functionally non-equivalent yet cooperative interaction between Wnt and RSPO ligands establishes a molecular precedent for regulation of mammalian stem cells by distinct priming and self-renewal factors, with broad implications for precise control of tissue regeneration.

    View details for DOI 10.1038/nature22313

    View details for Web of Science ID 000400963800037

  • intestinal stem-cell self-renewal. Nature Yan, K. S., Janda, C. Y., Chang, J., Zheng, G. X., Larkin, K. A., Luca, V. C., Chia, L. A., Mah, A. T., Han, A., Terry, J. M., Ootani, A., Roelf, K., Lee, M., Yuan, J., Li, X., Bolen, C. R., Wilhelmy, J., Davies, P. S., Ueno, H., von Furstenberg, R. J., Belgrader, P., Ziraldo, S. B., Ordonez, H., Henning, S. J., Wong, M. H., Snyder, M. P., Weissman, I. L., Hsueh, A. J., Mikkelsen, T. S., Garcia, K. C., Kuo, C. J. 2017; 545 (7653): 238-242

    Abstract

    The canonical Wnt/β-catenin signalling pathway governs diverse developmental, homeostatic and pathological processes. Palmitoylated Wnt ligands engage cell-surface frizzled (FZD) receptors and LRP5 and LRP6 co-receptors, enabling β-catenin nuclear translocation and TCF/LEF-dependent gene transactivation. Mutations in Wnt downstream signalling components have revealed diverse functions thought to be carried out by Wnt ligands themselves. However, redundancy between the 19 mammalian Wnt proteins and 10 FZD receptors and Wnt hydrophobicity have made it difficult to attribute these functions directly to Wnt ligands. For example, individual mutations in Wnt ligands have not revealed homeostatic phenotypes in the intestinal epithelium-an archetypal canonical, Wnt pathway-dependent, rapidly self-renewing tissue, the regeneration of which is fueled by proliferative crypt Lgr5(+) intestinal stem cells (ISCs). R-spondin ligands (RSPO1-RSPO4) engage distinct LGR4-LGR6, RNF43 and ZNRF3 receptor classes, markedly potentiate canonical Wnt/β-catenin signalling, and induce intestinal organoid growth in vitro and Lgr5(+) ISCs in vivo. However, the interchangeability, functional cooperation and relative contributions of Wnt versus RSPO ligands to in vivo canonical Wnt signalling and ISC biology remain unknown. Here we identify the functional roles of Wnt and RSPO ligands in the intestinal crypt stem-cell niche. We show that the default fate of Lgr5(+) ISCs is to differentiate, unless both RSPO and Wnt ligands are present. However, gain-of-function studies using RSPO ligands and a new non-lipidated Wnt analogue reveal that these ligands have qualitatively distinct, non-interchangeable roles in ISCs. Wnt proteins are unable to induce Lgr5(+) ISC self-renewal, but instead confer a basal competency by maintaining RSPO receptor expression that enables RSPO ligands to actively drive and specify the extent of stem-cell expansion. This functionally non-equivalent yet cooperative interaction between Wnt and RSPO ligands establishes a molecular precedent for regulation of mammalian stem cells by distinct priming and self-renewal factors, with broad implications for precise control of tissue regeneration.

    View details for DOI 10.1038/nature22313

    View details for PubMedID 28467820

  • Histone variant H2A.J accumulates in senescent cells and promotes inflammatory gene expression NATURE COMMUNICATIONS Contrepois, K., Coudereau, C., Benayoun, B. A., Schuler, N., Roux, P., Bischof, O., Courbeyrette, R., Carvalho, C., Thuret, J., Ma, Z., Derbois, C., Nevers, M., Volland, H., Redon, C. E., Bonner, W. M., Deleuze, J., Wiel, C., Bernard, D., Snyder, M. P., Ruebe, C. E., Olaso, R., Fenaille, F., Mann, C. 2017; 8

    Abstract

    The senescence of mammalian cells is characterized by a proliferative arrest in response to stress and the expression of an inflammatory phenotype. Here we show that histone H2A.J, a poorly studied H2A variant found only in mammals, accumulates in human fibroblasts in senescence with persistent DNA damage. H2A.J also accumulates in mice with aging in a tissue-specific manner and in human skin. Knock-down of H2A.J inhibits the expression of inflammatory genes that contribute to the senescent-associated secretory phenotype (SASP), and over expression of H2A.J increases the expression of some of these genes in proliferating cells. H2A.J accumulation may thus promote the signalling of senescent cells to the immune system, and it may contribute to chronic inflammation and the development of aging-associated diseases.

    View details for DOI 10.1038/ncomms14995

    View details for Web of Science ID 000400886800001

    View details for PubMedID 28489069

  • Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens NATURE COMMUNICATIONS Morgens, D. W., Wainberg, M., Boyle, E. A., Ursu, O., Araya, C. L., Tsui, C. K., Haney, M. S., Hess, G. T., Han, K., Jeng, E. E., Li, A., Snyder, M. P., Greenleaf, W. J., Kundaje, A., Bassik, M. C. 2017; 8

    Abstract

    CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.

    View details for DOI 10.1038/ncomms15178

    View details for PubMedID 28474669

  • A Case Report of Hypoglycemia and Hypogammaglobulinemia: DAVID syndrome in a patient with a novel NFKB2 mutation. journal of clinical endocrinology and metabolism Lal, R. A., Bachrach, L. K., Hoffman, A. R., Inlora, J., Rego, S., Snyder, M. P., Lewis, D. B. 2017

    Abstract

    DAVID syndrome (Deficient Anterior pituitary with Variable Immune Deficiency) is a rare disorder in which children present with symptomatic ACTH deficiency preceded by hypogammaglobulinemia from B-cell dysfunction with recurrent infections, termed common variable immunodeficiency (CVID). Subsequent whole exome sequencing studies have revealed germline heterozygous C-terminal mutations of NFKB2 as either a cause of DAVID syndrome or of CVID without clinical hypopituitarism. However, to the best of our knowledge there have been no cases in which the endocrinopathy has presented in the absence of a prior clinical history of CVID.A previously healthy 7 year-old boy with no history of clinical immunodeficiency, presented with profound hypoglycemia and seizures. He was found to have secondary adrenal insufficiency and was started on glucocorticoid replacement. An evaluation for autoimmune disease, including for anti-pituitary antibodies, was negative. Evaluation unexpectedly revealed hypogammaglobulinemia (decreased IgG, IgM, and IgA). He had moderately reduced serotype-specific IgG responses following pneumococcal polysaccharide vaccine. Subsequently, he was found to have growth hormone (GH) deficiency. Six years after initial presentation, whole exome sequencing revealed a novel de novo heterozygous NFKB2 missense mutation c.2596A>C (p.Ser866Arg) in the C-terminal region predicted to abrogate the processing of the p100 NFKB2 protein to its active p52 form.Isolated early-onset ACTH deficiency is rare and C-terminal region NFKB2 mutations should be considered as an etiology even in the absence of a clinical history of CVID. Early immunologic evaluation is indicated in the diagnosis and management of isolated ACTH deficiency.

    View details for DOI 10.1210/jc.2017-00341

    View details for PubMedID 28472507

  • Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers CELL STEM CELL Gu, M., Shao, N., Sa, S., Li, D., Termglinchan, V., Ameen, M., Karakikes, I., Sosa, G., Grubert, F., Lee, J., Cao, A., Taylor, S., Ma, Y., Zhao, Z., Chappell, J., Hamid, R., Austin, E. D., Gold, J. D., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2017; 20 (4): 490-?
  • Gpr124 is essential for blood-brain barrier integrity in central nervous system disease NATURE MEDICINE Chang, J., Mancuso, M. R., Maier, C., Liang, X., Yuki, K., Yang, L., Kwong, J. W., Wang, J., Rao, V., Vallon, M., Kosinski, C., Zhang, J. J., Mah, A. T., Xu, L., Li, L., Gholamin, S., Reyes, T. F., Li, R., Kuhnert, F., Han, X., Yuan, J., Chiou, S., Brettman, A. D., Daly, L., Corney, D. C., Cheshier, S. H., Shortliffe, L. D., Wu, X., Snyder, M., Chan, P., Giffard, R. G., Chang, H. Y., Andreasson, K., Kuo, C. J. 2017; 23 (4): 450-?

    Abstract

    Although blood-brain barrier (BBB) compromise is central to the etiology of diverse central nervous system (CNS) disorders, endothelial receptor proteins that control BBB function are poorly defined. The endothelial G-protein-coupled receptor (GPCR) Gpr124 has been reported to be required for normal forebrain angiogenesis and BBB function in mouse embryos, but the role of this receptor in adult animals is unknown. Here Gpr124 conditional knockout (CKO) in the endothelia of adult mice did not affect homeostatic BBB integrity, but resulted in BBB disruption and microvascular hemorrhage in mouse models of both ischemic stroke and glioblastoma, accompanied by reduced cerebrovascular canonical Wnt-β-catenin signaling. Constitutive activation of Wnt-β-catenin signaling fully corrected the BBB disruption and hemorrhage defects of Gpr124-CKO mice, with rescue of the endothelial gene tight junction, pericyte coverage and extracellular-matrix deficits. We thus identify Gpr124 as an endothelial GPCR specifically required for endothelial Wnt signaling and BBB integrity under pathological conditions in adult mice. This finding implicates Gpr124 as a potential therapeutic target for human CNS disorders characterized by BBB disruption.

    View details for DOI 10.1038/nm.4309

    View details for PubMedID 28288111

  • Induced Pluripotent Stem Cell Model of Pulmonary Arterial Hypertension Reveals Novel Gene Expression and Patient Specificity AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE Sa, S., Gu, M., Chappe, J., Shao, N., Ameen, M., Elliott, K. A., Li, D., Grubert, F., Li, C. G., Taylor, S., Cao, A., Ma, Y., Fong, R., Nguyen, L., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2017; 195 (7): 930-941
  • Characterization of the Dynamic Transcriptome of a Herpesvirus with Long-read Single Molecule Real-Time Sequencing. Scientific reports Tombácz, D., Balázs, Z., Csabai, Z., Moldován, N., Szucs, A., Sharon, D., Snyder, M., Boldogkoi, Z. 2017; 7: 43751-?

    Abstract

    Herpesvirus gene expression is co-ordinately regulated and sequentially ordered during productive infection. The viral genes can be classified into three distinct kinetic groups: immediate-early, early, and late classes. In this study, a massively parallel sequencing technique that is based on PacBio Single Molecule Real-time sequencing platform, was used for quantifying the poly(A) fraction of the lytic transcriptome of pseudorabies virus (PRV) throughout a 12-hour interval of productive infection on PK-15 cells. Other approaches, including microarray, real-time RT-PCR and Illumina sequencing are capable of detecting only the aggregate transcriptional activity of particular genomic regions, but not individual herpesvirus transcripts. However, SMRT sequencing allows for a distinction between transcript isoforms, including length- and splice variants, as well as between overlapping polycistronic RNA molecules. The non-amplified Isoform Sequencing (Iso-Seq) method was used to analyse the kinetic properties of the lytic PRV transcripts and to then classify them accordingly. Additionally, the present study demonstrates the general utility of long-read sequencing for the time-course analysis of global gene expression in practically any organism.

    View details for DOI 10.1038/srep43751

    View details for PubMedID 28256586

    View details for PubMedCentralID PMC5335617

  • Association of AHSG with alopecia and mental retardation (APMR) syndrome. Human genetics Reza Sailani, M., Jahanbani, F., Nasiri, J., Behnam, M., Salehi, M., Sedghi, M., Hoseinzadeh, M., Takahashi, S., Zia, A., Gruber, J., Lynch, J. L., Lam, D., Winkelmann, J., Amirkiai, S., Pang, B., Rego, S., Mazroui, S., Bernstein, J. A., Snyder, M. P. 2017; 136 (3): 287-296

    Abstract

    Alopecia with mental retardation syndrome (APMR) is a very rare autosomal recessive condition that is associated with total or partial absence of hair from the scalp and other parts of the body as well as variable intellectual disability. Here we present whole-exome sequencing results of a large consanguineous family segregating APMR syndrome with seven affected family members. Our study revealed a novel predicted pathogenic, homozygous missense mutation in the AHSG (OMIM 138680) gene (AHSG: NM_001622:exon7:c.950G>A:p.Arg317His). The variant is predicted to affect a region of the protein required for protein processing and disrupts a phosphorylation motif. In addition, the altered protein migrates with an aberrant size relative to healthy individuals. Consistent with the phenotype, AHSG maps within APMR linkage region 1 (APMR 1) as reported before, and falls within runs of homozygosity (ROH). Previous families with APMR syndrome have been studied through linkage analyses and the linkage resolution did not allow pointing out to a single gene candidate. Our study is the first report to identify a homozygous missense mutation for APMR syndrome through whole-exome sequencing.

    View details for DOI 10.1007/s00439-016-1756-5

    View details for PubMedID 28054173

  • A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N-1-methyladenosine modification RNA Cenik, C., Chua, H. N., Singh, G., Akef, A., Snyder, M. P., Palazzo, A. F., Moore, M. J., Roth, F. P. 2017; 23 (3): 270-283

    Abstract

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

    View details for DOI 10.1261/rna.059105.116.

    View details for Web of Science ID 000394467500002

    View details for PubMedCentralID PMC5311483

  • -methyladenosine modification. RNA (New York, N.Y.) Cenik, C., Chua, H. N., Singh, G., Akef, A., Snyder, M. P., Palazzo, A. F., Moore, M. J., Roth, F. P. 2017; 23 (3): 270-283

    Abstract

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

    View details for DOI 10.1261/rna.059105.116

    View details for PubMedID 27994090

    View details for PubMedCentralID PMC5311483

  • Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors NUCLEIC ACIDS RESEARCH Yang, J., Tanaka, Y., Seay, M., Li, Z., Jin, J., Garmire, L. X., Zhu, X., Taylor, A., Li, W., Euskirchen, G., Halene, S., Kluger, Y., Snyder, M. P., Park, I., Pan, X., Weissman, S. M. 2017; 45 (3): 1281-1296

    Abstract

    Molecular changes underlying stem cell differentiation are of fundamental interest. scRNA-seq on murine hematopoietic stem cells (HSC) and their progeny MPP1 separated the cells into 3 main clusters with distinct features: active, quiescent, and an un-characterized cluster. Induction of anemia resulted in mobilization of the quiescent to the active cluster and of the early to later stage of cell cycle, with marked increase in expression of certain transcription factors (TFs) while maintaining expression of interferon response genes. Cells with surface markers of long term HSC increased the expression of a group of TFs expressed highly in normal cycling MPP1 cells. However, at least Id1 and Hes1 were significantly activated in both HSC and MPP1 cells in anemic mice. Lineage-specific genes were differently expressed between cells, and correlated with the cell cycle stages with a specific augmentation of erythroid related genes in the G2/M phase. Most lineage specific TFs were stochastically expressed in the early precursor cells, but a few, such as Klf1, were detected only at very low levels in few precursor cells. The activation of these factors may correlate with stages of differentiation. This study reveals effects of cell cycle progression on the expression of lineage specific genes in precursor cells, and suggests that hematopoietic stress changes the balance of renewal and differentiation in these homeostatic cells.

    View details for DOI 10.1093/nar/gkw1214

    View details for Web of Science ID 000397008000025

    View details for PubMedCentralID PMC5388401

  • Genetic Adaptation of Porcine Circovirus Type 1 to Cultured Porcine Kidney Cells Revealed by Single-Molecule Long-Read Sequencing Technology MICROBIOLOGY RESOURCE ANNOUNCEMENTS Tombacz, D., Moldovan, N., Balazs, Z., Csabai, Z., Snyder, M., Boldogkoi, Z. 2017; 5 (5)

    Abstract

    Porcine circovirus type 1 (PCV1) is a nonpathogenic circovirus, and a contaminant of the porcine kidney (PK-15) cell line. We present the complete and annotated genome sequence of strain Szeged of PCV1, determined by Pacific Biosciences RSII long-read sequencing platform.

    View details for PubMedID 28153895

  • Pharmacological rescue of diabetic skeletal stem cell niches. Science translational medicine Tevlin, R., Seo, E. Y., Marecic, O., McArdle, A., Tong, X., Zimdahl, B., Malkovskiy, A., Sinha, R., Gulati, G., Li, X., Wearda, T., Morganti, R., Lopez, M., Ransom, R. C., Duldulao, C. R., Rodrigues, M., Nguyen, A., Januszyk, M., Maan, Z., Paik, K., Yapa, K., Rajadas, J., Wan, D. C., Gurtner, G. C., Snyder, M., Beachy, P. A., Yang, F., Goodman, S. B., Weissman, I. L., Chan, C. K., Longaker, M. T. 2017; 9 (372)

    Abstract

    Diabetes mellitus (DM) is a metabolic disease frequently associated with impaired bone healing. Despite its increasing prevalence worldwide, the molecular etiology of DM-linked skeletal complications remains poorly defined. Using advanced stem cell characterization techniques, we analyzed intrinsic and extrinsic determinants of mouse skeletal stem cell (mSSC) function to identify specific mSSC niche-related abnormalities that could impair skeletal repair in diabetic (Db) mice. We discovered that high serum concentrations of tumor necrosis factor-α directly repressed the expression of Indian hedgehog (Ihh) in mSSCs and in their downstream skeletogenic progenitors in Db mice. When hedgehog signaling was inhibited during fracture repair, injury-induced mSSC expansion was suppressed, resulting in impaired healing. We reversed this deficiency by precise delivery of purified Ihh to the fracture site via a specially formulated, slow-release hydrogel. In the presence of exogenous Ihh, the injury-induced expansion and osteogenic potential of mSSCs were restored, culminating in the rescue of Db bone healing. Our results present a feasible strategy for precise treatment of molecular aberrations in stem and progenitor cell populations to correct skeletal manifestations of systemic disease.

    View details for DOI 10.1126/scitranslmed.aag2809

    View details for PubMedID 28077677

  • ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis. Nucleic acids research Li, G., Chen, Y., Snyder, M. P., Zhang, M. Q. 2017; 45 (1)

    Abstract

    ChIA-PET2 is a versatile and flexible pipeline for analyzing different types of ChIA-PET data from raw sequencing reads to chromatin loops. ChIA-PET2 integrates all steps required for ChIA-PET data analysis, including linker trimming, read alignment, duplicate removal, peak calling and chromatin loop calling. It supports different kinds of ChIA-PET data generated from different ChIA-PET protocols and also provides quality controls for different steps of ChIA-PET analysis. In addition, ChIA-PET2 can use phased genotype data to call allele-specific chromatin interactions. We applied ChIA-PET2 to different ChIA-PET datasets, demonstrating its significantly improved performance as well as its ability to easily process ChIA-PET raw data. ChIA-PET2 is available at https://github.com/GuipengLi/ChIA-PET2.

    View details for DOI 10.1093/nar/gkw809

    View details for PubMedID 27625391

    View details for PubMedCentralID PMC5224499

  • Identification of a novel mutation in APTX gene associated with Ataxia-oculomotor apraxia. Cold Spring Harbor molecular case studies Inlora, J., Sailani, M. R., Khodadadi, H., Teymurinezhad, A., Takahashi, S., Bernstein, J. A., Garshasbi, M., Snyder, M. P. 2017

    Abstract

    Hereditary ataxias are clinically and genetically heterogeneous family of disorders defined by the inability to control gait and muscle coordination. Given the non-specific symptoms of many hereditary ataxias, precise diagnosis relies on molecular genetic testing. To this end, we conducted whole exome sequencing (WES) on a large consanguineous Iranian family with hereditary ataxia and oculomotor apraxia. WES in five affected and six unaffected individuals resulted in the identification of a homozygous novel stop-gain mutation in APTX gene (c. 739T>A; p.Lys247Ter) that segregates with the phenotype. Mutations in APTX gene are associated with ataxia with oculomotor apraxia type 1 (AOA1).

    View details for PubMedID 28652255

  • Genome-Wide Temporal Profiling of Transcriptome and Open Chromatin of Early Cardiomyocyte Differentiation Derived From hiPSCs and hESCs. Circulation research Liu, Q., Jiang, C., Xu, J., Zhao, M. T., Van Bortle, K., Cheng, X., Wang, G., Chang, H. Y., Wu, J. C., Snyder, M. P. 2017; 121 (4): 376–91

    Abstract

    Recent advances have improved our ability to generate cardiomyocytes from human induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs). However, our understanding of the transcriptional regulatory networks underlying early stages (ie, from mesoderm to cardiac mesoderm) of cardiomyocyte differentiation remains limited.To characterize transcriptome and chromatin accessibility during early cardiomyocyte differentiation from hiPSCs and hESCs.We profiled the temporal changes in transcriptome and chromatin accessibility at genome-wide levels during cardiomyocyte differentiation derived from 2 hiPSC lines and 2 hESC lines at 4 stages: pluripotent stem cells, mesoderm, cardiac mesoderm, and differentiated cardiomyocytes. Overall, RNA sequencing analysis revealed that transcriptomes during early cardiomyocyte differentiation were highly concordant between hiPSCs and hESCs, and clustering of 4 cell lines within each time point demonstrated that changes in genome-wide chromatin accessibility were similar across hiPSC and hESC cell lines. Weighted gene co-expression network analysis (WGCNA) identified several modules that were strongly correlated with different stages of cardiomyocyte differentiation. Several novel genes were identified with high weighted connectivity within modules and exhibited coexpression patterns with other genes, including noncoding RNA LINC01124 and uncharacterized RNA AK127400 in the module related to the mesoderm stage; E-box-binding homeobox 1 (ZEB1) in the module correlated with postcardiac mesoderm. We further demonstrated that ZEB1 is required for early cardiomyocyte differentiation. In addition, based on integrative analysis of both WGCNA and transcription factor motif enrichment analysis, we determined numerous transcription factors likely to play important roles at different stages during cardiomyocyte differentiation, such as T and eomesodermin (EOMES; mesoderm), lymphoid enhancer-binding factor 1 (LEF1) and mesoderm posterior BHLH transcription factor 1 (MESP1; from mesoderm to cardiac mesoderm), meis homeobox 1 (MEIS1) and GATA-binding protein 4 (GATA4) (postcardiac mesoderm), JUN and FOS families, and MEIS2 (cardiomyocyte).Both hiPSCs and hESCs share similar transcriptional regulatory mechanisms underlying early cardiac differentiation, and our results have revealed transcriptional regulatory networks and new factors (eg, ZEB1) controlling early stages of cardiomyocyte differentiation.

    View details for PubMedID 28663367

  • WISP3 mutation associated with Pseudorheumatoid Dysplasia. Cold Spring Harbor molecular case studies Sailani, M. R., Chappell, J., Inlora, J., Lynch, L., Narasimha, A., Mazroui, S., Zia, A., Bernstein, J., Aryani, O., Snyder, M. P. 2017

    Abstract

    Progressive pseudorheumatoid dysplasia (PPD) is a skeletal dysplasia characterized by predominant involvement of articular cartilage with progressive joint stiffness. Here we report genetic characterization of a consanguineous family segregating an uncharacterized from of skeletal dysplasia. Whole exome sequencing of four affected siblings and their parents identified a loss of function homozygous mutation in the WISP3 gene, leading to diagnosis of PPD in the affected individuals. The identified variant (chr6: 112382301; WISP3:c.156C>A p.Cys52*) is rare and predicted to cause premature termination of the WISP3 protein.

    View details for PubMedID 29092958

  • Transcriptomic and epigenomic differences in human induced pluripotent stem cells generated from six reprogramming methods. Nature biomedical engineering Churko, J. M., Lee, J., Ameen, M., Gu, M., Venkatasubramanian, M., Diecke, S., Sallam, K., Im, H., Wang, G., Gold, J. D., Salomonis, N., Snyder, M. P., Wu, J. C. 2017; 1 (10): 826–37

    Abstract

    Many reprogramming methods can generate human induced pluripotent stem cells (hiPSCs) that closely resemble human embryonic stem cells (hESCs). This has led to assessments of how similar hiPSCs are to hESCs, by evaluating differences in gene expression, epigenetic marks and differentiation potential. However, all previous studies were performed using hiPSCs acquired from different laboratories, passage numbers, culturing conditions, genetic backgrounds and reprogramming methods, all of which may contribute to the reported differences. Here, by using high-throughput sequencing under standardized cell culturing conditions and passage number, we compare the epigenetic signatures (H3K4me3, H3K27me3 and HDAC2 ChIP-seq profiles) and transcriptome differences (by RNA-seq) of hiPSCs generated from the same primary fibroblast population by using six different reprogramming methods. We found that the reprogramming method impacts the resulting transcriptome and that all hiPSC lines could terminally differentiate, regardless of the reprogramming method. Moreover, by comparing the differences between the hiPSC and hESC lines, we observed a significant proportion of differentially expressed genes that could be attributed to polycomb repressive complex targets.

    View details for PubMedID 30263871

  • Cloud-based interactive analytics for terabytes of genomic variants data. Bioinformatics (Oxford, England) Pan, C., McInnes, G., Deflaux, N., Snyder, M., Bingham, J., Datta, S., Tsao, P. S. 2017; 33 (23): 3709–15

    Abstract

    Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired.We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs.cuiping@stanford.edu or ptsao@stanford.edu.Supplementary data are available at Bioinformatics online.

    View details for PubMedID 28961771

  • Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors. Nucleic acids research Yang, J., Tanaka, Y., Seay, M., Li, Z., Jin, J., Garmire, L. X., Zhu, X., Taylor, A., Li, W., Euskirchen, G., Halene, S., Kluger, Y., Snyder, M. P., Park, I. H., Pan, X., Weissman, S. M. 2017; 45 (3): 1281–96

    Abstract

    Molecular changes underlying stem cell differentiation are of fundamental interest. scRNA-seq on murine hematopoietic stem cells (HSC) and their progeny MPP1 separated the cells into 3 main clusters with distinct features: active, quiescent, and an un-characterized cluster. Induction of anemia resulted in mobilization of the quiescent to the active cluster and of the early to later stage of cell cycle, with marked increase in expression of certain transcription factors (TFs) while maintaining expression of interferon response genes. Cells with surface markers of long term HSC increased the expression of a group of TFs expressed highly in normal cycling MPP1 cells. However, at least Id1 and Hes1 were significantly activated in both HSC and MPP1 cells in anemic mice. Lineage-specific genes were differently expressed between cells, and correlated with the cell cycle stages with a specific augmentation of erythroid related genes in the G2/M phase. Most lineage specific TFs were stochastically expressed in the early precursor cells, but a few, such as Klf1, were detected only at very low levels in few precursor cells. The activation of these factors may correlate with stages of differentiation. This study reveals effects of cell cycle progression on the expression of lineage specific genes in precursor cells, and suggests that hematopoietic stress changes the balance of renewal and differentiation in these homeostatic cells.

    View details for PubMedID 28003475

  • Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus. Frontiers in microbiology Moldován, N., Tombácz, D., Szűcs, A., Csabai, Z., Snyder, M., Boldogkői, Z. 2017; 8: 2708

    Abstract

    Third-generation sequencing is an emerging technology that is capable of solving several problems that earlier approaches were not able to, including the identification of transcripts isoforms and overlapping transcripts. In this study, we used long-read sequencing for the analysis of pseudorabies virus (PRV) transcriptome, including Oxford Nanopore Technologies MinION, PacBio RS-II, and Illumina HiScanSQ platforms. We also used data from our previous short-read and long-read sequencing studies for the comparison of the results and in order to confirm the obtained data. Our investigations identified 19 formerly unknown putative protein-coding genes, all of which are 5' truncated forms of earlier annotated longer PRV genes. Additionally, we detected 19 non-coding RNAs, including 5' and 3' truncated transcripts without in-frame ORFs, antisense RNAs, as well as RNA molecules encoded by those parts of the viral genome where no transcription had been detected before. This study has also led to the identification of three complex transcripts and 50 distinct length isoforms, including transcription start and end variants. We also detected 121 novel transcript overlaps, and two transcripts that overlap the replication origins of PRV. Furthermore, in silico analysis revealed 145 upstream ORFs, many of which are located on the longer 5' isoforms of the transcripts.

    View details for PubMedID 29403453

  • Topological organization and dynamic regulation of human tRNA genes during macrophage differentiation. Genome biology Van Bortle, K., Phanstiel, D. H., Snyder, M. P. 2017; 18 (1): 180

    Abstract

    The human genome is hierarchically organized into local and long-range structures that help shape cell-type-specific transcription patterns. Transfer RNA (tRNA) genes (tDNAs), which are transcribed by RNA polymerase III (RNAPIII) and encode RNA molecules responsible for translation, are dispersed throughout the genome and, in many cases, linearly organized into genomic clusters with other tDNAs. Whether the location and three-dimensional organization of tDNAs contribute to the activity of these genes has remained difficult to address, due in part to unique challenges related to tRNA sequencing. We therefore devised integrated tDNA expression profiling, a method that combines RNAPIII mapping with biotin-capture of nascent tRNAs. We apply this method to the study of dynamic tRNA gene regulation during macrophage development and further integrate these data with high-resolution maps of 3D chromatin structure.Integrated tDNA expression profiling reveals domain-level and loop-based organization of tRNA gene transcription during cellular differentiation. tRNA genes connected by DNA loops, which are proximal to CTCF binding sites and expressed at elevated levels compared to non-loop tDNAs, change coordinately with tDNAs and protein-coding genes at distal ends of interactions mapped by in situ Hi-C. We find that downregulated tRNA genes are specifically marked by enhanced promoter-proximal binding of MAF1, a transcriptional repressor of RNAPIII activity, altogether revealing multiple levels of tDNA regulation during cellular differentiation.We present evidence of both local and coordinated long-range regulation of human tDNA expression, suggesting the location and organization of tRNA genes contribute to dynamic tDNA activity during macrophage development.

    View details for PubMedID 28931413

    View details for PubMedCentralID PMC5607496

  • GATTACA: Lightweight Metagenomic Binning Using Kmer Counting Popic, V., Kuleshov, V., Snyder, M., Batzoglou, S., Sahinalp, S. C. SPRINGER-VERLAG BERLIN. 2017: 391–92
  • Cell Type-Specific Chromatin Signatures Underline Regulatory DNA Elements in Human Induced Pluripotent Stem Cells and Somatic Cells. Circulation research Zhao, M. T., Shao, N. Y., Hu, S., Ma, N., Srinivasan, R., Jahanbani, F., Lee, J., Zhang, S. L., Snyder, M. P., Wu, J. C. 2017; 121 (11): 1237–50

    Abstract

    Regulatory DNA elements in the human genome play important roles in determining the transcriptional abundance and spatiotemporal gene expression during embryonic heart development and somatic cell reprogramming. It is not well known how chromatin marks in regulatory DNA elements are modulated to establish cell type-specific gene expression in the human heart.We aimed to decipher the cell type-specific epigenetic signatures in regulatory DNA elements and how they modulate heart-specific gene expression.We profiled genome-wide transcriptional activity and a variety of epigenetic marks in the regulatory DNA elements using massive RNA-seq (n=12) and ChIP-seq (chromatin immunoprecipitation combined with high-throughput sequencing; n=84) in human endothelial cells (CD31+CD144+), cardiac progenitor cells (Sca-1+), fibroblasts (DDR2+), and their respective induced pluripotent stem cells. We uncovered 2 classes of regulatory DNA elements: class I was identified with ubiquitous enhancer (H3K4me1) and promoter (H3K4me3) marks in all cell types, whereas class II was enriched with H3K4me1 and H3K4me3 in a cell type-specific manner. Both class I and class II regulatory elements exhibited stimulatory roles in nearby gene expression in a given cell type. However, class I promoters displayed more dominant regulatory effects on transcriptional abundance regardless of distal enhancers. Transcription factor network analysis indicated that human induced pluripotent stem cells and somatic cells from the heart selected their preferential regulatory elements to maintain cell type-specific gene expression. In addition, we validated the function of these enhancer elements in transgenic mouse embryos and human cells and identified a few enhancers that could possibly regulate the cardiac-specific gene expression.Given that a large number of genetic variants associated with human diseases are located in regulatory DNA elements, our study provides valuable resources for deciphering the epigenetic modulation of regulatory DNA elements that fine-tune spatiotemporal gene expression in human cardiac development and diseases.

    View details for PubMedID 29030344

    View details for PubMedCentralID PMC5773062

  • Dynamic landscape and regulation of RNA editing in mammals. Nature Tan, M. H., Li, Q., Shanmugam, R., Piskol, R., Kohler, J., Young, A. N., Liu, K. I., Zhang, R., Ramaswami, G., Ariyoshi, K., Gupte, A., Keegan, L. P., George, C. X., Ramu, A., Huang, N., Pollina, E. A., Leeman, D. S., Rustighi, A., Goh, Y. P., Chawla, A., Del Sal, G., Peltz, G., Brunet, A., Conrad, D. F., Samuel, C. E., O'Connell, M. A., Walkley, C. R., Nishikura, K., Li, J. B. 2017; 550 (7675): 249–54

    Abstract

    Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.

    View details for PubMedID 29022589

  • Landscape of X chromosome inactivation across human tissues. Nature Tukiainen, T., Villani, A. C., Yen, A., Rivas, M. A., Marshall, J. L., Satija, R., Aguirre, M., Gauthier, L., Fleharty, M., Kirby, A., Cummings, B. B., Castel, S. E., Karczewski, K. J., Aguet, F., Byrnes, A., Lappalainen, T., Regev, A., Ardlie, K. G., Hacohen, N., MacArthur, D. G. 2017; 550 (7675): 244–48

    Abstract

    X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.

    View details for PubMedID 29022598

  • Molecular and functional resemblance of differentiated cells derived from isogenic human iPSCs and SCNT-derived ESCs. Proceedings of the National Academy of Sciences of the United States of America Zhao, M. T., Chen, H., Liu, Q., Shao, N. Y., Sayed, N., Wo, H. T., Zhang, J. Z., Ong, S. G., Liu, C., Kim, Y., Yang, H., Chour, T., Ma, H., Gutierrez, N. M., Karakikes, I., Mitalipov, S., Snyder, M. P., Wu, J. C. 2017

    Abstract

    Patient-specific pluripotent stem cells (PSCs) can be generated via nuclear reprogramming by transcription factors (i.e., induced pluripotent stem cells, iPSCs) or by somatic cell nuclear transfer (SCNT). However, abnormalities and preclinical application of differentiated cells generated by different reprogramming mechanisms have yet to be evaluated. Here we investigated the molecular and functional features, and drug response of cardiomyocytes (PSC-CMs) and endothelial cells (PSC-ECs) derived from genetically relevant sets of human iPSCs, SCNT-derived embryonic stem cells (nt-ESCs), as well as in vitro fertilization embryo-derived ESCs (IVF-ESCs). We found that differentiated cells derived from isogenic iPSCs and nt-ESCs showed comparable lineage gene expression, cellular heterogeneity, physiological properties, and metabolic functions. Genome-wide transcriptome and DNA methylome analysis indicated that iPSC derivatives (iPSC-CMs and iPSC-ECs) were more similar to isogenic nt-ESC counterparts than those derived from IVF-ESCs. Although iPSCs and nt-ESCs shared the same nuclear DNA and yet carried different sources of mitochondrial DNA, CMs derived from iPSC and nt-ESCs could both recapitulate doxorubicin-induced cardiotoxicity and exhibited insignificant differences on reactive oxygen species generation in response to stress condition. We conclude that molecular and functional characteristics of differentiated cells from human PSCs are primarily attributed to the genetic compositions rather than the reprogramming mechanisms (SCNT vs. iPSCs). Therefore, human iPSCs can replace nt-ESCs as alternatives for generating patient-specific differentiated cells for disease modeling and preclinical drug testing.

    View details for PubMedID 29203658

  • Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nature genetics 2017; 49 (12): 1664–70

    Abstract

    Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.

    View details for DOI 10.1038/ng.3969

    View details for PubMedID 29019975

  • The impact of rare variation on gene expression across tissues. Nature Li, X., Kim, Y., Tsang, E. K., Davis, J. R., Damani, F. N., Chiang, C., Hess, G. T., Zappala, Z., Strober, B. J., Scott, A. J., Li, A., Ganna, A., Bassik, M. C., Merker, J. D., Hall, I. M., Battle, A., Montgomery, S. B. 2017; 550 (7675): 239–43

    Abstract

    Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

    View details for PubMedID 29022581

  • Genetic effects on gene expression across human tissues. Nature Battle, A., Brown, C. D., Engelhardt, B. E., Montgomery, S. B. 2017; 550 (7675): 204–13

    Abstract

    Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

    View details for PubMedID 29022597

  • Lineage-specific dynamic and pre-established enhancer-promoter contacts cooperate in terminal differentiation. Nature genetics Rubin, A. J., Barajas, B. C., Furlan-Magaril, M., Lopez-Pajares, V., Mumbach, M. R., Howard, I., Kim, D. S., Boxer, L. D., Cairns, J., Spivakov, M., Wingett, S. W., Shi, M., Zhao, Z., Greenleaf, W. J., Kundaje, A., Snyder, M., Chang, H. Y., Fraser, P., Khavari, P. A. 2017; 49 (10): 1522–28

    Abstract

    Chromosome conformation is an important feature of metazoan gene regulation; however, enhancer-promoter contact remodeling during cellular differentiation remains poorly understood. To address this, genome-wide promoter capture Hi-C (CHi-C) was performed during epidermal differentiation. Two classes of enhancer-promoter contacts associated with differentiation-induced genes were identified. The first class ('gained') increased in contact strength during differentiation in concert with enhancer acquisition of the H3K27ac activation mark. The second class ('stable') were pre-established in undifferentiated cells, with enhancers constitutively marked by H3K27ac. The stable class was associated with the canonical conformation regulator cohesin, whereas the gained class was not, implying distinct mechanisms of contact formation and regulation. Analysis of stable enhancers identified a new, essential role for a constitutively expressed, lineage-restricted ETS-family transcription factor, EHF, in epidermal differentiation. Furthermore, neither class of contacts was observed in pluripotent cells, suggesting that lineage-specific chromatin structure is established in tissue progenitor cells and is further remodeled in terminal differentiation.

    View details for PubMedID 28805829

  • Cloud-based Interactive Analytics for Terabytes of Genomic Variants Data Bioinformatics Pan, C., McInnes, G., Deflaux, N., Snyder, M. P., Bingham, J., Datta, S., Tsao, P. S. 2017: 3709–15

    Abstract

    Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired.We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs.cuiping@stanford.edu or ptsao@stanford.edu.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btx468

    View details for PubMedCentralID PMC5860318

  • Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis CELL Ang, Y., Rivas, R. N., Ribeiro, A. J., Srivas, R., Rivera, J., Stone, N. R., Pratt, K., Mohamed, T. M., Fu, J., Spencer, C. I., Tippens, N. D., Li, M., Narasimha, A., Radzinsky, E., Moon-Grady, A. J., Yu, H., Pruitt, B. L., Snyder, M. P., Srivastava, D. 2016; 167 (7): 1734-?

    Abstract

    Mutation of highly conserved residues in transcription factors may affect protein-protein or protein-DNA interactions, leading to gene network dysregulation and human disease. Human mutations in GATA4, a cardiogenic transcription factor, cause cardiac septal defects and cardiomyopathy. Here, iPS-derived cardiomyocytes from subjects with a heterozygous GATA4-G296S missense mutation showed impaired contractility, calcium handling, and metabolic activity. In human cardiomyocytes, GATA4 broadly co-occupied cardiac enhancers with TBX5, another transcription factor that causes septal defects when mutated. The GATA4-G296S mutation disrupted TBX5 recruitment, particularly to cardiac super-enhancers, concomitant with dysregulation of genes related to the phenotypic abnormalities, including cardiac septation. Conversely, the GATA4-G296S mutation led to failure of GATA4 and TBX5-mediated repression at non-cardiac genes and enhanced open chromatin states at endothelial/endocardial promoters. These results reveal how disease-causing missense mutations can disrupt transcriptional cooperativity, leading to aberrant chromatin states and cellular dysfunction, including those related to morphogenetic defects.

    View details for DOI 10.1016/j.cell.2016.11.033

    View details for Web of Science ID 000393114700013

    View details for PubMedID 27984724

    View details for PubMedCentralID PMC5180611

  • Can heavy isotopes increase lifespan? Studies of relative abundance in various organisms reveal chemical perspectives on aging. BioEssays Li, X., Snyder, M. P. 2016; 38 (11): 1093-1101

    Abstract

    Stable heavy isotopes co-exist with their lighter counterparts in all elements commonly found in biology. These heavy isotopes represent a low natural abundance in isotopic composition but impose great retardation effects in chemical reactions because of kinetic isotopic effects (KIEs). Previous isotope analyses have recorded pervasive enrichment or depletion of heavy isotopes in various organisms, strongly supporting the capability of biological systems to distinguish different isotopes. This capability has recently been found to lead to general decline of heavy isotopes in metabolites during yeast aging. Conversely, supplementing heavy isotopes in growth medium promotes longevity. Whether this observation prevails in other organisms is not known, but it potentially bears promise in promoting human longevity.

    View details for DOI 10.1002/bies.201600040

    View details for PubMedID 27554342

    View details for PubMedCentralID PMC5108472

  • iPSC Model of Pulmonary Arterial Hypertension Reveals Novel Gene Expression and Patient Specificity. American journal of respiratory and critical care medicine Sa, S., Gu, M., Chappell, J., Shao, N., Ameen, M., Elliott, K. A., Li, D., Grubert, F., Li, C. G., Taylor, S., Cao, A., Ma, Y., Fong, R., Nguyen, L., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2016: -?

    Abstract

    Idiopathic or heritable pulmonary arterial hypertension is characterized by loss and obliteration of lung vasculature. Endothelial cell dysfunction is pivotal to the pathophysiology but different causal mechanisms may reflect a need for patient-tailored therapies.Endothelial cells differentiated from induced pluripotent stem cells were compared to pulmonary arterial endothelial cells from the same patients with idiopathic or heritable pulmonary arterial hypertension, to determine whether they shared functional abnormalities and altered gene expression patterns, that differed from those in unused donor cells. We then investigated whether endothelial cells differentiated from pluripotent cells could serve as surrogates to test emerging therapies.Functional changes assessed included adhesion, migration, tube formation, and propensity to apoptosis. Expression of BMPR2 and its target, collagen IV, pSMAD1/5 signaling and transcriptomic profiles were also analyzed.Native pulmonary arterial and induced pluripotent stem cell-derived endothelial cells from idiopathic and heritable pulmonary arterial hypertension patients compared to controls, showed a similar reduction in adhesion, migration, survival, and tube formation, decreased BMPR2 and downstream signaling and collagen IV expression. Transcriptomic profiling revealed high KISS1 related to reduced migration and low CES1, to impaired survival in patient cells. A beneficial angiogenic response to potential therapies, FK-506 and Elafin, was related to reduced SLIT3, an anti-migratory factor.Despite the site of disease in the lung our study indicates that induced pluripotent stem cell derived endothelial cells are useful surrogates to uncover novel features related to disease mechanisms and to better match patients to therapies.

    View details for PubMedID 27779452

  • Nat1 Deficiency Is Associated with Mitochondrial Dysfunction and Exercise Intolerance in Mice CELL REPORTS Chennamsetty, I., Coronado, M., Contrepois, K., Keller, M. P., Carcamo-Orive, I., Sandin, J., Fajardo, G., Whittle, A. J., Fathzadeh, M., Snyder, M., Reaven, G., Attie, A. D., Bernstein, D., Quertermous, T., Knowles, J. W. 2016; 17 (2): 527-540

    Abstract

    We recently identified human N-acetyltransferase 2 (NAT2) as an insulin resistance (IR) gene. Here, we examine the cellular mechanism linking NAT2 to IR and find that Nat1 (mouse ortholog of NAT2) is co-regulated with key mitochondrial genes. RNAi-mediated silencing of Nat1 led to mitochondrial dysfunction characterized by increased intracellular reactive oxygen species and mitochondrial fragmentation as well as decreased mitochondrial membrane potential, biogenesis, mass, cellular respiration, and ATP generation. These effects were consistent in 3T3-L1 adipocytes, C2C12 myoblasts, and in tissues from Nat1-deficient mice, including white adipose tissue, heart, and skeletal muscle. Nat1-deficient mice had changes in plasma metabolites and lipids consistent with a decreased ability to utilize fats for energy and a decrease in basal metabolic rate and exercise capacity without altered thermogenesis. Collectively, our results suggest that Nat1 deficiency results in mitochondrial dysfunction, which may constitute a mechanistic link between this gene and IR.

    View details for DOI 10.1016/j.celrep.2016.09.005

    View details for Web of Science ID 000385850700019

    View details for PubMedID 27705799

    View details for PubMedCentralID PMC5097870

  • Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature genetics Corces, M. R., Buenrostro, J. D., Wu, B., Greenside, P. G., Chan, S. M., Koenig, J. L., Snyder, M. P., Pritchard, J. K., Kundaje, A., Greenleaf, W. J., Majeti, R., Chang, H. Y. 2016; 48 (10): 1193-1203

    Abstract

    We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.

    View details for DOI 10.1038/ng.3646

    View details for PubMedID 27526324

  • A proposal for validation of antibodies NATURE METHODS Uhlen, M., Bandrowski, A., Carr, S., Edwards, A., Ellenberg, J., Lundberg, E., Rimm, D. L., Rodriguez, H., Hiltke, T., Snyder, M., Yamamoto, T. 2016; 13 (10): 823-?

    View details for DOI 10.1038/NMETH.3995

    View details for Web of Science ID 000385194600015

    View details for PubMedID 27595404

  • Multiple Pairwise Analysis of Non-homologous Centromere Coupling Reveals Preferential Chromosome Size-Dependent Interactions and a Role for Bouquet Formation in Establishing the Interaction Pattern PLOS GENETICS Lefrancois, P., Rockmill, B., Xie, P., Roeder, G. S., Snyder, M. 2016; 12 (10)

    Abstract

    During meiosis, chromosomes undergo a homology search in order to locate their homolog to form stable pairs and exchange genetic material. Early in prophase, chromosomes associate in mostly non-homologous pairs, tethered only at their centromeres. This phenomenon, conserved through higher eukaryotes, is termed centromere coupling in budding yeast. Both initiation of recombination and the presence of homologs are dispensable for centromere coupling (occurring in spo11 mutants and haploids induced to undergo meiosis) but the presence of the synaptonemal complex (SC) protein Zip1 is required. The nature and mechanism of coupling have yet to be elucidated. Here we present the first pairwise analysis of centromere coupling in an effort to uncover underlying rules that may exist within these non-homologous interactions. We designed a novel chromosome conformation capture (3C)-based assay to detect all possible interactions between non-homologous yeast centromeres during early meiosis. Using this variant of 3C-qPCR, we found a size-dependent interaction pattern, in which chromosomes assort preferentially with chromosomes of similar sizes, in haploid and diploid spo11 cells, but not in a coupling-defective mutant (spo11 zip1 haploid and diploid yeast). This pattern is also observed in wild-type diploids early in meiosis but disappears as meiosis progresses and homologous chromosomes pair. We found no evidence to support the notion that ancestral centromere homology plays a role in pattern establishment in S. cerevisiae post-genome duplication. Moreover, we found a role for the meiotic bouquet in establishing the size dependence of centromere coupling, as abolishing bouquet (using the bouquet-defective spo11 ndj1 mutant) reduces it. Coupling in spo11 ndj1 rather follows telomere clustering preferences. We propose that a chromosome size preference for centromere coupling helps establish efficient homolog recognition.

    View details for DOI 10.1371/journal.pgen.1006347

    View details for Web of Science ID 000386683300016

    View details for PubMedID 27768699

    View details for PubMedCentralID PMC5074576

  • iPSC-derived cardiomyocytes reveal abnormal TGF-ß signalling in left ventricular non-compaction cardiomyopathy. Nature cell biology Kodo, K., Ong, S., Jahanbani, F., Termglinchan, V., Hirono, K., Inanloorahatloo, K., Ebert, A. D., Shukla, P., Abilez, O. J., Churko, J. M., Karakikes, I., Jung, G., Ichida, F., Wu, S. M., Snyder, M. P., Bernstein, D., Wu, J. C. 2016; 18 (10): 1031-1042

    Abstract

    Left ventricular non-compaction (LVNC) is the third most prevalent cardiomyopathy in children and its pathogenesis has been associated with the developmental defect of the embryonic myocardium. We show that patient-specific induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) generated from LVNC patients carrying a mutation in the cardiac transcription factor TBX20 recapitulate a key aspect of the pathological phenotype at the single-cell level and this was associated with perturbed transforming growth factor beta (TGF-β) signalling. LVNC iPSC-CMs have decreased proliferative capacity due to abnormal activation of TGF-β signalling. TBX20 regulates the expression of TGF-β signalling modifiers including one known to be a genetic cause of LVNC, PRDM16, and genome editing of PRDM16 caused proliferation defects in iPSC-CMs. Inhibition of TGF-β signalling and genome correction of the TBX20 mutation were sufficient to reverse the disease phenotype. Our study demonstrates that iPSC-CMs are a useful tool for the exploration of pathological mechanisms underlying poorly understood cardiomyopathies including LVNC.

    View details for DOI 10.1038/ncb3411

    View details for PubMedID 27642787

  • Full-Length Isoform Sequencing Reveals Novel Transcripts and Substantial Transcriptional Overlaps in a Herpesvirus PLOS ONE Tombacz, D., Csabai, Z., Olah, P., Balazs, Z., Liko, I., Zsigmond, L., Sharon, D., Snyder, M., Boldogkoi, Z. 2016; 11 (9)

    Abstract

    Whole transcriptome studies have become essential for understanding the complexity of genetic regulation. However, the conventionally applied short-read sequencing platforms cannot be used to reliably distinguish between many transcript isoforms. The Pacific Biosciences (PacBio) RS II platform is capable of reading long nucleic acid stretches in a single sequencing run. The pseudorabies virus (PRV) is an excellent system to study herpesvirus gene expression and potential interactions between the transcriptional units. In this work, non-amplified and amplified isoform sequencing protocols were used to characterize the poly(A+) fraction of the lytic transcriptome of PRV, with the aim of a complete transcriptional annotation of the viral genes. The analyses revealed a previously unrecognized complexity of the PRV transcriptome including the discovery of novel protein-coding and non-coding genes, novel mono- and polycistronic transcription units, as well as extensive transcriptional overlaps between neighboring and distal genes. This study identified non-coding transcripts overlapping all three replication origins of the PRV, which might play a role in the control of DNA synthesis. We additionally established the relative expression levels of gene products. Our investigations revealed that the whole PRV genome is utilized for transcription, including both DNA strands in all coding and intergenic regions. The genome-wide occurrence of transcript overlaps suggests a crosstalk between genes through a network formed by interacting transcriptional machineries with a potential function in the control of gene expression.

    View details for DOI 10.1371/journal.pone.0162868

    View details for Web of Science ID 000384328500015

    View details for PubMedID 27685795

    View details for PubMedCentralID PMC5042381

  • Transcriptome Profiling of Patient-Specific Human iPSC-Cardiomyocytes Predicts Individual Drug Safety and Efficacy Responses In Vitro. Cell stem cell Matsa, E., Burridge, P. W., Yu, K., Ahrens, J. H., Termglinchan, V., Wu, H., Liu, C., Shukla, P., Sayed, N., Churko, J. M., Shao, N., Woo, N. A., Chao, A. S., Gold, J. D., Karakikes, I., Snyder, M. P., Wu, J. C. 2016; 19 (3): 311-325

    Abstract

    Understanding individual susceptibility to drug-induced cardiotoxicity is key to improving patient safety and preventing drug attrition. Human induced pluripotent stem cells (hiPSCs) enable the study of pharmacological and toxicological responses in patient-specific cardiomyocytes (CMs) and may serve as preclinical platforms for precision medicine. Transcriptome profiling in hiPSC-CMs from seven individuals lacking known cardiovascular disease-associated mutations and in three isogenic human heart tissue and hiPSC-CM pairs showed greater inter-patient variation than intra-patient variation, verifying that reprogramming and differentiation preserve patient-specific gene expression, particularly in metabolic and stress-response genes. Transcriptome-based toxicology analysis predicted and risk-stratified patient-specific susceptibility to cardiotoxicity, and functional assays in hiPSC-CMs using tacrolimus and rosiglitazone, drugs targeting pathways predicted to produce cardiotoxicity, validated inter-patient differential responses. CRISPR/Cas9-mediated pathway correction prevented drug-induced cardiotoxicity. Our data suggest that hiPSC-CMs can be used in vitro to predict and validate patient-specific drug safety and efficacy, potentially enabling future clinical approaches to precision medicine.

    View details for DOI 10.1016/j.stem.2016.07.006

    View details for PubMedID 27545504

  • Predicting Ovarian Cancer Patients' Clinical Response to Platinum-Based Chemotherapy by Their Tumor Proteomic Signatures JOURNAL OF PROTEOME RESEARCH Yu, K., Levine, D. A., Zhang, H., Chan, D. W., Zhang, Z., Snyder, M. 2016; 15 (8): 2455-2465

    Abstract

    Ovarian cancer is the deadliest gynecologic malignancy in the United States with most patients diagnosed in the advanced stage of the disease. Platinum-based antineoplastic therapeutics is indispensable to treating advanced ovarian serous carcinoma. However, patients have heterogeneous responses to platinum drugs, and it is difficult to predict these interindividual differences before administering medication. In this study, we investigated the tumor proteomic profiles and clinical characteristics of 130 ovarian serous carcinoma patients analyzed by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), predicted the platinum drug response using supervised machine learning methods, and evaluated our prediction models through leave-one-out cross-validation. Our data-driven feature selection approach indicated that tumor proteomics profiles contain information for predicting binarized platinum response (P < 0.0001). We further built a least absolute shrinkage and selection operator (LASSO)-Cox proportional hazards model that stratified patients into early relapse and late relapse groups (P = 0.00013). The top proteomic features indicative of platinum response were involved in ATP synthesis pathways and Ran GTPase binding. Overall, we demonstrated that proteomic profiles of ovarian serous carcinoma patients predicted platinum drug responses as well as provided insights into the biological processes influencing the efficacy of platinum-based therapeutics. Our analytical approach is also extensible to predicting response to other antineoplastic agents or treatment modalities for both ovarian and other cancers.

    View details for DOI 10.1021/acs.jproteome.5b01129

    View details for Web of Science ID 000381235900010

    View details for PubMedID 27312948

  • EPHB4 kinase-inactivating mutations cause autosomal dominant lymphatic-related hydrops fetalis. journal of clinical investigation Martin-Almedina, S., Martinez-Corral, I., Holdhus, R., Vicente, A., Fotiou, E., Lin, S., Petersen, K., Simpson, M. A., Hoischen, A., Gilissen, C., Jeffery, H., Atton, G., Karapouliou, C., Brice, G., Gordon, K., Wiseman, J. W., Wedin, M., Rockson, S. G., Jeffery, S., Mortimer, P. S., Snyder, M. P., Berland, S., Mansour, S., Makinen, T., Ostergaard, P. 2016; 126 (8): 3080-3088

    Abstract

    Hydrops fetalis describes fluid accumulation in at least 2 fetal compartments, including abdominal cavities, pleura, and pericardium, or in body tissue. The majority of hydrops fetalis cases are nonimmune conditions that present with generalized edema of the fetus, and approximately 15% of these nonimmune cases result from a lymphatic abnormality. Here, we have identified an autosomal dominant, inherited form of lymphatic-related (nonimmune) hydrops fetalis (LRHF). Independent exome sequencing projects on 2 families with a history of in utero and neonatal deaths associated with nonimmune hydrops fetalis uncovered 2 heterozygous missense variants in the gene encoding Eph receptor B4 (EPHB4). Biochemical analysis determined that the mutant EPHB4 proteins are devoid of tyrosine kinase activity, indicating that loss of EPHB4 signaling contributes to LRHF pathogenesis. Further, inactivation of Ephb4 in lymphatic endothelial cells of developing mouse embryos led to defective lymphovenous valve formation and consequent subcutaneous edema. Together, these findings identify EPHB4 as a critical regulator of early lymphatic vascular development and demonstrate that mutations in the gene can cause an autosomal dominant form of LRHF that is associated with a high mortality rate.

    View details for DOI 10.1172/JCI85794

    View details for PubMedID 27400125

    View details for PubMedCentralID PMC4966301

  • Omics Profiling in Precision Oncology. Molecular & cellular proteomics Yu, K., Snyder, M. 2016; 15 (8): 2525-2536

    Abstract

    Cancer causes significant morbidity and mortality worldwide, and is the area most targeted in precision medicine. Recent development of high-throughput methods enables detailed omics analysis of the molecular mechanisms underpinning tumor biology. These studies have identified clinically actionable mutations, gene and protein expression patterns associated with prognosis, and provided further insights into the molecular mechanisms indicative of cancer biology and new therapeutics strategies such as immunotherapy. In this review, we summarize the techniques used for tumor omics analysis, recapitulate the key findings in cancer omics studies, and point to areas requiring further research on precision oncology.

    View details for DOI 10.1074/mcp.O116.059253

    View details for PubMedID 27099341

  • Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer. Cell Zhang, H., Liu, T., Zhang, Z., Payne, S. H., Zhang, B., McDermott, J. E., Zhou, J., Petyuk, V. A., Chen, L., Ray, D., Sun, S., Yang, F., Chen, L., Wang, J., Shah, P., Cha, S. W., Aiyetan, P., Woo, S., Tian, Y., Gritsenko, M. A., Clauss, T. R., Choi, C., Monroe, M. E., Thomas, S., Nie, S., Wu, C., Moore, R. J., Yu, K., Tabb, D. L., Fenyö, D., Bafna, V., Wang, Y., Rodriguez, H., Boja, E. S., Hiltke, T., Rivers, R. C., Sokoll, L., Zhu, H., Shih, I., Cope, L., Pandey, A., Zhang, B., Snyder, M. P., Levine, D. A., Smith, R. D., Chan, D. W., Rodland, K. D. 2016; 166 (3): 755-765

    Abstract

    To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC.

    View details for DOI 10.1016/j.cell.2016.05.069

    View details for PubMedID 27372738

  • Integrated Network Analysis Reveals an Association between Plasma Mannose Levels and Insulin Resistance CELL METABOLISM Lee, S., Zhang, C., Kilicarslan, M., Piening, B. D., Bjornson, E., Hallstrom, B. M., Groen, A. K., Ferrannini, E., Laakso, M., Snyder, M., Bluher, M., Uhlen, M., Nielsen, J., Smith, U., Serlie, M. J., Boren, J., Mardinoglu, A. 2016; 24 (1): 172-184

    Abstract

    To investigate the biological processes that are altered in obese subjects, we generated cell-specific integrated networks (INs) by merging genome-scale metabolic, transcriptional regulatory and protein-protein interaction networks. We performed genome-wide transcriptomics analysis to determine the global gene expression changes in the liver and three adipose tissues from obese subjects undergoing bariatric surgery and integrated these data into the cell-specific INs. We found dysregulations in mannose metabolism in obese subjects and validated our predictions by detecting mannose levels in the plasma of the lean and obese subjects. We observed significant correlations between plasma mannose levels, BMI, and insulin resistance (IR). We also measured plasma mannose levels of the subjects in two additional different cohorts and observed that an increased plasma mannose level was associated with IR and insulin secretion. We finally identified mannose as one of the best plasma metabolites in explaining the variance in obesity-independent IR.

    View details for DOI 10.1016/j.cmet.2016.05.026

    View details for PubMedID 27345421

  • Using Mass Spectrometry to Quantify Rituximab and Perform Individualized Immunoglobulin Phenotyping in ANCA-Associated Vasculitis ANALYTICAL CHEMISTRY Mills, J. R., Cornec, D., Dasari, S., Ladwig, P. M., Hummel, A. M., Cheu, M., Murray, D. L., Willrich, M. A., Snyder, M. R., Hoffman, G. S., Kallenberg, C. G., Langford, C. A., Merkel, P. A., Monach, P. A., Seo, P., Spiera, R. F., St Cair, E. W., Stone, J. H., Specks, U., Barnidge, D. R. 2016; 88 (12): 6317-6325

    Abstract

    Therapeutic monoclonal immunoglobulins (mAbs) are used to treat patients with a wide range of disorders including autoimmune diseases. As pharmaceutical companies bring more fully humanized therapeutic mAb drugs to the healthcare market analytical platforms that perform therapeutic drug monitoring (TDM) without relying on mAb specific reagents will be needed. In this study we demonstrate that liquid-chromatography-mass spectrometry (LC-MS) can be used to perform TDM of mAbs in the same manner as smaller nonbiologic drugs. The assay uses commercially available reagents combined with heavy and light chain disulfide bond reduction followed by light chain analysis by microflow-LC-electrospray ionization-quadrupole-time-of-flight mass spectrometry (ESI-Q-TOF MS). Quantification is performed using the peak areas from multiply charged mAb light chain ions using an in-house developed software package developed for TDM of mAbs. The data presented here demonstrate the ability of an LC-MS assay to quantify a therapeutic mAb in a large cohort of patients in a clinical trial. The ability to quantify any mAb in serum via the reduced light chain without the need for reagents specific for each mAb demonstrates the unique capabilities of LC-MS. This fact, coupled with the ability to phenotype a patient's polyclonal repertoire in the same analysis further shows the potential of this approach to mAb analysis.

    View details for DOI 10.1021/acs.analchem.6b00544

    View details for Web of Science ID 000378470200034

    View details for PubMedID 27228216

  • Genome assembly from synthetic long read clouds BIOINFORMATICS Kuleshov, V., Snyder, M. P., Batzoglou, S. 2016; 32 (12): 216-224

    Abstract

    Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.Our source code is freely available at https://github.com/kuleshov/architectkuleshov@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btw267

    View details for Web of Science ID 000379734300025

    View details for PubMedCentralID PMC4908351

  • Genome assembly from synthetic long read clouds. Bioinformatics Kuleshov, V., Snyder, M. P., Batzoglou, S. 2016; 32 (12): i216-i224

    Abstract

    Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.Our source code is freely available at https://github.com/kuleshov/architectkuleshov@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btw267

    View details for PubMedID 27307620

  • Effects of cellular origin on differentiation of human induced pluripotent stem cell-derived endothelial cells. JCI insight Hu, S., Zhao, M., Jahanbani, F., Shao, N., Lee, W. H., Chen, H., Snyder, M. P., Wu, J. C. 2016; 1 (8)

    Abstract

    Human induced pluripotent stem cells (iPSCs) can be derived from various types of somatic cells by transient overexpression of 4 Yamanaka factors (OCT4, SOX2, C-MYC, and KLF4). Patient-specific iPSC derivatives (e.g., neuronal, cardiac, hepatic, muscular, and endothelial cells [ECs]) hold great promise in drug discovery and regenerative medicine. In this study, we aimed to evaluate whether the cellular origin can affect the differentiation, in vivo behavior, and single-cell gene expression signatures of human iPSC-derived ECs. We derived human iPSCs from 3 types of somatic cells of the same individuals: fibroblasts (FB-iPSCs), ECs (EC-iPSCs), and cardiac progenitor cells (CPC-iPSCs). We then differentiated them into ECs by sequential administration of Activin, BMP4, bFGF, and VEGF. EC-iPSCs at early passage (10 < P < 20) showed higher EC differentiation propensity and gene expression of EC-specific markers (PECAM1 and NOS3) than FB-iPSCs and CPC-iPSCs. In vivo transplanted EC-iPSC-ECs were recovered with a higher percentage of CD31(+) population and expressed higher EC-specific gene expression markers (PECAM1, KDR, and ICAM) as revealed by microfluidic single-cell quantitative PCR (qPCR). In vitro EC-iPSC-ECs maintained a higher CD31(+) population than FB-iPSC-ECs and CPC-iPSC-ECs with long-term culturing and passaging. These results indicate that cellular origin may influence lineage differentiation propensity of human iPSCs; hence, the somatic memory carried by early passage iPSCs should be carefully considered before clinical translation.

    View details for PubMedID 27398408

  • The genetic predisposition to bronchopulmonary dysplasia CURRENT OPINION IN PEDIATRICS Yu, K., Li, J., Snyder, M., Shaw, G. M., O'Brodovich, H. M. 2016; 28 (3): 318-323

    Abstract

    Bronchopulmonary dysplasia (BPD) is a prevalent chronic lung disease in premature infants. Twin studies have shown strong heritability underlying this disease; however, the genetic architecture of BPD remains unclear.A number of studies employed different approaches to characterize the genetic aberrations associated with BPD, including candidate gene studies, genome-wide association studies, exome sequencing, integrative omics analysis, and pathway analysis. Candidate gene studies identified a number of genes potentially involved with the development of BPD, but the etiological contribution from each gene is not substantial. Copy number variation studies and three independent genome-wide association studies did not identify genetic variations significantly and consistently associated with BPD. A recent exome-sequencing study pointed to rare variants implicated in the disease. In this review, we summarize these studies' methodology and findings, and suggest future research directions to better understand the genetic underpinnings of this potentially life-long lung disease.Genetic factors play a significant role in the development of BPD. Recent studies suggested that rare variants in genes participating in lung development pathways could contribute to BPD susceptibility.

    View details for DOI 10.1097/MOP.0000000000000344

    View details for Web of Science ID 000376387000010

    View details for PubMedID 26963946

    View details for PubMedCentralID PMC4853271

  • Concerted genomic targeting of H3K27 demethylase REF6 and chromatin-remodeling ATPase BRM in Arabidopsis NATURE GENETICS Li, C., Gu, L., Gao, L., Chen, C., Wei, C., Qiu, Q., Chien, C., Wang, S., Jiang, L., Ai, L., Chen, C., Yang, S., Nguyen, V., Qi, Y., Snyder, M. P., Burlingame, A. L., Kohalmi, S. E., Huang, S., Cao, X., Wang, Z., Wu, K., Chen, X., Cui, Y. 2016; 48 (6): 687-?

    Abstract

    SWI/SNF-type chromatin remodelers, such as BRAHMA (BRM), and H3K27 demethylases both have active roles in regulating gene expression at the chromatin level, but how they are recruited to specific genomic sites remains largely unknown. Here we show that RELATIVE OF EARLY FLOWERING 6 (REF6), a plant-unique H3K27 demethylase, targets genomic loci containing a CTCTGYTY motif via its zinc-finger (ZnF) domains and facilitates the recruitment of BRM. Genome-wide analyses showed that REF6 colocalizes with BRM at many genomic sites with the CTCTGYTY motif. Loss of REF6 results in decreased BRM occupancy at BRM-REF6 co-targets. Furthermore, REF6 directly binds to the CTCTGYTY motif in vitro, and deletion of the motif from a target gene renders it inaccessible to REF6 in vivo. Finally, we show that, when its ZnF domains are deleted, REF6 loses its genomic targeting ability. Thus, our work identifies a new genomic targeting mechanism for an H3K27 demethylase and demonstrates its key role in recruiting the BRM chromatin remodeler.

    View details for DOI 10.1038/ng.3555

    View details for PubMedID 27111034

  • Age-Dependent Pancreatic Gene Regulation Reveals Mechanisms Governing Human beta Cell Function CELL METABOLISM Arda, H. E., Li, L., Tsai, J., Torre, E. A., Rosli, Y., Peiris, H., Spitale, R. C., Dai, C., Gu, X., Qu, K., Wang, P., Wang, J., Grompe, M., Scharfmann, R., Snyder, M. S., Bottino, R., Powers, A. C., Chang, H. Y., Kim, S. K. 2016; 23 (5): 909-920

    Abstract

    Intensive efforts are focused on identifying regulators of human pancreatic islet cell growth and maturation to accelerate development of therapies for diabetes. After birth, islet cell growth and function are dynamically regulated; however, establishing these age-dependent changes in humans has been challenging. Here, we describe a multimodal strategy for isolating pancreatic endocrine and exocrine cells from children and adults to identify age-dependent gene expression and chromatin changes on a genomic scale. These profiles revealed distinct proliferative and functional states of islet α cells or β cells and histone modifications underlying age-dependent gene expression changes. Expression of SIX2 and SIX3, transcription factors without prior known functions in the pancreas and linked to fasting hyperglycemia risk, increased with age specifically in human islet β cells. SIX2 and SIX3 were sufficient to enhance insulin content or secretion in immature β cells. Our work provides a unique resource to study human-specific regulators of islet cell maturation and function.

    View details for DOI 10.1016/j.cmet.2016.04.002

    View details for PubMedID 27133132

  • Can Metabolic Profiles Be Used as a Phenotypic Readout of the Genome to Enhance Precision Medicine? CLINICAL CHEMISTRY Contrepois, K., Liang, L., Snyder, M. 2016; 62 (5): 676–78

    View details for PubMedID 26960666

    View details for PubMedCentralID PMC4851585

  • Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection BMC BIOINFORMATICS Zhang, Q., Zeng, X., Younkin, S., Kawli, T., Snyder, M. P., Keles, S. 2016; 17

    Abstract

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection.Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

    View details for DOI 10.1186/s12859-016-0957-1

    View details for Web of Science ID 000370775000001

    View details for PubMedID 26908256

    View details for PubMedCentralID PMC4765064

  • Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nature genetics Araya, C. L., Cenik, C., Reuter, J. A., Kiss, G., Pande, V. S., Snyder, M. P., Greenleaf, W. J. 2016; 48 (2): 117-125

    Abstract

    Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

    View details for DOI 10.1038/ng.3471

    View details for PubMedID 26691984

  • Protein substrates of the arginine methyltransferase Hmt1 identified by proteome arrays PROTEOMICS Low, J. K., Im, H., Erce, M. A., Hart-Smith, G., Snyder, M. P., Wilkins, M. R. 2016; 16 (3): 465–76

    Abstract

    Arginine methylation on nonhistone proteins is associated with a number of cellular processes including RNA splicing, protein localization, and the formation of protein complexes. In this manuscript, Saccharomyces cerevisiae proteome arrays carrying 4228 proteins were used with an antimethylarginine antibody to first identify 88 putatively arginine-methylated proteins. By treating the arrays with recombinant arginine methyltransferase Hmt1, 42 proteins were found to be possible substrates of this enzyme. Analysis of the putative arginine-methylated proteins revealed that they were predominantly nuclear or nucleolar in localization, consistent with the localization of Hmt1. Many are involved in known methylarginine-associated functions, such as RNA processing and ribonucleoprotein complex biogenesis, yet others are of newer classes, namely RNA/DNA helicases and tRNA-associated proteins. Using ex vivo methylation and MS/MS, a set of 12 proteins (Brr1, Dia4, Hts1, Mpp10, Mrd1, Nug1, Prp43, Rpa43, Rrp43, Spp381, Utp4, and Npl3), including the RNA helicase Prp43 and tRNA ligases Dia4 and Hts1, were all validated as Hmt1 substrates. Interestingly, the majority of these also had human orthologs, or family members, that have been documented elsewhere to carry arginine methylation. These results confirm arginine methylation as a widespread modification and Hmt1 as the major arginine methyltransferase in the S. cerevisiae cell.

    View details for PubMedID 26572822

  • Effects of Formalin Fixation Variables on DNA Integrity for Genomic Applications in Cancer Lefterova, M., Clark, M. J., Alla, R. K., Luo, S., Morra, M., Helman, E., Boyle, S. M., Kirk, S., Sripakdeevong, P., Karbelashvili, M., Church, D. M., Snyder, M. P., West, J., Chen, R. NATURE PUBLISHING GROUP. 2016: 516A–517A
  • Proteome-wide survey of the autoimmune target repertoire in autoimmune polyendocrine syndrome type 1 SCIENTIFIC REPORTS Landegren, N., Sharon, D., Freyhult, E., Hallgren, A., Eriksson, D., Edqvist, P., Bensing, S., Wahlberg, J., Nelson, L. M., Gustafsson, J., Husebye, E. S., Anderson, M. S., Snyder, M., Kampe, O. 2016; 6

    Abstract

    Autoimmune polyendocrine syndrome type 1 (APS1) is a monogenic disorder that features multiple autoimmune disease manifestations. It is caused by mutations in the Autoimmune regulator (AIRE) gene, which promote thymic display of thousands of peripheral tissue antigens in a process critical for establishing central immune tolerance. We here used proteome arrays to perform a comprehensive study of autoimmune targets in APS1. Interrogation of established autoantigens revealed highly reliable detection of autoantibodies, and by exploring the full panel of more than 9000 proteins we further identified MAGEB2 and PDILT as novel major autoantigens in APS1. Our proteome-wide assessment revealed a marked enrichment for tissue-specific immune targets, mirroring AIRE's selectiveness for this category of genes. Our findings also suggest that only a very limited portion of the proteome becomes targeted by the immune system in APS1, which contrasts the broad defect of thymic presentation associated with AIRE-deficiency and raises novel questions what other factors are needed for break of tolerance.

    View details for DOI 10.1038/srep20104

    View details for PubMedID 26830021

  • Distance from sub-Saharan Africa predicts mutational load in diverse human genomes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Henn, B. M., Botigue, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K., Martin, A. R., Musharoff, S., Cann, H., Snyder, M. P., Excoffier, L., Kidd, J. M., Bustamante, C. D. 2016; 113 (4): E440-E449

    Abstract

    The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

    View details for DOI 10.1073/pnas.1510805112

    View details for Web of Science ID 000368617900008

    View details for PubMedCentralID PMC4743782

  • Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing G3-GENES GENOMES GENETICS Shoemaker, L. D., Clark, M. J., Patwardhan, A., Chandratillake, G., Garcia, S., Chen, R., Morgan, A. A., Leng, N., Kirk, S., Chen, R., Cook, D. J., Snyder, M., Steinberg, G. K. 2016; 6 (1): 41-49

    Abstract

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10-12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10(-5)) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10(-4)) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10(-4)) and non-RNF213 founder mutation (P = 1.51×10(-3)) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10(-4)) and non-RNF213 founder mutation cases (P = 5.31×10(-5)). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis.

    View details for DOI 10.1534/g3.115.020321

    View details for Web of Science ID 000367725000004

    View details for PubMedCentralID PMC4704723

  • Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proceedings of the National Academy of Sciences of the United States of America Henn, B. M., Botigué, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K., Martin, A. R., Musharoff, S., Cann, H., Snyder, M. P., Excoffier, L., Kidd, J. M., Bustamante, C. D. 2016; 113 (4): E440–9

    Abstract

    The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

    View details for PubMedID 26712023

  • Secure cloud computing for genomic data Nature Biotechnology Somalee, D., Keith, B., Michael, S. 2016; 34 (6): 588-91

    View details for DOI 10.1038/nbt.3496

  • Yeast longevity promoted by reversing aging-associated decline in heavy isotope content npj Aging and Mechanisms of Disease Li, X., Snyder, M. P. 2016; 2 (16004): 16004

    Abstract

    Dysregulation of metabolism develops with organismal aging. Both genetic and environmental manipulations promote longevity by effectively diverting various metabolic processes against aging. How these processes converge on the metabolome is not clear. Here we report that the heavy isotopic forms of common elements, a universal feature of metabolites, decline in yeast cells undergoing chronological aging. Supplementation of deuterium, a heavy hydrogen isotope, through heavy water (D2O) uptake extends yeast chronological lifespan (CLS) by up to 85% with minimal effects on growth. The CLS extension by D2O bypasses several known genetic regulators, but is abrogated by calorie restriction and mitochondrial deficiency. Heavy water substantially suppresses endogenous generation of reactive oxygen species (ROS) and slows the pace of metabolic consumption and disposal. Protection from aging by heavy isotopes might result from kinetic modulation of biochemical reactions. Altogether, our findings reveal a novel perspective of aging and new means for promoting longevity.

    View details for DOI 10.1038/npjamd.2016.4

    View details for PubMedCentralID PMC5515009

  • HARNESSING BIG DATA FOR PRECISION MEDICINE: INFRASTRUCTURES AND APPLICATIONS. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Yu, K., Hart, S. N., Goldfeder, R., Zhang, Q. C., Parker, S. C., Snyder, M. 2016; 22: 635-639

    Abstract

    Precision medicine is a health management approach that accounts for individual differences in genetic backgrounds and environmental exposures. With the recent advancements in high-throughput omics profiling technologies, collections of large study cohorts, and the developments of data mining algorithms, big data in biomedicine is expected to provide novel insights into health and disease states, which can be translated into personalized disease prevention and treatment plans. However, petabytes of biomedical data generated by multiple measurement modalities poses a significant challenge for data analysis, integration, storage, and result interpretation. In addition, patient privacy preservation, coordination between participating medical centers and data analysis working groups, as well as discrepancies in data sharing policies remain important topics of discussion. In this workshop, we invite experts in omics integration, biobank research, and data management to share their perspectives on leveraging big data to enable precision medicine.Workshop website: http://tinyurl.com/PSB17BigData; HashTag: #PSB17BigData.

    View details for PubMedID 27897013

  • NIH working group report-using genomic information to guide weight management: From universal to precision treatment OBESITY Bray, M. S., Loos, R. J., McCaffery, J. M., Ling, C., Franks, P. W., Weinstock, G. M., Snyder, M. P., Vassy, J. L., Agurs-Collins, T. 2016; 24 (1): 14-22

    Abstract

    Precision medicine utilizes genomic and other data to optimize and personalize treatment. Although more than 2,500 genetic tests are currently available, largely for extreme and/or rare phenotypes, the question remains whether this approach can be used for the treatment of common, complex conditions like obesity, inflammation, and insulin resistance, which underlie a host of metabolic diseases.This review, developed from a Trans-NIH Conference titled "Genes, Behaviors, and Response to Weight Loss Interventions," provides an overview of the state of genetic and genomic research in the area of weight change and identifies key areas for future research.Although many loci have been identified that are associated with cross-sectional measures of obesity/body size, relatively little is known regarding the genes/loci that influence dynamic measures of weight change over time. Although successful short-term weight loss has been achieved using many different strategies, sustainable weight loss has proven elusive for many, and there are important gaps in our understanding of energy balance regulation.Elucidating the molecular basis of variability in weight change has the potential to improve treatment outcomes and inform innovative approaches that can simultaneously take into account information from genomic and other sources in devising individualized treatment plans.

    View details for DOI 10.1002/oby.21381

    View details for PubMedID 26692578

  • Metformin Improves Diabetic Bone Health by Re-Balancing Catabolism and Nitrogen Disposal PLOS ONE Li, X., Guo, Y., Yan, W., Snyder, M. P., Li, X. 2015; 10 (12)

    Abstract

    Metformin, a leading drug used to treat diabetic patients, is reported to benefit bone homeostasis under hyperglycemia in animal models. However, both the molecular targets and the biological pathways affected by metformin in bone are not well identified or characterized. The objective of this study is to investigate the bioengergeric pathways affected by metformin in bone marrow cells of mice.Metabolite levels were examined in bone marrow samples extracted from metformin or PBS -treated healthy (Wild type) and hyperglycemic (diabetic) mice using liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. We applied an untargeted high performance LC-MS approach which combined multimode chromatography (ion exchange, reversed phase and hydrophilic interaction (HILIC)) and Orbitrap-based ultra-high accuracy mass spectrometry to achieve a wide coverage. A multivariate clustering was applied to reveal the global trends and major metabolite players.A total of 346 unique metabolites were identified, and they are grouped into distinctive clusters that reflected general and diabetes-specific responses to metformin. As evidenced by changes in the TCA and urea cycles, increased catabolism and nitrogen waste that are commonly associated with diabetes were rebalanced upon treatment with metformin. In particular, we found glutamate and succinate whose levels were drastically elevated in diabetic animals were brought back to normal levels by metformin. These two metabolites were further validated as the major targets of metformin in bone marrow stromal cells.Overall using limited sample size, our study revealed the metabolic pathways modulated by metformin in bones which have broad implication in our understanding of bone remodeling under hyperglycemia and in finding therapeutic interventions in mammals.

    View details for DOI 10.1371/journal.pone.0146152

    View details for Web of Science ID 000367510500137

    View details for PubMedCentralID PMC4696809

  • Integrated Proteomic and Genomic Analysis of Gastric Cancer Patient Tissues JOURNAL OF PROTEOME RESEARCH Yan, J. F., Kim, H., Jeong, S., Lee, H., Sethi, M. K., Lee, L. Y., Beavis, R. C., Im, H., Snyder, M. P., Hofree, M., Ideker, T., Wu, S., Paik, Y., Fanayan, S., Hancock, W. S. 2015; 14 (12): 4995-5006

    Abstract

    V-erb-b2 erythroblastic leukemia viral oncogene homologue 2, known as ERBB2, is an important oncogene in the development of certain cancers. It can form a heterodimer with other epidermal growth factor receptor family members and activate kinase-mediated downstream signaling pathways. ERBB2 gene is located on chromosome 17 and is amplified in a subset of cancers, such as breast, gastric, and colon cancer. Of particular interest to the Chromosome-Centric Human Proteome Project (C-HPP) initiative is the amplification mechanism that typically results in overexpression of a set of genes adjacent to ERBB2, which provides evidence of a linkage between gene location and expression. In this report we studied patient samples from ERBB2-positive together with adjacent control nontumor tissues. In addition, non-ERBB2-expressing patient samples were selected as comparison to study the effect of expression of this oncogene. We detected 196 proteins in ERBB2-positive patient tumor samples that had minimal overlap (29 proteins) with the non-ERBB2 tumor samples. Interaction and pathway analysis identified extracellular signal regulated kinase (ERK) cascade and actin polymerization and actinmyosin assembly contraction as pathways of importance in ERBB2+ and ERBB2- gastric cancer samples, respectively. The raw data files are deposited at ProteomeXchange (identifier: PXD002674) as well as GPMDB.

    View details for DOI 10.1021/acs.jproteome.5b00827

    View details for PubMedID 26435392

  • Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans GENOME RESEARCH Cenik, C., Cenik, E. S., Byeon, G. W., Grubert, F., Candille, S. I., Spacek, D., Alsallakh, B., Tilgner, H., Araya, C. L., Tang, H., Ricci, E., Snyder, M. P. 2015; 25 (11): 1610-1621

    Abstract

    Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation.

    View details for DOI 10.1101/gr.193342.115

    View details for Web of Science ID 000364355600003

    View details for PubMedID 26297486

    View details for PubMedCentralID PMC4617958

  • Design and Implementation of the International Genetics and Translational Research in Transplantation Network TRANSPLANTATION Keating, B. J., van Setten, J., Jacobson, P. A., Holmes, M. V., Verma, S. S., Chandrupatla, H. R., Nair, N., Gao, H., Li, Y. R., Chang, B., Wong, C., Phillips, R., Cole, B. S., Mukhtar, E., Zhang, W., Cao, H., Mohebnasab, M., Hou, C., Lee, T., Steel, L., Shaked, O., Garifallou, J., Miller, M. B., Karczewski, K. J., Akdere, A., Gonzalez, A., Lloyd, K. M., McGinn, D., Michaud, Z., Colasacco, A., Lek, M., Fu, Y., Pawashe, M., Guettouche, T., Himes, A., Perez, L., Guan, W., Wu, B., Schladt, D., Menon, M., Zhang, Z., Tragante, V., de Jonge, N., Otten, H. G., de Weger, R. A., van de Graaf, E. A., Baan, C. C., Manintveld, O. C., De Vlaminck, I., Piening, B. D., Strehl, C., Shaw, M., Snieder, H., Klintmalm, G. B., O'Leary, J. G., Amaral, S., Goldfarb, S., Rand, E., Rossano, J. W., Kohli, U., Heeger, P., Stahl, E., Christie, J. D., Fuentes, M. H., Levine, J. E., Aplenc, R., Schadt, E. E., Stranger, B. E., Kluin, J., Potena, L., Zuckermann, A., Khush, K., Alzahrani, A. J., Al-Muhanna, F. A., Al-Ali, A. K., Al-Ali, R., Al-Rubaish, A. M., Al-Mueilo, S., Byrne, E. M., Miller, D., Alexander, S. I., Onengut-Gumuscu, S., Rich, S. S., Suthanthiran, M., Tedesco, H., Saw, C. L., Ragoussis, J., Kfoury, A. G., Horne, B., Carlquist, J., Gerstein, M. B., Reindl-Schwaighofer, R., Oberbauer, R., Wijmenga, C., Palmer, S., Pereira, A. C., Segovia, J., Alonso-Pulpon, L. A., Comez-Bueno, M., Vilches, C., Jaramillo, N., de Borst, M. H., Naesens, M., Hao, K., MacArthur, D., Balasubramanian, S., Conlon, P. J., Lord, G. M., Ritchie, M. D., Snyder, M., Olthoff, K. M., Moore, J. H., Petersdorf, E. W., Kamoun, M., Wang, J., Monos, D. S., de Bakker, P. I., Hakonarson, H., Murphy, B., Lankree, M. B., Garcia-Pavia, P., Oetting, W. S., Birdwell, K. A., Bakker, S. J., Israni, A. K., Shaked, A., Asselbergs, F. W. 2015; 99 (11): 2401-2412

    Abstract

    Genetic association studies of transplantation outcomes have been hampered by small samples and highly complex multifactorial phenotypes, hindering investigations of the genetic architecture of a range of comorbidities which significantly impact graft and recipient life expectancy. We describe here the rationale and design of the International Genetics & Translational Research in Transplantation Network. The network comprises 22 studies to date, including 16494 transplant recipients and 11669 donors, of whom more than 5000 are of non-European ancestry, all of whom have existing genomewide genotype data sets.We describe the rich genetic and phenotypic information available in this consortium comprising heart, kidney, liver, and lung transplant cohorts.We demonstrate significant power in International Genetics & Translational Research in Transplantation Network to detect main effect association signals across regions such as the MHC region as well as genomewide for transplant outcomes that span all solid organs, such as graft survival, acute rejection, new onset of diabetes after transplantation, and for delayed graft function in kidney only.This consortium is designed and statistically powered to deliver pioneering insights into the genetic architecture of transplant-related outcomes across a range of different solid-organ transplant studies. The study design allows a spectrum of analyses to be performed including recipient-only analyses, donor-recipient HLA mismatches with focus on loss-of-function variants and nonsynonymous single nucleotide polymorphisms.

    View details for DOI 10.1097/TP.0000000000000913

    View details for Web of Science ID 000369087800037

    View details for PubMedCentralID PMC4623847

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data PLOS GENETICS Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)

    Abstract

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for Web of Science ID 000364401600008

    View details for PubMedID 26448358

    View details for PubMedCentralID PMC4598191

  • Mango: a bias-correcting ChIA-PET analysis pipeline. Bioinformatics Phanstiel, D. H., Boyle, A. P., Heidari, N., Snyder, M. P. 2015; 31 (19): 3092-3098

    Abstract

    Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) is an established method for detecting genome-wide looping interactions at high resolution. Current ChIA-PET analysis software packages either fail to correct for non-specific interactions due to genomic proximity or only address a fraction of the steps required for data processing. We present Mango, a complete ChIA-PET data analysis pipeline that provides statistical confidence estimates for interactions and corrects for major sources of bias including differential peak enrichment and genomic proximity.Comparison to the existing software packages, ChIA-PET Tool and ChiaSig revealed that Mango interactions exhibit much better agreement with high-resolution Hi-C data. Importantly, Mango executes all steps required for processing ChIA-PET datasets, whereas ChiaSig only completes 20% of the required steps. Application of Mango to multiple available ChIA-PET datasets permitted the independent rediscovery of known trends in chromatin loops including enrichment of CTCF, RAD21, SMC3 and ZNF143 at the anchor regions of interactions and strong bias for convergent CTCF motifs.Mango is open source and distributed through github at https://github.com/dphansti/mango.mpsnyder@standford.eduSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btv336

    View details for PubMedID 26034063

    View details for PubMedCentralID PMC4592333

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data. PLoS genetics Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)

    Abstract

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for PubMedID 26448358

  • Exome Sequencing of Neonatal Blood Spots and the Identification of Genes Implicated in Bronchopulmonary Dysplasia. American journal of respiratory and critical care medicine Li, J., Yu, K., Oehlert, J., Jeliffe-Pawlowski, L. L., Gould, J. B., Stevenson, D. K., Snyder, M., Shaw, G. M., O'Brodovich, H. M. 2015; 192 (5): 589-596

    Abstract

    Bronchopulmonary dysplasia (BPD), a prevalent severe lung disease of premature infants, has a strong genetic component. Large-scale genome-wide association studies for common variants have not revealed its genetic basis.Given the historical high mortality rate of extremely preterm infants who now survive and develop BPD, we hypothesized that risk loci underlying this disease are under severe purifying selection during evolution; thus, rare variants likely explain greater risk of the disease.We performed exome sequencing on 50 BPD-affected and unaffected twin pairs using DNA isolated from neonatal blood spots and identified genes affected by extremely rare nonsynonymous mutations. Functional genomic approaches were then used to systematically compare these affected genes.We identified 258 genes with rare nonsynonymous mutations in patients with BPD. These genes were highly enriched for processes involved in pulmonary structure and function including collagen fibril organization, morphogenesis of embryonic epithelium, and regulation of Wnt signaling pathway; displayed significantly elevated expression in fetal and adult lungs; and were substantially up-regulated in a murine model of BPD. Analyses of mouse mutants revealed their phenotypic enrichment for embryonic development and the cyanosis phenotype, a clinical manifestation of BPD.Our study supports the role of rare variants in BPD, in contrast with the role of common variants targeted by genome-wide association studies. Overall, our study is the first to sequence BPD exomes from newborn blood spot samples and identify with high confidence genes implicated in BPD, thereby providing important insights into its biology and molecular etiology.

    View details for DOI 10.1164/rccm.201501-0168OC

    View details for PubMedID 26030808

  • Genomic analysis of mycosis fungoides and Sézary syndrome identifies recurrent alterations in TNFR2. Nature genetics Ungewickell, A., Bhaduri, A., Rios, E., Reuter, J., Lee, C. S., Mah, A., Zehnder, A., Ohgami, R., Kulkarni, S., Armstrong, R., Weng, W., Gratzinger, D., Tavallaee, M., Rook, A., Snyder, M., Kim, Y., Khavari, P. A. 2015; 47 (9): 1056-1060

    Abstract

    Mycosis fungoides and Sézary syndrome comprise the majority of cutaneous T cell lymphomas (CTCLs), disorders notable for their clinical heterogeneity that can present in skin or peripheral blood. Effective treatment options for CTCL are limited, and the genetic basis of these T cell lymphomas remains incompletely characterized. Here we report recurrent point mutations and genomic gains of TNFRSF1B, encoding the tumor necrosis factor receptor TNFR2, in 18% of patients with mycosis fungoides and Sézary syndrome. Expression of the recurrent TNFR2 Thr377Ile mutant in T cells leads to enhanced non-canonical NF-κB signaling that is sensitive to the proteasome inhibitor bortezomib. Using an integrative genomic approach, we additionally discovered a recurrent CTLA4-CD28 fusion, as well as mutations in downstream signaling mediators of these receptors.

    View details for DOI 10.1038/ng.3370

    View details for PubMedID 26258847

  • Evaluating Common Humoral Responses against Fungal Infections with Yeast Protein Microarrays JOURNAL OF PROTEOME RESEARCH Coelho, P. S., Im, H., Clemons, K. V., Snyder, M. P., Stevens, D. A. 2015; 14 (9): 3924-3931

    Abstract

    We profiled the global immunoglobulin response against fungal infection by using yeast protein microarrays. Groups of CD-1 mice were infected systemically with human fungal pathogens (Coccidioides posadasii, Candida albicans, or Paracoccidioides brasiliensis) or inoculated with PBS as a control. Another group was inoculated with heat-killed yeast (HKY) of Saccharomyces cerevisiae. After 30 days, serum from mice in the groups were collected and used to probe S. cerevisiae protein microarrays containing 4800 full-length glutathione S-transferase (GST)-fusion proteins. Antimouse IgG conjugated with Alexafluor 555 and anti-GST antibody conjugated with Alexafluor 647 were used to detect antibody-antigen interactions and the presence of GST-fusion proteins, respectively. Serum after infection with C. albicans reacted with 121 proteins: C. posadasii, 81; P. brasiliensis, 67; and after HKY, 63 proteins on the yeast protein microarray, respectively. We identified a set of 16 antigenic proteins that were shared across the three fungal pathogens. These include retrotransposon capsid proteins, heat shock proteins, and mitochondrial proteins. Five of these proteins were identified in our previous study of fungal cell wall by mass spectrometry (Ann. N. Y. Acad. Sci. 2012, 1273, 44-51). The results obtained give a comprehensive view of the immunological responses to fungal infections at the proteomic level. They also offer insight into immunoreactive protein commonality among several fungal pathogens and provide a basis for a panfungal vaccine.

    View details for DOI 10.1021/acs.jproteome.5b00365

    View details for PubMedID 26258609

  • RNA Sequencing Analysis Detection of a Novel Pathway of Endothelial Dysfunction in Pulmonary Arterial Hypertension AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE Rhodes, C. J., Im, H., Cao, A., Hennigs, J. K., Wang, L., Sa, S., Chen, P., Nickel, N. P., Miyagawa, K., Hopper, R. K., Tojais, N. F., Li, C. G., Gu, M., Spiekerkoetter, E., Xian, Z., Chen, R., Zhao, M., Kaschwich, M., del Rosario, P. A., Bernstein, D., Zamanian, R. T., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2015; 192 (3): 356-366

    Abstract

    Pulmonary arterial hypertension is characterized by endothelial dysregulation, but global changes in gene expression have not been related to perturbations in function.RNA sequencing was utilized to discriminate changes in transcriptomes of endothelial cells cultured from lungs of patients with idiopathic pulmonary arterial hypertension vs. controls and to assess the functional significance of major differentially expressed transcripts.The endothelial transcriptomes from seven control and six idiopathic pulmonary arterial hypertension patients' lungs were analyzed. Differentially expressed genes were related to BMPR2 signaling. Those downregulated were assessed for function in cultured cells, and in a transgenic mouse.Fold-differences in ten genes were significant (p<0.05), four increased and six decreased in patients vs.No patient was mutant for BMPR2. However, knockdown of BMPR2 by siRNA in control pulmonary arterial endothelial cells recapitulated six/ten patient-related gene changes, including decreased collagen IV (COL4A1, COL4A2) and ephrinA1 (EFNA1). Reduction of BMPR2 regulated transcripts was related to decreased β-catenin. Reducing COL4A1, COL4A2 and EFNA1 by siRNA inhibited pulmonary endothelial adhesion, migration and tube formation. In mice null for the EFNA1 receptor, EphA2, vs. controls, VEGF receptor blockade and hypoxia caused more severe pulmonary hypertension, judged by elevated right ventricular systolic pressure, right ventricular hypertrophy and loss of small arteries.The novel relationship between BMPR2 dysfunction and reduced expression of endothelial COL4 and EFNA1 may underlie vulnerability to injury in pulmonary arterial hypertension.

    View details for DOI 10.1164/rccm.201408-1528OC

    View details for PubMedID 26030479

  • Probing High-density Functional Protein Microarrays to Detect Protein-protein Interactions JOVE-JOURNAL OF VISUALIZED EXPERIMENTS Fasolo, J., Im, H., Snyder, M. P. 2015

    Abstract

    High-density functional protein microarrays containing ~4,200 recombinant yeast proteins are examined for kinase protein-protein interactions using an affinity purified yeast kinase fusion protein containing a V5-epitope tag for read-out. Purified kinase is obtained through culture of a yeast strain optimized for high copy protein production harboring a plasmid containing a Kinase-V5 fusion construct under a GAL inducible promoter. The yeast is grown in restrictive media with a neutral carbon source for 6 hr followed by induction with 2% galactose. Next, the culture is harvested and kinase is purified using standard affinity chromatographic techniques to obtain a highly purified protein kinase for use in the assay. The purified kinase is diluted with kinase buffer to an appropriate range for the assay and the protein microarrays are blocked prior to hybridization with the protein microarray. After the hybridization, the arrays are probed with monoclonal V5 antibody to identify proteins bound by the kinase-V5 protein. Finally, the arrays are scanned using a standard microarray scanner, and data is extracted for downstream informatics analysis to determine a high confidence set of protein interactions for downstream validation in vivo.

    View details for DOI 10.3791/51872

    View details for Web of Science ID 000361537100003

    View details for PubMedID 26274875

    View details for PubMedCentralID PMC4545172

  • Single-cell chromatin accessibility reveals principles of regulatory variation NATURE Buenostro, J. D., Wu, B., Litzenburger, U. M., Ruff, D., Gonzales, M. L., Snyder, M. P., Chang, H. Y., Greenleaf, W. J. 2015; 523 (7561): 486-U264

    Abstract

    Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

    View details for DOI 10.1038/nature14590

    View details for Web of Science ID 000358378900042

  • Single-cell chromatin accessibility reveals principles of regulatory variation. Nature Buenrostro, J. D., Wu, B., Litzenburger, U. M., Ruff, D., Gonzales, M. L., Snyder, M. P., Chang, H. Y., Greenleaf, W. J. 2015; 523 (7561): 486-490

    Abstract

    Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

    View details for DOI 10.1038/nature14590

    View details for PubMedID 26083756

  • Achieving high-sensitivity for clinical applications using augmented exome sequencing GENOME MEDICINE Patwardhan, A., Harris, J., Leng, N., Bartha, G., Church, D. M., Luo, S., Haudenschild, C., Pratt, M., Zook, J., Salit, M., Tirch, J., Morra, M., Chervitz, S., Li, M., Clark, M., Garcia, S., Chandratillake, G., Kirk, S., Ashley, E., Snyder, M., Altman, R., Bustamante, C., Butte, A. J., West, J., Chen, R. 2015; 7

    Abstract

    Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity.We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms.Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.

    View details for DOI 10.1186/s13073-015-0197-4

    View details for Web of Science ID 000359428300001

    View details for PubMedID 26269718

    View details for PubMedCentralID PMC4534066

  • Recurrent somatic mutations in regulatory regions of human cancer genomes NATURE GENETICS Melton, C., Reuter, J. A., Spacek, D. V., Snyder, M. 2015; 47 (7): 710-?

    Abstract

    Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus-specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

    View details for DOI 10.1038/ng.3332

    View details for Web of Science ID 000357090300007

    View details for PubMedID 26053494

    View details for PubMedCentralID PMC4485503

  • Where Next for Genetics and Genomics? PLOS BIOLOGY Tyler-Smith, C., Yang, H., Landweber, L. F., Dunham, I., Knoppers, B. M., Donnelly, P., Mardis, E. R., Snyder, M., McVean, G. 2015; 13 (7)

    Abstract

    The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? PLOS Biology asked eight leaders spanning a range of related areas to give us their predictions. Without exception, the predictions are for more data on a massive scale and of more diverse types. All are optimistic and predict enormous positive impact on scientific understanding, while a recurring theme is the benefit of such data for the transformation and personalization of medicine. Several also point out that the biggest changes will very likely be those that we don't foresee, even now.

    View details for DOI 10.1371/journal.pbio.1002216

    View details for Web of Science ID 000360617100023

    View details for PubMedCentralID PMC4520474

  • Metabolome progression during early gut microbial colonization of gnotobiotic mice SCIENTIFIC REPORTS Marcobal, A., Yusufaly, T., Higginbottom, S., Snyder, M., Sonnenburg, J. L., Mias, G. I. 2015; 5

    Abstract

    The microbiome has been implicated directly in host health, especially host metabolic processes and development of immune responses. These are particularly important in infants where the gut first begins being colonized, and such processes may be modeled in mice. In this investigation we follow longitudinally the urine metabolome of ex-germ-free mice, which are colonized with two bacterial species, Bacteroides thetaiotaomicron and Bifidobacterium longum. High-throughput mass spectrometry profiling of urine samples revealed dynamic changes in the metabolome makeup, associated with the gut bacterial colonization, enabled by our adaptation of non-linear time-series analysis to urine metabolomics data. Results demonstrate both gradual and punctuated changes in metabolite production and that early colonization events profoundly impact the nature of small molecules circulating in the host. The identified small molecules are implicated in amino acid and carbohydrate metabolic processes, and offer insights into the dynamic changes occurring during the colonization process, using high-throughput longitudinal methodology.

    View details for DOI 10.1038/srep11589

    View details for Web of Science ID 000357041700001

    View details for PubMedID 26118551

    View details for PubMedCentralID PMC4484351

  • Transglutaminase 4 as a prostate autoantigen in male subfertility SCIENCE TRANSLATIONAL MEDICINE Landegren, N., Sharon, D., Shum, A. K., Khan, I. S., Fasano, K. J., Hallgren, A., Kampf, C., Freyhult, E., Ardesjo-Lundgren, B., Alimohammadi, M., Rathsman, S., Ludvigsson, J. F., Lundh, D., Motrich, R., Rivero, V., Fong, L., Giwercman, A., Gustafsson, J., Perheentupa, J., Husebye, E. S., Anderson, M. S., Snyder, M., Kampe, O. 2015; 7 (292)

    Abstract

    Autoimmune polyendocrine syndrome type 1 (APS1), a monogenic disorder caused by AIRE gene mutations, features multiple autoimmune disease components. Infertility is common in both males and females with APS1. Although female infertility can be explained by autoimmune ovarian failure, the mechanisms underlying male infertility have remained poorly understood. We performed a proteome-wide autoantibody screen in APS1 patient sera to assess the autoimmune response against the male reproductive organs. By screening human protein arrays with male and female patient sera and by selecting for gender-imbalanced autoantibody signals, we identified transglutaminase 4 (TGM4) as a male-specific autoantigen. Notably, TGM4 is a prostatic secretory molecule with critical role in male reproduction. TGM4 autoantibodies were detected in most of the adult male APS1 patients but were absent in all the young males. Consecutive serum samples further revealed that TGM4 autoantibodies first presented during pubertal age and subsequent to prostate maturation. We assessed the animal model for APS1, the Aire-deficient mouse, and found spontaneous development of TGM4 autoantibodies specifically in males. Aire-deficient mice failed to present TGM4 in the thymus, consistent with a defect in central tolerance for TGM4. In the mouse, we further link TGM4 immunity with a destructive prostatitis and compromised secretion of TGM4. Collectively, our findings in APS1 patients and Aire-deficient mice reveal prostate autoimmunity as a major manifestation of APS1 with potential role in male subfertility.

    View details for DOI 10.1126/scitranslmed.aaa9186

    View details for PubMedID 26084804

  • Transcriptome Signature and Regulation in Human Somatic Cell Reprogramming STEM CELL REPORTS Tanaka, Y., Hysolli, E., Su, J., Xiang, Y., Kim, K., Zhong, M., Li, Y., Heydari, K., Euskirchen, G., Snyder, M. P., Pan, X., Weissman, S. M., Park, I. 2015; 4 (6): 1125-1139

    Abstract

    Reprogramming of somatic cells produces induced pluripotent stem cells (iPSCs) that are invaluable resources for biomedical research. Here, we extended the previous transcriptome studies by performing RNA-seq on cells defined by a combination of multiple cellular surface markers. We found that transcriptome changes during early reprogramming occur independently from the opening of closed chromatin by OCT4, SOX2, KLF4, and MYC (OSKM). Furthermore, our data identify multiple spliced forms of genes uniquely expressed at each progressive stage of reprogramming. In particular, we found a pluripotency-specific spliced form of CCNE1 that is specific to human and significantly enhances reprogramming. In addition, single nucleotide polymorphism (SNP) expression analysis reveals that monoallelic gene expression is induced in the intermediate stages of reprogramming, while biallelic expression is recovered upon completion of reprogramming. Our transcriptome data provide unique opportunities in understanding human iPSC reprogramming.

    View details for DOI 10.1016/j.stemcr.2015.04.009

    View details for Web of Science ID 000356068100017

    View details for PubMedID 26004630

    View details for PubMedCentralID PMC4471828

  • Optimized Analytical Procedures for the Untargeted Metabolomic Profiling of Human Urine and Plasma by Combining Hydrophilic Interaction (HILIC) and Reverse-Phase Liquid Chromatography (RPLC)-Mass Spectrometry MOLECULAR & CELLULAR PROTEOMICS Contrepois, K., Jiang, L., Snyder, M. 2015; 14 (6): 1684-1695

    Abstract

    Profiling of body fluids is crucial for monitoring and discovering metabolic markers of health and disease and for providing insights into human physiology. Since human urine and plasma each contain an extreme diversity of metabolites, a single liquid chromatographic system when coupled to mass spectrometry (MS) is not sufficient to achieve reasonable metabolome coverage. Hydrophilic interaction liquid chromatography (HILIC) offers complementary information to reverse-phase liquid chromatography (RPLC) by retaining polar metabolites. With the objective of finding the optimal combined chromatographic solution to profile urine and plasma, we systematically investigated the performance of five HILIC columns with different chemistries operated at three different pH (acidic, neutral, basic) and five C18-silica RPLC columns. The zwitterionic column ZIC-HILIC operated at neutral pH provided optimal performance on a large set of hydrophilic metabolites. The RPLC columns Hypersil GOLD and Zorbax SB aq were proven to be best suited for the metabolic profiling of urine and plasma, respectively. Importantly, the optimized HILIC-MS method showed excellent intrabatch peak area reproducibility (CV < 12%) and good long-term interbatch (40 days) peak area reproducibility (CV < 22%) that were similar to those of RPLC-MS procedures. Finally, combining the optimal HILIC- and RPLC-MS approaches greatly expanded metabolome coverage with 44% and 108% new metabolic features detected compared with RPLC-MS alone for urine and plasma, respectively. The proposed combined LC-MS approaches improve the comprehensiveness of global metabolic profiling of body fluids and thus are valuable for monitoring and discovering metabolic changes associated with health and disease in clinical research studies.

    View details for DOI 10.1074/mcp.M114.046508

    View details for Web of Science ID 000355550400019

    View details for PubMedID 25787789

    View details for PubMedCentralID PMC4458729

  • AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae (vol 10, e0120671, 2015) PLOS ONE Song, G., Dickins, B. A., Demeter, J., Engel, S., Gallagher, J., Choe, K., Dunn, B., Snyder, M., Cherry, J. 2015; 10 (5): e0129184

    View details for DOI 10.1371/journal.pone.0129184

    View details for Web of Science ID 000355185600125

    View details for PubMedID 26017550

    View details for PubMedCentralID PMC4446291

  • High-Throughput Sequencing Technologies MOLECULAR CELL Reuter, J. A., Spacek, D. V., Snyder, M. P. 2015; 58 (4): 586-597

    Abstract

    The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.

    View details for DOI 10.1016/j.molcel.2015.05.004

    View details for Web of Science ID 000355154000007

    View details for PubMedCentralID PMC4494749

  • High-throughput sequencing technologies. Molecular cell Reuter, J. A., Spacek, D. V., Snyder, M. P. 2015; 58 (4): 586-597

    Abstract

    The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.

    View details for DOI 10.1016/j.molcel.2015.05.004

    View details for PubMedID 26000844

  • Characterization of Novel Transcripts in Pseudorabies Virus VIRUSES-BASEL Tombacz, D., Csabai, Z., Olah, P., Havelda, Z., Sharon, D., Snyder, M., Boldogkoi, Z. 2015; 7 (5): 2727-2744

    Abstract

    In this study we identified two 3'-coterminal RNA molecules in the pseudorabies virus. The highly abundant short transcript (CTO-S) proved to be encoded between the ul21 and ul22 genes in close vicinity of the replication origin (OriL) of the virus. The less abundant long RNA molecule (CTO-L) is a transcriptional readthrough product of the ul21 gene and overlaps OriL. These polyadenylated RNAs were characterized by ascertaining their nucleotide sequences with the Illumina HiScanSQ and Pacific Biosciences Real-Time (PacBio RSII) sequencing platforms and by analyzing their transcription kinetics through use of multi-time-point Real-Time RT-PCR and the PacBio RSII system. It emerged that transcription of the CTOs is fully dependent on the viral transactivator protein IE180 and CTO-S is not a microRNA precursor. We propose an interaction between the transcription and replication machineries at this genomic location, which might play an important role in the regulation of DNA synthesis.

    View details for DOI 10.3390/v7052727

    View details for Web of Science ID 000356228700027

    View details for PubMedID 26008709

    View details for PubMedCentralID PMC4452928

  • Impact of allele-specific peptides in proteome quantification PROTEOMICS CLINICAL APPLICATIONS Wu, L., Snyder, M. 2015; 9 (3-4): 432-436

    Abstract

    MS-based proteome technologies have greatly improved our ability to detect and quantify proteomes across various biological samples. High throughput bottom-up proteome profiling in combination with targeted MS method, e.g. SRM assay, is emerging as a powerful approach in the field of biomarker discovery. In the past few years, increasing number of studies have attempted to integrate genomic and proteomic data for biomarker discovery. Here, we describe how allele-specific peptide can be applied in biomarker discovery and their impact in protein quantification.

    View details for DOI 10.1002/prca.201400126

    View details for Web of Science ID 000353291000019

    View details for PubMedID 25676416

    View details for PubMedCentralID PMC4448739

  • Reassessment of Piwi Binding to the Genome and Piwi Impact on RNA Polymerase II Distribution DEVELOPMENTAL CELL Lin, H., Chen, M., Kundaje, A., Valouev, A., Yin, H., Liu, N., Neuenkirchen, N., Zhong, M., Snyder, M. 2015; 32 (6): 772-774

    Abstract

    Drosophila Piwi was reported by Huang et al. (2013) to be guided by piRNAs to piRNA-complementary sites in the genome, which then recruits heterochromatin protein 1a and histone methyltransferase Su(Var)3-9 to the sites. Among additional findings, Huang et al. (2013) also reported Piwi binding sites in the genome and the reduction of RNA polymerase II in euchromatin but its increase in pericentric regions in piwi mutants. Marinov et al. (2015) disputed the validity of the Huang et al. bioinformatic pipeline that led to the last two claims. Here we report our independent reanalysis of the data using current bioinformatic methods. Our reanalysis agrees with Marinov et al. (2015) that Piwi's genomic targets still remain to be identified but confirms the Huang et al. claim that Piwi influences RNA polymerase II distribution in the genome. This Matters Arising Response addresses the Marinov et al. (2015) Matters Arising, published concurrently in this issue of Developmental Cell.

    View details for DOI 10.1016/j.devcel.2015.03.004

    View details for PubMedID 25805139

  • The conserved histone deacetylase Rpd3 and its DNA binding subunit Ume6 control dynamic transcript architecture during mitotic growth and meiotic development NUCLEIC ACIDS RESEARCH Lardenois, A., Stuparevic, I., Liu, Y., Law, M. J., Becker, E., Smagulova, F., Waern, K., Guilleux, M., Horecka, J., Chu, A., Kervarrec, C., Strich, R., Snyder, M., Davis, R. W., Steinmetz, L. M., Primig, M. 2015; 43 (1): 115-128

    Abstract

    It was recently reported that the sizes of many mRNAs change when budding yeast cells exit mitosis and enter the meiotic differentiation pathway. These differences were attributed to length variations of their untranslated regions. The function of UTRs in protein translation is well established. However, the mechanism controlling the expression of distinct transcript isoforms during mitotic growth and meiotic development is unknown. In this study, we order developmentally regulated transcript isoforms according to their expression at specific stages during meiosis and gametogenesis, as compared to vegetative growth and starvation. We employ regulatory motif prediction, in vivo protein-DNA binding assays, genetic analyses and monitoring of epigenetic amino acid modification patterns to identify a novel role for Rpd3 and Ume6, two components of a histone deacetylase complex already known to repress early meiosis-specific genes in dividing cells, in mitotic repression of meiosis-specific transcript isoforms. Our findings classify developmental stage-specific early, middle and late meiotic transcript isoforms, and they point to a novel HDAC-dependent control mechanism for flexible transcript architecture during cell growth and differentiation. Since Rpd3 is highly conserved and ubiquitously expressed in many tissues, our results are likely relevant for development and disease in higher eukaryotes.

    View details for DOI 10.1093/nar/gku1185

    View details for Web of Science ID 000350207100017

    View details for PubMedID 25477386

    View details for PubMedCentralID PMC4288150

  • Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing. G3 (Bethesda, Md.) Shoemaker, L. D., Clark, M. J., Patwardhan, A., Chandratillake, G., Garcia, S., Chen, R., Morgan, A. A., Leng, N., Kirk, S., Chen, R., Cook, D. J., Snyder, M., Steinberg, G. K. 2015; 6 (1): 41-49

    Abstract

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10-12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10(-5)) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10(-4)) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10(-4)) and non-RNF213 founder mutation (P = 1.51×10(-3)) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10(-4)) and non-RNF213 founder mutation cases (P = 5.31×10(-5)). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis.

    View details for DOI 10.1534/g3.115.020321

    View details for PubMedID 26530418

  • AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae. PloS one Song, G., Dickins, B. J., Demeter, J., Engel, S., Gallagher, J., Choe, K., Dunn, B., Snyder, M., Cherry, J. M. 2015; 10 (3)

    Abstract

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

    View details for DOI 10.1371/journal.pone.0120671

    View details for PubMedID 25781462

  • Where Next for Genetics and Genomics? PLoS biology Tyler-Smith, C., Yang, H., Landweber, L. F., Dunham, I., Knoppers, B. M., Donnelly, P., Mardis, E. R., Snyder, M., McVean, G. 2015; 13 (7): e1002216

    Abstract

    The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? PLOS Biology asked eight leaders spanning a range of related areas to give us their predictions. Without exception, the predictions are for more data on a massive scale and of more diverse types. All are optimistic and predict enormous positive impact on scientific understanding, while a recurring theme is the benefit of such data for the transformation and personalization of medicine. Several also point out that the biggest changes will very likely be those that we don't foresee, even now.

    View details for PubMedID 26225775

  • Metformin Improves Diabetic Bone Health by Re-Balancing Catabolism and Nitrogen Disposal. PloS one Li, X., Guo, Y., Yan, W., Snyder, M. P., Li, X. 2015; 10 (12)

    Abstract

    Metformin, a leading drug used to treat diabetic patients, is reported to benefit bone homeostasis under hyperglycemia in animal models. However, both the molecular targets and the biological pathways affected by metformin in bone are not well identified or characterized. The objective of this study is to investigate the bioengergeric pathways affected by metformin in bone marrow cells of mice.Metabolite levels were examined in bone marrow samples extracted from metformin or PBS -treated healthy (Wild type) and hyperglycemic (diabetic) mice using liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. We applied an untargeted high performance LC-MS approach which combined multimode chromatography (ion exchange, reversed phase and hydrophilic interaction (HILIC)) and Orbitrap-based ultra-high accuracy mass spectrometry to achieve a wide coverage. A multivariate clustering was applied to reveal the global trends and major metabolite players.A total of 346 unique metabolites were identified, and they are grouped into distinctive clusters that reflected general and diabetes-specific responses to metformin. As evidenced by changes in the TCA and urea cycles, increased catabolism and nitrogen waste that are commonly associated with diabetes were rebalanced upon treatment with metformin. In particular, we found glutamate and succinate whose levels were drastically elevated in diabetic animals were brought back to normal levels by metformin. These two metabolites were further validated as the major targets of metformin in bone marrow stromal cells.Overall using limited sample size, our study revealed the metabolic pathways modulated by metformin in bones which have broad implication in our understanding of bone remodeling under hyperglycemia and in finding therapeutic interventions in mammals.

    View details for DOI 10.1371/journal.pone.0146152

    View details for PubMedID 26716870

  • Achieving high-sensitivity for clinical applications using augmented exome sequencing. Genome medicine Patwardhan, A., Harris, J., Leng, N., Bartha, G., Church, D. M., Luo, S., Haudenschild, C., Pratt, M., Zook, J., Salit, M., Tirch, J., Morra, M., Chervitz, S., Li, M., Clark, M., Garcia, S., Chandratillake, G., Kirk, S., Ashley, E., Snyder, M., Altman, R., Bustamante, C., Butte, A. J., West, J., Chen, R. 2015; 7 (1): 71-?

    Abstract

    Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity.We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms.Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.

    View details for DOI 10.1186/s13073-015-0197-4

    View details for PubMedID 26269718

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2015; 6: 8085-?

    View details for DOI 10.1038/ncomms9085

    View details for PubMedID 26333996

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2015; 6: 8085-?

    Abstract

    Generalized lymphatic dysplasia (GLD) is a rare form of primary lymphoedema characterized by a uniform, widespread lymphoedema affecting all segments of the body, with systemic involvement such as intestinal and/or pulmonary lymphangiectasia, pleural effusions, chylothoraces and/or pericardial effusions. This may present prenatally as non-immune hydrops. Here we report homozygous and compound heterozygous mutations in PIEZO1, resulting in an autosomal recessive form of GLD with a high incidence of non-immune hydrops fetalis and childhood onset of facial and four limb lymphoedema. Mutations in PIEZO1, which encodes a mechanically activated ion channel, have been reported with autosomal dominant dehydrated hereditary stomatocytosis and non-immune hydrops of unknown aetiology. Besides its role in red blood cells, our findings indicate that PIEZO1 is also involved in the development of lymphatic structures.

    View details for DOI 10.1038/ncomms9085

    View details for PubMedID 26333996

  • Whole-Exome Enrichment with the Agilent SureSelect Human All Exon Platform. Cold Spring Harbor protocols Chen, R., Im, H., Snyder, M. 2015; 2015 (7): pdb prot083659-?

    Abstract

    There are multiple platforms available for whole-exome enrichment and sequencing (WES). This protocol is based on the Agilent SureSelect Human All Exon platform, which targets ∼50 Mb of the human exonic regions. The SureSelect system uses ∼120-base RNA probes to capture known coding DNA sequences (CDS) from the NCBI Consensus CDS Database as well as other major RNA coding sequence databases, such as Sanger miRBase. The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.

    View details for DOI 10.1101/pdb.prot083659

    View details for PubMedID 25762417

  • Metabolome progression during early gut microbial colonization of gnotobiotic mice. Scientific reports Marcobal, A., Yusufaly, T., Higginbottom, S., Snyder, M., Sonnenburg, J. L., Mias, G. I. 2015; 5: 11589-?

    Abstract

    The microbiome has been implicated directly in host health, especially host metabolic processes and development of immune responses. These are particularly important in infants where the gut first begins being colonized, and such processes may be modeled in mice. In this investigation we follow longitudinally the urine metabolome of ex-germ-free mice, which are colonized with two bacterial species, Bacteroides thetaiotaomicron and Bifidobacterium longum. High-throughput mass spectrometry profiling of urine samples revealed dynamic changes in the metabolome makeup, associated with the gut bacterial colonization, enabled by our adaptation of non-linear time-series analysis to urine metabolomics data. Results demonstrate both gradual and punctuated changes in metabolite production and that early colonization events profoundly impact the nature of small molecules circulating in the host. The identified small molecules are implicated in amino acid and carbohydrate metabolic processes, and offer insights into the dynamic changes occurring during the colonization process, using high-throughput longitudinal methodology.

    View details for DOI 10.1038/srep11589

    View details for PubMedID 26118551

    View details for PubMedCentralID PMC4484351

  • Genomic analysis of fibrolamellar hepatocellular carcinoma. Human molecular genetics Xu, L., Hazard, F. K., Zmoos, A., Jahchan, N., Chaib, H., Garfin, P. M., Rangaswami, A., Snyder, M. P., Sage, J. 2015; 24 (1): 50-63

    Abstract

    Pediatric tumors are relatively infrequent but are often associated with significant lethality and lifelong morbidity. A major goal of pediatric cancer research has been to identify key drivers of tumorigenesis to eventually develop targeted therapies to enhance cure rate and minimize acute and long-term toxic effects. Here we used genomics approaches to identify biomarkers and candidate drivers for fibrolamellar hepatocellular carcinoma (FL-HCC), a very rare subtype of pediatric liver cancer for which limited therapeutic options exist. In-depth genomics analyses of one tumor followed by immunohistochemistry validation on seven other tumors showed expression of neuroendocrine markers in FL-HCC. DNA and RNA sequencing data further showed that common cancer pathways are not visibly altered in FL-HCC but identified two novel structural variants, both resulting in fusion transcripts. The first, a 400kb deletion, results in a DNAJ1-PRKCA fusion transcript, which leads to increased PKA activity in the index tumor case and other FL-HCC cases compared to normal liver. This PKA fusion protein is oncogenic in HCC cells. The second gene fusion event, a translocation between the CLPTML1 and GLIS3 genes, generates a transcript whose product also promotes cancer phenotypes in HCC cell lines. These experiments further highlight the tumorigenic role of gene fusions in the etiology of pediatric solid tumors and identify both candidate biomarkers and possible therapeutic targets for this lethal pediatric disease.

    View details for DOI 10.1093/hmg/ddu418

    View details for PubMedID 25122662

  • Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss BMC GENOMICS Haraksingh, R. R., Jahanbani, F., Rodriguez-Paris, J., Gelernter, J., Nadeau, K. C., Oghalai, J. S., Schrijver, I., Snyder, M. P. 2014; 15

    Abstract

    The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.

    View details for DOI 10.1186/1471-2164-15-1155

    View details for Web of Science ID 000209598100001

  • Genomic era diagnosis and management of hereditary and sporadic colon cancer. World journal of clinical oncology Esplin, E. D., Snyder, M. P. 2014; 5 (5): 1036-1047

    Abstract

    The morbidity and mortality attributable to heritable and sporadic carcinomas of the colon are substantial and affect children and adults alike. Despite current colonoscopy screening recommendations colorectal adenocarcinoma (CRC) still accounts for almost 140000 cancer cases yearly. Familial adenomatous polyposis (FAP) is a colon cancer predisposition due to alterations in the adenomatous polyposis coli gene, which is mutated in most CRC. Since the beginning of the genomic era next-generation sequencing analyses of CRC continue to improve our understanding of the genetics of tumorigenesis and promise to expand our ability to identify and treat this disease. Advances in genome sequence analysis have facilitated the molecular diagnosis of individuals with FAP, which enables initiation of appropriate monitoring and timely intervention. Genome sequencing also has potential clinical impact for individuals with sporadic forms of CRC, providing means for molecular diagnosis of CRC tumor type, data guiding selection of tumor targeted therapies, and pharmacogenomic profiles specifying patient specific drug tolerances. There is even a potential role for genomic sequencing in surveillance for recurrence, and early detection, of CRC. We review strategies for diagnostic assessment and management of FAP and sporadic CRC in the current genomic era, with emphasis on the current, and potential for future, impact of genome sequencing on the clinical care of these conditions.

    View details for DOI 10.5306/wjco.v5.i5.1036

    View details for PubMedID 25493239

    View details for PubMedCentralID PMC4259930

  • Widespread contribution of transposable elements to the innovation of gene regulatory networks GENOME RESEARCH Sundaram, V., Cheng, Y., Ma, Z., Li, D., Xing, X., Edge, P., Snyder, M. P., Wang, T. 2014; 24 (12): 1963-1976

    Abstract

    Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.

    View details for DOI 10.1101/gr.168872.113

    View details for Web of Science ID 000345810600005

    View details for PubMedID 25319995

    View details for PubMedCentralID PMC4248313

  • Genome-wide map of regulatory interactions in the human genome GENOME RESEARCH Heidari, N., Phanstiel, D. H., He, C., Grubert, F., Jahanbani, F., Kasowski, M., Zhang, M. Q., Snyder, M. P. 2014; 24 (12): 1905-1917

    Abstract

    Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) experiments targeting six broadly distributed factors. Bound regions covered 80% of DNase I hypersensitive sites including 99.7% of TSS and 98% of enhancers. Correlating this map with ChIP-seq and RNA-seq data sets revealed cohesin, CTCF, and ZNF143 as key components of three-dimensional chromatin structure and revealed how the distal chromatin state affects gene transcription. Comparison of interactions between cell types revealed that enhancer-promoter interactions were highly cell-type-specific. Construction and comparison of distal and proximal regulatory networks revealed stark differences in structure and biological function. Proximal binding events are enriched at genes with housekeeping functions, while distal binding events interact with genes involved in dynamic biological processes including response to stimulus. This study reveals new mechanistic and functional insights into regulatory region organization in the nucleus.

    View details for DOI 10.1101/gr.176586.114

    View details for PubMedID 25228660

  • A comparative encyclopedia of DNA elements in the mouse genome NATURE Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., De Sousa, B. L., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Santos, M. R., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-?

    Abstract

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for Web of Science ID 000345770600034

  • A comparative encyclopedia of DNA elements in the mouse genome. Nature Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., Lacerda de Sousa, B., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Ramalho Santos, M., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-364

    Abstract

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for PubMedID 25409824

  • Topologically associating domains are stable units of replication-timing regulation. Nature Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., Ren, B., Gilbert, D. M. 2014; 515 (7527): 402-405

    Abstract

    Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.

    View details for DOI 10.1038/nature13986

    View details for PubMedID 25409831

  • Principles of regulatory information conservation between mouse and human NATURE Cheng, Y., Ma, Z., Kim, B., Wu, W., Cayting, P., Boyle, A. P., Sundaram, V., Xing, X., Dogan, N., Li, J., Euskirchen, G., Lin, S., Lin, Y., Visel, A., Kawli, T., Yang, X., Patacsil, D., Keller, C. A., Giardine, B., Kundaje, A., Wang, T., Pennacchio, L. A., Weng, Z., Hardison, R. C., Snyder, M. P. 2014; 515 (7527): 371-?

    Abstract

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

    View details for DOI 10.1038/nature13985

    View details for Web of Science ID 000345770600036

    View details for PubMedCentralID PMC4343047

  • Topologically associating domains are stable units of replication-timing regulation NATURE Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Guelsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., Ren, B., Gilbert, D. M. 2014; 515 (7527): 402-?

    Abstract

    Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.

    View details for DOI 10.1038/nature13986

    View details for Web of Science ID 000345770600043

    View details for PubMedCentralID PMC4251741

  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway GENETICS IN MEDICINE Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B. 2014; 16 (10): 751-758

    Abstract

    Purpose:The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1.Methods:Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data.Results:All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele.Conclusion:NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Genet Med advance online publication 20 March 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.22.

    View details for DOI 10.1038/gim.2014.22

    View details for Web of Science ID 000342884500005

  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway. Genetics in medicine Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B. 2014; 16 (10): 751-758

    Abstract

    Purpose:The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1.Methods:Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data.Results:All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele.Conclusion:NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Genet Med advance online publication 20 March 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.22.

    View details for DOI 10.1038/gim.2014.22

    View details for PubMedID 24651605

  • Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics Phanstiel, D. H., Boyle, A. P., Araya, C. L., Snyder, M. P. 2014; 30 (19): 2808-2810

    Abstract

    Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser Extensible Data Paired-End (BEDPE). Sushi.R is open source and made publicly available through GitHub (https://github.com/dphansti/Sushi) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/Sushi.html).mpsnyder@stanford.edu or dphansti@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btu379

    View details for PubMedID 24903420

  • Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures BIOINFORMATICS Phanstiel, D. H., Boyle, A. P., Araya, C. L., Snyder, M. P. 2014; 30 (19): 2808-2810

    Abstract

    Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser Extensible Data Paired-End (BEDPE). Sushi.R is open source and made publicly available through GitHub (https://github.com/dphansti/Sushi) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/Sushi.html).mpsnyder@stanford.edu or dphansti@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btu379

    View details for Web of Science ID 000343082900016

  • Comparative analysis of regulatory information and circuits across distant species. Nature Boyle, A. P., Araya, C. L., Brdlik, C., Cayting, P., Cheng, C., Cheng, Y., Gardner, K., Hillier, L. W., Janette, J., Jiang, L., Kasper, D., Kawli, T., Kheradpour, P., Kundaje, A., Li, J. J., Ma, L., Niu, W., Rehm, E. J., Rozowsky, J., Slattery, M., Spokony, R., Terrell, R., Vafeados, D., Wang, D., Weisdepp, P., Wu, Y., Xie, D., Yan, K., Feingold, E. A., Good, P. J., Pazin, M. J., Huang, H., Bickel, P. J., Brenner, S. E., Reinke, V., Waterston, R. H., Gerstein, M., White, K. P., Kellis, M., Snyder, M. 2014; 512 (7515): 453-456

    Abstract

    Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.

    View details for DOI 10.1038/nature13668

    View details for PubMedID 25164757

  • Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature Araya, C. L., Kawli, T., Kundaje, A., Jiang, L., Wu, B., Vafeados, D., Terrell, R., Weissdepp, P., Gevirtzman, L., Mace, D., Niu, W., Boyle, A. P., Xie, D., Ma, L., Murray, J. I., Reinke, V., Waterston, R. H., Snyder, M. 2014; 512 (7515): 400-405

    Abstract

    Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.

    View details for DOI 10.1038/nature13497

    View details for PubMedID 25164749

  • Shared functions of plant and mammalian StAR-related lipid transfer (START) domains in modulating transcription factor activity BMC BIOLOGY Schrick, K., Bruno, M., Khosla, A., Cox, P. N., Marlatt, S. A., Roque, R. A., Nguyen, H. C., He, C., Snyder, M. P., Singh, D., Yadav, G. 2014; 12

    Abstract

    Steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains were first identified from mammalian proteins that bind lipid/sterol ligands via a hydrophobic pocket. In plants, predicted START domains are predominantly found in homeodomain leucine zipper (HD-Zip) transcription factors that are master regulators of cell-type differentiation in development. Here we utilized studies of Arabidopsis in parallel with heterologous expression of START domains in yeast to investigate the hypothesis that START domains are versatile ligand-binding motifs that can modulate transcription factor activity.Our results show that deletion of the START domain from Arabidopsis Glabra2 (GL2), a representative HD-Zip transcription factor involved in differentiation of the epidermis, results in a complete loss-of-function phenotype, although the protein is correctly localized to the nucleus. Despite low sequence similarly, the mammalian START domain from StAR can functionally replace the HD-Zip-derived START domain. Embedding the START domain within a synthetic transcription factor in yeast, we found that several mammalian START domains from StAR, MLN64 and PCTP stimulated transcription factor activity, as did START domains from two Arabidopsis HD-Zip transcription factors. Mutation of ligand-binding residues within StAR START reduced this activity, consistent with the yeast assay monitoring ligand-binding. The D182L missense mutation in StAR START was shown to affect GL2 transcription factor activity in maintenance of the leaf trichome cell fate. Analysis of in vivo protein-metabolite interactions by mass spectrometry provided direct evidence for analogous lipid-binding activity in mammalian and plant START domains in the yeast system. Structural modeling predicted similar sized ligand-binding cavities of a subset of plant START domains in comparison to mammalian counterparts.The START domain is required for transcription factor activity in HD-Zip proteins from plants, although it is not strictly necessary for the protein's nuclear localization. START domains from both mammals and plants are modular in that they can bind lipid ligands to regulate transcription factor function in a yeast system. The data provide evidence for an evolutionarily conserved mechanism by which lipid metabolites can orchestrate transcription. We propose a model in which the START domain is used by both plants and mammals to regulate transcription factor activity.

    View details for DOI 10.1186/s12915-014-0070-8

    View details for Web of Science ID 000342371100001

    View details for PubMedID 25159688

    View details for PubMedCentralID PMC4169639

  • Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (33): E3366-E3366

    View details for DOI 10.1073/pnas.1410434111

    View details for Web of Science ID 000340438800004

    View details for PubMedID 25275169

    View details for PubMedCentralID PMC4143047

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

    Abstract

    Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency. Cell Benayoun, B. A., Pollina, E. A., Ucar, D., Mahmoudi, S., Karra, K., Wong, E. D., Devarajan, K., Daugherty, A. C., Kundaje, A. B., Mancini, E., Hitz, B. C., Gupta, R., Rando, T. A., Baker, J. C., Snyder, M. P., Cherry, J. M., Brunet, A. 2014; 158 (3): 673-688

    Abstract

    Trimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency and [corrected] increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.

    View details for DOI 10.1016/j.cell.2014.06.027

    View details for PubMedID 25083876

  • Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proceedings of the National Academy of Sciences of the United States of America Tilgner, H., Grubert, F., Sharon, D., Snyder, M. P. 2014; 111 (27): 9869-9874

    Abstract

    Personal transcriptomes in which all of an individual's genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV-in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.

    View details for DOI 10.1073/pnas.1400447111

    View details for PubMedID 24961374

  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway (vol 111, pg 236, 2014) GENETICS IN MEDICINE Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B., Chen, R., FORGE Canada Consortium 2014; 16 (7): 568
  • Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nature biotechnology Buenrostro, J. D., Araya, C. L., Chircus, L. M., Layton, C. J., Chang, H. Y., Snyder, M. P., Greenleaf, W. J. 2014; 32 (6): 562-568

    Abstract

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

    View details for DOI 10.1038/nbt.2880

    View details for PubMedID 24727714

  • Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nature biotechnology Buenrostro, J. D., Araya, C. L., Chircus, L. M., Layton, C. J., Chang, H. Y., Snyder, M. P., Greenleaf, W. J. 2014; 32 (6): 562-568

    Abstract

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

    View details for DOI 10.1038/nbt.2880

    View details for PubMedID 24727714

  • Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature Treutlein, B., Brownfield, D. G., Wu, A. R., Neff, N. F., Mantalas, G. L., Espinoza, F. H., Desai, T. J., Krasnow, M. A., Quake, S. R. 2014; 509 (7500): 371-375

    View details for DOI 10.1038/nature13173

    View details for PubMedID 24739965

  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues PLOS GENETICS Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., Tan, M. H., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)

    Abstract

    Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

    View details for DOI 10.1371/journal.pgen.1004304

    View details for Web of Science ID 000337145100010

  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues. PLoS genetics Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., How Tan, M., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)

    Abstract

    Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

    View details for DOI 10.1371/journal.pgen.1004304

    View details for PubMedID 24786518

  • Defining functional DNA elements in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J. A., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (17): 6131-6138

    Abstract

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

    View details for DOI 10.1073/pnas.1318948111

    View details for Web of Science ID 000335199000025

    View details for PubMedID 24753594

    View details for PubMedCentralID PMC4035993

  • Extended lifespan and reduced adiposity in mice lacking the FAT10 gene. Proceedings of the National Academy of Sciences of the United States of America Canaan, A., DeFuria, J., Perelman, E., Schultz, V., Seay, M., Tuck, D., Flavell, R. A., Snyder, M. P., Obin, M. S., Weissman, S. M. 2014; 111 (14): 5313-5318

    Abstract

    The HLA-F adjacent transcript 10 (FAT10) is a member of the ubiquitin-like gene family that alters protein function/stability through covalent ligation. Although FAT10 is induced by inflammatory mediators and implicated in immunity, the physiological functions of FAT10 are poorly defined. We report the discovery that FAT10 regulates lifespan through pleiotropic actions on metabolism and inflammation. Median and overall lifespan are increased 20% in FAT10ko mice, coincident with elevated metabolic rate, preferential use of fat as fuel, and dramatically reduced adiposity. This phenotype is associated with metabolic reprogramming of skeletal muscle (i.e., increased AMP kinase activity, β-oxidation and -uncoupling, and decreased triglyceride content). Moreover, knockout mice have reduced circulating glucose and insulin levels and enhanced insulin sensitivity in metabolic tissues, consistent with elevated IL-10 in skeletal muscle and serum. These observations suggest novel roles of FAT10 in immune metabolic regulation that impact aging and chronic disease.

    View details for DOI 10.1073/pnas.1323426111

    View details for PubMedID 24706839

    View details for PubMedCentralID PMC3986194

  • Haplotype structure and positive selection at TLR1 EUROPEAN JOURNAL OF HUMAN GENETICS Heffelfinger, C., Pakstis, A. J., Speed, W. C., Clark, A. P., Haigh, E., Fang, R., Furtado, M. R., Kidd, K. K., Snyder, M. P. 2014; 22 (4): 551-557

    Abstract

    Toll-like receptor 1, when dimerized with Toll-like receptor 2, is a cell surface receptor that, upon recognition of bacterial lipoproteins, activates the innate immune system. Variants in TLR1 associate with the risk of a variety of medical conditions and diseases, including sepsis, leprosy, tuberculosis, and others. The foremost of these is rs5743618 c.2079T>G(p.(Ile602Ser)), the derived allele of which is associated with reduced risk of sepsis, leprosy, and other diseases. Interestingly, 602Ser, which shows signatures of selection, inhibits TLR1 surface trafficking and subsequent activation of NFκB upon recognition of a ligand. This suggests that reduced TLR1 activity may be beneficial for human health. To better understand TLR1 variation and its link to human health, we have typed all 7 high-frequency missense variants (>5% in at least one population) along with 17 other variants in and around TLR1 in 2548 individuals from 56 populations from around the globe. We have also found additional signatures of selection on missense variants not associated with rs5743618, suggesting that there may be multiple functional alleles under positive selection in this gene.

    View details for DOI 10.1038/ejhg.2013.194

    View details for Web of Science ID 000332938400027

    View details for PubMedID 24002163

    View details for PubMedCentralID PMC3953919

  • Clinical interpretation and implications of whole-genome sequencing. JAMA Dewey, F. E., Grove, M. E., Pan, C., Goldstein, B. A., Bernstein, J. A., Chaib, H., Merker, J. D., Goldfeder, R. L., Enns, G. M., David, S. P., Pakdaman, N., Ormond, K. E., Caleshu, C., Kingham, K., Klein, T. E., Whirl-Carrillo, M., Sakamoto, K., Wheeler, M. T., Butte, A. J., Ford, J. M., Boxer, L., Ioannidis, J. P., Yeung, A. C., Altman, R. B., Assimes, T. L., Snyder, M., Ashley, E. A., Quertermous, T. 2014; 311 (10): 1035-1045

    Abstract

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

    View details for DOI 10.1001/jama.2014.1717

    View details for PubMedID 24618965

  • Erratum: A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2014; 32 (3): 291-?

    View details for DOI 10.1038/nbt0314-291b

    View details for PubMedID 24727780

  • Gene-centric Meta-analysis in 87,736 Individuals of European Ancestry Identifies Multiple Blood-Pressure-Related Loci. American journal of human genetics Tragante, V., Barnes, M. R., Ganesh, S. K., Lanktree, M. B., Guo, W., Franceschini, N., Smith, E. N., Johnson, T., Holmes, M. V., Padmanabhan, S., Karczewski, K. J., Almoguera, B., Barnard, J., Baumert, J., Chang, Y. C., Elbers, C. C., Farrall, M., Fischer, M. E., Gaunt, T. R., Gho, J. M., Gieger, C., Goel, A., Gong, Y., Isaacs, A., Kleber, M. E., Mateo Leach, I., McDonough, C. W., Meijs, M. F., Melander, O., Nelson, C. P., Nolte, I. M., Pankratz, N., Price, T. S., Shaffer, J., Shah, S., Tomaszewski, M., van der Most, P. J., van Iperen, E. P., Vonk, J. M., Witkowska, K., Wong, C. O., Zhang, L., Beitelshees, A. L., Berenson, G. S., Bhatt, D. L., Brown, M., Burt, A., Cooper-DeHoff, R. M., Connell, J. M., Cruickshanks, K. J., Curtis, S. P., Davey-Smith, G., Delles, C., Gansevoort, R. T., Guo, X., Haiqing, S., Hastie, C. E., Hofker, M. H., Hovingh, G. K., Kim, D. S., Kirkland, S. A., Klein, B. E., Klein, R., Li, Y. R., Maiwald, S., Newton-Cheh, C., O'Brien, E. T., Onland-Moret, N. C., Palmas, W., Parsa, A., Penninx, B. W., Pettinger, M., Vasan, R. S., Ranchalis, J. E., M Ridker, P., Rose, L. M., Sever, P., Shimbo, D., Steele, L., Stolk, R. P., Thorand, B., Trip, M. D., van Duijn, C. M., Verschuren, W. M., Wijmenga, C., Wyatt, S., Young, J. H., Zwinderman, A. H., Bezzina, C. R., Boerwinkle, E., Casas, J. P., Caulfield, M. J., Chakravarti, A., Chasman, D. I., Davidson, K. W., Doevendans, P. A., Dominiczak, A. F., FitzGerald, G. A., Gums, J. G., Fornage, M., Hakonarson, H., Halder, I., Hillege, H. L., Illig, T., Jarvik, G. P., Johnson, J. A., Kastelein, J. J., Koenig, W., Kumari, M., März, W., Murray, S. S., O'Connell, J. R., Oldehinkel, A. J., Pankow, J. S., Rader, D. J., Redline, S., Reilly, M. P., Schadt, E. E., Kottke-Marchant, K., Snieder, H., Snyder, M., Stanton, A. V., Tobin, M. D., Uitterlinden, A. G., van der Harst, P., van der Schouw, Y. T., Samani, N. J., Watkins, H., Johnson, A. D., Reiner, A. P., Zhu, X., de Bakker, P. I., Levy, D., Asselbergs, F. W., Munroe, P. B., Keating, B. J. 2014; 94 (3): 349-360

    Abstract

    Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification.

    View details for DOI 10.1016/j.ajhg.2013.12.016

    View details for PubMedID 24560520

  • Whole-genome haplotyping using long reads and statistical methods NATURE BIOTECHNOLOGY Kuleshov, V., Xie, D., Chen, R., Pushkarev, D., Ma, Z., Blauwkamp, T., Kertesz, M., Snyder, M. 2014; 32 (3): 261-266

    Abstract

    The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.

    View details for DOI 10.1038/nbt.2833

    View details for Web of Science ID 000332819800024

    View details for PubMedID 24561555

  • Ordering and dynamical properties of superbright C-60 molecules on Ag(111) PHYSICAL REVIEW B Li, H. I., Abreu, G. J., Shukla, A. K., Fournee, V., Ledieu, J., Loli, L. N., Rauterkus, S. E., Snyder, M. V., Su, S. Y., Marino, K. E., Diehl, R. D. 2014; 89 (8)
  • Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics Karczewski, K. J., Snyder, M., Altman, R. B., Tatonetti, N. P. 2014; 10 (2)

    Abstract

    Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.

    View details for DOI 10.1371/journal.pgen.1004122

    View details for PubMedID 24516403

    View details for PubMedCentralID PMC3916285

  • Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics Karczewski, K. J., Snyder, M., Altman, R. B., Tatonetti, N. P. 2014; 10 (2)

    View details for DOI 10.1371/journal.pgen.1004122

    View details for PubMedID 24516403

  • Landscape and variation of RNA secondary structure across the human transcriptome. Nature Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., Chang, H. Y. 2014; 505 (7485): 706-709

    Abstract

    In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3' untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

    View details for DOI 10.1038/nature12946

    View details for PubMedID 24476892

  • Landscape and variation of RNA secondary structure across the human transcriptome. Nature Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., Chang, H. Y. 2014; 505 (7485): 706-709

    Abstract

    In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3' untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

    View details for DOI 10.1038/nature12946

    View details for PubMedID 24476892

  • iPOP and its role in participatory medicine GENOME MEDICINE Snyder, M. 2014; 6

    Abstract

    Michael Snyder shares his thoughts on participatory medicine and how omics profiling could fit into this new model of healthcare where patients are at the center of medicine.

    View details for DOI 10.1186/gm512

    View details for Web of Science ID 000335597000001

    View details for PubMedID 24479626

    View details for PubMedCentralID PMC3978943

  • Identification of STAT5A and STAT5B Target Genes in Human T Cells. PloS one Kanai, T., Seki, S., Jenks, J. A., Kohli, A., Kawli, T., Martin, D. P., Snyder, M., Bacchetta, R., Nadeau, K. C. 2014; 9 (1)

    View details for DOI 10.1371/journal.pone.0086790

    View details for PubMedID 24497979

  • Path-scan: a reporting tool for identifying clinically actionable variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daneshjou, R., Zappala, Z., Kukurba, K., Boyle, S. M., Ormond, K. E., Klein, T. E., Snyder, M., Bustamante, C. D., Altman, R. B., Montgomery, S. B. 2014; 19: 229-240

    Abstract

    The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.

    View details for PubMedID 24297550

  • Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Kolker, E., Ozdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2014; 18 (1): 10-14

    Abstract

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/omi.2013.0149

    View details for Web of Science ID 000331085100002

    View details for PubMedID 24456465

    View details for PubMedCentralID PMC3903324

  • Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease. Pharmacogenomics Esplin, E. D., Oei, L., Snyder, M. P. 2014; 15 (14): 1771–90

    Abstract

    The potential for personalized sequencing to individually optimize medical treatment in diseases such as cancer and for pharmacogenomic application is just beginning to be realized, and the utility of sequencing healthy individuals for managing health is also being explored. The data produced requires additional advancements in interpretation of variants of unknown significance to maximize clinical benefit. Nevertheless, personalized sequencing, only recently applied to clinical medicine, has already been broadly applied to the discovery and study of disease. It is poised to enable the earlier and more accurate diagnosis of disease risk and occurrence, guide prevention and individualized intervention as well as facilitate monitoring of healthy and treated patients, and play a role in the prevention and recurrence of future disease. This article documents the advancing capacity of personalized sequencing, reviews its impact on disease-oriented scientific discovery and anticipates its role in the future of medicine.

    View details for PubMedID 25493570

  • Metadata Checklist for the Integrated Personal OMICS Study: Proteomics and Metabolomics Experiments OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2014; 18 (1): 81–85

    View details for PubMedID 24456466

    View details for PubMedCentralID PMC3903326

  • Identification of STAT5A and STAT5B target genes in human T cells. PloS one Kanai, T., Seki, S., Jenks, J. A., Kohli, A., Kawli, T., Martin, D. P., Snyder, M., Bacchetta, R., Nadeau, K. C. 2014; 9 (1)

    Abstract

    Signal transducer and activator of transcription (STAT) comprises a family of universal transcription factors that help cells sense and respond to environmental signals. STAT5 refers to two highly related proteins, STAT5A and STAT5B, with critical function: their complete deficiency is lethal in mice; in humans, STAT5B deficiency alone leads to endocrine and immunological problems, while STAT5A deficiency has not been reported. STAT5A and STAT5B show peptide sequence similarities greater than 90%, but subtle structural differences suggest possible non-redundant roles in gene regulation. However, these roles remain unclear in humans. We applied chromatin immunoprecipitation followed by DNA sequencing using human CD4(+) T cells to detect candidate genes regulated by STAT5A and/or STAT5B, and quantitative-PCR in STAT5A or STAT5B knock-down (KD) human CD4(+) T cells to validate the findings. Our data show STAT5A and STAT5B play redundant roles in cell proliferation and apoptosis via SGK1 interaction. Interestingly, we found a novel, unique role for STAT5A in binding to genes involved in neural development and function (NDRG1, DNAJC6, and SSH2), while STAT5B appears to play a distinct role in T cell development and function via DOCK8, SNX9, FOXP3 and IL2RA binding. Our results also suggest that one or more co-activators for STAT5A and/or STAT5B may play important roles in establishing different binding abilities and gene regulation behaviors. The new identification of these genes regulated by STAT5A and/or STAT5B has major implications for understanding the pathophysiology of cancer progression, neural disorders, and immune abnormalities.

    View details for DOI 10.1371/journal.pone.0086790

    View details for PubMedID 24497979

    View details for PubMedCentralID PMC3907443

  • Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss. BMC genomics Haraksingh, R. R., Jahanbani, F., Rodriguez-Paris, J., Gelernter, J., Nadeau, K. C., Oghalai, J. S., Schrijver, I., Snyder, M. P. 2014; 15: 1155-?

    Abstract

    The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.

    View details for DOI 10.1186/1471-2164-15-1155

    View details for PubMedID 25528277

  • Global analysis of transcription factor-binding sites in yeast using ChIP-Seq. Methods in molecular biology (Clifton, N.J.) Lefrançois, P., Gallagher, J. E., Snyder, M. 2014; 1205: 231-255

    Abstract

    Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way.Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28-36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy.

    View details for DOI 10.1007/978-1-4939-1363-3_15

    View details for PubMedID 25213249

  • Strain Kaplan of Pseudorabies Virus Genome Sequenced by PacBio Single-Molecule Real-Time Sequencing Technology. Genome announcements Tombácz, D., Sharon, D., Oláh, P., Csabai, Z., Snyder, M., Boldogkoi, Z. 2014; 2 (4)

    Abstract

    Pseudorabies virus (PRV) is a neurotropic herpesvirus that causes Aujeszky's disease in pigs. PRV strains are widely used as transsynaptic tracers for mapping neural circuits. We present here the complete and fully annotated genome sequence of strain Kaplan of PRV, determined by Pacific Biosciences RSII long-read sequencing technology.

    View details for DOI 10.1128/genomeA.00628-14

    View details for PubMedID 25035325

    View details for PubMedCentralID PMC4102862

  • Serum profiling using protein microarrays to identify disease related antigens. Methods in molecular biology (Clifton, N.J.) Sharon, D., Snyder, M. 2014; 1176: 169-178

    Abstract

    Disease related antigens are of great importance in the clinic. They are used as markers to screen patients for various forms of cancer, to monitor response to therapy, or to serve as therapeutic targets (Chapman et al., Ann Oncol 18(5):868-873, 2007; Soussi et al., Cancer Res 60:1777-1788, 2000; Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Levenson, Biochim Biophy Acta 1770:847-856, 2007). In cancer endogenous levels of protein expression may be disrupted or proteins may be expressed in an aberrant fashion resulting in an immune response that bypasses self tolerance (Soussi et al., Cancer Res 60:1777-1788, 2000; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Molina et al., Breast Cancer Res Treat 51:109-119, 1998). Protein microarrays, which represent a large fraction of the human proteome, have been used to identify antigens in multiple diseases including cancer (Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499, 2007; Beyer et al., J Neuroimmunol 242:26-32, 2012). Typically, arrays are probed with immunoglobulin (Ig) samples from patients as well as healthy controls, then compared to determine which antigens (Ag's) are more reactive within the patient group (Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499).

    View details for DOI 10.1007/978-1-4939-0992-6_14

    View details for PubMedID 25030927

    View details for PubMedCentralID PMC4420618

  • Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease PHARMACOGENOMICS Esplin, E. D., Oei, L., Snyder, M. P. 2014; 15 (14): 1771-1790

    Abstract

    The potential for personalized sequencing to individually optimize medical treatment in diseases such as cancer and for pharmacogenomic application is just beginning to be realized, and the utility of sequencing healthy individuals for managing health is also being explored. The data produced requires additional advancements in interpretation of variants of unknown significance to maximize clinical benefit. Nevertheless, personalized sequencing, only recently applied to clinical medicine, has already been broadly applied to the discovery and study of disease. It is poised to enable the earlier and more accurate diagnosis of disease risk and occurrence, guide prevention and individualized intervention as well as facilitate monitoring of healthy and treated patients, and play a role in the prevention and recurrence of future disease. This article documents the advancing capacity of personalized sequencing, reviews its impact on disease-oriented scientific discovery and anticipates its role in the future of medicine.

    View details for DOI 10.2217/pgs.14.117

    View details for Web of Science ID 000346180100006

    View details for PubMedCentralID PMC4336568

  • STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PloS one Karczewski, K. J., Fernald, G. H., Martin, A. R., Snyder, M., Tatonetti, N. P., Dudley, J. T. 2014; 9 (1)

    Abstract

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.

    View details for DOI 10.1371/journal.pone.0084860

    View details for PubMedID 24454756

  • Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes JOURNAL OF PROTEOME RESEARCH Menon, R., Im, H., Zhang, E. (., Wu, S., Chen, R., Snyder, M., Hancock, W. S., Omenn, G. S. 2014; 13 (1): 212-227

    Abstract

    This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER-/PR-; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER-/PR-; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.

    View details for DOI 10.1021/pr400773v

    View details for Web of Science ID 000329472700022

    View details for PubMedID 24111759

  • Chromatin immunoprecipitation and multiplex sequencing (ChIP-Seq) to identify global transcription factor binding sites in the nematode Caenorhabditis elegans. Methods in enzymology Brdlik, C. M., Niu, W., Snyder, M. 2014; 539: 89-111

    Abstract

    The global identification of transcription factor (TF) binding sites is a critical step in the elucidation of the functional elements of the genome. Several methods have been developed that map TF binding in human cells, yeast, and other model organisms. These methods make use of chromatin immunoprecipitation, or ChIP, and take advantage of the fact that formaldehyde fixation of living cells can be used to cross-link DNA sequences to the TFs that bind them in vivo. In ChIP, the cross-linked TF-DNA complexes are sheared by sonication, size fractionated, and incubated with antibody specific to the TF of interest to generate a library of TF-bound DNA sequences. ChIP-chip was the first technology developed to globally identify TF-bound DNA sequences and involves subsequent hybridization of the ChIP DNA to oligonucleotide microarrays. However, ChIP-chip proved to be costly, labor-intensive, and limited by the fixed number of probes available on the microarray chip. ChIP-Seq combines ChIP with massively parallel high-throughput sequencing (see Explanatory Chapter: Next Generation Sequencing) and has demonstrated vast improvement over ChIP-chip with respect to time and cost, signal-to-noise ratio, and resolution. In particular, multiplex sequencing can be used to achieve a higher throughput in ChIP-Seq analyses involving organisms with genomes of lower complexity than that of human (Lefrançois et al., 2009) and thereby reduce the cost and amount of time needed for each result. The multiplex ChIP-Seq method described in this section has been developed for Caenorhabditis elegans, but is easily adaptable for other organisms.

    View details for DOI 10.1016/B978-0-12-420120-0.00007-4

    View details for PubMedID 24581441

  • STAT3 Targets Suggest Mechanisms of Aggressive Tumorigenesis in Diffuse Large B-Cell Lymphoma G3-GENES GENOMES GENETICS Hardee, J., Ouyang, Z., Zhang, Y., Kundaje, A., Lacroute, P., Snyder, M. 2013; 3 (12): 2173-2185

    Abstract

    The signal transducer and activator of transcription 3 (STAT3) is a transcription factor that, when dysregulated, becomes a powerful oncogene found in many human cancers, including diffuse large B-cell lymphoma. Diffuse large B-cell lymphoma is the most common form of non-Hodgkin's lymphoma and has two major subtypes: germinal center B-cell-like and activated B-cell-like. Compared with the germinal center B-cell-like form, activated B-cell-like lymphomas respond much more poorly to current therapies and often exhibit overexpression or overactivation of STAT3. To investigate how STAT3 might contribute to this aggressive phenotype, we have integrated genome-wide studies of STAT3 DNA binding using chromatin immunoprecipitation-sequencing with whole-transcriptome profiling using RNA-sequencing. STAT3 binding sites are present near almost a third of all genes that differ in expression between the two subtypes, and examination of the affected genes identified previously undetected and clinically significant pathways downstream of STAT3 that drive oncogenesis. Novel treatments aimed at these pathways may increase the survivability of activated B-cell-like diffuse large B-cell lymphoma.

    View details for DOI 10.1534/g3.113.007674

    View details for PubMedID 24142927

  • Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications. Big data Kolker, E., Özdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2013; 1 (4): 196-201

    Abstract

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/big.2013.0039

    View details for PubMedID 27447251

  • Metadata Checklist for the Integrated Personal Omics Study: Proteomics and Metabolomics Experiments. Big data Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2013; 1 (4): 202-206

    Abstract

    The integrative personal omics profiling study introduced a novel, integrative approach based on personalized, longitudinal, multi-omics data. The study collected genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. The results revealed various medical risks and extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. The current article is a data publication that provides the checklists for the metadata of the proteomics (see Table 1 ) and metabolomics (see Table 2 ) datasets of the study. The proposed checklist was recently developed and endorsed by the Data-Enabled Life Sciences Alliance (DELSA Global). We call for the broader use of data publications using the metadata checklist to make omics data more discoverable, interpretable, and reusable, while enabling appropriate attribution to data generators and infrastructure science builders.

    View details for DOI 10.1089/big.2013.0040

    View details for PubMedID 27447252

  • METADATA CHECKLIST FOR THE INTEGRATED PERSONAL OMICS STUDY: Proteomics and Metabolomics Experiments BIG DATA Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2013; 1 (4): BD202-U81

    Abstract

    The integrative personal omics profiling study introduced a novel, integrative approach based on personalized, longitudinal, multi-omics data. The study collected genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. The results revealed various medical risks and extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. The current article is a data publication that provides the checklists for the metadata of the proteomics (see Table 1 ) and metabolomics (see Table 2 ) datasets of the study. The proposed checklist was recently developed and endorsed by the Data-Enabled Life Sciences Alliance (DELSA Global). We call for the broader use of data publications using the metadata checklist to make omics data more discoverable, interpretable, and reusable, while enabling appropriate attribution to data generators and infrastructure science builders.

    View details for DOI 10.1089/big.2013.0040

    View details for Web of Science ID 000209646300006

  • TOWARD MORE TRANSPARENT AND REPRODUCIBLE OMICS STUDIES THROUGH A COMMON METADATA CHECKLIST AND DATA PUBLICATIONS BIG DATA Kolker, E., Oezdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2013; 1 (4): BD196-?

    Abstract

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/big.2013.0039

    View details for Web of Science ID 000209646300005

  • Impacts of variation in the human genome on gene regulation. Journal of molecular biology Haraksingh, R. R., Snyder, M. P. 2013; 425 (21): 3970-3977

    Abstract

    Recent advances in fast and inexpensive DNA sequencing have enabled the extensive study of genomic and transciptomic variation in humans. Human genomic variation is composed of sequence and structural changes including single-nucleotide and multinucleotide variants, short insertions or deletions (indels), larger copy number variants, and similarly sized copy neutral inversions and translocations. It is now well established that any two genomes differ extensively and that structural changes constitute the most prominent source of this variation. There have also been major technological advances in RNA sequencing to globally quantify and describe diversity in transcripts. Large consortia such as the 1000 Genomes Project and the Enclyclopedia of DNA Elements Project are producing increasingly comphrehensive maps outlining the regions of the human genome containing variants and functional elements, respectively. Integration of genetic variation data and extensive annotation of functional genomic elements, along with the ability to measure global transcription, allow the impacts of genetic variants on gene expression to be resolved. There are several well-established models by which genetic variants affect gene regulation depending on the type, nature, and position of the variant with respect to the affected genes. These effects can be manifested in two ways: changes to transcript sequences and isoforms by coding variants, and changes to transcript abundance by dosage or regulatory variants. Here, we review the current state of how genetic variations impact gene regulation locally and globally in the human genome.

    View details for DOI 10.1016/j.jmb.2013.07.015

    View details for PubMedID 23871684

  • Defective sphingosine 1-phosphate receptor 1 (S1P1) phosphorylation exacerbates TH17-mediated autoimmune neuroinflammation. Nature immunology Garris, C. S., Wu, L., Acharya, S., Arac, A., Blaho, V. A., Huang, Y., Moon, B. S., Axtell, R. C., Ho, P. P., Steinberg, G. K., Lewis, D. B., Sobel, R. A., Han, D. K., Steinman, L., Snyder, M. P., Hla, T., Han, M. H. 2013; 14 (11): 1166-1172

    Abstract

    Sphingosine 1-phosphate (S1P) signaling regulates lymphocyte egress from lymphoid organs into systemic circulation. The sphingosine phosphate receptor 1 (S1P1) agonist FTY-720 (Gilenya) arrests immune trafficking and prevents multiple sclerosis (MS) relapses. However, alternative mechanisms of S1P-S1P1 signaling have been reported. Phosphoproteomic analysis of MS brain lesions revealed S1P1 phosphorylation on S351, a residue crucial for receptor internalization. Mutant mice harboring an S1pr1 gene encoding phosphorylation-deficient receptors (S1P1(S5A)) developed severe experimental autoimmune encephalomyelitis (EAE) due to autoimmunity mediated by interleukin 17 (IL-17)-producing helper T cells (TH17 cells) in the peripheral immune and nervous system. S1P1 directly activated the Jak-STAT3 signal-transduction pathway via IL-6. Impaired S1P1 phosphorylation enhances TH17 polarization and exacerbates autoimmune neuroinflammation. These mechanisms may be pathogenic in MS.

    View details for DOI 10.1038/ni.2730

    View details for PubMedID 24076635

  • Defective sphingosine 1-phosphate receptor 1 (S1P1) phosphorylation exacerbates TH17-mediated autoimmune neuroinflammation. Nature immunology Garris, C. S., Wu, L., Acharya, S., Arac, A., Blaho, V. A., Huang, Y., Moon, B. S., Axtell, R. C., Ho, P. P., Steinberg, G. K., Lewis, D. B., Sobel, R. A., Han, D. K., Steinman, L., Snyder, M. P., Hla, T., Han, M. H. 2013; 14 (11): 1166-1172

    View details for DOI 10.1038/ni.2730

    View details for PubMedID 24076635

  • Comprehensive whole-genome sequencing of an early-stage primary myelofibrosis patient defines low mutational burden and non-recurrent candidate genes. Haematologica Merker, J. D., Roskin, K. M., Ng, D., Pan, C., Fisk, D. G., King, J. J., Hoh, R., Stadler, M., Okumoto, L. M., Abidi, P., Hewitt, R., Jones, C. D., Gojenola, L., Clark, M. J., Zhang, B., Cherry, A. M., George, T. I., Snyder, M., Boyd, S. D., Zehnder, J. L., Fire, A. Z., Gotlib, J. 2013; 98 (11): 1689-1696

    Abstract

    In order to identify novel somatic mutations associated with classic BCR/ABL1-negative myeloproliferative neoplasms, we performed high-coverage genome sequencing of DNA from peripheral blood granulocytes and cultured skin fibroblasts from a patient with MPL W515K-positive primary myelofibrosis. The primary myelofibrosis genome had a low somatic mutation rate, consistent with that observed in similar hematopoietic tumor genomes. Interfacing of whole-genome DNA sequence data with RNA expression data identified three somatic mutations of potential functional significance: a nonsense mutation in CARD6, implicated in modulation of NF-kappaB activation; a 19-base pair deletion involving a potential regulatory region in the 5'-untranslated region of BRD2, implicated in transcriptional regulation and cell cycle control; and a non-synonymous point mutation in KIAA0355, an uncharacterized protein. Additional mutations in three genes (CAP2, SOX30, and MFRP) were also evident, albeit with no support for expression at the RNA level. Re-sequencing of these six genes in 178 patients with polycythemia vera, essential thrombocythemia, and myelofibrosis did not identify recurrent somatic mutations in these genes. Finally, we describe methods for reducing false-positive variant calls in the analysis of hematologic malignancies with a low somatic mutation rate. This trial is registered with ClinicalTrials.gov (NCT01108159).

    View details for DOI 10.3324/haematol.2013.092379

    View details for PubMedID 23872309

  • A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2013; 31 (11): 1009-1014

    Abstract

    Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

    View details for DOI 10.1038/nbt.2705

    View details for PubMedID 24108091

  • Incorporating Motif Analysis into Gene Co-expression Networks Reveals Novel Modular Expression Pattern and New Signaling Pathways PLOS GENETICS Ma, S., Shah, S., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2013; 9 (10)

    Abstract

    Understanding of gene regulatory networks requires discovery of expression modules within gene co-expression networks and identification of promoter motifs and corresponding transcription factors that regulate their expression. A commonly used method for this purpose is a top-down approach based on clustering the network into a range of densely connected segments, treating these segments as expression modules, and extracting promoter motifs from these modules. Here, we describe a novel bottom-up approach to identify gene expression modules driven by known cis-regulatory motifs in the gene promoters. For a specific motif, genes in the co-expression network are ranked according to their probability of belonging to an expression module regulated by that motif. The ranking is conducted via motif enrichment or motif position bias analysis. Our results indicate that motif position bias analysis is an effective tool for genome-wide motif analysis. Sub-networks containing the top ranked genes are extracted and analyzed for inherent gene expression modules. This approach identified novel expression modules for the G-box, W-box, site II, and MYB motifs from an Arabidopsis thaliana gene co-expression network based on the graphical Gaussian model. The novel expression modules include those involved in house-keeping functions, primary and secondary metabolism, and abiotic and biotic stress responses. In addition to confirmation of previously described modules, we identified modules that include new signaling pathways. To associate transcription factors that regulate genes in these co-expression modules, we developed a novel reporter system. Using this approach, we evaluated MYB transcription factor-promoter interactions within MYB motif modules.

    View details for DOI 10.1371/journal.pgen.1003840

    View details for Web of Science ID 000330367200023

    View details for PubMedID 24098147

    View details for PubMedCentralID PMC3789834

  • Genome-wide Association Analysis of Blood-Pressure Traits in African-Ancestry Individuals Reveals Common Associated Genes in African and Non-African Populations AMERICAN JOURNAL OF HUMAN GENETICS Franceschini, N., Fox, E., Zhang, Z., Edwards, T. L., Nalls, M. A., Sung, Y. J., Tayo, B. O., Sun, Y. V., Gottesman, O., Adeyemo, A., Johnson, A. D., Young, J. H., Rice, K., Duan, Q., Chen, F., Li, Y., Tang, H., Fornage, M., Keene, K. L., Andrews, J. S., Smith, J. A., Fau, J. D., Guangfa, Z., Guo, W., Liu, Y., Murray, S. S., Musani, S. K., Srinivasan, S., Edwards, D. R., Wang, H., Becker, L. C., Bovet, P., Bochud, M., Broecke, U., Burnier, M., Carty, C., Chasman, D. I., Ehret, G., Chen, W., Chen, G., Chen, W., Ding, J., Dreisbach, A. W., Evans, M. K., Guo, X., Garcia, M. E., Jensen, R., Keller, M. E., Lettre, G., Lotay, V., Martin, L. W., Moore, J. H., Morrison, A. C., Mosley, T. H., Ogunniyi, A., Palmas, W., Papanicolaou, G., Penman, A., Polak, J. F., Ridker, P. M., Salako, B., Singleton, A. B., Shriner, D., Taylor, K. D., Vasan, R., Wiggins, K., Williams, S. M., Yanek, L. R., Zhao, W., Zonderman, A. B., Becker, D. M., Berenson, G., Boerwinkle, E., Bottinger, E., Cushman, M., Eaton, C., Nyberg, F., Heiss, G., Hirschhron, J. N., Howard, V. J., Karczewsk, K. J., Lanktree, M. B., Liu, K., Liu, Y., Loos, R., Margolis, K., Snyder, M., Psaty, B. M., Schork, N. J., Weir, D. R., Rotimi, C. N., Sale, M. M., Harris, T., Kardia, S. L., Hunt, S. C., Arnett, D., Redline, S., Cooper, R. S., Risch, N. J., Rao, D. C., Rotter, J. I., Chakravarti, A., Reiner, A. P., Levy, D., Keating, B. J., Zhu, X. 2013; 93 (3): 545-554

    Abstract

    High blood pressure (BP) is more prevalent and contributes to more severe manifestations of cardiovascular disease (CVD) in African Americans than in any other United States ethnic group. Several small African-ancestry (AA) BP genome-wide association studies (GWASs) have been published, but their findings have failed to replicate to date. We report on a large AA BP GWAS meta-analysis that includes 29,378 individuals from 19 discovery cohorts and subsequent replication in additional samples of AA (n = 10,386), European ancestry (EA) (n = 69,395), and East Asian ancestry (n = 19,601). Five loci (EVX1-HOXA, ULK4, RSPO3, PLEKHG1, and SOX6) reached genome-wide significance (p < 1.0 × 10(-8)) for either systolic or diastolic BP in a transethnic meta-analysis after correction for multiple testing. Three of these BP loci (EVX1-HOXA, RSPO3, and PLEKHG1) lack previous associations with BP. We also identified one independent signal in a known BP locus (SOX6) and provide evidence for fine mapping in four additional validated BP loci. We also demonstrate that validated EA BP GWAS loci, considered jointly, show significant effects in AA samples. Consequently, these findings suggest that BP loci might have universal effects across studied populations, demonstrating that multiethnic samples are an essential component in identifying, fine mapping, and understanding their trait variability.

    View details for DOI 10.1016/j.ajhg.2013.07.010

    View details for Web of Science ID 000330268900014

    View details for PubMedID 23972371

    View details for PubMedCentralID PMC3769920

  • Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females SCIENCE Poznik, G. D., Henn, B. M., Yee, M., Sliwerska, E., Euskirchen, G. M., Lin, A. A., Snyder, M., Quintana-Murci, L., Kidd, J. M., Underhill, P. A., Bustamante, C. D. 2013; 341 (6145): 562-565

    Abstract

    The Y chromosome and the mitochondrial genome have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y-chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and the mitochondrial genome, we estimate the time to the most recent common ancestor (T(MRCA)) of the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome T(MRCA) to be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages.

    View details for DOI 10.1126/science.1237619

    View details for Web of Science ID 000322586700057

    View details for PubMedID 23908239

  • Genome-wide profiling of human cap-independent translation-enhancing elements. Nature methods Wellensiek, B. P., Larsen, A. C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B. L., Kumar, S., Chaput, J. C. 2013; 10 (8): 747-750

    Abstract

    We report an in vitro selection strategy to identify RNA sequences that mediate cap-independent initiation of translation. This method entails mRNA display of trillions of genomic fragments, selection for initiation of translation and high-throughput deep sequencing. We identified >12,000 translation-enhancing elements (TEEs) in the human genome, generated a high-resolution map of human TEE-bearing regions (TBRs), and validated the function of a subset of sequences in vitro and in cultured cells.

    View details for DOI 10.1038/nmeth.2522

    View details for PubMedID 23770754

  • Genome-wide profiling of human cap-independent translation-enhancing elements NATURE METHODS Wellensiek, B. P., Larsen, A. C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B. L., Kumar, S., Chaput, J. C. 2013; 10 (8): 747-?

    Abstract

    We report an in vitro selection strategy to identify RNA sequences that mediate cap-independent initiation of translation. This method entails mRNA display of trillions of genomic fragments, selection for initiation of translation and high-throughput deep sequencing. We identified >12,000 translation-enhancing elements (TEEs) in the human genome, generated a high-resolution map of human TEE-bearing regions (TBRs), and validated the function of a subset of sequences in vitro and in cultured cells.

    View details for DOI 10.1038/nmeth.2522

    View details for Web of Science ID 000322453600023

    View details for PubMedID 23770754

  • Functional genomic screen of human stem cell differentiation reveals pathways involved in neurodevelopment and neurodegeneration PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhang, Y., Schulz, V. P., Reed, B. D., Wang, Z., Pan, X., Mariani, J., Euskirchen, G., Snyder, M. P., Vaccarino, F. M., Ivanova, N., Weissman, S. M., Szekely, A. M. 2013; 110 (30): 12361-12366

    Abstract

    Human embryonic stem cells (hESCs) can be induced and differentiated to form a relatively homogeneous population of neuronal precursors in vitro. We have used this system to screen for genes necessary for neural lineage development by using a pooled human short hairpin RNA (shRNA) library screen and massively parallel sequencing. We confirmed known genes and identified several unpredicted genes with interrelated functions that were specifically required for the formation or survival of neuronal progenitor cells without interfering with the self-renewal capacity of undifferentiated hESCs. Among these are several genes that have been implicated in various neurodevelopmental disorders (i.e., brain malformations, mental retardation, and autism). Unexpectedly, a set of genes mutated in late-onset neurodegenerative disorders and with roles in the formation of RNA granules were also found to interfere with neuronal progenitor cell formation, suggesting their functional relevance in early neurogenesis. This study advances the feasibility and utility of using pooled shRNA libraries in combination with next-generation sequencing for a high-throughput, unbiased functional genomic screen. Our approach can also be used with patient-specific human-induced pluripotent stem cell-derived neural models to obtain unparalleled insights into developmental and degenerative processes in neurological or neuropsychiatric disorders with monogenic or complex inheritance.

    View details for DOI 10.1073/pnas.1309725110

    View details for Web of Science ID 000322112300054

    View details for PubMedID 23836664

    View details for PubMedCentralID PMC3725080

  • Variation and genetic control of protein abundance in humans NATURE Wu, L., Candille, S. I., Choi, Y., Xie, D., Jiang, L., Li-Pook-Than, J., Tang, H., Snyder, M. 2013; 499 (7456): 79-82

    Abstract

    Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.

    View details for DOI 10.1038/nature12223

    View details for Web of Science ID 000321285600037

    View details for PubMedID 23676674

  • Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis VIRUSES-BASEL Qian, F., Chung, L., Zheng, W., Bruno, V., Alexander, R. P., Wang, Z., Wang, X., Kurscheid, S., Zhao, H., Fikrig, E., Gerstein, M., Snyder, M., Montgomery, R. R. 2013; 5 (7): 1664-1681

    Abstract

    The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals--and for targeting therapeutics--in multiple biological settings.

    View details for DOI 10.3390/v5071664

    View details for Web of Science ID 000322172200005

    View details for PubMedID 23881275

    View details for PubMedCentralID PMC3738954

  • Genome Wide Proteomics of ERBB2 and EGFR and Other Oncogenic Pathways in Inflammatory Breast Cancer. Journal of proteome research Zhang, E. Y., Cristofanilli, M., Robertson, F., Reuben, J. M., Mu, Z., Beavis, R. C., Im, H., Snyder, M., Hofree, M., Ideker, T., Omenn, G. S., Fanayan, S., Jeong, S., Paik, Y., Zhang, A. F., Wu, S., Hancock, W. S. 2013; 12 (6): 2805-2817

    Abstract

    In this study we selected three breast cancer cell lines (SKBR3, SUM149 and SUM190) with different oncogene expression levels involved in ERBB2 and EGFR signaling pathways as a model system for the evaluation of selective integration of subsets of transcriptomic and proteomic data. We assessed the oncogene status with reads per kilobase per million mapped reads (RPKM) values for ERBB2 (14.4, 400, and 300 for SUM149, SUM190, and SKBR3, respectively) and for EGFR (60.1, not detected, and 1.4 for the same 3 cell lines). We then used RNA-Seq data to identify those oncogenes with significant transcript levels in these cell lines (total 31) and interrogated the corresponding proteomics data sets for proteins with significant interaction values with these oncogenes. The number of observed interactors for each oncogene showed a significant range, e.g., 4.2% (JAK1) to 27.3% (MYC). The percentage is measured as a fraction of the total protein interactions in a given data set vs total interactors for that oncogene in STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 9.0) and I2D (Interologous Interaction Database, version 1.95). This approach allowed us to focus on 4 main oncogenes, ERBB2, EGFR, MYC, and GRB2, for pathway analysis. We used bioinformatics sites GeneGo, PathwayCommons and NCI receptor signaling networks to identify pathways that contained the four main oncogenes and had good coverage in the transcriptomic and proteomic data sets as well as a significant number of oncogene interactors. The four pathways identified were ERBB signaling, EGFR1 signaling, integrin outside-in signaling, and validated targets of C-MYC transcriptional activation. The greater dynamic range of the RNA-Seq values allowed the use of transcript ratios to correlate observed protein values with the relative levels of the ERBB2 and EGFR transcripts in each of the four pathways. This provided us with potential proteomic signatures for the SUM149 and 190 cell lines, growth factor receptor-bound protein 7 (GRB7), Crk-like protein (CRKL) and Catenin delta-1 (CTNND1) for ERBB signaling; caveolin 1 (CAV1), plectin (PLEC) for EGFR signaling; filamin A (FLNA) and actinin alpha1 (ACTN1) (associated with high levels of EGFR transcript) for integrin signalings; branched chain amino-acid transaminase 1 (BCAT1), carbamoyl-phosphate synthetase (CAD), nucleolin (NCL) (high levels of EGFR transcript); transferrin receptor (TFRC), metadherin (MTDH) (high levels of ERBB2 transcript) for MYC signaling; S100-A2 protein (S100A2), caveolin 1 (CAV1), Serpin B5 (SERPINB5), stratifin (SFN), PYD and CARD domain containing (PYCARD), and EPH receptor A2 (EPHA2) for PI3K signaling, p53 subpathway. Future studies of inflammatory breast cancer (IBC), from which the cell lines were derived, will be used to explore the significance of these observations.

    View details for DOI 10.1021/pr4001527

    View details for PubMedID 23647160

  • Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circulation research Churko, J. M., Mantalas, G. L., Snyder, M. P., Wu, J. C. 2013; 112 (12): 1613-1623

    Abstract

    High throughput sequencing technologies have become essential in studies on genomics, epigenomics, and transcriptomics. Although sequencing information has traditionally been elucidated using a low throughput technique called Sanger sequencing, high throughput sequencing technologies are capable of sequencing multiple DNA molecules in parallel, enabling hundreds of millions of DNA molecules to be sequenced at a time. This advantage allows high throughput sequencing to be used to create large data sets, generating more comprehensive insights into the cellular genomic and transcriptomic signatures of various diseases and developmental stages. Within high throughput sequencing technologies, whole exome sequencing can be used to identify novel variants and other mutations that may underlie many genetic cardiac disorders, whereas RNA sequencing can be used to analyze how the transcriptome changes. Chromatin immunoprecipitation sequencing and methylation sequencing can be used to identify epigenetic changes, whereas ribosome sequencing can be used to determine which mRNA transcripts are actively being translated. In this review, we will outline the differences in various sequencing modalities and examine the main sequencing platforms on the market in terms of their relative read depths, speeds, and costs. Finally, we will discuss the development of future sequencing platforms and how these new technologies may improve on current sequencing platforms. Ultimately, these sequencing technologies will be instrumental in further delineating how the cardiovascular system develops and how perturbations in DNA and RNA can lead to cardiovascular disease.

    View details for DOI 10.1161/CIRCRESAHA.113.300939

    View details for PubMedID 23743227

  • Metabolomics as a robust tool in systems biology and personalized medicine: an open letter to the metabolomics community METABOLOMICS Snyder, M., Li, X. 2013; 9 (3): 532-534
  • iPOP Goes the World: Integrated Personalized Omics Profiling and the Road toward Improved Health Care. Chemistry & biology Li-Pook-Than, J., Snyder, M. 2013; 20 (5): 660-666

    Abstract

    The health of an individual depends upon their DNA as well as upon environmental factors (environome or exposome). It is expected that although the genome is the blueprint of an individual, its analysis with that of the other omes such as the DNA methylome, the transcriptome, proteome, and metabolome will further provide a dynamic assessment of the physiology and health state of an individual. This review will help to categorize the current progress of omics analyses and how omics integration can be used for medical research. We believe that integrative personal omics profiling (iPOP) is a stepping stone to a new road to personalized health care and may improve disease risk assessment, accuracy of diagnosis, disease monitoring, targeted treatments, and understanding the biological processes of disease states for their prevention.

    View details for DOI 10.1016/j.chembiol.2013.05.001

    View details for PubMedID 23706632

  • Identification of Potential Glycan Cancer Markers with Sialic Acid Attached to Sialic Acid and Up-regulated Fucosylated Galactose Structures in Epidermal Growth Factor Receptor Secreted from A431 Cell Line. Molecular & cellular proteomics Wu, S., Taylor, A. D., Lu, Q., Hanash, S. M., Im, H., Snyder, M., Hancock, W. S. 2013; 12 (5): 1239-1249

    Abstract

    We have used powerful HPLC-mass spectrometric approaches to characterize the secreted form of epidermal growth factor receptor (sEGFR). We demonstrated that the amino acid sequence lacked the cytoplasmic domain and was consistent with the primary sequence reported for EGFR purified from a human plasma pool. One of the sEGFR forms, attributed to the alternative RNA splicing, was also confirmed by transcriptional analysis (RNA sequencing). Two unusual types of glycan structures were observed in sEGFR as compared with membrane-bound EGFR from the A431 cell line. The unusual glycan structures were di-sialylated glycans (sialic acid attached to sialic acid) at Asn-151 and N-acetylhexosamine attached to a branched fucosylated galactose with N-acetylglucosamine moieties (HexNAc-(Fuc)Gal-GlcNAc) at Asn-420. These unusual glycans at specific sites were either present at a much lower level or were not observable in membrane-bound EGFR present in the A431 cell lysate. The observation of these di-sialylated glycan structures was consistent with the observed expression of the corresponding α-N-acetylneuraminide α-2,8-sialyltransferase 2 (ST8SiA2) and α-N-acetylneuraminide α-2,8-sialyltransferase 4 (ST8SiA4), by quantitative real time RT-PCR. The connectivity present at the branched fucosylated galactose was also confirmed by methylation of the glycans followed by analysis with sequential fragmentation in mass spectrometry. We hypothesize that the presence of such glycan structures could promote secretion via anionic or steric repulsion mechanisms and thus facilitate the observation of these glycan forms in the secreted fractions. We plan to use this model system to facilitate the search for novel glycan structures present at specific sites in sEGFR as well as other secreted oncoproteins such as Erbb2 as markers of disease progression in blood samples from cancer patients.

    View details for DOI 10.1074/mcp.M112.024554

    View details for PubMedID 23371026

    View details for PubMedCentralID PMC3650335

  • Preparation of recombinant protein spotted arrays for proteome-wide identification of kinase targets. Current protocols in protein science / editorial board, John E. Coligan ... [et al.] Im, H., Snyder, M. 2013; Chapter 27: Unit 27 4-?

    Abstract

    Protein microarrays allow unique approaches for interrogating global protein interaction networks. Protein arrays can be divided into two categories: antibody arrays and functional protein arrays. Antibody arrays consist of various antibodies and are appropriate for profiling protein abundance and modifications. Functional full-length protein arrays employ full-length proteins with various post-translational modifications. A key advantage of the latter is rapid parallel processing of large number of proteins for studying highly controlled biochemical activities, protein-protein interactions, protein-nucleic acid interactions, and protein-small molecule interactions. This unit presents a protocol for constructing functional yeast protein microarrays for global kinase substrate identification. This approach enables the rapid determination of protein interaction networks in yeast on a proteome-wide level. The same methodology can be readily applied to higher eukaryotic systems with careful consideration of overexpression strategy.

    View details for DOI 10.1002/0471140864.ps2704s72

    View details for PubMedID 23546622

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405 JOURNAL OF PROTEOME RESEARCH Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013; 12 (4): 1732-1742

    Abstract

    As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (∼9800 transcripts per cell line) versus the protein observations (∼1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.

    View details for DOI 10.1021/pr3010869

    View details for Web of Science ID 000317327500018

  • Comparative annotation of functional regions in the human genome using epigenomic data NUCLEIC ACIDS RESEARCH Won, K., Zhang, X., Wang, T., Ding, B., Raha, D., Snyder, M., Ren, B., Wang, W. 2013; 41 (8): 4423-4432

    Abstract

    Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.

    View details for DOI 10.1093/nar/gkt143

    View details for Web of Science ID 000318569700014

    View details for PubMedID 23482391

    View details for PubMedCentralID PMC3632130

  • A Major Epigenetic Programming Mechanism Guided by piRMAs DEVELOPMENTAL CELL Huang, X. A., Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2013; 24 (5): 502-516

    Abstract

    A central enigma in epigenetics is how epigenetic factors are guided to specific genomic sites for their function. Previously, we reported that a Piwi-piRNA complex associates with the piRNA-complementary site in the Drosophila genome and regulates its epigenetic state. Here, we report that Piwi-piRNA complexes bind to numerous piRNA-complementary sequences throughout the genome, implicating piRNAs as a major mechanism that guides Piwi and Piwi-associated epigenetic factors to program the genome. To test this hypothesis, we demonstrate that inserting piRNA-complementary sequences to an ectopic site leads to Piwi, HP1a, and Su(var)3-9 recruitment to the site as well as H3K9me2/3 enrichment and reduced RNA polymerase II association, indicating that piRNA is both necessary and sufficient to recruit Piwi and epigenetic factors to specific genomic sites. Piwi deficiency drastically changed the epigenetic landscape and polymerase II profile throughout the genome, revealing the Piwi-piRNA mechanism as a major epigenetic programming mechanism in Drosophila.

    View details for DOI 10.1016/j.devcel.2013.01.023

    View details for Web of Science ID 000316163000005

    View details for PubMedID 23434410

  • Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing G3-GENES GENOMES GENETICS Tilgner, H., Raha, D., Habegger, L., Mohiuddin, M., Gerstein, M., Snyder, M. 2013; 3 (3): 387-397

    Abstract

    Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.

    View details for DOI 10.1534/g3.112.004812

    View details for Web of Science ID 000315950000002

    View details for PubMedID 23450794

    View details for PubMedCentralID PMC3583448

  • Personal genomes, quantitative dynamic omics and personalized medicine. Quantitative biology (Beijing, China) Mias, G. I., Snyder, M. 2013; 1 (1): 71-90

    Abstract

    The rapid technological developments following the Human Genome Project have made possible the availability of personalized genomes. As the focus now shifts from characterizing genomes to making personalized disease associations, in combination with the availability of other omics technologies, the next big push will be not only to obtain a personalized genome, but to quantitatively follow other omics. This will include transcriptomes, proteomes, metabolomes, antibodyomes, and new emerging technologies, enabling the profiling of thousands of molecular components in individuals. Furthermore, omics profiling performed longitudinally can probe the temporal patterns associated with both molecular changes and associated physiological health and disease states. Such data necessitates the development of computational methodology to not only handle and descriptively assess such data, but also construct quantitative biological models. Here we describe the availability of personal genomes and developing omics technologies that can be brought together for personalized implementations and how these novel integrated approaches may effectively provide a precise personalized medicine that focuses on not only characterization and treatment but ultimately the prevention of disease.

    View details for PubMedID 25798291

  • Extensive Transcript Diversity and Novel Upstream Open Reading Frame Regulation in Yeast G3-GENES GENOMES GENETICS Waern, K., Snyder, M. 2013; 3 (2): 343-352

    Abstract

    To understand the diversity of transcripts in yeast (Saccharomyces cerevisiae) we analyzed the transcriptional landscapes for cells grown under 18 different environmental conditions. Each sample was analyzed using RNA-sequencing, and a total of 670,446,084 uniquely mapped reads and 377,263 poly-adenylated end tags were produced. Consistent with previous studies, we find that the majority of yeast genes are expressed under one or more different conditions. By directly comparing the 5' and 3' ends of the transcribed regions, we find extensive differences in transcript ends across many conditions, especially those of stationary phase, growth in grape juice, and salt stimulation, suggesting differential choice of transcription start and stop sites is pervasive in yeast. Relative to the exponential growth condition (i.e., YPAD), transcripts differing at the 5' ends and 3' ends are predicted to differ in their annotated start codon in 21 genes and their annotated stop codon in 63 genes. Many (431) upstream open reading frames (uORFs) are found in alternate 5' ends and are significantly enriched in transcripts produced during the salt response. Mutational analysis of five genes with uORFs revealed that two sets of uORFs increase the expression of a reporter construct, indicating a role in activation which had not been reported previously, whereas two other uORFs decreased expression. In addition, RNA binding protein motifs are statistically enriched for alternate ends under many conditions. Overall, these results demonstrate enormous diversity of transcript ends, and that this heterogeneity is regulated under different environmental conditions. Moreover, transcript end diversity has important biological implications for the regulation of gene expression. In addition, our data also serve as a valuable resource for the scientific community.

    View details for DOI 10.1534/g3.112.003640

    View details for Web of Science ID 000314881600019

    View details for PubMedID 23390610

    View details for PubMedCentralID PMC3564994

  • SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data GENOME RESEARCH Ouyang, Z., Snyder, M. P., Chang, H. Y. 2013; 23 (2): 377-387

    Abstract

    We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a high-dimensional classification framework, SeqFold efficiently matches a given SPP to the most likely cluster of structures sampled from the Boltzmann-weighted ensemble. SeqFold is able to incorporate diverse types of RNA structure profiling data, including parallel analysis of RNA structure (PARS), selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), fragmentation sequencing (FragSeq) data generated by deep sequencing, and conventional SHAPE data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold outperforms or matches existing approaches in accuracy and is more robust to noise in experimental data. Application of SeqFold to reconstruct the secondary structures of the yeast transcriptome reveals the diverse impact of RNA secondary structure on gene regulation, including translation efficiency, transcription initiation, and protein-RNA interactions. SeqFold can be easily adapted to incorporate any new types of high-throughput RNA structure profiling data and is widely applicable to analyze RNA structures in any transcriptome.

    View details for DOI 10.1101/gr.138545.112

    View details for PubMedID 23064747

  • Two methods for full-length RNA sequencing for low quantities of cells and single cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Durrett, R. E., Zhu, H., Tanaka, Y., Li, Y., Zi, X., Marjani, S. L., Euskirchen, G., Ma, C., LaMotte, R. H., Park, I., Snyder, M. P., Mason, C. E., Weissman, S. M. 2013; 110 (2): 594-599

    Abstract

    The ability to determine the gene expression pattern in low quantities of cells or single cells is important for resolving a variety of problems in many biological disciplines. A robust description of the expression signature of a single cell requires determination of the full-length sequence of the expressed mRNAs in the cell, yet existing methods have either 3' biased or variable transcript representation. Here, we report our protocols for the amplification and high-throughput sequencing of very small amounts of RNA for sequencing using procedures of either semirandom primed PCR or phi29 DNA polymerase-based DNA amplification, for the cDNA generated with oligo-dT and/or random oligonucleotide primers. Unlike existing methods, these protocols produce relatively uniformly distributed sequences covering the full length of almost all transcripts independent of their sizes, from 1,000 to 10 cells, and even with single cells. Both protocols produced satisfactory detection/coverage of the abundant mRNAs from a single K562 erythroleukemic cell or a single dorsal root ganglion neuron. The phi29-based method produces long products with less noise, uses an isothermal reaction, and is simple to practice. The semirandom primed PCR procedure is more sensitive and reproducible at low transcript levels or with low quantities of cells. These methods provide tools for mRNA sequencing or RNA sequencing when only low quantities of cells, a single cell, or even degraded RNA are available for profiling.

    View details for DOI 10.1073/pnas.1217322109

    View details for Web of Science ID 000313906600047

    View details for PubMedID 23267071

    View details for PubMedCentralID PMC3545756

  • Multimodal Dynamic Profiling of Healthy and Diseased States for Future Personalized Health Care CLINICAL PHARMACOLOGY & THERAPEUTICS Mias, G. I., Snyder, M. 2013; 93 (1): 29-32

    View details for DOI 10.1038/clpt.2012.204

    View details for Web of Science ID 000312618200021

    View details for PubMedID 23187877

  • Integrative analysis of longitudinal metabolomics data from a personal multi-omics profile. Metabolites Stanberry, L., Mias, G. I., Haynes, W., Higdon, R., Snyder, M., Kolker, E. 2013; 3 (3): 741-760

    Abstract

    The integrative personal omics profile (iPOP) is a pioneering study that combines genomics, transcriptomics, proteomics, metabolomics and autoantibody profiles from a single individual over a 14-month period. The observation period includes two episodes of viral infection: a human rhinovirus and a respiratory syncytial virus. The profile studies give an informative snapshot into the biological functioning of an organism. We hypothesize that pathway expression levels are associated with disease status. To test this hypothesis, we use biological pathways to integrate metabolomics and proteomics iPOP data. The approach computes the pathways' differential expression levels at each time point, while taking into account the pathway structure and the longitudinal design. The resulting pathway levels show strong association with the disease status. Further, we identify temporal patterns in metabolite expression levels. The changes in metabolite expression levels also appear to be consistent with the disease status. The results of the integrative analysis suggest that changes in biological pathways may be used to predict and monitor the disease. The iPOP experimental design, data acquisition and analysis issues are discussed within the broader context of personal profiling.

    View details for DOI 10.3390/metabo3030741

    View details for PubMedID 24958148

    View details for PubMedCentralID PMC3901289

  • Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports Mias, G. I., Chen, R., Zhang, Y., Sridhar, K., Sharon, D., Xiao, L., Im, H., Snyder, M. P., Greenberg, P. L. 2013; 3: 3311-?

    View details for DOI 10.1038/srep03311

    View details for PubMedID 24264604

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs GENOME BIOLOGY Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1)

    Abstract

    BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.

    View details for DOI 10.1186/gb-2013-14-1-r5

    View details for Web of Science ID 000320155200005

  • High-throughput sequencing for biology and medicine MOLECULAR SYSTEMS BIOLOGY Soon, W. W., Hariharan, M., Snyder, M. P. 2013; 9

    Abstract

    Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.

    View details for DOI 10.1038/msb.2012.61

    View details for Web of Science ID 000314415800010

    View details for PubMedID 23340846

    View details for PubMedCentralID PMC3564260

  • Systematic investigation of protein-small molecule interactions IUBMB LIFE Li, X., Wang, X., Snyder, M. 2013; 65 (1): 2-8

    Abstract

    Cell signaling is extensively wired between cellular components to sustain cell proliferation, differentiation, and adaptation. The interaction network is often manifested in how protein function is regulated through interacting with other cellular components including small molecule metabolites. While many biochemical interactions have been established as reactions between protein enzymes and their substrates and products, much less is known at the system level about how small metabolites regulate protein functions through allosteric binding. In the past decade, study of protein-small molecule interactions has been lagging behind other types of interactions. Recent technological advances have explored several high-throughput platforms to reveal many "unexpected" protein-small molecule interactions that could have profound impact on our understanding of cell signaling. These interactions will help bridge gaps in existing regulatory loops of cell signaling and serve as new targets for medical intervention. In this review, we summarize recent advances of systematic investigation of protein-metabolite/small molecule interactions, and discuss the impact of such studies and their potential impact on both biological researches and medicine.

    View details for DOI 10.1002/iub.1111

    View details for Web of Science ID 000312886200002

    View details for PubMedID 23225626

  • A Chromosome-centric Human Proteome Project (C-HPP) to Characterize the Sets of Proteins Encoded in Chromosome 17 JOURNAL OF PROTEOME RESEARCH Liu, S., Im, H., Bairoch, A., Cristofanilli, M., Chen, R., Deutsch, E. W., Dalton, S., Fenyo, D., Fanayan, S., Gates, C., Gaudet, P., Hincapie, M., Hanash, S., Kim, H., Jeong, S., Lundberg, E., Mias, G., Menon, R., Mu, Z., Nice, E., Paik, Y., Uhlen, M., Wells, L., Wu, S., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Omenn, G. S., Beavis, R. C., Hancock, W. S. 2013; 12 (1): 45-57

    Abstract

    We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

    View details for DOI 10.1021/pr300985j

    View details for Web of Science ID 000313156300007

    View details for PubMedID 23259914

  • Exome sequencing by targeted enrichment. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Clark, M. J., Chen, R., Snyder, M. 2013; Chapter 7: Unit7 12-?

    Abstract

    This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.

    View details for DOI 10.1002/0471142727.mb0712s102

    View details for PubMedID 23547016

  • The variable somatic genome. Cell cycle O'Huallachain, M., Weissman, S. M., Snyder, M. P. 2013; 12 (1): 5-6

    View details for DOI 10.4161/cc.23069

    View details for PubMedID 23255102

    View details for PubMedCentralID PMC3570516

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405. Journal of proteome research Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013

    Abstract

    As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (∼9800 transcripts per cell line) versus the protein observations (∼1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.

    View details for DOI 10.1021/pr3010869

    View details for PubMedID 23458625

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs. Genome biology Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1): R5

    Abstract

    BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.

    View details for DOI 10.1186/gb-2013-14-1-r5

    View details for PubMedID 23347407

  • Promise of personalized omics to precision medicine WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE Chen, R., Snyder, M. 2013; 5 (1): 73-82

    Abstract

    The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.

    View details for DOI 10.1002/wsbm.1198

    View details for Web of Science ID 000312736200005

    View details for PubMedID 23184638

  • Centromere-Like Regions in the Budding Yeast Genome PLOS GENETICS Lefrancois, P., Auerbach, R. K., Yellman, C. M., Roeder, G. S., Snyder, M. 2013; 9 (1)

    Abstract

    Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres.

    View details for DOI 10.1371/journal.pgen.1003209

    View details for Web of Science ID 000314651500052

    View details for PubMedID 23349633

    View details for PubMedCentralID PMC3547844

  • Copy Number Variation detection from 1000 Genomes project exon capture sequencing data BMC BIOINFORMATICS Wu, J., Grzeda, K. R., Stewart, C., Grubert, F., Urban, A. E., Snyder, M. P., Marth, G. T. 2012; 13

    Abstract

    DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%.This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.

    View details for DOI 10.1186/1471-2105-13-305

    View details for Web of Science ID 000314688600001

    View details for PubMedID 23157288

    View details for PubMedCentralID PMC3563612

  • Whole Genome Sequence Analysis of Primary Myelofibrosis. 54th Annual Meeting and Exposition of the American-Society-of-Hematology (ASH) Merker, J. D., Roskin, K., Ng, D., Pan, C., Fisk, D. G., Jones, C. D., Gojenola, L., Clark, M. J., Zhang, B., Cherry, M., Snyder, M., Boyd, S. D., Zehnder, J. L., Fire, A. Z., Gotlib, J. AMER SOC HEMATOLOGY. 2012
  • Genome interpretation and assembly-recent progress and next steps. Nature biotechnology Baker, S., Joecker, A., Church, G., Snyder, M., West, J., Salzberg, S., Worthey, E., Smith, T., Wang, J., Reid, J. G. 2012; 30 (11): 1081-1083

    View details for DOI 10.1038/nbt.2425

    View details for PubMedID 23138307

  • Michael Snyder. Interview by Asher Mullard. Nature reviews. Drug discovery Snyder, M. 2012; 11 (10): 744-?

    View details for DOI 10.1038/nrd3867

    View details for PubMedID 23023673

  • Systems biology: personalized medicine for the future? CURRENT OPINION IN PHARMACOLOGY Chen, R., Snyder, M. 2012; 12 (5): 623-628

    Abstract

    Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.

    View details for DOI 10.1016/j.coph.2012.07.011

    View details for Web of Science ID 000310478800017

    View details for PubMedID 22858243

  • SWI/SNF Chromatin-remodeling Factors: Multiscale Analyses and Diverse Functions JOURNAL OF BIOLOGICAL CHEMISTRY Euskirchen, G., Auerbach, R. K., Snyder, M. 2012; 287 (37): 30897-30905

    Abstract

    Chromatin-remodeling enzymes play essential roles in many biological processes, including gene expression, DNA replication and repair, and cell division. Although one such complex, SWI/SNF, has been extensively studied, new discoveries are still being made. Here, we review SWI/SNF biochemistry; highlight recent genomic and proteomic advances; and address the role of SWI/SNF in human diseases, including cancer and viral infections. These studies have greatly increased our understanding of complex nuclear processes.

    View details for DOI 10.1074/jbc.R111.309302

    View details for Web of Science ID 000308791300003

    View details for PubMedID 22952240

    View details for PubMedCentralID PMC3438922

  • Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements GENOME RESEARCH Kundaje, A., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M., Smith, C. L., Raha, D., Winters, E. E., Johnson, S. M., Snyder, M., Batzoglou, S., Sidow, A. 2012; 22 (9): 1735-1747

    Abstract

    Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.

    View details for DOI 10.1101/gr.136366.111

    View details for PubMedID 22955985

  • A highly integrated and complex PPARGC1A transcription factor binding network in HepG2 cells GENOME RESEARCH Charos, A. E., Reed, B. D., Raha, D., Szekely, A. M., Weissman, S. M., Snyder, M. 2012; 22 (9): 1668-1679

    Abstract

    PPARGC1A is a transcriptional coactivator that binds to and coactivates a variety of transcription factors (TFs) to regulate the expression of target genes. PPARGC1A plays a pivotal role in regulating energy metabolism and has been implicated in several human diseases, most notably type II diabetes. Previous studies have focused on the interplay between PPARGC1A and individual TFs, but little is known about how PPARGC1A combines with all of its partners across the genome to regulate transcriptional dynamics. In this study, we describe a core PPARGC1A transcriptional regulatory network operating in HepG2 cells treated with forskolin. We first mapped the genome-wide binding sites of PPARGC1A using chromatin-IP followed by high-throughput sequencing (ChIP-seq) and uncovered overrepresented DNA sequence motifs corresponding to known and novel PPARGC1A network partners. We then profiled six of these site-specific TF partners using ChIP-seq and examined their network connectivity and combinatorial binding patterns with PPARGC1A. Our analysis revealed extensive overlap of targets including a novel link between PPARGC1A and HSF1, a TF regulating the conserved heat shock response pathway that is misregulated in diabetes. Importantly, we found that different combinations of TFs bound to distinct functional sets of genes, thereby helping to reveal the combinatorial regulatory code for metabolic and other cellular processes. In addition, the different TFs often bound near the promoters and coding regions of each other's genes suggesting an intricate network of interdependent regulation. Overall, our study provides an important framework for understanding the systems-level control of metabolic gene expression in humans.

    View details for DOI 10.1101/gr.127761.111

    View details for Web of Science ID 000308272800009

    View details for PubMedID 22955979

    View details for PubMedCentralID PMC3431484

  • Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for IncRNAs GENOME RESEARCH Tilgner, H., Knowles, D. G., Johnson, R., Davis, C. A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T. R., Guigo, R. 2012; 22 (9): 1616-1625

    Abstract

    Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.

    View details for DOI 10.1101/gr.134445.111

    View details for Web of Science ID 000308272800004

    View details for PubMedID 22955974

    View details for PubMedCentralID PMC3431479

  • VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment BIOINFORMATICS Habegger, L., Balasubramanian, S., Chen, D. Z., Khurana, E., Sboner, A., Harmanci, A., Rozowsky, J., Clarke, D., Snyder, M., Gerstein, M. 2012; 28 (17): 2267-2269

    Abstract

    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment.VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.

    View details for DOI 10.1093/bioinformatics/bts368

    View details for Web of Science ID 000308019200008

    View details for PubMedID 22743228

    View details for PubMedCentralID PMC3426844

  • Understanding transcriptional regulation by integrative analysis of transcription factor binding data GENOME RESEARCH Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K., Dong, X., Djebali, S., Ruan, Y., Davis, C. A., Carninci, P., Lassman, T., Gingerasi, T. R., Guigo, R., Birney, E., Weng, Z., Snyder, M., Gerstein, M. 2012; 22 (9): 1658-1667

    Abstract

    Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.

    View details for DOI 10.1101/gr.136838.111

    View details for Web of Science ID 000308272800008

    View details for PubMedID 22955978

    View details for PubMedCentralID PMC3431483

  • Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors GENOME RESEARCH Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T. W., Greven, M. C., Pierce, B. G., Dong, X., Kundaje, A., Cheng, Y., Rando, O. J., Birney, E., Myers, R. M., Noble, W. S., Snyder, M., Weng, Z. 2012; 22 (9): 1798-1812

    Abstract

    Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.

    View details for DOI 10.1101/gr.139105.112

    View details for Web of Science ID 000308272800020

    View details for PubMedID 22955990

    View details for PubMedCentralID PMC3431495

  • A Genome-Scale Resource for In Vivo Tag-Based Protein Function Exploration in C. elegans CELL Sarov, M., Murray, J. I., Schanze, K., Pozniakovski, A., Niu, W., Angermann, K., Hasse, S., Rupprecht, M., Vinis, E., Tinney, M., Preston, E., Zinke, A., Enst, S., Teichgraber, T., Janette, J., Reis, K., Janosch, S., Schloissnig, S., Ejsmont, R. K., Slightam, C., Xu, X., Kim, S. K., Reinke, V., Stewart, A. F., Snyder, M., Waterston, R. H., Hyman, A. A. 2012; 150 (4): 855-866

    Abstract

    Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in biology. To enable systematic protein function interrogation in a multicellular context, we built a genome-scale transgenic platform for in vivo expression of fluorescent- and affinity-tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering, and next-generation sequencing to generate a resource of 14,637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins.

    View details for DOI 10.1016/j.cell.2012.08.001

    View details for Web of Science ID 000308002300018

    View details for PubMedID 22901814

  • Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis PLOS ONE Ma, S., Bachan, S., Porto, M., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2012; 7 (8)

    Abstract

    The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

    View details for DOI 10.1371/journal.pone.0043198

    View details for Web of Science ID 000307500100069

    View details for PubMedID 22912824

    View details for PubMedCentralID PMC3418279

  • Investigating metabolite-protein interactions: An overview of available techniques METHODS Yang, G. X., Li, X., Snyder, M. 2012; 57 (4): 459-466

    Abstract

    Metabolites comprise the molar majority of chemical substances in living cells, and metabolite-protein interactions are expected to be quite common. Many interactions have already been identified and have been shown to be involved in the regulation of different types of cellular processes including signaling events, enzyme activities, protein localizations and interactions. Recent technological advances have greatly facilitated the detection of metabolite-protein interactions at high sensitivity and some of these have been applied on a large scale. In this manuscript, we review the available in vitro, in silico and in vivo technologies for mapping small-molecule-protein interactions. Although some of these were developed for drug-protein interactions they can be applied for mapping metabolite-protein interactions. Information gained from the use of these approaches can be applied to the manipulation of cellular processes and therapeutic applications.

    View details for DOI 10.1016/j.ymeth.2012.06.013

    View details for Web of Science ID 000309625600009

    View details for PubMedID 22750303

    View details for PubMedCentralID PMC3448827

  • Patient-Specific Induced Pluripotent Stem Cells as a Model for Familial Dilated Cardiomyopathy SCIENCE TRANSLATIONAL MEDICINE Sun, N., Yazawa, M., Liu, J., Han, L., Sanchez-Freire, V., Abilez, O. J., Navarrete, E. G., Hu, S., Wang, L., Lee, A., Pavlovic, A., Lin, S., Chen, R., Hajjar, R. J., Snyder, M. P., Dolmetsch, R. E., Butte, M. J., Ashley, E. A., Longaker, M. T., Robbins, R. C., Wu, J. C. 2012; 4 (130)

    Abstract

    Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric α-actinin. When stimulated with a β-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric α-actinin distribution. Treatment with β-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.

    View details for DOI 10.1126/scitranslmed.3003552

    View details for Web of Science ID 000303045900004

    View details for PubMedID 22517884

    View details for PubMedCentralID PMC3657516

  • Extensive In vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses Experimental Biology Meeting 2012 Snyder, M., Li, X., Gianoulis, T., Yip, K., Gerstein, M. FEDERATION AMER SOC EXP BIOL. 2012
  • A core erythroid transcriptional network is repressed by a master regulator of myelo-lymphoid differentiation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wontakal, S. N., Guo, X., Smith, C., MacCarthy, T., Bresnick, E. H., Bergman, A., Snyder, M. P., Weissman, S. M., Zheng, D., Skoultchi, A. I. 2012; 109 (10): 3832-3837

    Abstract

    Two mechanisms that play important roles in cell fate decisions are control of a "core transcriptional network" and repression of alternative transcriptional programs by antagonizing transcription factors. Whether these two mechanisms operate together is not known. Here we report that GATA-1, SCL, and Klf1 form an erythroid core transcriptional network by co-occupying >300 genes. Importantly, we find that PU.1, a negative regulator of terminal erythroid differentiation, is a highly integrated component of this network. GATA-1, SCL, and Klf1 act to promote, whereas PU.1 represses expression of many of the core network genes. PU.1 also represses the genes encoding GATA-1, SCL, Klf1, and important GATA-1 cofactors. Conversely, in addition to repressing PU.1 expression, GATA-1 also binds to and represses >100 PU.1 myelo-lymphoid gene targets in erythroid progenitors. Mathematical modeling further supports that this dual mechanism of repressing both the opposing upstream activator and its downstream targets provides a synergistic, robust mechanism for lineage specification. Taken together, these results amalgamate two key developmental principles, namely, regulation of a core transcriptional network and repression of an alternative transcriptional program, thereby enhancing our understanding of the mechanisms that establish cellular identity.

    View details for DOI 10.1073/pnas.1121019109

    View details for Web of Science ID 000301117700049

    View details for PubMedID 22357756

    View details for PubMedCentralID PMC3309740

  • Tcf7 Is an Important Regulator of the Switch of Self-Renewal and Differentiation in a Multipotential Hematopoietic Cell Line PLOS GENETICS Wu, J. Q., Seay, M., Schulz, V. P., Hariharan, M., Tuck, D., Lian, J., Du, J., Shi, M., Ye, Z., Gerstein, M., Snyder, M. P., Weissman, S. 2012; 8 (3)

    Abstract

    A critical problem in biology is understanding how cells choose between self-renewal and differentiation. To generate a comprehensive view of the mechanisms controlling early hematopoietic precursor self-renewal and differentiation, we used systems-based approaches and murine EML multipotential hematopoietic precursor cells as a primary model. EML cells give rise to a mixture of self-renewing Lin-SCA+CD34+ cells and partially differentiated non-renewing Lin-SCA-CD34- cells in a cell autonomous fashion. We identified and validated the HMG box protein TCF7 as a regulator in this self-renewal/differentiation switch that operates in the absence of autocrine Wnt signaling. We found that Tcf7 is the most down-regulated transcription factor when CD34+ cells switch into CD34- cells, using RNA-Seq. We subsequently identified the target genes bound by TCF7, using ChIP-Seq. We show that TCF7 and RUNX1 (AML1) bind to each other's promoter regions and that TCF7 is necessary for the production of the short isoforms, but not the long isoforms of RUNX1, suggesting that TCF7 and the short isoforms of RUNX1 function coordinately in regulation. Tcf7 knock-down experiments and Gene Set Enrichment Analyses suggest that TCF7 plays a dual role in promoting the expression of genes characteristic of self-renewing CD34+ cells while repressing genes activated in partially differentiated CD34- state. Finally a network of up-regulated transcription factors of CD34+ cells was constructed. Factors that control hematopoietic stem cell (HSC) establishment and development, cell growth, and multipotency were identified. These studies in EML cells demonstrate fundamental cell-intrinsic properties of the switch between self-renewal and differentiation, and yield valuable insights for manipulating HSCs and other differentiating systems.

    View details for DOI 10.1371/journal.pgen.1002565

    View details for Web of Science ID 000302254800041

    View details for PubMedID 22412390

    View details for PubMedCentralID PMC3297581

  • The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome NATURE BIOTECHNOLOGY Paik, Y., Jeong, S., Omenn, G. S., Uhlen, M., Hanash, S., Cho, S. Y., Lee, H., Na, K., Choi, E., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Cheng, Y., Chen, R., Marko-Varga, G., Deutsch, E. W., Kim, H., Kwon, J., Aebersold, R., Bairoch, A., Taylor, A. D., Kim, K. Y., Lee, E., Hochstrasser, D., Legrain, P., Hancock, W. S. 2012; 30 (3): 221-223

    View details for Web of Science ID 000301303800011

    View details for PubMedID 22398612

  • Correlation of Global MicroRNA Expression With Basal Cell Carcinoma Subtype G3-GENES GENOMES GENETICS Heffelfinger, C., Ouyang, Z., Engberg, A., Leffell, D. J., Hanlon, A. M., Gordon, P. B., Zheng, W., Zhao, H., Snyder, M. P., Bale, A. E. 2012; 2 (2): 279-286

    Abstract

    Basal cell carcinomas (BCCs) are the most common cancers in the United States. The histologic appearance distinguishes several subtypes, each of which can have a different biologic behavior. In this study, global miRNA expression was quantified by high-throughput sequencing in nodular BCCs, a subtype that is slow growing, and infiltrative BCCs, aggressive tumors that extend through the dermis and invade structures such as cutaneous nerves. Principal components analysis correctly classified seven of eight infiltrative tumors on the basis of miRNA expression. The remaining tumor, on pathology review, contained a mixture of nodular and infiltrative elements. Nodular tumors did not cluster tightly, likely reflecting broader histopathologic diversity in this class, but trended toward forming a group separate from infiltrative BCCs. Quantitative polymerase chain reaction assays were developed for six of the miRNAs that showed significant differences between the BCC subtypes, and five of these six were validated in a replication set of four infiltrative and three nodular tumors. The expression level of miR-183, a miRNA that inhibits invasion and metastasis in several types of malignancies, was consistently lower in infiltrative than nodular tumors and could be one element underlying the difference in invasiveness. These results represent the first miRNA profiling study in BCCs and demonstrate that miRNA gene expression may be involved in tumor pathogenesis and particularly in determining the aggressiveness of these malignancies.

    View details for DOI 10.1534/g3.111.001115

    View details for Web of Science ID 000312411000015

    View details for PubMedID 22384406

    View details for PubMedCentralID PMC3284335

  • An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome biology Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie1, B. R., Jain, G., Sanyal, A., Chen, K. B., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., Desalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8): 418

    Abstract

    ABSTRACT: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.

    View details for DOI 10.1186/gb-2012-13-8-418

    View details for PubMedID 22889292

    View details for PubMedCentralID PMC3491367

  • Deciphering DNA Sequence Information GENOME ORGANIZATION AND FUNCTION IN THE CELL NUCLEUS Kaganovich, M., Snyder, M., Rippe, K. 2012: 1–20
  • An encyclopedia of mouse DNA elements (Mouse ENCODE) GENOME BIOLOGY Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie, B. R., Jain, G., Sanyal, A., Chen, K., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., DeSalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8)
  • Q & A: the Snyderome GENOME BIOLOGY Snyder, M. 2012; 13 (3)

    Abstract

    Michael Snyder answers Genome Biology's questions on the human and professional stories underlying his Snyderome integrative omics project.

    View details for DOI 10.1186/gb-2012-13-3-147

    View details for Web of Science ID 000308544200010

    View details for PubMedID 22424393

    View details for PubMedCentralID PMC3439959

  • Phosphorylation of Yeast Transcription Factors Correlates with the Evolution of Novel Sequence and Function JOURNAL OF PROTEOME RESEARCH Kaganovich, M., Snyder, M. 2012; 11 (1): 261-268

    Abstract

    Gene duplication is a significant source of novel genes and the dynamics of gene duplicate retention vs loss are poorly understood, particularly in terms of the functional and regulatory specialization of their gene products. We compiled a comprehensive data set of S. cerevisiae phosphosites to study the role of phosphorylation in yeast paralog divergence. We found that proteins coded by duplicated genes created in the Whole Genome Duplication (WGD) event and in a period prior to the WGD are significantly more phosphorylated than other duplicates or singletons. Though the amino acid sequence of each paralog of a given pair tends to diverge fairly similarly from their common ortholog in a related species, the phosphorylated amino acids tend to diverge in sequence from the ortholog at different rates. We observed that transcription factors (TFs) are disproportionately present among the set of duplicate genes and among the set of proteins that are phosphorylated. Interestingly, TFs that occur on higher levels of the transcription network hierarchy (i.e., tend to regulate other TFs) tend to be more phosphorylated than lower-level TFs. We found that TF paralog divergence in expression, binding, and sequence correlates with the abundance of phosphosites. Overall, these studies have important implications for understanding divergence of gene function and regulation in eukaryotes.

    View details for DOI 10.1021/pr201065k

    View details for Web of Science ID 000298827700024

    View details for PubMedID 22141333

  • Interpretome: a freely available, modular, and secure personal genome interpretation engine. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Karczewski, K. J., Tirrell, R. P., Cordero, P., Tatonetti, N. P., Dudley, J. T., Salari, K., Snyder, M., Altman, R. B., Kim, S. K. 2012: 339-350

    Abstract

    The decreasing cost of genotyping and genome sequencing has ushered in an era of genomic personalized medicine. More than 100,000 individuals have been genotyped by direct-to-consumer genetic testing services, which offer a glimpse into the interpretation and exploration of a personal genome. However, these interpretations, which require extensive manual curation, are subject to the preferences of the company and are not customizable by the individual. Academic institutions teaching personalized medicine, as well as genetic hobbyists, may prefer to customize their analysis and have full control over the content and method of interpretation. We present the Interpretome, a system for private genome interpretation, which contains all genotype information in client-side interpretation scripts, supported by server-side databases. We provide state-of-the-art analyses for teaching clinical implications of personal genomics, including disease risk assessment and pharmacogenomics. Additionally, we have implemented client-side algorithms for ancestry inference, demonstrating the power of these methods without excessive computation. Finally, the modular nature of the system allows for plugin capabilities for custom analyses. This system will allow for personal genome exploration without compromising privacy, facilitating hands-on courses in genomics and personalized medicine.

    View details for PubMedID 22174289

  • Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors GENOME BIOLOGY Yip, K. Y., Cheng, C., Bhardwaj, N., Brown, J. B., Leng, J., Kundaje, A., Rozowsky, J., Birney, E., Bickel, P., Snyder, M., Gerstein, M. 2012; 13 (9)

    Abstract

    Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors.As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions.Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.

    View details for DOI 10.1186/gb-2012-13-9-r48

    View details for Web of Science ID 000313182600001

    View details for PubMedID 22950945

    View details for PubMedCentralID PMC3491392

  • Characterization of Enhancer Function from Genome-Wide Analyses ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 13 Maston, G. A., Landt, S. G., Snyder, M., Green, M. R. 2012; 13: 29-57

    Abstract

    There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.

    View details for DOI 10.1146/annurev-genom-090711-163723

    View details for Web of Science ID 000310143800002

    View details for PubMedID 22703170

  • A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult Drosophila melanogaster PLOS GENETICS Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2011; 7 (12)

    Abstract

    Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications.

    View details for DOI 10.1371/journal.pgen.1002380

    View details for Web of Science ID 000299167900003

    View details for PubMedID 22194694

  • Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms PLOS ONE Haraksingh, R. R., Abyzov, A., Gerstein, M., Urban, A. E., Snyder, M. 2011; 6 (11)

    Abstract

    Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.

    View details for DOI 10.1371/journal.pone.0027859

    View details for Web of Science ID 000298168100021

    View details for PubMedID 22140474

    View details for PubMedCentralID PMC3227574

  • Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data PLOS COMPUTATIONAL BIOLOGY Cheng, C., Yan, K., Hwang, W., Qian, J., Bhardwaj, N., Rozowsky, J., Lu, Z. J., Niu, W., Alves, P., Kato, M., Snyder, M., Gerstein, M. 2011; 7 (11)

    Abstract

    We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

    View details for DOI 10.1371/journal.pcbi.1002190

    View details for Web of Science ID 000297263700001

    View details for PubMedID 22125477

    View details for PubMedCentralID PMC3219617

  • Performance comparison of exome DNA sequencing technologies NATURE BIOTECHNOLOGY Clark, M. J., Chen, R., Lam, H. Y., Karczewski, K. J., Chen, R., Euskirchen, G., Butte, A. J., Snyder, M. 2011; 29 (10): 908-U206

    Abstract

    Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.

    View details for DOI 10.1038/nbt.1975

    View details for Web of Science ID 000296273000017

    View details for PubMedID 21947028

  • Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence PLOS GENETICS Dewey, F. E., Chen, R., Cordero, S. P., Ormond, K. E., Caleshu, C., Karczewski, K. J., Whirl-Carrillo, M., Wheeler, M. T., Dudley, J. T., Byrnes, J. K., Cornejo, O. E., Knowles, J. W., Woon, M., Sangkuhl, K., Gong, L., Thorn, C. F., Hebert, J. M., Capriotti, E., David, S. P., Pavlovic, A., West, A., Thakuria, J. V., Ball, M. P., Zaranek, A. W., Rehm, H. L., Church, G. M., West, J. S., Bustamante, C. D., Snyder, M., Altman, R. B., Klein, T. E., Butte, A. J., Ashley, E. A. 2011; 7 (9)

    Abstract

    Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

    View details for DOI 10.1371/journal.pgen.1002280

    View details for PubMedID 21935354

  • Arabidopsis RTNLB1 and RTNLB2 Reticulon-Like Proteins Regulate Intracellular Trafficking and Activity of the FLS2 Immune Receptor PLANT CELL Lee, H. Y., Bowen, C. H., Popescu, G. V., Kang, H., Kato, N., Ma, S., Dinesh-Kumar, S., Snyder, M., Popescu, S. C. 2011; 23 (9): 3374-3391

    Abstract

    Receptors localized at the plasma membrane are critical for the recognition of pathogens. The molecular determinants that regulate receptor transport to the plasma membrane are poorly understood. In a screen for proteins that interact with the FLAGELIN-SENSITIVE2 (FLS2) receptor using Arabidopsis thaliana protein microarrays, we identified the reticulon-like protein RTNLB1. We showed that FLS2 interacts in vivo with both RTNLB1 and its homolog RTNLB2 and that a Ser-rich region in the N-terminal tail of RTNLB1 is critical for the interaction with FLS2. Transgenic plants that lack RTNLB1 and RTNLB2 (rtnlb1 rtnlb2) or overexpress RTNLB1 (RTNLB1ox) exhibit reduced activation of FLS2-dependent signaling and increased susceptibility to pathogens. In both rtnlb1 rtnlb2 and RTNLB1ox, FLS2 accumulation at the plasma membrane was significantly affected compared with the wild type. Transient overexpression of RTNLB1 led to FLS2 retention in the endoplasmic reticulum (ER) and affected FLS2 glycosylation but not FLS2 stability. Removal of the critical N-terminal Ser-rich region or either of the two Tyr-dependent sorting motifs from RTNLB1 causes partial reversion of the negative effects of excess RTNLB1 on FLS2 transport out of the ER and accumulation at the membrane. The results are consistent with a model whereby RTNLB1 and RTNLB2 regulate the transport of newly synthesized FLS2 to the plasma membrane.

    View details for DOI 10.1105/tpc.111.089656

    View details for Web of Science ID 000296739100025

    View details for PubMedID 21949153

    View details for PubMedCentralID PMC3203430

  • Cooperative transcription factor associations discovered using regulatory variation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Karczewski, K. J., Tatonetti, N. P., Landt, S. G., Yang, X., Slifer, T., Altman, R. B., Snyder, M. 2011; 108 (32): 13353-13358

    Abstract

    Regulation of gene expression at the transcriptional level is achieved by complex interactions of transcription factors operating at their target genes. Dissecting the specific combination of factors that bind each target is a significant challenge. Here, we describe in detail the Allele Binding Cooperativity test, which uses variation in transcription factor binding among individuals to discover combinations of factors and their targets. We developed the ALPHABIT (a large-scale process to hunt for allele binding interacting transcription factors) pipeline, which includes statistical analysis of binding sites followed by experimental validation, and demonstrate that this method predicts transcription factors that associate with NFκB. Our method successfully identifies factors that have been known to work with NFκB (E2A, STAT1, IRF2), but whose global coassociation and sites of cooperative action were not known. In addition, we identify a unique coassociation (EBF1) that had not been reported previously. We present a general approach for discovering combinatorial models of regulation and advance our understanding of the genetic basis of variation in transcription factor binding.

    View details for DOI 10.1073/pnas.1103105108

    View details for Web of Science ID 000293691400076

    View details for PubMedID 21828005

    View details for PubMedCentralID PMC3156166

  • A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans PLOS GENETICS Stewart, C., Kural, D., Stroemberg, M. P., Walker, J. A., Konkel, M. K., Stuetz, A. M., Urban, A. E., Grubert, F., Lam, H. Y., Lee, W., Busby, M., Indap, A. R., Garrison, E., Huff, C., Xing, J., Snyder, M. P., Jorde, L. B., Batzer, M. A., Korbel, J. O., Marth, G. T. 2011; 7 (8)

    Abstract

    As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.

    View details for DOI 10.1371/journal.pgen.1002236

    View details for Web of Science ID 000294297000031

    View details for PubMedID 21876680

    View details for PubMedCentralID PMC3158055

  • AlleleSeq: analysis of allele-specific expression and binding in a network framework MOLECULAR SYSTEMS BIOLOGY Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., Bhardwaj, N., Rubin, M., Snyder, M., Gerstein, M. 2011; 7

    Abstract

    To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.

    View details for DOI 10.1038/msb.2011.54

    View details for Web of Science ID 000294537800003

    View details for PubMedID 21811232

    View details for PubMedCentralID PMC3208341

  • Identification of genomic indels and structural variations using split reads BMC GENOMICS Zhang, Z. D., Du, J., Lam, H., Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 12

    Abstract

    Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.

    View details for DOI 10.1186/1471-2164-12-375

    View details for Web of Science ID 000294205500001

    View details for PubMedID 21787423

    View details for PubMedCentralID PMC3161018

  • Metabolites as global regulators: A new view of protein regulation BIOESSAYS Li, X., Snyder, M. 2011; 33 (7): 485-489

    View details for DOI 10.1002/bies.201100026

    View details for Web of Science ID 000292710500002

    View details for PubMedID 21495048

  • The Human Proteome Project: Current State and Future Direction MOLECULAR & CELLULAR PROTEOMICS Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C. H., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Wu, C. H., Yamamoto, T., Paik, Y., Omenn, G. S. 2011; 10 (7)

    Abstract

    After the successful completion of the Human Genome Project, the Human Proteome Organization has recently officially launched a global Human Proteome Project (HPP), which is designed to map the entire human protein set. Given the lack of protein-level evidence for about 30% of the estimated 20,300 protein-coding genes, a systematic global effort will be necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP research groups will use the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge bases. The HPP participants will take advantage of the output and cross-analyses from the ongoing Human Proteome Organization initiatives and a chromosome-centric protein mapping strategy, termed C-HPP, with which many national teams are currently engaged. In addition, numerous biologically driven and disease-oriented projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents, and tools for protein studies and analyses, and a stronger basis for personalized medicine. The Human Proteome Organization urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.M111.009993

    View details for Web of Science ID 000292541500012

    View details for PubMedID 21742803

    View details for PubMedCentralID PMC3134076

  • Landscape of Next-Generation Sequencing Technologies ANALYTICAL CHEMISTRY Niedringhaus, T. P., Milanova, D., Kerby, M. B., Snyder, M. P., Barron, A. E. 2011; 83 (12): 4327-4341

    View details for DOI 10.1021/ac2010857

    View details for Web of Science ID 000291499800001

    View details for PubMedID 21612267

    View details for PubMedCentralID PMC3437308

  • A Large Gene Network in Immature Erythroid Cells Is Controlled by the Myeloid and B Cell Transcriptional Regulator PU.1 PLOS GENETICS Wontakal, S. N., Guo, X., Will, B., Shi, M., Raha, D., Mahajan, M. C., Weissman, S., Snyder, M., Steidl, U., Zheng, D., Skoultchi, A. I. 2011; 7 (6)

    Abstract

    PU.1 is a hematopoietic transcription factor that is required for the development of myeloid and B cells. PU.1 is also expressed in erythroid progenitors, where it blocks erythroid differentiation by binding to and inhibiting the main erythroid promoting factor, GATA-1. However, other mechanisms by which PU.1 affects the fate of erythroid progenitors have not been thoroughly explored. Here, we used ChIP-Seq analysis for PU.1 and gene expression profiling in erythroid cells to show that PU.1 regulates an extensive network of genes that constitute major pathways for controlling growth and survival of immature erythroid cells. By analyzing fetal liver erythroid progenitors from mice with low PU.1 expression, we also show that the earliest erythroid committed cells are dramatically reduced in vivo. Furthermore, we find that PU.1 also regulates many of the same genes and pathways in other blood cells, leading us to propose that PU.1 is a multifaceted factor with overlapping, as well as distinct, functions in several hematopoietic lineages.

    View details for DOI 10.1371/journal.pgen.1001392

    View details for Web of Science ID 000292386300004

    View details for PubMedID 21695229

    View details for PubMedCentralID PMC3111485

  • CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing GENOME RESEARCH Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 21 (6): 974-984

    Abstract

    Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.

    View details for DOI 10.1101/gr.114876.110

    View details for Web of Science ID 000291153400017

    View details for PubMedID 21324876

    View details for PubMedCentralID PMC3106330

  • Genome-wide chromatin occupancy analysis reveals a role for ASH2 in transcriptional pausing NUCLEIC ACIDS RESEARCH Perez-Lluch, S., Blanco, E., Carbonell, A., Raha, D., Snyder, M., Serras, F., Corominas, M. 2011; 39 (11): 4628-4639

    Abstract

    An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.

    View details for DOI 10.1093/nar/gkq1322

    View details for Web of Science ID 000291755000015

    View details for PubMedID 21310711

    View details for PubMedCentralID PMC3113561

  • Diverse protein kinase interactions identified by protein microarrays reveal novel connections between cellular processes GENES & DEVELOPMENT Fasolo, J., Sboner, A., Sun, M. G., Yu, H., Chen, R., Sharon, D., Kim, P. M., Gerstein, M., Snyder, M. 2011; 25 (7): 767-778

    Abstract

    Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.

    View details for DOI 10.1101/gad.1998811

    View details for Web of Science ID 000289062700010

    View details for PubMedID 21460040

    View details for PubMedCentralID PMC3070938

  • A User's Guide to the Encyclopedia of DNA Elements (ENCODE) PLOS BIOLOGY Myers, R. M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R. C., Bernstein, B. E., Gingeras, T. R., Kent, W. J., Birney, E., Wold, B., Crawford, G. E., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Mikkelsen, T. S., Kheradpour, P., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Thanh Truong, T., Ward, L. D., Altshuler, R. C., Lin, M. F., Kellis, M., Gingeras, T. R., Davis, C. A., Kapranov, P., Dobin, A., Zaleski, C., Schlesinger, F., Batut, P., Chakrabortty, S., Jha, S., Lin, W., Drenkow, J., Wang, H., Bell, K., Gao, H., Bell, I., Dumais, E., Dumais, J., Antonarakis, S. E., Ucla, C., Borel, C., Guigo, R., Djebali, S., Lagarde, J., Kingswood, C., Ribeca, P., Sammeth, M., Alioto, T., Merkel, A., Tilgner, H., Carninci, P., Hayashizaki, Y., Lassmann, T., Takahashi, H., Abdelhamid, R. F., Hannon, G., Fejes-Toth, K., Preall, J., Gordon, A., Sotirova, V., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Ruan, Y., Ruan, X., Shahab, A., Poh, W. T., Wei, C., Crawford, G. E., Furey, T. S., Boyle, A. P., Sheffield, N. C., Song, L., Shibata, Y., Vales, T., Winter, D., Zhang, Z., London, D., Wang, T., Birney, E., Keefe, D., Iyer, V. R., Lee, B., McDaniell, R. M., Liu, Z., Battenhouse, A., Bhinge, A. A., Lieb, J. D., Grasfeder, L. L., Showers, K. A., Giresi, P. G., Kim, S. K., Shestak, C., Myers, R. M., Pauli, F., Reddy, T. E., Gertz, J., Partridge, E. C., Jain, P., Sprouse, R. O., Bansal, A., Pusey, B., Muratet, M. A., Varley, K. E., Bowling, K. M., Newberry, K. M., Nesmith, A. S., Dilocker, J. A., Parker, S. L., Waite, L. L., Thibeault, K., Roberts, K., Absher, D. M., Wold, B., Mortazavi, A., Williams, B., Marinov, G., Trout, D., Pepke, S., King, B., McCue, K., Kirilusha, A., DeSalvo, G., Fisher-Aylor, K., Amrhein, H., Vielmetter, J., Sherlock, G., Sidow, A., Batzoglou, S., Rauch, R., Kundaje, A., Libbrecht, M., Margulies, E. H., Parker, S. C., Elnitski, L., Green, E. D., Hubbard, T., Harrow, J., Searle, S., Kokocinski, F., Aken, B., Frankish, A., Hunt, T., Despacio-Reyes, G., Kay, M., Mukherjee, G., Bignell, A., Saunders, G., Boychenko, V., Brent, M., van Baren, M. J., Brown, R. H., Gerstein, M., Khurana, E., Balasubramanian, S., Zhang, Z., Lam, H., Cayting, P., Robilotto, R., Lu, Z., Guigo, R., Derrien, T., Tanzer, A., Knowles, D. G., Mariotti, M., Kent, W. J., Haussler, D., Harte, R., Diekhans, M., Kellis, M., Lin, M., Kheradpour, P., Ernst, J., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Valencia, A., Tress, M., Manuel Rodriguez, J., Snyder, M., Landt, S. G., Raha, D., Shi, M., Euskirchen, G., Grubert, F., Kasowski, M., Lian, J., Cayting, P., Lacroute, P., Xu, Y., Monahan, H., Patacsil, D., Slifer, T., Yang, X., Charos, A., Reed, B., Wu, L., Auerbach, R. K., Habegger, L., Hariharan, M., Rozowsky, J., Abyzov, A., Weissman, S. M., Gerstein, M., Struhl, K., Lamarre-Vincent, N., Lindahl-Allen, M., Miotto, B., Moqtaderi, Z., Fleming, J. D., Newburger, P., Farnham, P. J., Frietze, S., O'Geen, H., Xu, X., Blahnik, K. R., Cao, A. R., Iyengar, S., Stamatoyannopoulos, J. A., Kaul, R., Thurman, R. E., Wang, H., Navas, P. A., Sandstrom, R., Sabo, P. J., Weaver, M., Canfield, T., Lee, K., Neph, S., Roach, V., Reynolds, A., Johnson, A., Rynes, E., Giste, E., Vong, S., Neri, J., Frum, T., Johnson, E. M., Nguyen, E. D., Ebersol, A. K., Sanchez, M. E., Sheffer, H. H., Lotakis, D., Haugen, E., Humbert, R., Kutyavin, T., Shafer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Kent, W. J., Rosenbloom, K. R., Dreszer, T. R., Raney, B. J., Barber, G. P., Meyer, L. R., Sloan, C. A., Malladi, V. S., Cline, M. S., Learned, K., Swing, V. K., Zweig, A. S., Rhead, B., Fujita, P. A., Roskin, K., Karolchik, D., Kuhn, R. M., Haussler, D., Birney, E., Dunham, I., Wilder, S. P., Keefe, D., Sobral, D., Herrero, J., Beal, K., Lukk, M., Brazma, A., Vaquerizas, J. M., Luscombe, N. M., Bickel, P. J., Boley, N., Brown, J. B., Li, Q., Huang, H., Gerstein, M., Habegger, L., Sboner, A., Rozowsky, J., Auerbach, R. K., Yip, K. Y., Cheng, C., Yan, K., Bhardwaj, N., Wang, J., Lochovsky, L., Jee, J., Gibson, T., Leng, J., Du, J., Hardison, R. C., Harris, R. S., Song, G., Miller, W., Haussler, D., Roskin, K., Suh, B., Wang, T., Paten, B., Noble, W. S., Hoffman, M. M., Buske, O. J., Weng, Z., Dong, X., Wang, J., Xi, H., Tenenbaum, S. A., Doyle, F., Penalva, L. O., Chittur, S., Tullius, T. D., Parker, S. C., White, K. P., Karmakar, S., Victorsen, A., Jameel, N., Bild, N., Grossman, R. L., Snyder, M., Landt, S. G., Yang, X., Patacsil, D., Slifer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Trinklein, N. D., Partridge, E. C., Myers, R. M., Giddings, M. C., Chen, X., Khatun, J., Maier, C., Yu, Y., Gunawardena, H., Risk, B., Feingold, E. A., Lowdon, R. F., Dillon, L. A., Good, P. J. 2011; 9 (4)

    Abstract

    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

    View details for DOI 10.1371/journal.pbio.1001046

    View details for Web of Science ID 000289938900014

  • Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches PLOS GENETICS Euskirchen, G. M., Auerbach, R. K., Davidov, E., Gianoulis, T. A., Zhong, G., Rozowsky, J., Bhardwaj, N., Gerstein, M. B., Snyder, M. 2011; 7 (3)

    Abstract

    A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5' ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.

    View details for DOI 10.1371/journal.pgen.1002008

    View details for Web of Science ID 000288996600042

    View details for PubMedID 21408204

    View details for PubMedCentralID PMC3048368

  • Mapping copy number variation by population-scale genome sequencing NATURE Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E., Chen, K., Alkan, C., Abyzov, A., Yoon, S. C., Ye, K., Cheetham, R. K., Chinwalla, A., Conrad, D. F., Fu, Y., Grubert, F., Hajirasouliha, I., Hormozdiari, F., Iakoucheva, L. M., Iqbal, Z., Kang, S., Kidd, J. M., Konkel, M. K., Korn, J., Khurana, E., Kural, D., Lam, H. Y., Leng, J., Li, R., Li, Y., Lin, C., Luo, R., Mu, X. J., Nemesh, J., Peckham, H. E., Rausch, T., Scally, A., Shi, X., Stromberg, M. P., Stuetz, A. M., Urban, A. E., Walker, J. A., Wu, J., Zhang, Y., Zhang, Z. D., Batzer, M. A., Ding, L., Marth, G. T., McVean, G., Sebat, J., Snyder, M., Wang, J., Ye, K., Eichler, E. E., Gerstein, M. B., Hurles, M. E., Lee, C., McCarroll, S. A., Korbel, J. O. 2011; 470 (7332): 59-65

    Abstract

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

    View details for DOI 10.1038/nature09708

    View details for Web of Science ID 000286886400033

    View details for PubMedID 21293372

    View details for PubMedCentralID PMC3077050

  • Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data GENOME RESEARCH Lu, Z. J., Yip, K. Y., Wang, G., Shou, C., Hillier, L. W., Khurana, E., Agarwal, A., Auerbach, R., Rozowsky, J., Cheng, C., Kato, M., Miller, D. M., Slack, F., Snyder, M., Waterston, R. H., Reinke, V., Gerstein, M. B. 2011; 21 (2): 276-285

    Abstract

    We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.

    View details for DOI 10.1101/gr.110189.110

    View details for Web of Science ID 000286804100013

    View details for PubMedID 21177971

    View details for PubMedCentralID PMC3032931

  • Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans GENOME RESEARCH Niu, W., Lu, Z. J., Zhong, M., Sarov, M., Murray, J. I., Brdlik, C. M., Janette, J., Chen, C., Alves, P., Preston, E., Slightham, C., Jiang, L., Hyman, A. A., Kim, S. K., Waterston, R. H., Gerstein, M., Snyder, M., Reinke, V. 2011; 21 (2): 245-254

    Abstract

    Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors--LIN-39, MAB-5, and EGL-5--indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.

    View details for DOI 10.1101/gr.114587.110

    View details for Web of Science ID 000286804100010

    View details for PubMedID 21177963

    View details for PubMedCentralID PMC3032928

  • RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries BIOINFORMATICS Habegger, L., Sboner, A., Gianoulis, T. A., Rozowsky, J., Agarwal, A., Snyder, M., Gerstein, M. 2011; 27 (2): 281-283

    Abstract

    The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.

    View details for DOI 10.1093/bioinformatics/btq643

    View details for Web of Science ID 000286215200025

    View details for PubMedID 21134889

    View details for PubMedCentralID PMC3018817

  • Stat3 is essential for neuronal differentiation through direct transcriptional regulation of the Sox6 gene FEBS LETTERS Snyder, M., Huang, X., Zhang, J. J. 2011; 585 (1): 148-152

    Abstract

    The transcription factor Signal Transducer and Activator of Transcription 3 (Stat3) functions in various cellular processes including neuronal differentiation. We show that the SRY-box containing gene 6 (Sox6) gene, important for neuronal differentiation, is a direct target gene of Stat3. We demonstrate that in response to ligand stimulation, Stat3 binds to the Sox6 promoter and induces its expression. Furthermore, Stat3 is activated and Sox6 is induced during neuronal differentiation of P19 cells in the absence of exogenous ligand treatment. Moreover, using an RNA interference approach, we show that Stat3 is required for Sox6 expression during neuronal differentiation.

    View details for DOI 10.1016/j.febslet.2010.11.030

    View details for Web of Science ID 000285921500025

    View details for PubMedID 21094641

  • The human proteome project: Current state and future direction. Molecular & cellular proteomics : MCP Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Hu, C. H., Yamamoto, T., Paik, Y. K., Omenn, G. S. 2011

    Abstract

    After successful completion of the Human Genome Project (HGP), HUPO has recently officially launched a global Human Proteome Project (HPP) which is designed to map the entire human protein set. Given the presence of about 30% undisclosed proteins out of 20,300 protein gene products, a systematic global effort is necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP groups employ the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge base. The HPP participants will take advantage of the output and cross-analyses from the ongoing HUPO initiatives and a chromosome-based protein mapping strategy, termed C-HPP with many national teams currently engaged. In addition, numerous biologically-driven projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents and tools for protein studies and analyses, and a stronger basis for personalized medicine. HUPO urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.O111.009993

    View details for PubMedID 21531903

  • Embryonic Stem Cells: Discovery, Development, and Current Trends STEM CELLS AND REGENERATIVE MEDICINE: FROM MOLECULAR EMBRYOLOGY TO TISSUE ENGINEERING Theodorou, E., Snyder, M., Appasani, K., Appasani, R. K. 2011: 19–43
  • Analyzing In Vivo Metabolite-Protein Interactions By Large-Scale Systematic Analyses. Current protocols in chemical biology Li, X., Snyder, M. 2011; 3 (4): 181-196

    Abstract

    Metabolites interact with proteins in vivo in various ways other than enzymatic reactions. Profiling of such interactions may help disclose unknown molecular mechanisms that regulate protein functions, and provide potential targets for disease treatment. Here we describe a procedure for systematic analyses of metabolite-protein interactions in vivo. This procedure couples protein affinity purification and mass spectrometry to identify metabolite-protein interactions. The primary effort can be completed within one day and scaled to process hundreds of samples in a batch. Originally developed in yeast, the same principle and protocol can be adapted to other organisms.

    View details for PubMedID 22846927

  • The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics GENOME BIOLOGY Gianoulis, T. A., Agarwal, A., Snyder, M., Gerstein, M. B. 2011; 12 (3)

    Abstract

    Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic--for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.

    View details for DOI 10.1186/gb-2011-12-3-r32

    View details for Web of Science ID 000291309200012

    View details for PubMedID 21453526

    View details for PubMedCentralID PMC3129682

  • Regulatory Variation Within Between Species ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 12 Zheng, W., Gianoulis, T. A., Karczewski, K. J., Zhao, H., Snyder, M. 2011; 12: 327-346

    Abstract

    Understanding how individuals differ from one another and from closely related species is a fundamental problem in biology. Recent evidence suggests that much of the variation both within and between species is due to differential gene regulation. Here we review differential gene regulation focusing on evolutionary-developmental (evo-devo) biology, global comparison of genomic sequences, whole-genome gene expression, and transcription factor (TF) binding profiles. We also explore the relationship between divergence rate of regulatory sequences, coding sequences, and TF binding events using several different measures and discuss their implications in the context of evolution of regulatory networks. Finally, we discuss the current status and future challenges in relating regulatory variation to the divergence across and within species.

    View details for DOI 10.1146/annurev-genom-082908-150139

    View details for Web of Science ID 000295819900014

    View details for PubMedID 21721942

  • Kinase substrate interactions. Methods in molecular biology (Clifton, N.J.) Smith, M. G., Ptacek, J., Snyder, M. 2011; 723: 201-212

    Abstract

    Kinases have become popular therapeutic targets primarily due to their integral role in cell cycle and tumor progression. The efficacy of high-throughput screening efforts is dependent on the development of high quality multiplex tools capable of replacing lower-throughput technologies such as mass spectroscopy or solution-based assays for the study of kinase-substrate interactions. Functional protein microarrays are comprised of thousands of immobilized proteins on glass slides that have been used successfully to identify protein-protein interactions. Here, we describe the application of functional protein microarrays for the identification of the phosphorylation targets of individual protein kinases using highly sensitive radioactive detection and robust informatics algorithms.

    View details for DOI 10.1007/978-1-61779-043-0_13

    View details for PubMedID 21370067

  • Measuring the Evolutionary Rewiring of Biological Networks PLOS COMPUTATIONAL BIOLOGY Shou, C., Bhardwaj, N., Lam, H. Y., Yan, K., Kim, P. M., Snyder, M., Gerstein, M. B. 2011; 7 (1)

    Abstract

    We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or "rewire", at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of "commonplace" networks such as family trees, co-authorships and linux-kernel function dependencies.

    View details for DOI 10.1371/journal.pcbi.1001050

    View details for Web of Science ID 000286652100009

    View details for PubMedID 21253555

    View details for PubMedCentralID PMC3017101

  • RNA sequencing. Methods in molecular biology (Clifton, N.J.) Waern, K., Nagalakshmi, U., Snyder, M. 2011; 759: 125-132

    Abstract

    This chapter describes the RNA sequencing (RNA-Seq) protocol, whereby RNA from yeast cells is prepared for sequencing on an Illumina Genome Analyzer. The protocol can easily be altered to use RNA from a different organism. This chapter covers RNA extraction, cDNA synthesis, cDNA fragmentation, and Illumina cDNA library generation and contains some brief remarks on bioinformatic analysis.

    View details for DOI 10.1007/978-1-61779-173-4_8

    View details for PubMedID 21863485

  • Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project SCIENCE Gerstein, M. B., Lu, Z. J., Van Nostrand, E. L., Cheng, C., Arshinoff, B. I., Liu, T., Yip, K. Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., Alves, P., Chateigner, A., Perry, M., Morris, M., Auerbach, R. K., Feng, X., Leng, J., Vielle, A., Niu, W., Rhrissorrakrai, K., Agarwal, A., Alexander, R. P., Barber, G., Brdlik, C. M., Brennan, J., Brouillet, J. J., Carr, A., Cheung, M., Clawson, H., Contrino, S., Dannenberg, L. O., Dernburg, A. F., Desai, A., Dick, L., Dose, A. C., Du, J., Egelhofer, T., Ercan, S., Euskirchen, G., Ewing, B., Feingold, E. A., Gassmann, R., Good, P. J., Green, P., Gullier, F., Gutwein, M., Guyer, M. S., Habegger, L., Han, T., Henikoff, J. G., Henz, S. R., Hinrichs, A., Holster, H., Hyman, T., Iniguez, A. L., Janette, J., Jensen, M., Kato, M., Kent, W. J., Kephart, E., Khivansara, V., Khurana, E., Kim, J. K., Kolasinska-Zwierz, P., Lai, E. C., Latorre, I., Leahey, A., Lewis, S., Lloyd, P., Lochovsky, L., Lowdon, R. F., Lubling, Y., Lyne, R., MacCoss, M., Mackowiak, S. D., Mangone, M., McKay, S., Mecenas, D., Merrihew, G., Miller, D. M., Muroyama, A., Murray, J. I., Ooi, S., Pham, H., Phippen, T., Preston, E. A., Rajewsky, N., Raetsch, G., Rosenbaum, H., Rozowsky, J., Rutherford, K., Ruzanov, P., Sarov, M., Sasidharan, R., Sboner, A., Scheid, P., Segal, E., Shin, H., Shou, C., Slack, F. J., Slightam, C., Smith, R., Spencer, W. C., Stinson, E. O., Taing, S., Takasaki, T., Vafeados, D., Voronina, K., Wang, G., Washington, N. L., Whittle, C. M., Wu, B., Yan, K., Zeller, G., Zha, Z., Zhong, M., Zhou, X., Ahringer, J., Strome, S., Gunsalus, K. C., Micklem, G., Liu, X. S., Reinke, V., Kim, S. K., Hillier, L. W., Henikoff, S., Piano, F., Snyder, M., Stein, L., Lieb, J. D., Waterston, R. H. 2010; 330 (6012): 1775-1787

    Abstract

    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

    View details for DOI 10.1126/science.1196914

    View details for Web of Science ID 000285603700031

    View details for PubMedID 21177976

    View details for PubMedCentralID PMC3142569

  • Statistical Issues in Mapping QTLs for RNA-seq Data 19th Annual Meeting of the International-Genetic-Epidemiology-Society Zheng, W., Raha, D., Snyder, M., Zhao, H. WILEY-BLACKWELL. 2010: 942–42
  • Exploring successful community pharmacist-physician collaborative working relationships using mixed methods RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY Snyder, M. E., Zillich, A. J., Primack, B. A., Rice, K. R., McGivney, M. A., Pringle, J. L., Smith, R. B. 2010; 6 (4): 307-323

    Abstract

    Collaborative working relationships (CWRs) between community pharmacists and physicians may foster the provision of medication therapy management services, disease state management, and other patient care activities; however, pharmacists have expressed difficulty in developing such relationships. Additional work is needed to understand the specific pharmacist-physician exchanges that effectively contribute to the development of CWR. Data from successful pairs of community pharmacists and physicians may provide further insights into these exchange variables and expand research on models of professional collaboration.To describe the professional exchanges that occurred between community pharmacists and physicians engaged in successful CWRs, using a published conceptual model and tool for quantifying the extent of collaboration.A national pool of experts in community pharmacy practice identified community pharmacists engaged in CWRs with physicians. Five pairs of community pharmacists and physician colleagues participated in individual semistructured interviews, and 4 of these pairs completed the Pharmacist-Physician Collaborative Index (PPCI). Main outcome measures include quantitative (ie, scores on the PPCI) and qualitative information about professional exchanges within 3 domains found previously to influence relationship development: relationship initiation, trustworthiness, and role specification.On the PPCI, participants scored similarly on trustworthiness; however, physicians scored higher on relationship initiation and role specification. The qualitative interviews revealed that when initiating relationships, it was important for many pharmacists to establish open communication through face-to-face visits with physicians. Furthermore, physicians were able to recognize in these pharmacists a commitment for improved patient care. Trustworthiness was established by pharmacists making consistent contributions to care that improved patient outcomes over time. Open discussions regarding professional roles and an acknowledgment of professional norms (ie, physicians as decision makers) were essential.The findings support and extend the literature on pharmacist-physician CWRs by examining the exchange domains of relationship initiation, trustworthiness, and role specification qualitatively and quantitatively among pairs of practitioners. Relationships appeared to develop in a manner consistent with a published model for CWRs, including the pharmacist as relationship initiator, the importance of communication during early stages of the relationship, and an emphasis on high-quality pharmacist contributions.

    View details for DOI 10.1016/j.sapharm.2009.11.008

    View details for Web of Science ID 000285168400005

    View details for PubMedID 21111388

  • Transformation of Candida albicans with a synthetic hygromycin B resistance gene YEAST Basso, L. R., Bartiss, A., Mao, Y., Gast, C. E., Coelho, P. S., Snyder, M., Wong, B. 2010; 27 (12): 1039-1048

    Abstract

    Synthetic genes that confer resistance to the antibiotic nourseothricin in the pathogenic fungus Candida albicans are available, but genes conferring resistance to other antibiotics are not. We found that multiple C. albicans strains were inhibited by hygromycin B, so we designed a 1026 bp gene (CaHygB) that encodes Escherichia coli hygromycin B phosphotransferase with C. albicans codons. CaHygB conferred hygromycin B resistance in C. albicans transformed with ars2-containing plasmids or single-copy integrating vectors. Since CaHygB did not confer nourseothricin resistance and since the nourseothricin resistance marker SAT-1 did not confer hygromycin B resistance, we reasoned that these two markers could be used for homologous gene disruptions in wild-type C. albicans. We used PCR to fuse CaHygB or SAT-1 to approximately 1 kb of 5' and 3' noncoding DNA from C. albicans ARG4, HIS1 and LEU2, and introduced the resulting amplicons into six wild-type C. albicans strains. Homologous targeting frequencies were approximately 50-70%, and disruption of ARG4, HIS1 and LEU2 alleles was verified by the respective transformants' inabilities to grow without arginine, histidine and leucine. CaHygB should be a useful tool for genetic manipulation of different C. albicans strains, including clinical isolates.

    View details for DOI 10.1002/yea.1813

    View details for PubMedID 20737428

  • Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads BMC GENOMICS Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M., Wang, Z. 2010; 11

    Abstract

    Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

    View details for DOI 10.1186/1471-2164-11-663

    View details for Web of Science ID 000285303000001

    View details for PubMedID 21106091

    View details for PubMedCentralID PMC3152782

  • Extensive In Vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses CELL Li, X., Gianoulis, T. A., Yip, K. Y., Gerstein, M., Snyder, M. 2010; 143 (4): 639-650

    Abstract

    Natural small compounds comprise most cellular molecules and bind proteins as substrates, products, cofactors, and ligands. However, a large-scale investigation of in vivo protein-small metabolite interactions has not been performed. We developed a mass spectrometry assay for the large-scale identification of in vivo protein-hydrophobic small metabolite interactions in yeast and analyzed compounds that bind ergosterol biosynthetic proteins and protein kinases. Many of these proteins bind small metabolites; a few interactions were previously known, but the vast majority are new. Importantly, many key regulatory proteins such as protein kinases bind metabolites. Ergosterol was found to bind many proteins and may function as a general regulator. It is required for the activity of Ypk1, a mammalian AKT/SGK kinase homolog. Our study defines potential key regulatory steps in lipid biosynthetic pathways and suggests that small metabolites may play a more general role as regulators of protein activity and function than previously appreciated.

    View details for DOI 10.1016/j.cell.2010.09.048

    View details for Web of Science ID 000284149100020

    View details for PubMedID 21035178

    View details for PubMedCentralID PMC3005334

  • A map of human genome variation from population-scale sequencing NATURE Altshuler, D., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., De La Vega, F. M., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D., Peltonen, L., Schafer, A. J., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Deiros, D., Metzker, M., Muzny, D., Reid, J., Wheeler, D., Wang, J., Li, J., Jian, M., Li, G., Li, R., Liang, H., Tian, G., Wang, B., Wang, J., Wang, W., Yang, H., Zhang, X., Zheng, H., Lander, E. S., Altshuler, D. L., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Jaffe, D. B., Shefler, E., Sougnez, C. L., Bentley, D. R., Gormley, N., Humphray, S., Kingsbury, Z., Koko-Gonzales, P., Stone, J., McKernan, K. J., Costa, G. L., Ichikawa, J. K., Lee, C. C., Sudbrak, R., Lehrach, H., Borodina, T. A., Dahl, A., Davydov, A. N., Marquardt, P., Mertes, F., Nietfeld, W., Rosenstiel, P., Schreiber, S., Soldatov, A. V., Timmermann, B., Tolzmann, M., Egholm, M., Affourtit, J., Ashworth, D., Attiya, S., Bachorski, M., Buglione, E., Burke, A., Caprio, A., Celone, C., Clark, S., Conners, D., Desany, B., Gu, L., Guccione, L., Kao, K., Kebbel, A., Knowlton, J., Labrecque, M., McDade, L., Mealmaker, C., Minderman, M., Nawrocki, A., Niazi, F., Pareja, K., Ramenani, R., Riches, D., Song, W., Turcotte, C., Wang, S., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Weinstock, G., Durbin, R. M., Burton, J., Carter, D. M., Churcher, C., Coffey, A., Cox, A., Palotie, A., Quail, M., Skelly, T., Stalker, J., Swerdlow, H. P., Turner, D., De Witte, A., Giles, S., Gibbs, R. A., Wheeler, D., Bainbridge, M., Challis, D., Sabo, A., Yu, F., Yu, J., Wang, J., Fang, X., Guo, X., Li, R., Li, Y., Luo, R., Tai, S., Wu, H., Zheng, H., Zheng, X., Zhou, Y., Yang, H., Marth, G. T., Garrison, E. P., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., Daly, M. J., DePristo, M. A., Altshuler, D. L., Ball, A. D., Banks, E., Bloom, T., Browning, B. L., Cibulskis, K., Fennell, T. J., Garimella, K. V., Grossman, S. R., Handsaker, R. E., Hanna, M., Hartl, C., Jaffe, D. B., Kernytsky, A. M., Korn, J. M., Li, H., Maguire, J. R., McCarroll, S. A., McKenna, A., Nemesh, J. C., Philippakis, A. A., Poplin, R. E., Price, A., Rivas, M. A., Sabeti, P. C., Schaffner, S. F., Shefler, E., Shlyakhter, I. A., Cooper, D. N., Ball, E. V., Mort, M., Phillips, A. D., Stenson, P. D., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Bustamante, C. D., Clark, A. G., Boyko, A., Degenhardt, J., Gravel, S., Gutenkunst, R. N., Kaganovich, M., Keinan, A., Lacroute, P., Ma, X., Reynolds, A., Clarke, L., Flicek, P., Cunningham, F., Herrero, J., Keenen, S., Kulesha, E., Leinonen, R., McLaren, W., Radhakrishnan, R., Smith, R. E., Zalunin, V., Zheng-Bradley, X., Korbel, J. O., Stuetz, A. M., Humphray, S., Bauer, M., Cheetham, R. K., Cox, T., Eberle, M., James, T., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Hyland, F. C., Manning, J. M., McLaughlin, S. F., Peckham, H. E., Sakarya, O., Sun, Y. A., Tsung, E. F., Batzer, M. A., Konkel, M. K., Walker, J. A., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Herwig, R., Parkhomchuk, D. V., Sherry, S. T., Agarwala, R., Khouri, H., Morgulis, A. O., Paschall, J. E., Phan, L. D., Rotmistrovsky, K. E., Sanders, R. D., Shumway, M. F., Xiao, C., McVean, G. A., Auton, A., Iqbal, Z., Lunter, G., Marchini, J. L., Moutsianas, L., Myers, S., Tumian, A., Desany, B., Knight, J., Winer, R., Craig, D. W., Beckstrom-Sternberg, S. M., Christoforides, A., Kurdoglu, A. A., Pearson, J., Sinari, S. A., Tembe, W. D., Haussler, D., Hinrichs, A. S., Katzman, S. J., Kern, A., Kuhn, R. M., Przeworski, M., Hernandez, R. D., Howie, B., Kelley, J. L., Melton, S. C., Abecasis, G. R., Li, Y., Anderson, P., Blackwell, T., Chen, W., Cookson, W. O., Ding, J., Kang, H. M., Lathrop, M., Liang, L., Moffatt, M. F., Scheet, P., Sidore, C., Snyder, M., Zhan, X., Zoellner, S., Awadalla, P., Casals, F., Idaghdour, Y., Keebler, J., Stone, E. A., Zilversmit, M., Jorde, L., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Sahinalp, S. C., Sudmant, P. H., Mardis, E. R., Chen, K., Chinwalla, A., Ding, L., Koboldt, D. C., McLellan, M. D., Dooling, D., Weinstock, G., Wallis, J. W., Wendl, M. C., Zhang, Q., Durbin, R. M., Albers, C. A., Ayub, Q., Balasubramaniam, S., Barrett, J. C., Carter, D. M., Chen, Y., Conrad, D. F., Danecek, P., Dermitzakis, E. T., Hu, M., Huang, N., Hurles, M. E., Jin, H., Jostins, L., Keane, T. M., Keane, T. M., Le, S. Q., Lindsay, S., Long, Q., MacArthur, D. G., Montgomery, S. B., Parts, L., Stalker, J., Tyler-Smith, C., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Abyzov, A., Balasubramanian, S., Bjornson, R., Du, J., Grubert, F., Habegger, L., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Li, Y., Luo, R., Marth, G. T., Garrison, E. P., Kural, D., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., McCarroll, S. A., Banks, E., DePristo, M. A., Handsaker, R. E., Hartl, C., Korn, J. M., Li, H., Nemesh, J. C., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Degenhardt, J., Kaganovich, M., Clarke, L., Smith, R. E., Zheng-Bradley, X., Korbel, J. O., Humphray, S., Cheetham, R. K., Eberle, M., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Peckham, H. E., Sun, Y. A., Batzer, M. A., Konkel, M. K., Xiao, C., Iqbal, Z., Desany, B., Blackwell, T., Snyder, M., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Chen, K., Chinwalla, A., Ding, L., McLellan, M. D., Wallis, J. W., Hurles, M. E., Conrad, D. F., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Du, J., Grubert, F., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Gibbs, R. A., Bainbridge, M., Challis, D., Coafra, C., Dinh, H., Kovar, C., Lee, S., Muzny, D., Nazareth, L., Reid, J., Sabo, A., Yu, F., Yu, J., Marth, G. T., Garrison, E. P., Indap, A., Leong, W. F., Quinlan, A. R., Stewart, C., Ward, A. N., Wu, J., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Garimella, K. V., Hartl, C., Shefler, E., Sougnez, C. L., Wilkinson, J., Clark, A. G., Gravel, S., Grubert, F., Clarke, L., Flicek, P., Smith, R. E., Zheng-Bradley, X., Sherry, S. T., Khouri, H. M., Paschall, J. E., Shumway, M. F., Xiao, C., McVean, G. A., Katzman, S. J., Abecasis, G. R., Blackwell, T., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Koboldt, D. C., Durbin, R. M., Balasubramaniam, S., Coffey, A., Keane, T. M., MacArthur, D. G., Palotie, A., Scott, C., Stalker, J., Tyler-Smith, C., Gerstein, M. B., Balasubramanian, S., Chakravarti, A., Knoppers, B. M., Peltonen, L., Abecasis, G. R., Bustamante, C. D., Gharani, N., Gibbs, R. A., Jorde, L., Kaye, J. S., Kent, A., Li, T., McGuire, A. L., McVean, G. A., Ossorio, P. N., Rotimi, C. N., Su, Y., Toji, L. H., Tyler-Smith, C., Brooks, L. D., Felsenfeld, A. L., McEwen, J. E., Abdallah, A., Juenger, C. R., Clemm, N. C., Collins, F. S., Duncanson, A., Green, E. D., Guyer, M. S., Peterson, J. L., Schafer, A. J., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E., McVean, G. A. 2010; 467 (7319): 1061-1073

    Abstract

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    View details for DOI 10.1038/nature09534

    View details for Web of Science ID 000283548600039

    View details for PubMedCentralID PMC3042601

  • Yeast proteomics and protein microarrays JOURNAL OF PROTEOMICS Chen, R., Snyder, M. 2010; 73 (11): 2147-2157

    Abstract

    Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.

    View details for DOI 10.1016/j.jprot.2010.08.003

    View details for Web of Science ID 000283903000008

    View details for PubMedID 20728591

    View details for PubMedCentralID PMC2949546

  • Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq GENOME RESEARCH Bruno, V. M., Wang, Z., Marjani, S. L., Euskirchen, G. M., Martin, J., Sherlock, G., Snyder, M. 2010; 20 (10): 1451-1458

    Abstract

    Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.

    View details for DOI 10.1101/gr.109553.110

    View details for Web of Science ID 000282375000015

    View details for PubMedID 20810668

    View details for PubMedCentralID PMC2945194

  • Annotating non-coding regions of the genome NATURE REVIEWS GENETICS Alexander, R. P., Fang, G., Rozowsky, J., Snyder, M., Gerstein, M. B. 2010; 11 (8): 559-571

    Abstract

    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

    View details for DOI 10.1038/nrg2814

    View details for Web of Science ID 000279988800012

    View details for PubMedID 20628352

  • Initiation of the TORC1-Regulated G(0) Program Requires Igo1/2, which License Specific mRNAs to Evade Degradation via the 5 '-3 ' mRNA Decay Pathway MOLECULAR CELL Talarek, N., Cameroni, E., Jaquenoud, M., Luo, X., Bontron, S., Lippman, S., Devgan, G., Snyder, M., Broach, J. R., De Virgilio, C. 2010; 38 (3): 345-355

    Abstract

    Eukaryotic cell proliferation is controlled by growth factors and essential nutrients, in the absence of which cells may enter into a quiescent (G(0)) state. In yeast, nitrogen and/or carbon limitation causes downregulation of the conserved TORC1 and PKA signaling pathways and, consequently, activation of the PAS kinase Rim15, which orchestrates G(0) program initiation and ensures proper life span by controlling distal readouts, including the expression of specific genes. Here, we report that Rim15 coordinates transcription with posttranscriptional mRNA protection by phosphorylating the paralogous Igo1 and Igo2 proteins. This event, which stimulates Igo proteins to associate with the mRNA decapping activator Dhh1, shelters newly expressed mRNAs from degradation via the 5'-3' mRNA decay pathway, thereby enabling their proper translation during initiation of the G(0) program. These results delineate a likely conserved mechanism by which nutrient limitation leads to stabilization of specific mRNAs that are critical for cell differentiation and life span.

    View details for DOI 10.1016/j.molcel.2010.02.039

    View details for Web of Science ID 000277818400006

    View details for PubMedID 20471941

  • MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains BMC BIOINFORMATICS Lam, H. Y., Kim, P. M., Mok, J., Tonikian, R., Sidhu, S. S., Turk, B. E., Snyder, M., Gerstein, M. B. 2010; 11

    Abstract

    Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.

    View details for DOI 10.1186/1471-2105-11-243

    View details for Web of Science ID 000279728900007

    View details for PubMedID 20459839

  • Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells NATURE STRUCTURAL & MOLECULAR BIOLOGY Moqtaderi, Z., Wang, J., Raha, D., White, R. J., Snyder, M., Weng, Z., Struhl, K. 2010; 17 (5): 635-U139

    Abstract

    Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.

    View details for DOI 10.1038/nsmb.1794

    View details for Web of Science ID 000277330700020

    View details for PubMedID 20418883

    View details for PubMedCentralID PMC3350333

  • Genetic analysis of variation in transcription factor binding in yeast NATURE Zheng, W., Zhao, H., Mancera, E., Steinmetz, L. M., Snyder, M. 2010; 464 (7292): 1187-U106

    Abstract

    Variation in transcriptional regulation is thought to be a major cause of phenotypic diversity. Although widespread differences in gene expression among individuals of a species have been observed, studies to examine the variability of transcription factor binding on a global scale have not been performed, and thus the extent and underlying genetic basis of transcription factor binding diversity is unknown. By mapping differences in transcription factor binding among individuals, here we present the genetic basis of such variation on a genome-wide scale. Whole-genome Ste12-binding profiles were determined using chromatin immunoprecipitation coupled with DNA sequencing in pheromone-treated cells of 43 segregants of a cross between two highly diverged yeast strains and their parental lines. We identified extensive Ste12-binding variation among individuals, and mapped underlying cis- and trans-acting loci responsible for such variation. We showed that most transcription factor binding variation is cis-linked, and that many variations are associated with polymorphisms residing in the binding motifs of Ste12 as well as those of several proposed Ste12 cofactors. We also identified two trans-factors, AMN1 and FLO8, that modulate Ste12 binding to promoters of more than ten genes under alpha-factor treatment. Neither of these two genes was previously known to regulate Ste12, and we suggest that they may be mediators of gene activity and phenotypic diversity. Ste12 binding strongly correlates with gene expression for more than 200 genes, indicating that binding variation is functional. Many of the variable-bound genes are involved in cell wall organization and biogenesis. Overall, these studies identified genetic regulators of molecular diversity among individuals and provide new insights into mechanisms of gene regulation.

    View details for DOI 10.1038/nature08934

    View details for Web of Science ID 000276891100036

    View details for PubMedID 20237471

    View details for PubMedCentralID PMC2941147

  • Variation in Transcription Factor Binding Among Humans SCIENCE Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S. M., Habegger, L., Rozowsky, J., Shi, M., Urban, A. E., Hong, M., Karczewski, K. J., Huber, W., Weissman, S. M., Gerstein, M. B., Korbel, J. O., Snyder, M. 2010; 328 (5975): 232-235

    Abstract

    Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (PolII) and a key regulator of immune responses, nuclear factor kappaB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing PolII binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.

    View details for DOI 10.1126/science.1183621

    View details for Web of Science ID 000276459600043

    View details for PubMedID 20299548

    View details for PubMedCentralID PMC2938768

  • Molecular Mechanisms of Ethanol-Induced Pathogenesis Revealed by RNA-Sequencing PLOS PATHOGENS Camarena, L., Bruno, V., Euskirchen, G., Poggio, S., Snyder, M. 2010; 6 (4)

    Abstract

    Acinetobacter baumannii is a common pathogen whose recent resistance to drugs has emerged as a major health problem. Ethanol has been found to increase the virulence of A. baumannii in Dictyostelium discoideum and Caenorhabditis elegans models of infection. To better understand the causes of this effect, we examined the transcriptional profile of A. baumannii grown in the presence or absence of ethanol using RNA-Seq. Using the Illumina/Solexa platform, a total of 43,453,960 reads (35 nt) were obtained, of which 3,596,474 mapped uniquely to the genome. Our analysis revealed that ethanol induces the expression of 49 genes that belong to different functional categories. A strong induction was observed for genes encoding metabolic enzymes, indicating that ethanol is efficiently assimilated. In addition, we detected the induction of genes encoding stress proteins, including upsA, hsp90, groEL and lon as well as permeases, efflux pumps and a secreted phospholipase C. In stationary phase, ethanol strongly induced several genes involved with iron assimilation and a high-affinity phosphate transport system, indicating that A. baumannii makes a better use of the iron and phosphate resources in the medium when ethanol is used as a carbon source. To evaluate the role of phospholipase C (Plc1) in virulence, we generated and analyzed a deletion mutant for plc1. This strain exhibits a modest, but reproducible, reduction in the cytotoxic effect caused by A. baumannii on epithelial cells, suggesting that phospholipase C is important for virulence. Overall, our results indicate the power of applying RNA-Seq to identify key modulators of bacterial pathogenesis. We suggest that the effect of ethanol on the virulence of A. baumannii is multifactorial and includes a general stress response and other specific components such as phospholipase C.

    View details for DOI 10.1371/journal.ppat.1000834

    View details for Web of Science ID 000277722400007

    View details for PubMedID 20368969

    View details for PubMedCentralID PMC2848557

  • Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wu, J. Q., Habegger, L., Noisa, P., Szekely, A., Qiu, C., Hutchison, S., Raha, D., Egholm, M., Lin, H., Weissman, S., Cui, W., Gerstein, M., Snyder, M. 2010; 107 (11): 5254-5259

    Abstract

    To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.

    View details for DOI 10.1073/pnas.0914114107

    View details for Web of Science ID 000275714300079

    View details for PubMedID 20194744

    View details for PubMedCentralID PMC2841935

  • Personal genome sequencing: current approaches and challenges GENES & DEVELOPMENT Snyder, M., Du, J., Gerstein, M. 2010; 24 (5): 423-431

    Abstract

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., "personal genomes." Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences.

    View details for DOI 10.1101/gad.1864110

    View details for Web of Science ID 000275055900001

    View details for PubMedID 20194435

    View details for PubMedCentralID PMC2827837

  • X chromosome-wide analyses of genomic DNA methylation states and gene expression in male and female neutrophils PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Yasukochi, Y., Maruyama, O., Mahajan, M. C., Padden, C., Euskirchen, G. M., Schulz, V., Hirakawa, H., Kuhara, S., Pan, X., Newburger, P. E., Snyder, M., Weissman, S. M. 2010; 107 (8): 3704-3709

    Abstract

    The DNA methylation status of human X chromosomes from male and female neutrophils was identified by high-throughput sequencing of HpaII and MspI digested fragments. In the intergenic and intragenic regions on the X chromosome, the sites outside CpG islands were heavily hypermethylated to the same degree in both genders. Nearly half of X chromosome promoters were either hypomethylated or hypermethylated in both females and males. Nearly one third of X chromosome promoters were a mixture of hypomethylated and heterogeneously methylated sites in females and were hypomethylated in males. Thus, a large fraction of genes that are silenced on the inactive X chromosome are hypomethylated in their promoter regions. These genes frequently belong to the evolutionarily younger strata of the X chromosome. The promoters that were hypomethylated at more than two sites contained most of the genes that escaped silencing on the inactive X chromosome. The overall levels of expression of X-linked genes were indistinguishable in females and males, regardless of the methylation state of the inactive X chromosome. Thus, in addition to DNA methylation, other factors are involved in the fine tuning of gene dosage compensation in neutrophils.

    View details for DOI 10.1073/pnas.0914812107

    View details for Web of Science ID 000275130900077

    View details for PubMedID 20133578

  • Close association of RNA polymerase II and many transcription factors with Pol III genes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raha, D., Wang, Z., Moqtaderi, Z., Wu, L., Zhong, G., Gerstein, M., Struhl, K., Snyder, M. 2010; 107 (8): 3639-3644

    Abstract

    Transcription of the eukaryotic genomes is carried out by three distinct RNA polymerases I, II, and III, whereby each polymerase is thought to independently transcribe a distinct set of genes. To investigate a possible relationship of RNA polymerases II and III, we mapped their in vivo binding sites throughout the human genome by using ChIP-Seq in two different cell lines, GM12878 and K562 cells. Pol III was found to bind near many known genes as well as several previously unidentified target genes. RNA-Seq studies indicate that a majority of the bound genes are expressed, although a subset are not suggestive of stalling by RNA polymerase III. Pol II was found to bind near many known Pol III genes, including tRNA, U6, HVG, hY, 7SK and previously unidentified Pol III target genes. Similarly, in vivo binding studies also reveal that a number of transcription factors normally associated with Pol II transcription, including c-Fos, c-Jun and c-Myc, also tightly associate with most Pol III-transcribed genes. Inhibition of Pol II activity using alpha-amanitin reduced expression of a number of Pol III genes (e.g., U6, hY, HVG), suggesting that Pol II plays an important role in regulating their transcription. These results indicate that, contrary to previous expectations, polymerases can often work with one another to globally coordinate gene expression.

    View details for DOI 10.1073/pnas.0911315106

    View details for Web of Science ID 000275130900066

    View details for PubMedID 20139302

    View details for PubMedCentralID PMC2840497

  • Deciphering Protein Kinase Specificity Through Large-Scale Analysis of Yeast Phosphorylation Site Motifs SCIENCE SIGNALING Mok, J., Kim, P. M., Lam, H. Y., Piccirillo, S., Zhou, X., Jeschke, G. R., Sheridan, D. L., Parker, S. A., Desai, V., Jwa, M., Cameroni, E., Niu, H., Good, M., Remenyi, A., Ma, J. N., Sheu, Y., Sassi, H. E., Sopko, R., Chan, C. S., De Virgilio, C., Hollingsworth, N. M., Lim, W. A., Stern, D. F., Stillman, B., Andrews, B. J., Gerstein, M. B., Snyder, M., Turk, B. E. 2010; 3 (109)

    Abstract

    Phosphorylation is a universal mechanism for regulating cell behavior in eukaryotes. Although protein kinases target short linear sequence motifs on their substrates, the rules for kinase substrate recognition are not completely understood. We used a rapid peptide screening approach to determine consensus phosphorylation site motifs targeted by 61 of the 122 kinases in Saccharomyces cerevisiae. By correlating these motifs with kinase primary sequence, we uncovered previously unappreciated rules for determining specificity within the kinase family, including a residue determining P-3 arginine specificity among members of the CMGC [CDK (cyclin-dependent kinase), MAPK (mitogen-activated protein kinase), GSK (glycogen synthase kinase), and CDK-like] group of kinases. Furthermore, computational scanning of the yeast proteome enabled the prediction of thousands of new kinase-substrate relationships. We experimentally verified several candidate substrates of the Prk1 family of kinases in vitro and in vivo and identified a protein substrate of the kinase Vhs1. Together, these results elucidate how kinase catalytic domains recognize their phosphorylation targets and suggest general avenues for the identification of previously unknown kinase substrates across eukaryotes.

    View details for DOI 10.1126/scisignal.2000482

    View details for Web of Science ID 000275647900005

    View details for PubMedID 20159853

    View details for PubMedCentralID PMC2846625

  • Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response PLOS GENETICS Zhong, M., Niu, W., Lu, Z. J., Sarov, M., Murray, J. I., Janette, J., Raha, D., Sheaffer, K. L., Lam, H. Y., Preston, E., Slightham, C., Hillier, L. W., Brock, T., Agarwal, A., Auerbach, R., Hyman, A. A., Gerstein, M., Mango, S. E., Kim, S. K., Waterston, R. H., Reinke, V., Snyder, M. 2010; 6 (2)

    Abstract

    Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.

    View details for DOI 10.1371/journal.pgen.1000848

    View details for Web of Science ID 000275262700016

    View details for PubMedID 20174564

    View details for PubMedCentralID PMC2824807

  • Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library NATURE BIOTECHNOLOGY Lam, H. Y., Mu, X. J., Stuetz, A. M., Tanzer, A., Cayting, P. D., Snyder, M., Kim, P. M., Korbel, J. O., Gerstein, M. B. 2010; 28 (1): 47-U76

    Abstract

    Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.

    View details for DOI 10.1038/nbt.1600

    View details for Web of Science ID 000273430400020

    View details for PubMedID 20037582

  • CHIP-SEQ: USING HIGH-THROUGHPUT DNA SEQUENCING FOR GENOME-WIDE IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES METHODS IN ENZYMOLOGY, VOL 470: GUIDE TO YEAST GENETICS: Lefrancois, P., Zheng, W., Snyder, M. 2010; 470: 77-104

    Abstract

    Much of eukaryotic gene regulation is mediated by binding of transcription factors near or within their target genes. Transcription factor binding sites (TFBS) are often identified globally using chromatin immunoprecipitation (ChIP) in which specific protein-DNA interactions are isolated using an antibody against the factor of interest. Coupling ChIP with high-throughput DNA sequencing allows identification of TFBS in a direct, unbiased fashion; this technique is termed ChIP-Sequencing (ChIP-Seq). In this chapter, we describe the yeast ChIP-Seq procedure, including the protocols for ChIP, input DNA preparation, and Illumina DNA sequencing library preparation. Descriptions of Illumina sequencing and data processing and analysis are also included. The use of multiplex short-read sequencing (i.e., barcoding) enables the analysis of many ChIP samples simultaneously, which is especially valuable for organisms with small genomes such as yeast.

    View details for DOI 10.1016/S0076-6879(10)70004-5

    View details for Web of Science ID 000275827900004

    View details for PubMedID 20946807

  • RNA-Seq: a method for comprehensive transcriptome analysis. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Nagalakshmi, U., Waern, K., Snyder, M. 2010; Chapter 4: Unit 4 11 1-13

    Abstract

    A recently developed technique called RNA Sequencing (RNA-Seq) uses massively parallel sequencing to allow transcriptome analyses of genomes at a far higher resolution than is available with Sanger sequencing- and microarray-based methods. In the RNA-Seq method, complementary DNAs (cDNAs) generated from the RNA of interest are directly sequenced using next-generation sequencing technologies. The reads obtained from this can then be aligned to a reference genome in order to construct a whole-genome transcriptome map. RNA-Seq has been used successfully to precisely quantify transcript levels, confirm or revise previously annotated 5' and 3' ends of genes, and map exon/intron boundaries. This unit describes protocols for performing RNA-Seq using the Illumina sequencing platform.

    View details for DOI 10.1002/0471142727.mb0411s89

    View details for PubMedID 20069539

  • Systems biology approaches to disease marker discovery DISEASE MARKERS Sharon, D., Chen, R., Snyder, M. 2010; 28 (4): 209-224

    Abstract

    Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.

    View details for DOI 10.3233/DMA-2010-0707

    View details for Web of Science ID 000279321200003

    View details for PubMedID 20534906

  • EBNA1 regulates cellular gene expression by binding cellular promoters PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Canaan, A., Haviv, I., Urban, A. E., Schulz, V. P., Hartman, S., Zhang, Z., Palejev, D., Deisseroth, A. B., Lacy, J., Snyder, M., Gerstein, M., Weissman, S. M. 2009; 106 (52): 22421-22426

    Abstract

    Epstein-Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitt's lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.

    View details for DOI 10.1073/pnas.0911676106

    View details for Web of Science ID 000273178700069

    View details for PubMedID 20080792

  • Mapping accessible chromatin regions using Sono-Seq PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Auerbach, R. K., Euskirchen, G., Rozowsky, J., Lamarre-Vincent, N., Moqtaderi, Z., Lefrancois, P., Struhl, K., Gerstein, M., Snyder, M. 2009; 106 (35): 14926-14931

    Abstract

    Disruptions in local chromatin structure often indicate features of biological interest such as regulatory regions. We find that sonication of cross-linked chromatin, when combined with a size-selection step and massively parallel short-read sequencing, can be used as a method (Sono-Seq) to map locations of high chromatin accessibility in promoter regions. Sono-Seq sites frequently correspond to actively transcribed promoter regions, as evidenced by their co-association with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation (H3K4me3) marks, and CpG islands; signals over other sites, such as those bound by the CTCF insulator, are also observed. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, that observed for FAIRE and DNase I hypersensitive sites. Our results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure. Furthermore, our results provide insights into the mapping of binding sites by using ChIP-Seq experiments and the value of reference samples that should be used in such experiments.

    View details for DOI 10.1073/pnas.0905443106

    View details for Web of Science ID 000269481000036

    View details for PubMedID 19706456

  • Global analysis of the glycoproteome in Saccharomyces cerevisiae reveals new roles for protein glycosylation in eukaryotes MOLECULAR SYSTEMS BIOLOGY Kung, L. A., Tao, S., Qian, J., Smith, M. G., Snyder, M., Zhu, H. 2009; 5

    Abstract

    To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.

    View details for DOI 10.1038/msb.2009.64

    View details for Web of Science ID 000270456400002

    View details for PubMedID 19756047

  • Impact of Chromatin Structures on DNA Processing for Genomic Analyses PLOS ONE Teytelman, L., Oezaydin, B., Zill, O., Lefrancois, P., Snyder, M., Rine, J., Eisen, M. B. 2009; 4 (8)

    Abstract

    Chromatin has an impact on recombination, repair, replication, and evolution of DNA. Here we report that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation (ChIP) experiments. We initially discovered this effect at the Saccharomyces cerevisiae HMR locus, where we found that silenced chromatin was refractory to shearing, relative to euchromatin. Using input samples from ChIP-Seq studies, we detected a similar bias throughout the heterochromatic portions of the yeast genome. We also observed significant chromatin-related effects at telomeres, protein binding sites, and genes, reflected in the variation of input-Seq coverage. Experimental tests of candidate regions showed that chromatin influenced shearing at some loci, and that chromatin could also lead to enriched or depleted DNA levels in prepared samples, independently of shearing effects. Our results suggested that assays relying on immunoprecipitation of chromatin will be biased by intrinsic differences between regions packaged into different chromatin structures - biases which have been largely ignored to date. These results established the pervasiveness of this bias genome-wide, and suggested that this bias can be used to detect differences in chromatin structures across the genome.

    View details for DOI 10.1371/journal.pone.0006700

    View details for Web of Science ID 000269267400008

    View details for PubMedID 19693276

  • Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo NATURE STRUCTURAL & MOLECULAR BIOLOGY Zhang, Y., Moqtaderi, Z., Rattner, B. P., Euskirchen, G., Snyder, M., Kadonaga, J. T., Liu, X. S., Struhl, K. 2009; 16 (8): 847-U70

    Abstract

    We assess the role of intrinsic histone-DNA interactions by mapping nucleosomes assembled in vitro on genomic DNA. Nucleosomes strongly prefer yeast DNA over Escherichia coli DNA, indicating that the yeast genome evolved to favor nucleosome formation. Many yeast promoter and terminator regions intrinsically disfavor nucleosome formation, and nucleosomes assembled in vitro show strong rotational positioning. Nucleosome arrays generated by the ACF assembly factor have fewer nucleosome-free regions, reduced rotational positioning and less translational positioning than obtained by intrinsic histone-DNA interactions. Notably, nucleosomes assembled in vitro have only a limited preference for specific translational positions and do not show the pattern observed in vivo. Our results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.

    View details for DOI 10.1038/nsmb.1636

    View details for Web of Science ID 000268738700012

    View details for PubMedID 19620965

  • The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Tirosh-Wagner, T., Urban, A. E., Chen, X., Kasowski, M., Dai, L., Grubert, F., Erdman, C., Gao, M. C., Lange, K., Sobel, E. M., Barlow, G. M., Aylsworth, A. S., Carpenter, N. J., Clark, R. D., Cohen, M. Y., Doran, E., Falik-Zaccai, T., Lewin, S. O., Lott, I. T., McGillivray, B. C., Moeschler, J. B., Pettenati, M. J., Pueschel, S. M., Rao, K. W., Shaffer, L. G., Shohat, M., Van Riper, A. J., Warburton, D., Weissman, S., Gerstein, M. B., Snyder, M., Korenberg, J. R. 2009; 106 (29): 12031-12036

    Abstract

    Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.

    View details for DOI 10.1073/pnas.0813248106

    View details for Web of Science ID 000268178400040

    View details for PubMedID 19597142

  • Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants PLOS COMPUTATIONAL BIOLOGY Du, J., Bjornson, R. D., Zhang, Z. D., Kong, Y., Snyder, M., Gerstein, M. B. 2009; 5 (7)

    Abstract

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.

    View details for DOI 10.1371/journal.pcbi.1000432

    View details for Web of Science ID 000269220100023

    View details for PubMedID 19593373

  • Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles JOURNAL OF PROTEOME RESEARCH Rodriguez, H., Snyder, M., Uhlen, M., Andrews, P., Beavis, R., Borchers, C., Chalkley, R. J., Cho, S. Y., Cottingham, K., Dunn, M., Dylag, T., Edgar, R., Hare, P., Heck, A. J., Hirsch, R. F., Kennedy, K., Kolar, P., Kraus, H., Mallick, P., Nesvizhskii, A., Ping, P., Ponten, F., Yang, L., Yates, J. R., Stein, S. E., Hermjakob, H., Kinsinger, C. R., Apweiler, R. 2009; 8 (7): 3689-3692

    Abstract

    Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the "International Summit on Proteomics Data Release and Sharing Policy" in Amsterdam, The Netherlands, to identify and address potential roadblocks to rapid and open access to data. The six principles agreed upon by key stakeholders at the summit addressed issues surrounding (1) timing, (2) comprehensiveness, (3) format, (4) deposition to repositories, (5) quality metrics, and (6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.

    View details for DOI 10.1021/pr900023z

    View details for Web of Science ID 000267694600043

    View details for PubMedID 19344107

  • Unlocking the secrets of the genome NATURE Celniker, S. E., Dillon, L. A., Gerstein, M. B., Gunsalus, K. C., Henikoff, S., Karpen, G. H., Kellis, M., Lai, E. C., Lieb, J. D., MacAlpine, D. M., Micklem, G., Piano, F., Snyder, M., Stein, L., White, K. P., Waterston, R. H. 2009; 459 (7249): 927-930

    View details for DOI 10.1038/459927a

    View details for Web of Science ID 000267063500031

    View details for PubMedID 19536255

  • Dynamic and complex transcription factor binding during an inducible response in yeast GENES & DEVELOPMENT Ni, L., Bruce, C., Hart, C., Leigh-Bell, J., Gelperin, D., Umansky, L., Gerstein, M. B., Snyder, M. 2009; 23 (11): 1351-1363

    Abstract

    Complex biological processes are often regulated, at least in part, by the binding of transcription factors to their targets. Recently, considerable effort has been made to analyze the binding of relevant factors to the suite of targets they regulate, thereby generating a regulatory circuit map. However, for most studies the dynamics of binding have not been analyzed, and thus the temporal order of events and mechanisms by which this occurs are poorly understood. We globally analyzed in detail the temporal order of binding of several key factors involved in the salt response of yeast to their target genes. Analysis of Yap4 and Sko1 binding to their target genes revealed multiple temporal classes of binding patterns: (1) constant binding, (2) rapid induction, (3) slow induction, and (4) transient induction. These results demonstrate that individual transcription factors can have multiple binding patterns and help define the different types of temporal binding patterns used in eukaryotic gene regulation. To investigate these binding patterns further, we also analyzed the binding of seven other key transcription factors implicated in osmotic regulation, including Hot1, Msn1, Msn2, Msn4, Skn7, and Yap6, and found significant coassociation among the different factors at their gene targets. Moreover, the binding of several key factors was correlated with distinct classes of Yap4- and Sko1-binding patterns and with distinct types of genes. Gene expression studies revealed association of Yap4, Sko1, and other transcription factor-binding patterns with different gene expression patterns. The integration and analysis of binding and expression information reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.

    View details for DOI 10.1101/gad.1781909

    View details for Web of Science ID 000266524100009

    View details for PubMedID 19487574

  • Integrated analysis of co-expressed MAP kinase substrates in Arabidopsis thaliana. Plant signaling & behavior Popescu, S. C., Popescu, G. V., Snyder, M., Dinesh-Kumar, S. P. 2009; 4 (6): 524-527

    View details for PubMedID 19816141

  • Distinct Genomic Aberrations Associated with ERG Rearranged Prostate Cancer GENES CHROMOSOMES & CANCER Demichelis, F., Setlur, S. R., Beroukhim, R., Perner, S., Korbel, J. O., LaFargue, C. J., Pflueger, D., Pina, C., Hofer, M. D., Sboner, A., Svensson, M. A., Rickman, D. S., Urban, A., Snyder, M., Meyerson, M., Lee, C., Gerstein, M. B., Kuefer, R., Rubin, M. A. 2009; 48 (4): 366-380

    Abstract

    Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered three genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in nonrearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3 Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pin-point breakpoints leading to fusion events. This study provides further support to define a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements.

    View details for DOI 10.1002/gcc.20647

    View details for Web of Science ID 000263572700007

    View details for PubMedID 19156837

  • A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster BLOOD Zhang, X., Lian, Z., Padden, C., Gerstein, M. B., Rozowsky, J., Snyder, M., Gingeras, T. R., Kapranov, P., Weissman, S. M., Newburger, P. E. 2009; 113 (11): 2526-2534

    Abstract

    We have identified an intergenic transcriptional activity that is located between the human HOXA1 and HOXA2 genes, shows myeloid-specific expression, and is up-regulated during granulocytic differentiation. The novel gene, termed HOTAIRM1 (HOX antisense intergenic RNA myeloid 1), is transcribed antisense to the HOXA genes and originates from the same CpG island that embeds the start site of HOXA1. The transcript appears to be a noncoding RNA containing no long open-reading frame; sucrose gradient analysis shows no association with polyribosomal fractions. HOTAIRM1 is the most prominent intergenic transcript expressed and up-regulated during induced granulocytic differentiation of NB4 promyelocytic leukemia and normal human hematopoietic cells; its expression is specific to the myeloid lineage. Its induction during retinoic acid (RA)-driven granulocytic differentiation is through RA receptor and may depend on the expression of myeloid cell development factors targeted by RA signaling. Knockdown of HOTAIRM1 quantitatively blunted RA-induced expression of HOXA1 and HOXA4 during the myeloid differentiation of NB4 cells, and selectively attenuated induction of transcripts for the myeloid differentiation genes CD11b and CD18, but did not noticeably impact the more distal HOXA genes. These findings suggest that HOTAIRM1 plays a role in the myelopoiesis through modulation of gene expression in the HOXA cluster.

    View details for DOI 10.1182/blood-2008-06-162164

    View details for Web of Science ID 000264110600021

    View details for PubMedID 19144990

  • A high throughput embryonic stem cell screen identifies Oct-2 as a bifunctional regulator of neuronal differentiation GENES & DEVELOPMENT Theodorou, E., Dalembert, G., Heffelfinger, C., White, E., Weissman, S., Corcoran, L., Snyder, M. 2009; 23 (5): 575-588

    Abstract

    Neuronal differentiation is a complex process that involves a plethora of regulatory steps. To identify transcription factors that influence neuronal differentiation we developed a high throughput screen using embryonic stem (ES) cells. Seven-hundred human transcription factor clones were stably introduced into mouse ES (mES) cells and screened for their ability to induce neuronal differentiation of mES cells. Twenty-four factors that are capable of inducing neuronal differentiation were identified, including four known effectors of neuronal differentiation, 11 factors with limited evidence of involvement in regulating neuronal differentiation, and nine novel factors. One transcription factor, Oct-2, was studied in detail and found to be a bifunctional regulator: It can either repress or induce neuronal differentiation, depending on the particular isoform. Ectopic expression experiments demonstrate that isoform Oct-2.4 represses neuronal differentiation, whereas Oct-2.2 activates neuron formation. Consistent with a role in neuronal differentiation, Oct-2.2 expression is induced during differentiation, and cells depleted of Oct-2 and its homolog Oct-1 have a reduced capacity to differentiate into neurons. Our results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.

    View details for DOI 10.1101/gad.1772509

    View details for Web of Science ID 000263918500005

    View details for PubMedID 19270158

  • Quantifying environmental adaptation of metabolic pathways in metagenomics PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gianoulis, T. A., Raes, J., Patel, P. V., Bjornson, R., Korbel, J. O., Letunic, I., Yamada, T., Paccanaro, A., Jensen, L. J., Snyder, M., Bork, P., Gerstein, M. B. 2009; 106 (5): 1374-1379

    Abstract

    Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats-i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.

    View details for DOI 10.1073/pnas.0808022106

    View details for Web of Science ID 000263074600018

    View details for PubMedID 19164758

  • Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing BMC GENOMICS Lefrancois, P., Euskirchen, G. M., Auerbach, R. K., Rozowsky, J., Gibson, T., Yellman, C. M., Gerstein, M., Snyder, M. 2009; 10

    Abstract

    Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosoma